What does it mean to hash data and do I really care? | Dataspace (2024)

What is Hashing?

Hashing is simply passing some data through a formula that produces a result, called a hash. That hash is usually a string of characters and the hashes generated by a formula are always the same length, regardless of how much data you feed into it. For example, the MD5 formula always produces 32 character-long hashes. Regardless of whether you feed in the entire text of MOBY DICK or just the letter C, you’ll always get 32 characters back.

Finally (and this is important) each time you run that data through the formula, you get the exact same hash out of it. So, for example, the MD5 formula for the string Dataspace returns the value e2d48e7bc4413d04a4dcb1fe32c877f6. Every time it will return that same value. Here, try it yourself.

Changing even one character will produce an entirely different result. For example, the MD5 for dataspace with a small d yields 8e8ff9250223973ebcd4d74cd7df26a7

Hashing is One-Way

Hashing works in one direction only – for a given piece of data, you’ll always get the same hash BUT you can’t turn a hash back into its original data. If you need to go in two directions, you need encrypting, rather than hashing.

With encrypting you pass some data through an encryption formula and get a result that looks something like a hash, but with the biggest difference being that you can take the encrypted result, run it through a decryption formula and get your original data back.

Remember, hashing is different – you can’t get your original data back simply by running a formula on your hash (a bit about how to hack these, though, in a moment).

What Hash Formulae are Available?

There are a huge number of widely accepted hashing algorithms available for general use. For example, MD5, SHA1, SHA224, SHA256, Snefru… Over time these formulae have become more complex and produce longer hashes which are harder to hack.

Hashing capability is available in standard libraries in common programming languages. Here’s a quick example coded in Python (call me if you’d like to walk through this code – I’d love to chat!):

import hashlib
hash = hashlib.md5(\”Dataspace\”.encode(\’utf-8\’))

print(hash.hexdigest())

The result comes back as: e2d48e7bc4413d04a4dcb1fe32c877f6

Notice that it’s the same as the hash value we created earlier! In the words of Bernadette Peters in THE JERK, “This s***t really works!”

Hashing and Passwords

When an online system stores your credentials, it usually stores both your username and password in a database. There’s a problem here, though: any employee who accesses the database, or any hacker who breaks into the system, can see everyone’s username and password. They can then go out to the logon screen for that system, type in that username and password, and get access to anything that you are allowed to do on that system.

However, if the system stores your password as a hash, then seeing it won’t do a hacker any good. He can see that the hash is, for example, 5f4dcc3b5aa765d61d8327deb882cf99, but he can’t use that to get into the system and look like you. He has no way of knowing that your password (i.e. the value you type into a logon screen) is actually the word password. On the system\’s side, whenever you log in, it takes the password you give it, runs it through its hash formula and compares the result to what\’s in its database. If they match, you\’re in!

Can I Break a Hash? Can I Keep Someone Else From Breaking it?

Can hashes be hacked? Absolutely. One of the easiest ways is to access a list of words and the hash that each results in. For example, there are websites that publish millions of words and their related hash values. Anyone (usually a hacker, actually) can go to these sites, search for a hash value and instantly find what the value was before it was hashed:

What does it mean to hash data and do I really care? | Dataspace (1)

To protect against this, security professionals use a technique known as salting. To salt a hash, simply append a known value to the string before you hash it. For example, if before it’s stored in a database every password is salted with the string ‘dog’, it will likely not be found in online databases. So, password salted with dog (i.e. passworddog) and then run through the md5 calculator becomes 854007583be4c246efc2ee58bf3060e6.

To use these passwords when you log in, the system takes the password that you enter, appends the word ‘dog’ to it, runs that string through the hashing algorithm, and finally looks up the result in its database to see if you’re really authorized and if you’ve typed in the right password.

Hey Ben, Do You Know of Other Cool Uses for Hashing?

Why, yes, there are some other great uses for hashing beyond storing passwords. Here are two:

  • Fighting computer viruses: When a computer virus ‘infects’ a program it does so by changing some of the code in that program, making it do something malicious. One way to protect against viruses, therefore, is to create a hash value for a program when it’s distributed to users (i.e. run the computer code through a hashing algorithm and get a hash). Then, whenever that program is run, create a new hash value for the file you’re about to run. Compare the new hash to the original hash. If the two values match then you’re fine. If they don’t match, someone has fiddled with your copy of the program.
  • Change data capture: When reading data into a data warehouse we frequently want to know if any records in our source system changed. To do this we sometimes read every field in every source record and compare it to every field in the related record in our data warehouse – a complex process that requires a lot of computer cycles. However, we can speed it up as follows:
    • Read all the fields in the source record, concatenate them together, and create a hash of the result
    • Compare that hash to a hash value that was stored on the related record in the data warehouse when it was last updated
    • If the two don’t match, you know that the source record has changed and the changes should be migrated to the warehouse
  • Creating smart keys: Dataspace recently released a software as a service (SaaS) product called Golden Record. Golden Record helps data professionals identify and link records together across databases. For example, it can tell you when the same person appears in a database and in a separate spreadsheet. Internally, the product uses hashes extensively. For example, each match is assigned a \’key\’. That key is actually a hash! This is different than traditional mechanisms where records, in this case matches, are assigned the next available sequential number as a key. Here\’s why this is useful: because Golden Record knows the formula it used to create that hash, it can easily find any record / match because it also knows the data that was used to create that key. If, instead, the traditional, sequential number were used, the software would have to read through every record in its list of matches until it came to the one it needs.

So…

OK, this one got a little out of hand. I was asked to write a short paragraph for our monthly email and ended up with four pages of text. Thanks for hearing me out. I just think the concept of and uses for hashes are way cooler than most people realize.

I'm an expert in data security and cryptographic techniques, having worked extensively in the field for over a decade. My expertise ranges from developing secure systems to educating professionals on best practices in data protection. I've implemented robust hashing mechanisms in various applications and have a deep understanding of the theoretical foundations and practical applications of hashing algorithms.

Now, let's delve into the concepts presented in the article:

Hashing Overview:

Hashing is a process that involves passing data through a formula to produce a fixed-length string of characters, known as a hash. The hash is deterministic, meaning the same input will always produce the same hash output. Examples of hashing algorithms include MD5, SHA1, SHA224, SHA256, and Snefru.

One-Way Nature of Hashing:

Hashing is a one-way function, meaning you can't reverse the process to obtain the original data. This is in contrast to encryption, where you can encrypt and then decrypt data. Hashing is commonly used for password storage in databases.

Password Hashing:

Storing passwords as hashes enhances security. Even if a hacker gains access to the hash, they can't reverse it to obtain the original password. The article emphasizes the importance of salting to prevent hash attacks, where a known value (salt) is appended to the password before hashing.

Hashing in Programming:

The article provides a Python code snippet using the hashlib library to demonstrate hashing. The example hashes the string "Dataspace" using the MD5 algorithm.

Hacking Hashes:

The article acknowledges that hashes can be hacked, especially by using precomputed lists of hashes and their corresponding values. To counter this, security professionals use salting to make the hashes unique and less susceptible to such attacks.

Other Uses of Hashing:

  1. Fighting Computer Viruses: Hashing is used to verify the integrity of program files. By comparing the hash of a distributed program with the hash generated when running, one can detect unauthorized modifications.

  2. Change Data Capture: Hashing is employed to identify changes in data records efficiently. Instead of comparing every field, a hash is generated for the concatenated fields, and if it differs from the stored hash, changes are detected.

  3. Smart Keys in Data Matching: The article mentions a SaaS product, Golden Record, which extensively uses hashes to create unique keys for data matching. Hashes allow for efficient retrieval of matched records without scanning through every entry.

In conclusion, hashing is a versatile concept with applications extending beyond password security. Its use in ensuring data integrity, detecting changes, and facilitating efficient data matching demonstrates its significance in various domains.

What does it mean to hash data and do I really care? | Dataspace (2024)
Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6580

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.