Commonly referred to as a "digital fingerprint," a hash value is a special encryption code that is associated with each computer file. Hash values provide digital files with a unique identifier that corresponds to its contents. If the contents change, the file's hashtag will change as well, indicating that the file is not the same as it was before. In e-discovery, one can compare hash values before and after collection to verify that a file is the same before and after collection.To understand how MD5 hashing relates to e-discovery one must first know what a computer hash is. A computer hash is an encryption algorithm that takes the various bits of a file and outputs a unique text string. Many hash algorithms have been created over the years, but the most commonly applied algorithm in use today for e-discovery is the MD5 (“MD" being short for message-digest). An MD5 hash tag might look something like:A558c8b8295854fa69a2ad9a7cc75ab7
While the above sequence might look like a random assortment of letters and numbers, it is in fact a revealing digital code, a unique alphanumeric value representing the contents of a single computer file. If one character is modified or deleted from the data contained in a file, its MD5 hash code will be completely different than the original MD5 hash code. If a file is defensibly collected and processed, its hash code will not change--even if the file name has been modified.Why do hash codes matter for e-discovery?
- Data Integrity: Assigning MD5 algorithms can help ensure that any changes to a document result in the generation of unique hash codes, thus exposing any attempts to manipulate potentially relevant evidence.
- De-Duplication: Accurate MD5 hashing in the collection and processing phases of e-discovery allows duplicate and system files to be accurately identified and removed, which in turn lowers e-discovery costs by reducing data volumes in advance of attorney review.