MD5 is an algorithm which generates a 128-bit hash, which is used to verify the integrity of data.
When a critical data is shared between two parties, the authenticity of the data needs to be verified, whether the data is corrupted during transit or not.
md5sum is a function that uses MD5 algorithm which generates a 128-bit hash based on the data.
Let's first see how it is done on a terminal, Ubuntu has a built in command md5sum and iOS has md5
$ echo 'hello world!' | md5sum
c897d1410af8f2c74fba11b1db511e9e
For the string hello world! the 128-bit hash is c897d1410af8f2c74fba11b1db511e9e
Now let's compare the result with a node js library md5.
Create a simple package.json with md5 as dependency and in a index.js file use it to generate md5sum for the string hello world!
{
"name": "md5",
"version": "1.0.0",
"description": "generate md5sum",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"author": "",
"license": "ISC",
"dependencies": {
"md5": "^2.3.0"
}
}
const md5sum = require("md5");
console.log(md5sum("hello world!"));
Let's run index.js
$ node index.js
fc3ff98e8c6a0d3087d515c0473f8677
The hash came out to be fc3ff98e8c6a0d3087d515c0473f8677 which is completely different than c897d1410af8f2c74fba11b1db511e9e
By default the echo command appends a new line character to the end of strings, so hello world! is considered as hello world!\n. We have to explicitly tell echo to not add new line character at the end. This is done using the -n option
$ echo -n 'hello world!' | md5sum
fc3ff98e8c6a0d3087d515c0473f8677
Now both the hash generated on the terminal and using the node js package are similar.
This is all good for small strings but what about files.
Let's create a simple text file named greetings.txt and populate it with the string hello world!. The hash of the file will be
$ md5sum greetings.txt
c897d1410af8f2c74fba11b1db511e9e
This hash is exactly similar to the hash that was created before when echo command was used without -n option, that is because a file will always have an new line appended. Read more here
From the js side the file hash should look something similar
const md5sum = require("md5");
const fs = require("fs");
const res = fs.readFileSync("./greetings.txt", { encoding: "utf8", flag: "r" });
console.log(md5sum(res));
$ node index.js
c897d1410af8f2c74fba11b1db511e9e
Yes they are similar as file is the same entity.
Let's consider a use case, there is a requirement to build a file upload service where user selects a file to upload from a website and it gets uploaded to a remote server, to find out the integrity of the uploaded file we need to share the MD5 hash.
Should the hash be created in browser? The answer is no.
Let's figure out why, I will try to generate hash for a large file with size of 300 MB
const md5sum = require("md5");
const fs = require("fs");
const res = fs.readFileSync("largeFile.mp4", { encoding: "utf8", flag: "r" });
console.log(md5sum(res));
$ node index.js
../.npm/charenc@0.0.2/node_modules/charenc/charenc.js:6
return charenc.bin.stringToBytes(unescape(encodeURIComponent(str)));
^
RangeError: Invalid string length
at encodeURIComponent (<anonymous>)
at Object.stringToBytes (/Users/idks/Developer/test/node_modules/.pnpm/charenc@0.0.2/node_modules/charenc/charenc.js:6:49)
...
It failed because there is a limit to the string length in js. Read more here
So ideally the browser should not be responsible for creating MD5 hash but rather it's the users responsibality to create the hash and share it with a remote server along with the file to verify it's integrity.