Popular Posts

Thursday, February 2, 2012

Python MD5 Checksums

I recently did something novel.  I checked the MD5 checksum of a file I downloaded.  For at least 5 years, software package distributors have- as a standard- included an MD5 checksum (fairly long string of characters) that can be used (by magic) to verify the validity of the data you transferred.  I've done this once or twice using software packages specifically designed for checking MD5's on a Windows environment, but I've never gotten any deeper than that.

Today, at the urging of Apache, "We recommend you use a mirror to download our release builds, but you must verify the integrity of the downloaded files using signatures downloaded from our main distribution directories." [their emphasis]  Oh I must, must I?  Well then okay.

I found the checksum of the package I wanted to download, the org.apache.commons.io package, MD5: 
da4d3ca0be4afeb78e6fde2047bad281
(from here: http://commons.apache.org/io/download_io.cgi)

I then downloaded the io zipped package, then Googled how to do that with Python...turns out, it's pretty easy and fun.

This guy's blog had the simplest and easiest version:
http://abstracthack.wordpress.com/2007/10/19/calculating-md5-checksum/

He says, do this:
from md5 import md5
fname = "my file path"
s = md5(open(fname, "rb").read()).hexdigest()
print "md5 checksum: %s" % s
So I did this:
from md5 import md5
fname = "D:\\Downloads\\Win\\commons-io-2.1-bin.zip"
s = md5(open(fname, "rb").read()).hexdigest()
print "md5 checksum: %s" % s
and got this:
md5 checksum: da4d3ca0be4afeb78e6fde2047bad281 
Hooray!  So now I've done that.

No comments:

Post a Comment