Popular Posts

Thursday, February 2, 2012

Python MD5 Checksums

I recently did something novel.  I checked the MD5 checksum of a file I downloaded.  For at least 5 years, software package distributors have- as a standard- included an MD5 checksum (fairly long string of characters) that can be used (by magic) to verify the validity of the data you transferred.  I've done this once or twice using software packages specifically designed for checking MD5's on a Windows environment, but I've never gotten any deeper than that.

Today, at the urging of Apache, "We recommend you use a mirror to download our release builds, but you must verify the integrity of the downloaded files using signatures downloaded from our main distribution directories." [their emphasis]  Oh I must, must I?  Well then okay.

I found the checksum of the package I wanted to download, the org.apache.commons.io package, MD5: 
da4d3ca0be4afeb78e6fde2047bad281
(from here: http://commons.apache.org/io/download_io.cgi)

I then downloaded the io zipped package, then Googled how to do that with Python...turns out, it's pretty easy and fun.

This guy's blog had the simplest and easiest version:
http://abstracthack.wordpress.com/2007/10/19/calculating-md5-checksum/

He says, do this:
from md5 import md5
fname = "my file path"
s = md5(open(fname, "rb").read()).hexdigest()
print "md5 checksum: %s" % s
So I did this:
from md5 import md5
fname = "D:\\Downloads\\Win\\commons-io-2.1-bin.zip"
s = md5(open(fname, "rb").read()).hexdigest()
print "md5 checksum: %s" % s
and got this:
md5 checksum: da4d3ca0be4afeb78e6fde2047bad281 
Hooray!  So now I've done that.

Tuesday, September 20, 2011

NumPy and MatPlotLib

In a recent project, I needed a good way to plot the probability distribution of activation times of phone devices.  I had all the data available, but didn't want to have to plug it all into Excel every time I needed to see the data visualized, so I asked the internet what I needed and it told me that it was probably the hist function in MatPlotLib.  A prerequisite for MatPlotLib is NumPy.  After a medium-sized series of dead ends and red herrings, I determined that the easiest way to access MatPlotLib was to use the exe installers on their public websites even though I didn't initially want my users to have to do that to see the plots.  Turns out that if I wanted to compile NumPy (at least) from source and include it in my project, I'd have to deal with compiling it via GCC and G77 (Fortran 77 compiler) and I'd have to match the compiler version I use to compile NumPy to the compiler that was used to compile the version of Python that I'm running...and I knew I wouldn't do that stuff right on the first try.

Long story short, it was really easy once I selected the correct version of NumPy and MatPlotLib I wanted (note that MatPlotLib links you to the wrong NumPy SourceForge page).  I would suggest that most users just do it that way, it's much more difficult to include those packages in the project.  The documentation is really good for MatPlotLib, at least in the hist function.  NumPy has a lot of good stuff in it too, of course, and I like its docs as well.


When it was all done, I plotted some real data from one of our installs.  It looks rough but it's just about all we need and will get some polish on it.

Thursday, July 14, 2011

Unit Testing Suites

I was recently involved with an upgrade to the unit testing and automation suite that my company uses- I've mentioned it previously.  In developing and implementing the solution to our specific issues, I have had to comb the vast yet not very complete world of Unit Testing blogs, wiki's, and even at times, subject matter experts' personal emails.

One of those SME's, or at least one of the advocates and project leaders, Bruno Kinoshita is in the midst of developing a TAP (Test Anything Protocol) unit testing output display plug-in for Jenkins.  I asked if I could participate in his project to see if we could use the work he was doing for TAP and apply it more broadly to other unit testing suites.  One of the issues we had initially was that each of us use some similar (but not the compatible) version of unit testing, but that there is no single standard for displaying test results or constructing unit test cases.

In familiarizing Bruno to some of the protocols I had come into contact with, I put together a short primer about what I knew.  I recognize that my own knowledge is limited to just the basics (and I may have gotten some of it wrong) and just what I understand about the suites.  Bruno suggested that posting the info to a blog would be helpful to others who are struggling to understand and possibly even chose which suite to use.  Feel free to correct me when I'm wrong, or to add your own understanding/impressions about the suites I mention.

Original message follows:
-----
Here's my understanding of the nomenclature:

First, xUnit, is a generic term used to mean any language's unittest framework (they're all pretty similar in their implementation).  Here's the origin story: http://www.martinfowler.com/bliki/Xunit.html

Some of the unittest packages (for C++, .NET, Java, and Python) include CPPUnit/NUnit/JUnit/PyUnit and they all can produce output that is consistent with, and inspired by Smalltalk's unit testing format, SUnit (http://www.xprogramming.com/testfram.htm).  That's what is good about xUnit output is that no matter what language you use, there's probably a testing package that prints a generic xUnit output- all except JUnit produce output that looks like ...F...E...FF.., followed by lots of details on errors and failures.  JUnit has a formatter that is capable of producing XML unittest output, the kind that Jenkins reads.  (There's one for Perl too: http://search.cpan.org/~mcast/Test-Unit-0.25/lib/Test/Unit.pm)

There are two xUnit plugins we've been talking about- one for Jenkins (https://wiki.jenkins-ci.org/display/JENKINS/xUnit+Plugin) and one for Nose (http://packages.python.org/nose/plugins/xunit.html).  They both are designed to handle XML output, but in different ways- the xUnit plugin for Jenkins is designed to read xUnit style generic output- the ...F...E stuff- and produce XML that is consistent with the XML formatter that JUnit implements.  This is what Jenkins can read.  The xUnit plugin for Nose bypasses the need for the Jenkins plugin and natively produces XML-formatted (JUnit-consistent) test output that Jenkins can read.  Basically, everything has to end up in XML format for Jenkins to be happy.

When I mentioned our desire to have additional output including, possibly, data in the test case output, Kohsuke said he thought that might require a change to Jenkins' core, but that he was willing to do that to add the functionality.

One final note is that SubUnit includes a converter from TAP to Subunit.  It also includes a converter from Subunit to JUnit XML format.  This could be a helpful place to look (I'm sure that it's not a complete implementation of what TAP output can do, so you'd probably have to take the bits you want and then extend it). http://pypi.python.org/pypi/python-subunit/0.0.6


The G-Unit framework is a separate issue and should be handled with care.  http://en.wikipedia.org/wiki/G-unit ;-)
 
-Max

Thursday, April 21, 2011

testresources

Have been working on an implementation of testresources , written by Robert Collins, for some time now, gotten it up and running, and now have good output in Jenkins...therefore, it seems about time to throw all that away in favor of a different implementation.

Although testresources appears much discussed in Python testing circles (see here, here and here), the documentation that exists on implementing it is...fairly sparse, to put it politely.  So over the last couple of days, I've waded hip deep into testresources package code.  I've found that the code itself is elegant and smart (I haven't tested the logic behind the various tools used in the implementation- digraph to graph, minimal spanning tree via Kruskals algorithm, etc.- but I do see that Mr. Collins put a great deal of effort into making it smart and powerful).

In summary, using the testresources package is a great way to allow access to expensive resources (in our case, a Selenium instance that fires up Firefox, imports a profile, and authenticates a user into our web interface) over multiple unit test cases in a suite that all require access to those resources.  The package even optimizes- <i>spelling it optimise to emfasise the British origin of the code, I guess</i>- the use of those resources by re-ordering test cases that call for the same resource to run all together so that, in theory, the resources persists only as long as there is a test case that is waiting to use it, then it is torn down via its native clean and tearDown methods (I say, "in theory," because if a test case changes the state of the resource and marks it <i>dirtied</i> the resource is destroyed and re-created- the order isn't smart enough to check and put any test cases that dirty the resource at the end of the queue so that the ones that can run happily one after the other don't get bogged down.)  To summarize, it does allow you to do some things that PyUnit doesn't allow, and it implements them very well.

Unfortunately, it also limits how you can structure your test cases (everything needs to be flat when you optimize it- one test suite consisting of all the test cases).  That means, you can't have nested suites (suites of suites), or sibling suites that contain cases that share resources.
This would work and be optimized with testresources:
<test suite>
     <test case 1>      - uses resource A
     <test case 2>
     <test case 3>      - uses resource A
     <test case 4>      - uses resource A
</test suite>
This doesn't:
<test suites>
     <test suite 1>
          <test case 1>      - uses resource A
          <test case 2>
          <test case 3>      - uses resource A
     </test suite 1>
     <test suite 2>
          <test case 4>      - uses resource A
     </test suite 2>
</test suites> 
So resource A would be created twice (at least), even though  it's essentially the same as the first structure (except that it uses a suite of test suites).

The next step, I think, is to try an implementation using the nose testing package.  More to follow.

Friday, April 8, 2011

Unit Testing

I'm currently working on a project to reorganize the output of our automated Selenium QA test cases to something that has a finer resolution (each functional test case should have its own output, and that output should be helpful/understandable).

There are a few challenges to this.  The continuous integration package that we currently use is Jenkins (until very recently known as Hudson).  Jenkins can read in XML output from test results, but we don't have any idea what standards Jenkins is looking for- can it handle nests of test cases (called a test suite)?  If it can handle test suite structures in XML, do we have any output generating test runners that can deliver that output?

Unfortunately, as is so often the case with open source code, the documentation to go along with these packages is spare to nonexistent (to be fair, Jenkins has a pretty good Wiki that is just missing coverage on the pieces that I care about now).  I may have to resort to emailing/posting a message to the developer Kohsuke (who is really responsive, it seems) to figure out what Jenkins is looking for.

We've been looking at zope.test and zope.testrunner as possible candidates to speed up execution of our test cases (they allow an environment layer to be set and the unit test cases to run within that environment).  Currently, we use python's integrated unittest package to implement test cases and suites and because of the way unittest works, each test case is run on the same level (so suites of suites of test cases all just become a list of test cases that run in the order they were assembled- so Jenkins displays their results that way- all on the same level).  This is non-optimal because we want to assemble suites to organize sets of test cases based on testing a specific function or page completely, and view the output in the corresponding organizational fashion.

Because of the dependency on the intermediate XML output, we've been playing with subunit2pyunit within subunit, and before that collective.xmltestreport as a possible translator- we switched to looking at subunit because of that message.

In addition, I just found out how to use Setup Tools' easy_install which is pretty fun on installs, but absolutely a pain to uninstall- might as well call it difficult_uninstall as a warning to potential future users.  (The advantage of easy_install for Python developers is that it sets PYTHONPATH references to packages, downloads the appropriate version much like apt-get on Linux, and does a lot of the dirty work of integrating it all into one location- helpful PYTHONPATH link.)  Also, just a note zopepackage is not the same as zope.testrunner (in fact, the zope package doesn't include testrunner anymore just test.)