Work with BagIt packages from Python.
bagit is a Python library and command line utility for working with BagIt style packages. BagIt is a minimalist packaging format for digital preservation. If you're not familiar with BagIt already you may be interested in the BagIt Wikipedia article, the IETF specification or this short video below.
bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:
pip install bagit
Python v2.4+ is required.
From python you can use the bagit module to make a bag like this:
import bagit
bag = bagit.make_bag('mydir', {'Contact-Name': 'Ed Summers'})
Or if you've got an existing bag
import bagit
bag = bagit.Bag('/path/to/bag')
Or from the command line:
bagit.py --contact-name 'Ed Summers' mydir
If you want to validate a bag you can:
bag = bagit.Bag('/path/to/bag')
if bag.is_valid():
print "yay :)"
else:
print "boo :("
If you'd like to generate the checksums using parallel system processes, instead of single process:
bagit.make_bag('mydir', {'Contact-Name': 'Ed Summers'}, processes=4)
or:
bagit.py --processes 4 --contact-name 'Ed Summers' mydir
bag --help will give the full set of options.
% git clone git://github.com/LibraryOfCongress/bagit-python.git
% cd bagit
% python test.py
If you'd like to see how increasing parallelization of bag creation on your system effects the time to create a bag try using the included bench utility:
% ./bench.py