Tag Archives: software

‘The DROID we are looking for?’ – Using the open source tools

One of the objectives of this project is to examine how useful and practical the Open Source preservation tools are.

I thought that it would be worth recording my experience in installing and using the software in order to carry out the tests for the project. I’ve heard so many people over the years, from my mother to system administrators, use the phrase ‘I’m not very technical or IT-savvy’ so I won’t say that. I will be honest and confess I am just ‘IT-savvy’ enough to embark on something like this but not savvy enough to master it correctly. I think many of us working in records management occupy this sort of space but it shows that none of the following should be attempted without the input and guidance of your IT department.

Deciding to use open source tools to support records management is a different approach for a records manager. We are used to stages such as building a business case, procuring a system and implementing it. Open source tools are a lot simpler. Find it on SourceForge and – if your IT department says it’s OK – click download and there it is on your desktop.

DROID

According to its website DROID (Digital Record Object Identification) is “a software tool developed by The National Archives to perform automated batch identification of file formats”.

I found this downloading this tool very simple to do and it is very intuitive to use.

The functionality of DROID that stood out for me was the analysis and reporting that it could produce. Fundamental to my conception of the business case for electronic records management is the amount spent on storage and back-up of large, unstructured shared drives. Whilst I had used simple ‘search’ analysis in Windows Explorer to assess shared drives, the DROID tool did this reporting quicker and was able to produce it as a CSV download that allowed for analysis. The possibility for producing metrics to support a business case is therefore much enhanced with DROID.

This also provided me with the idea of using DROID to run regular reports on a structured, organised File Servers repository. One of the challenges of a ‘file servers/existing tools’ approach to electronic records management is the analysis and reporting that can be produced. DROID allows a simple manual process of taking a few minutes at most that could provide a regular ‘snapshot’ to map the development, growth and reduction of an electronic repository over time. I have demonstrated how this might work in scenarios 2 and 3 of my project output ‘Records management workflows’.

DROID therefore was an immediate help and has the potential to be a valuable tool for records managers.

XENA
According to its website, http://xena.sourceforge.net/index.php XENA (Xml Electronic Normalising for Archives) is a tool intended to “aid in the long term preservation of digital records”. The tool was developed by the National Archives of Australia.

XENA needed a few other things installed on my PC before I could get it up and running. Therefore I had to enlist the help of my IT support people to install OpenOffice and upgrade my JAVA settings accordingly. 

Once it was installed, XENA was quite easy to use. It follows the OAIS model by converting the document into a preservable digital object. You can view some of the content through a ‘XENA viewer’, but to make it properly available you can publish or ‘normalise’ the document, usually as an Open Office document or PDF.  (Further details on the outcome of our testing are available elsewhere on the project blog).

The concept of a safely preserved set of documents of which a copy can be created when required is quite a departure from the idea of a EDRMS master repository of documents which a user can search. Could a user – a member of staff or a member of the public – search a catalogue of document titles and keywords rather than the documents themselves? This might be impractical for ‘short retention /regularly accessed’ types of records. But it could be ideal for permanent retention documents (such as governance minutes, regulations, strategies) that cover both bases as valuable administrative records and documents of archival or public interest.

Challenges / Risks

1) There is an assumption that XENA and DROID will continue to be supported and relevant in years to come. That they are products of National Archives programmes does suggest a longevity that means the software is worth using for long-term preservation.  Would the Open Office output of XENA run more of a risk of obsolescence than the XENA objects?

2) The use of these tools requires a lot of manual work for the records manager – running reports, converting documents etc. – which can be automated by the use of an EDRMS system. However, you could argue that this sort of work was a key part of past – and still persisting – paper records management processes. Additionally, an EDRMS brings, alongside significant costs, a host of system administrator and training overheads which would be sidestepped by this sort of approach.

Conclusion

Both these open source tools, which are free to download and took little longer than 30 minutes to get up and running, provide some really interesting and valuable functionality for records managers. Converting them from a desktop ‘toy’ to a scalable working component in a records management system or process is a much more difficult step, but my testing of DROID and XENA has certainly given me plenty to think about.

Testing the DPSP

I did a day’s testing of the Digital Preservation Software Platform (DPSP) in December. The process has been enough to persuade me it isn’t quite right for what we intend with our project, but that’s not intended as a criticism of the system, which has many strengths.

The heart of DPSP is performing the normalisation of office documents to their nearest Open Office equivalent, using Xena. Around this, it builds a workflow that enables automated generation of checksums, quarantining of ingested records, virus checking, and deposit of converted records into an archival repository. It also generates a “manifest” file (not far apart from a transfer list), and logs all of the steps in its database.

The workflow is one that clearly suits The National Archives of Australia (NAA) and matches their practice, which involves thorough checking and QA to move objects in and out of quarantine, in and out of a normalisation stage, and finally into the repository. All of these steps require the generation of Unique IDs and folder destinations which probably match an accessioning or citation system at NAA; there’s no workaround for this, and I simply had to invent dummy IDs and folders for my purpose. The steps also oblige a user to log out of the database, and log back in so as to perform a different activity; this is required three times. This process is undoubtedly correct for a National Archives and I would expect nothing less. It just feels a bit too thorough for what we’re trying to demonstrate with the current project, our preservation policy is not yet this advanced, and there aren’t enough different staff members to cover all the functions. To put it another way, we’re trying to find a simple system that would enable one person (the records manager) to perform normalisations, and DPSP could be seen as “overkill”. I seem to recall PANDORA, the NLA’s web-archiving system, was similarly predicated on a series of workflows where tasks were divided up among several members of staff with extensive curation and QA.

My second concern is that multiple sets of AIPs are generated by DPSP. Presumably the intention is that the verified AIPs which end up in the repository will be the archive copies, and anything generated in the quarantine and transformation stages can be discarded eventually. However, this is not made explicit in the workflow, neither is the removal of such copies described in the manual.

My third problem is an area which I’ll have to revisit, because I must have missed something. When running the Xena transformations, DPSP creates two folders – one for “binary” and one for “normalised” objects. The distinction here is not clear to me yet. I’m also worried because out of 47 objects transformed, I ended up with only 31 objects in the “normalised” folder.

The gain with using DPSP is that we get checksums and virus checks built into the workflow, and complete audit trails also; so far with our method of “manual” transformations with XENA we have none of the above, although we do get checksums if we use DROID. But I found the DPSP workflow a little clunky and time-consuming, and somewhat counter-intuitive in navigating through the stages.