September 4, 2008 -- In late 2003, engineers at Harvard University Library and JSTOR (external link) developed an open-source tool called JHOVE (external link) (the JSTOR/Harvard Object Validation Environment) to validate file formats. Since its release, JHOVE has gradually gained acceptance worldwide as an essential tool for format validation.
Format validation is crucial to digital preservation and access. If you don’t know what a file is, or if its integrity is damaged, you may not be able to read it or hear it or see its content.
JHOVE was designed to process a digital object and determine what the object claims to be (identification), if the object conforms to requirements (validation) and the properties of the object (characterization). When JHOVE finds a file that it cannot validate, it flags the file. Though the process is automated, only a human can decide whether to accept the file as is or try to get a better version.
JHOVE has gained international acceptance as an essential tool for format validation. It is easy to install and run. Some users embed the JHOVE Java code into their existing system and integrate it into their digital-preservation workflow.
As adoption spread so did awareness of the original tool's limits. "We came to realize a number of shortcomings," said Stephen Abrams, who was one of the Harvard University Library engineers at the heart of the JHOVE project. "Some things we now know we could've done better and some things we just didn't have the opportunity to do."
Equipped with a new set of requirements and support from the Library, Abrams, and colleagues at Portico and Stanford began work on JHOVE2. Their goals are to:
- Change the JHOVE architecture to get better performance, enable more simplified system integration and encourage third party development and enhancement
- Provide significant new functions
- Implement existing and new functionality.
The terminology in JHOVE2 has changed a bit from JHOVE. Identification and validation are the same but characterization is now called feature extraction, which Abrams explains as, "Being able to examine formatted objects and extract and report on their salient internal properties."
A new function, assessment, will subjectively determine acceptability under local policy rules (by contrast, validation objectively determines acceptability with a "yes" or "no" answer to each requirement). In other words, a file might not be perfect but the assessment function of JHOVE2 can tell you if the file is good enough to keep or not.
The JHOVE2 engineers are in the prototyping phase. JHOVE2 should be available in early 2010 under an open-source license.