Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > ContextMiner: A Metadata Collection Tool

November 6, 2008 -- The University of North Carolina at Chapel Hill has developed a Web-based archiving tool as part of its NDIIPP VidArch project (external link). The tool, ContextMiner, enables users to collect links to blogs and online videos along with extensive metadata.

A user begins by creating a scheduled, repeated collection activity called a "campaign." ContextMiner has over 20 optional descriptive and administrative metadata fields for each campaign. The user then selects the content source from which ContextMiner will extract the data. Currently the options are YouTube and blogs but UNC plans to expand content options in the future.

Next the user creates keyword queries. For example, if the collection centered on Louis Armstrong the user might ask ContextMiner to query YouTube and blogs for the keywords "Louis Armstrong," "New Orleans trumpet" and "Satchmo." The user schedules ContextMiner to query sources daily, weekly or monthly. ContextMiner can also download contextual information such as tags, Web links and numbers of views.

ContextMiner displays the query results in tabled records. Each record contains hyperlinks: embedded links to YouTube videos, links to blog pages and links to related Web sites.

ContextMiner does not download and archive videos or blog pages; it only links to the Web source. The developers hope to eventually offer tools and policies for exporting and sharing videos, blog pages and metadata. More information is available at http://www.contextminer.org/ (external link).