Collections as Data: Hack-to-Learn
May 17, 2017, Library of Congress, LJ-119
May 18, 2017, George Washington University, Gelman Library
Do you use research strategies like text mining, social network analysis, data cleaning, or geocoding in your research? Do you have subject expertise in cataloging records, digital humanities, social media, or working with digital collections found in libraries, archives, or cultural heritage institutions? Are you curious about either?
In this two day hack-to-learn workshop, the Library of Congress, George Washington University, and George Mason University will join forces to demo research tools and provide data sets for hands-on exploration. You’ll walk away with exposure to a variety of tools and techniques, a community of support, and an idea of the possibilities of collections as data.
The enrollment period for this event is now closed.
Tools we'll demo include: OpenRefine, Voyant, MALLET, Gephi, and Carto.
Collections as data sets we'll work with include:
The Phyllis Diller Gag File: 52,000 transcribed index cards of the comedian's jokes or gags provided by the Smithsonian's National Museum of American History and Smithsonian Transcription Center.
The Library of Congress Retrospective MARC records: 25 million MARC records available by categories such as maps, music, name and subject authorities.
Eleanor Roosevelt's "My Day" columns: 8,000 transcribed documents representing Eleanor Roosevelt's nationally-syndicated newspaper column from George Washington University.
Tumblr End of Term Archive: Text and metadata from 56,864 tumblr blogs posts from 72 federal Tumblr blogs harvested as part of the End of Term Archive project.
Code of Conduct
[The following code of conduct is adopted from DPLAFest 2016]
Hack-to-Learn seeks to provide a welcoming, fun, and safe community and workshop experience for everyone. We do not tolerate harassment in any form. Discriminatory language and imagery (including sexual) is not appropriate for any event venue, including working groups, or any virtual environment such as the Hack-to-learn slack channel.
Harassment is understood as any behavior that threatens another person or group, or produces an unsafe environment. It includes offensive verbal comments or non-verbal expressions related to gender, gender identity, gender expression, sexual orientation, disability, physical appearance, body size, race, age, religious beliefs, sexual or discriminatory images in public spaces (including online), deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.
By participating in this event, you are agreeing to adhere to Hack-to-Learn’s code of conduct. The code of conduct does not confer any legal rights on the participants, including participants who are sanctioned for violating the code of conduct.
Schedule
Day 1 (Library of Congress, LJ-119)
8:30-9:00am Registration and open help for installing tools
9:00-9:05 Welcome - Eugene Flanagan, Director of National Programs, Library of Congress
9:05-9:10 Event Goals - Kate Zwaard, Chief, National Digital Initiatives, Library of Congress
9:10-9:20 Code of Conduct, day’s agenda and other info - Jaime Mears, Program Specialist, National Digital Initiatives, Library of Congress
9:20-9:25 Introduce MARC data set - Ann Della Porta, Supervisory IT Specialist, ILS Program Office, Library of Congress
9:25-9:30 Wrangling MARC data - Wendy Mann, Head, Data Services Group, George Mason University
9:30-9:35 Introduce Diller data set - Kate Zwaard, Chief, National Digital Initiatives, Library of Congress
9:35-9:40 Introduce Eleanor Roosevelt Papers data set- Christy Regenhardt, Editor, Eleanor Roosevelt Papers Project, George Washington University Columbian College of Arts and Sciences
9:40-9:45 TEI format- Chris Alhambra, Visiting Scholar and Consulting Digital Editor for the Eleanor Roosevelt Papers Project, George Washington University Columbian College of Arts and Sciences
9:45-9:50 Wrangling TEI - Justin Littman, Software Developer, George Washington University Libraries
9:50-10:00 Introduce Tumblr dataset, JSON format, wrangling Tumblr data - Laura Wrubel, Software Development Librarian, George Washington University Libraries
10:00-10:30 OpenRefine tutorial with Peggy Griesinger, Metadata and Cataloging Librarian, George Mason University Libraries
10:30-10:45 Break
10:45-11:15 Voyant tutorial with Amanda Regan, Ph.D. Candidate, Department of History and Art History, George Mason University
11:15-11:45 MALLET tutorial with Joshua Catalano, Ph.D. Graduate Student, Department of History, George Mason University
11:45-1:15 Lunch on your own
1:15-1:30 Group formation activity. Get started in hands-on work
1:30-4:15 Groups work together
4:15-4:30 Wrap-up
5:00-6:30 No-host social hour at Union Pub.
Day 2 (GWU, Gelman Library, National Churchill Library and Center)
8:00-9:00 Open help for installing tools
9:00-9:15 Welcome
9:15-9:45 Gephi tutorial with Jen Stevens, Humanities and Social Sciences Librarian, George Mason University
9:45-10:15 Carto tutorial with George D. Oberle III, History Librarian, George Mason University
10:15-10:30 Break
10:30-10:45 Facilitate groups getting together
10:45-12:00 Group work
12:00-1:00 Lunch on your own
1:00-2:45 Group work
2:45-3:00 Break, set up for sharing
3:00-3:05 Welcome by Geneva Henry, Dean of Libraries and Academic Innovation, George Washington University
3:05-4:15 Group share
4:15-4:30 Wrap-up discussion, what's next?
For any questions about hack-to-learn, please contact Jaime Mears, Program Specialist, National Digital Initiatives Division - Library of Congress at [email protected].
Request ADA accommodations five business days in advance at (202) 707-6362 or [email protected].
Back to top