The Indigenous Digital Archive is an open source crowdsourcing platform that includes tools to help prepare archival source material for the project, and tools to manage the taxonomy that emerges from it. The project is developed for the Museum of Indian Arts and Culture in Santa Fe. The archive will enable engagement with authentic public documents of community history, government actions, and civic life in New Mexico. The first phase will focus on open public records related to land and to the government Indian Boarding Schools from the late 1800s into the 1920s and 30s.
The project builds on digital images of over 120 linear feet of United States government records on captured on microfilm in the 1970s, amounting to approximately 0.5m pages of letters, reports, photographs and administrative material relating to the Indian Schools program in the South West from the Indian Wars of the 1800s to the 1930s. These records have never before been readily available to the people and communities they relate to. Native Americans and other volunteers will be directly working with the platform on this material as part of the crowdsourcing activities and the results will be available to all.
This source material is not described in fine detail in electronic format; the raw material comprises microfilm rolls each of which typically has over 1000 images. The platform aims to produce richly described and categorised records at a much finer level of detail - individual letters and reports, rather than units dictated by the microfilm medium.
The solution uses Digital Library Cloud Services (DLCS) components integrated with the Omeka S Collection Management System to provide a IIIF-powered collaborative archive platform. The structure of the archive itself is generated by users of the platform using the Sorting Room tool, archival units (individual letters, reports etc) are then identified from larger sets by volunteers and project administrators.
As record-size archival units are identified, they get fed through the text pipeline of the DLCS. Documents are assessed for their ability to be OCRed, text extracted where possible, and named entity recognition applied to that text. People, places, schools, tribes and other terms are extracted, and optionally matched to external controlled vocabularies. As more text is processed the growing body of entities can be refined and managed through the Topic Manager tool (see Applications section) - identifying OCR mistakes, misclassification of terms, duplicates and synonyms.
These automated and human-assisted processes quickly build up a navigable set of documents, using the recognised and refined entities to generate a generous interface, encouraging exploration and discovery. Users can add to the growing body of entities, tag other material not amenable to OCR, transcribe more documents by hand, comment on the archival units and individual images, and provide their own narratives for the subjects presented.
The project is funded by an IMLS National Leadership Grant and further prototyping funding from the Knight Foundation. The Museum of Indian Arts and Culture is conducting the Indigenous Digital Archive project in collaboration with the New Mexico State Library Tribal Libraries Program and the Indian Pueblo Cultural Center.