Repositories
From digitalHPS
Contents |
A Repository for the Digital HPS Community
Central Repository
A. Characteristics of the central repository:
- able to store all digital objects with associated metadata
- Interoperable with others easily
- straightforward and adaptable
- able to be converted to more sophisticated uses in the future -- flexible
- quickly and easily serve the metadata and digital objects to secondary display sites such as History of MBL or Embryo Project
- accept digital objects and associated metadata with a easy-to-use GUI
- have excellent quick and advanced search feature
- are triple-stores necessary -- yes
- Others will use the repository and pay a service fee to do so
- Submitters will be able to retain their own rights -- rights management
- Federated search
- Less than 1 million objects
- Minum 5 G up to 3 hr high def
- Some relationships are essential all photos by Hutner of Morgan at MBL
- Versioning is important
- Object access -- public, dark and password accessible -- automatically rolling access
- Easy to limit search by collection (ie only stuff from MBL)
- Search
- Good API
- Triple store????
B. Possible solutions for central repository
- Fedora Commons http://www.fedora-commons.org/
- there is a Fedora-Omeka plugin currently in the works, which would allow Omeka sites to access Fedora backends. http://omeka.org/codex/Plugins/FedoraConnector
- Dspace http://www.dspace.org/
- ir+ : http://code.google.com/p/irplus/
- Berkeley Electronic Press http://www.bepress.com/ir/
- ePrints http://www.eprints.org/software/
- eSciDoc https://www.escidoc.org/
- Just adding a potential solution here. The MPIWG is somewhat invested in eSciDoc and eagerly awaiting a "useable" version. As the MPIWG explained to me, the use of eSciDoc would be to hold all their stuff (as a real repository), but middleware solutions such as the eXist XML repository would be utilized between the user interface and the repository.
- Home-grown repository
????
Metadata
Use widely accepted metadata standards for digital objects.
A. Characteristics of metadata
- universally or widely used standards so we can exchange data easily with others
- metadata should be complete so future additions are not necessary
- flexible enough to be used for a variety of digital objects, such as text, images, moving images
- can design crosswalks to bring in metadata from other sources easily
- Includes: info about repository, rights, ownership and use
- Identification of objects with DOIs, or handle -- Felicity will check this out
- Individual projects may have an ISSN --
- tag field available
- Minimal metadata required
B. Possible metadata standards to use
- Dublin Core http://dublincore.org/specifications/
- DC seems like the minimal standard. Everyone (or almost everyone) uses DC at a minimum and often adds other metadata according to needs.
- Modified Dublin Core
- MODS (used for EP) http://www.loc.gov/standards/mods/
- VRA Core (mostly for visual objects) http://www.vraweb.org/projects/vracore4/
- EAD (Encoded Archival Description framework) http://www.loc.gov/ead/
- this is the standard for archival items, so if the repository will be an archive, using EAD might make sense
?????
Public web sites
The digital objects and metadata in the repository can be used to create public websites to inform, educate
A. Characteristics of public website framework
- Flexible
- Easy to use
- Accepts user input
- Standards about how the repository is credited and upkeep
- Personalized user space, like MyMBL (I see this as a web site component, not a repository component -- is this right??
B. Possible solutions for public website framework
- OMEKA http://www.omeka.org/
- Drupal http://drupal.org/
- Custom created
????
Management
The repository will be stored at the MBL, maintained by the SIG, with extensive support from ASU and the Embryo Project. Jobs/roles will need to be assessed for a number of different areas of the repository, including management, curation, etc. (List other roles)
- What is the expected work flow?
- Who will add digital objects? Trained and certified contract agreement separate user projects
- Who will add metadata?
- Who will build the websites?
Issues connected with Repositories for Digital HPS:
1. Rationale for use of a repository framework:
=> the main goal for all these projects is durability
=> in this context a Repository has advantages over a Relational database because we deal with:
o Generally unstructured data, qualitative information
o Extensibility of the repository framework (not dependent on a previously defined structure or schema)
o Administrative metadata that include:
§ Audit trail
§ Checksum
§ Versioning
§ Provenance
Object types
- Object classes: finite number of object classes
o Validation
o Abstraction of structure to facilitate interoperability
o Abstracted views of the content
§ Abstracted views of an object (new instantiations)
- Objects may be dynamic: the relationship graphs to objects will change constantly
Reliable citation
- Persistent identifiers
- Versioning/citable versions; ability to understand changes in objects
- Finely grained policy enforcement
Good APIs: simple, language independent, etc.
- Hide complexity, admin functions
Services
- Use cases to identify and specify tools and services This page has links to some useful eSciDoc use cases (“usage scenariosâ€): http://www.escidoc.org/JSPWiki/en/ScholarlyWorkbench
- Open APIs
Security, access management
- Necessary conditions for collaboration
o AuthN/AuthZ
o Workflows need to accommodate collaboration