Metadata database models and EML creation at LTER sites

M.Gastil-Buhl (MCR) from contributions by D.Henshaw & S.Remillard (AND), J.Laundre (ARC), J.Walsh (BES), P.Tarrant (CAP), K.Baker, M.Kortz & J.Conners (CCE/PAL), D.Bahauddin (CDR), J.Chamblee (CWT), L.Powell (FCE), W.Sheldon (GCE), J.Campbell (HBR), E.Boose (HFR), K.Ramsey (JRN), S.Bohm (KBS), A.Skibbe (KNZ), E.Melendez-Colom (LUQ), S.Welch (MCM), C.Gries (NTL), H.Humphries (NWT), H.Garritt (PIE), M.O’Brien (SBC), K.Vanderbilt (SEV), N.Kaplan (SGS), J.Porter (VCR), I.San Gil (LNO/NBII)

This poster includes diagrams of metadata data models and a survey of all sites asking

  1. which relational DB management system (RDBMS) they use to store metadata
  2. how do they create EML
  3. past or potential collaboration with other LTER sites
  4. if they participate in the Drupal Environmental Information Mangement System (DEIMS) group
  5. if they use EML to populate their local data catalog
  6. multiplicity of data tables per dataset

The purpose of this survey is to gather materials to start discussion.

As we prepare for data integration we will each examine our IM System to ask if it will meet potential new metrics, such as Metadata-Data Congruency, to meet future network-level synthesis needs.

Some LTER sites already have such IM Systems in place. Those sites have succeeded because their architecture undergoes continuous development. Although those systems have proven agile enough to evolve with their own sites increasing needs, how tightly coupled are they to their DBMS and scripting language? Can we port existing models to other LTER sites?

Metadata-data congruency can be enhanced when the data are included within an architecture that coordinates data and metadata databases. However, this poster focuses only on metadata, not the data per se, as a starting point.

All LTER sites share some common things ("entities" in database design jargon):

  • Publications
  • People
  • Sites
  • Taxa
  • Keywords
  • Studies or Projects
  • Datasets
  • Data Tables
  • Measurements
  • Units
  • Attribute Names

The Entity-Relationship Diagrams (ERD) show how these relate to each other. Three ERDs are shown in the poster as examples of different designs that model the same relationships.

All sites need to present metadata on websites and EML documents and other uses.

Longevity and Continuing Design
Some LTER sites’ models designed in the 1990s are still in use today, such as at VCR and AND, having migrated to new servers and new applications with changing technology. They remain useful because their schemata inherently model the characteristics of metadata and through continuing design to keep pace with evolving standards.

GCE Metabase, the AND Metadata Database, and DataZoo at CCE/PAL are three examples of mature models, in production, and part of a larger IM System at these LTER sites. These models continue to undergo improvements. Web page display is just one of their uses. EML is currently generated by scripts from all three of these metadata databases. The AND and GCE metadata model designs pre-dated EML; the extraction of EML was developed after the initial design. EML is just one of several metadata standards these systems were designed to serve.  All three undergo continuing development.

EML generated from the constrained model of a database is more likely to meet future metrics, especially if the data itself is filtered through a connected system.

Web services are changing our options for development and use of data and metadata. The Unit Registry web service will soon be followed by the Controlled Vocabulary of Keywords and then subsequently by the NIS Administrative modules (bibliography and personnel). With this approach, sites may connect to services, replacing or synchronizing those parts of their local database. How will this affect our metadata database architecture?

Several sites are looking to participate in future development of metadata data models.

The GCE Metabase has been adopted by CWT and is planned to be ported to PostgreSQL for use by MCR and SBC.

Six LTER sites (LUQ, SEV, PIE, ARC, NTL, VCR) have pooled resources to develop a Drupal-based metadata storage, display and EML creation system. Legacy EML from LUQ, SEV, and NTL has been uploaded to the Drupal back-end database. This is now in use to serve web pages for these sites. PIE and ARC are in line next. Export to EML is being programmed currently.

The scope of this poster is limited to metadata data models, not storage of research data, even though the coupling of these is important to this discussion. Only relational databases are included, although some sites use native XML databases, such as eXist at CAP. GIS metadata is not discussed beyond <geographicCoverage>.

Generic metadata ERD ppt files attached below

generic_metadata_ERD.ppt33 KB
Poster_Gastil_et_al_IMC2010.pdf271.52 KB
MCR_generic_metadata_ERD.ppt90 KB
Metadata_Data_Model_Discussion.pdf64.14 KB
IMC_Session_VII_notes_mob.pdf41.24 KB
GCE_generic_metadata_ERD.ppt102 KB
Data_models2010_jpNotes.doc28 KB