LTER Best Practices for Units -- DRAFT

Todd Ackerman, Hap Garritt, Margaret O'Brien, Mark Servilla, Karen Baker, Mason Kortz, James Conners

We anticipate approximately 3-5 pages briefly describing units and how to create a customUnit. Units will be compliant with eml. Initially, the units document will be located at this node (news/committees/working_groups/unit_dictionary):

but after acceptance from the community, it should be recognized as an LTER Best Practices for Units and related to EML Best Practices for LTER sites that can be found at im_practices/metadata.

depricated: Mar 2008, draft and outline; EML-2.1 release is expected by early October 08
Feb 2009: comments from Units WG
Mar 2009: release to larger test group

Unit Best Practices - New Draft Version started 2009 Feb 11

Overview and Recommendations
1. Introduction
2. Unit Standards
2.1. SI Units
2.2. Units in ISO
2.3. Units in STMML
2.4. Units in EML
3. LTER Site Implementation of Units
3.1. IM approach: EML Best Practices
3.2. Unit Challenges
3.2.1. Unit/attribute category overlap
3.2.2. Standard/custom unit duplication
3.2.3. Naming irregularities
3.2.4. Inappropriate use of unit field
3.2.5. Improperly defined unit (check ?? TMA: I think this originally referred to what is now 3.2.4)
4. Approaches
4.1. Community Unit Dictionary
4.2. Community Unit Registry
5. Best Practices for LTER Site Unit Creation
5.1. Unit Creation and Naming Conventions
5.2. EML Representation of Created Units
6. Future Issues
6.1. Clarifying attribute names
6.2. Vetting
6.3. Coordinating with Other Dictionaries
6.4. Automated conversions
7. Annotated Resource List

Overview and Recommendations

As a community with an understanding of the central role of semantics for data interoperability, the Long-Term Ecological Research (LTER) Information Management Committee (IMC) has taken on the challenge of refining metadata practices by focusing on standardizing scientific units used in practice. Units are a fundamental but problematic aspect of metadata capture. In doing so we recognize the following three community activities as an important contribution to the process of standards-making and data integration: 1) the identification of units locally, 2) collection of units into an accessible repository as an LTER network dictionary representing community conventions, and 3) development of a registry for managing units over time. In light of these activities, the LTER IMC recommends:

• Development of an LTER network-wide unit dictionary in accordance with ISO recommendations in order to identify and encourage community conventions as well as to extend the unit dictionary currently incorporated in EML 2.0.1 by creating an LTER community repository including all LTER dataset units.

• Design and development of a LTER registry for units so that local custom units are recognized at a site level and also as part of a cross-site collection.

• Establishment of a unit registry process for custom units to be vetted and designated as community units. Such a process can play a role in supporting ongoing scientific work, developing community conventions, and informing national standards-building efforts.

1. Introduction

With a developing understanding of the central role of semantics for data interoperability, the Long-Term Ecological Research (LTER) Information Management Committee (IMC) is addressing the challenge of refining metadata practices by focusing on units. Units specify a quantity of measurement and are fixed by a definition. They are independent of external conditions and provide a basis for standardizing the measurement of entities, enabling intercomparison of those measurements. Most units follow the rules of algebra. It is possible to both convert and to relate units automatically if they are well defined.

Scientific units represent a broader set of measures than the weights and sizes originally developed for commerce. A set of physical units has become well defined and recognized at a cross-community level though implementations may still vary. Biological units today, on the other hand, are frequently used according to local convention rather than having agreed-upon definitions. As a result there is semantic uncertainty associated with existing units as well as with new units. In addition to semantic uncertainty, structural ambiguity is another issue. Digital representations of units must also be aligned for automated conversions and cross-system comparisons, representing another realm of coordination and convergence of practices.

This ‘state of the units’ is sometimes considered related to domain maturity but often results from a combination of many other factors. Unit standardization is directly affected by the complexity and evolution of measured entities and measurement methods, coordination and communication of practices and processes, and a lack of awareness of existing resources and/or of technical infrastructures. The importance of the human-in-the-loop also bears on standardization efforts as both education and misconceptions can propagate through communities.

How a standard is developed and implemented varies and has ramifications in terms of enactment as well as for subsequent developments (Star and Lampland, 2009; Millerand et al, 2005). The aim of the LTER Unit Working Group (Unit WG) is to develop a set of best practices for the LTER Network that will guide development of a consistent set of network-recognized units with an LTER Unit Registry.

2. Unit Standards

A formal technical standard is a controlled artifact or established norm that specifies criteria or methods that guides practices. The implementation and enactment of standards is an ongoing process. Standards can develop in many ways, for example as a feedback system involving both a process of consensus building over time or as a declaration at a particular moment in time.

Numerous scientific units exist as local conventions or working standards that may be in use by an individual, a group, or a community. A local convention may influence the development of standards for standards may change, that is, they may be changed.

2.1. SI Units
For units there is a widely recognized Système International d'Unités (SI; for physical quantities with seven well-defined base units for measures that are essentially independent (ie meter, kilogram, second, ampere, Kelvin, mole, and canadela). Derived units are formed as products of powers of the base units and according to algebraic relations linking the quantities concerned. Non SI units may have a relation to an SI unit called the ‘parent SI’, described via a multipler and a constant.

2.2. Units in ISO
An International Standard ISO 31 on ‘Quantities and units’ ( provides a style guide for use of units of measurement. International bodies are working together on standards that may supersede the existing standards with ISO 80000 or IEC 80000. Note, this represents an example of standards formation as an ongoing process.

2.3. Units in STMML
In today’s digital realm, a markup language for scientific, technical and medical publishing (STMML) has been developed as an XML Schema with a units component (Murray-Rust and Rzepa;

2.4. Units in EML
The Ecological Metadata Language (EML) is a metadata specification that has been adopted by a number of environmental science communities. There is a place for units in EML. EML documentation states:

" The authors decided to sharpen the model of attribute by nesting unit under measurementScale. Measurement Scale is a data typology borrowed from Statistics that was introduced in the 1940's. Under the adopted model, attributes are classified as nominal, ordinal, interval, and ratio. Though widely criticized, this classification is well-known and provides at least first-order utility in EML. For example, nesting unit under measurementScale allows EML to prevent its meaningless inclusion for categorical data -- an approach judged superior to making unit universally required or universally optional.

" The sharpening of the attribute model allowed the elimination of the unit type "undefined" from the standard unit dictionary (see eml-unitDictionary.xml). It seemed self-defeating to require the unit element exactly where appropriate, yet still allow its content to be undefined. An attribute that requires a unit definition is malformed until one is provided. The unit type "dimensionless" is preserved, however. In EML 2.0, it is synonymous with "unitless" and represents the case in which units cannot be associated with an attribute for some reason, despite the proper classification of that attribute as interval or ratio. Dimensionless may itself be an anomaly arising from the limitations of the adopted measurement scale typology. ("

In EML, a unit may be designated to be either 'standard' or 'custom'. Standard units are well defined by the existing unit dictionary in EML; custom units are those units not currently in the existing dictionary, though these would benefit from being defined within communities as well.

3. LTER Site Implementation of Units

3.1. IM approach: EML Best Practices
In contrast to a standard, a best practice represents a local convention known as a defacto, working or local standard. The IMC created an EML Best Practices ( to address the complexity of issues involved with implementation of EML across the LTER network. This document represents a coordination mechanism across the LTER sites, adding specificity through guidelines of how the EML fields will be interpreted and implemented for the LTER network. For example, the tag could be interpreted as a single field campaign bounded in time or could be interpreted as a long-term project with an array of field campaigns.

The EML Best Practices offers a phased plan of metadata implementation by grouping metadata into five tiers with each tier adding complexity and richness to the metadata record. The plan can be seen as a response to the need to develop metadata for datasets in phases over time rather than a comprehensive metadata all at once. The LTER environment is especially complicated due to it’s long-term nature as both legacy and emerging measurements call for diverse and evolving units. The sheer number of datasets adds complication as broad metadata standards are in continual development alongside the practical development of not only unit and attribute conventions but also standardization processes.

While units are mentioned in the EML Best Practices document, the unit dictionary incorporated in EML was not considered critically in terms of use or of practice. Units conforming to EML are a required element for submission of dataset. Subsequent to the publications of the EML Best Practices, the majority of datasets have been put into EML with varying unit interpretations and EML designations for unit components. As the datasets have become available for discussion and comparison across sites, the community dialogue about units has developed.

3.2. Unit Challenges
The Units Task Force examined approximately 60% of site units submitted to the KNB Metacat (called "candidate units"). Five common errors were identified from these submissions that involved the choice and the description of candidate units. These errors have to do with the unit/attribute relationship, the standard/custom overlap, naming irregularities, and the inappropriate use of the ‘unit’ field.

3.2.1. Standard/Custom unit duplication
A standard unit is available but was not used. This could be due to many factors: legacy units developing at sites independently without easy reference to existing units, a unit created and assigned before a new dictionary version is released, or a misunderstanding of EML unit fields.
Example: The unit "meter" is in the existing EML dictionary but a custom unit "m" (for meter) was created.
Reference: See LTER Unit Best Practice 5.2.1

3.2.2. Unit/Attribute category overlap
In practice the use of units and attributes is often blurred. In computer sciences, attribute is a specification that defines a property of an object, element or file ( Attributes in EML - often referred to as variables, parameters, columns or field names in the environmental sciences - have a unit assigned when defined as measurement type of interval or ratio. In local scientific conventions, there is often a blending or overlap between units and attributes. On a structural level and for an unambiguous comparison of measurements, the attribute and units must be distinguished.

The candidate was called a custom unit, but (to the best of our knowledge), is really an attribute (example 1), or part of the attribute was included in the unit name (examples 2 and 3). The correct unit may have been already available as a standard unit (e.g. num/inch for example 1).

Example 1:
Attribute: bacterial abundance
Correct Unit: num
Improper unit: bacteria
Example 2:
Attribute: primary production
Correct Unit: mg/m3/day
Attribute description, Column name or subject qualifier: carbon
Improper unit: mgC/m3/day
Example 3:
Attribute: short shoot growth
Correct Unit: num/cm2
Improper unit: short shoots per cm2
Reference: See Appendix NIST#11; see LTER Unit Best Practices 5.2.2

3.2.3. Naming irregularities
A custom unit is created that does not follow naming conventions (see section 5.1).
Correct: metersSquared
Improper: squareMeters
Reference: See LTER Unit Best Practices 5.2.1-3

3.2.4. Inappropriate use of unit field
An attribute is given an incorrect measurement type referencing a unit when it is not allowed. Measurement types are: interval (anything without a meaningful zero, ie degrees Celsius), ratio (anything a calculation can be performed on), datetime (variety of formatted time), nominal (ie text station name), ordinal (ie pH). Of these five only interval and ratio allow for units. An attribute is described as measurementType "interval" or "ratio" but should be datetime, nominal or ordinal.
Dates are sometimes entered with units given as "yyyy", or "month". Dates (and date parts) should be typed as "datetime" with the notation "yyyy" as the pattern.
Reference: See LTER Unit Best Practices 5.3

3.2.5 Obsolete Terms NIST #22 (check???)
3.2.5. Improperly defined units (check ?? TMA: I think this originally referred to what is now 3.2.4)
the candidate is legitimate custom unit, but not described correctly with respect to attribute and stmml
examples: (coming soon)

4. Approaches

Successful data integration requires units to be defined clearly and consistently, and to be understood across communities and information systems. Syntactic guides are required for naming and abbreviating units. Dictionaries and unit registries provide semantic and structural guidelines to communities, minimizing and preventing inconsistencies in use and definition of units.

4.1. Community Unit Dictionary
A dictionary is a list of words with information about their definitions and characteristics. Having a unit registry associated with an LTER unit dictionary is a design feature that transforms the dictionary from a passive static list to an interactive community tool - a living dictionary.

Data dictionaries provide a mechanism to gather and preserve information about field observations as well as to inform both data collectors and data users. Unit and attribute dictionaries represent an organizational strategy and are one element of an information infrastructure. As we gain experience with the scope of our local and community data as well as with information classification, we begin to build an understanding of data typologies, units, and attributes. The process of creating dictionaries establishes a unique setting for dialogue between information system requirements, information managers, and earth scientists. A dictionary can create a bidirectional forum - one of both elicitation as well as prescription. It serves as a mechanism prompting self-organization; with an explicit organization, it exerts control. What's in the dictionary informs, yet is subject to discussion and update when appropriate processes are in place. So after all, is a dictionary just a controlled vocabulary list? Or is it a moderated forum informed by community needs? Dictionaries are an infrastructure element that may be enhanced by technical access, organizational flexibility, and community use.

Dictionaries are one of a suite of semantic tools for developing local and federated information infrastructure. In the semantically rich and chaotic realm of observational research, dictionaries serve as a point of engagement for participants in preparing for data sharing. They provide a place for data collectors to begin engaging with local community expectations and to align with often-complex community requirements. Technically, digital dictionaries also provide a common structure, providing a starting point to data description and system interoperability.

4.2. Community Unit Registry
Having a unit registry associated with an LTER unit dictionary is a design feature that transforms a dictionary from a passive static list to an interactive community tool - a living dictionary. In order to move from a static to a dynamic or ‘living’ dictionary, a bridge between development of units at sites and a process for addition of units to the network list is required. Three design features are being developed that essentially define a community unit registry from which a dictionary can be provided.

The first feature is providing easy access to the registry. Easy access includes both web access through a human-readable interface providing query from multiple perspectives and web services allowing remote query of the unit dictionary for working units and of the registry for access to the unit-creation process. Easy access facilitates the significant change in practices that collaborative endeavors require, offering a point of engagement that enables site-level sensitivity.

The second feature is the concept of 'scope'. The scope opens up unit work at multiple levels. Scope represents a strategy to define the working acceptance of a specific unit within the community. For example, when a site submits a term, it is flagged as "site-level" (e.g. PAL-LTER). Site-level implies that the unit is accepted only at the site for use in their metadata. Units that are under review by the Dictionary Working Group are designated as 'DWG', approved at the Network level as 'US-LTER', and at the cross-community level as 'EML'. The initial unit dictionary included all units defined within the EML 2.0.1 specification, and thus have an EML designation. Such a plan is technically straightforward though it will take extensive community and organizational work to enact (Baker et al, 2006).

The third feature is the concept of ‘vetting’ that provides a guarantee that the unit is considered valid by a group at site or network level depending upon the scope. If a unit conforms to the best practices, and is not a duplicate of an existing unit, the ‘LTER Network’ scope will be added to that unit. This scope indicates that the unit has been reviewed and may be used with confidence.

5. Best Practices for LTER Site Unit Creation

One goal for any community in need of unit conventions – local or larger-scale - is to educate community participants about how to create a valid unit. The definition of a Custom Unit is ultimately up to an individual or a group of participants since a unit may be proposed, defined and used locally at any time. For the LTER unit registry, any unit may be registered as a site-specific unit. However, there are points at which formalization of unit creation and registration practices becomes beneficial, giving guidance to the creator and enabling faster, more accurate unit definitions and comparisons. Community Unit Best Practices are a representation of previous lessons learned and enable site interactions.

5.1. Unit Creation Conventions
A digital registry goal is to make a community dictionary of terms easily available as a reference so that a unit creator or metadata author can easily check whether a needed unit is already defined at the site level or at the network level. Note, in checking a community repository before creating a unit, a site is moving from independent site-based work to site-level and network-level work where the levels are understood as interdependent. LTER IMC unit creation conventions for Custom Units:

5.1.1. Check what others in the community are doing before creating a new unit.
5.1.2. Place only the most broad and essential measurement information in the unit, all other information is moved to the attribute level.

5.2. Naming Conventions
LTER IMC naming conventions for Custom Units:
5.2.1 Name first the unit then the modifier, i.e. 'meterSquared' rather than 'squareMeter'. this applies to each element in the unit name if multiple base units are brought together into a derived unit, ie. gramPerMeterSquaredPerSecondSquared
5.2.2. Use ‘per’ as a linking term in the unit definition, ie, ‘gramPerMeterSquared’.
Use camel case notation, ie ‘countPerMeterSquared.
5.2.2. Spell out the unit rather than depending on abbreviations or shorthand.
5.2.3. Do not include numbers or special characters in the unit name.
5.2.4 Numbers are allowed are in the unit abbreviation but not symbols or special characters.
5.2.5 Singular terms are preferred over plural terms (ie. gram vs. grams)

5.3. EML representation of created units
Murray-Rust and Rzepa state for scientific units “It is likely that several groups will develop a variety of approaches. STMML supports units through dictionaries so that is easy for a community to create new units. Sufficient information is held to manage dimension analysis, and to support the "dimensionless" unit in a richer manner.”

Units are represented in EML using STMML tags , and . The marks the begin and end of a list of the units. Units have an associated , that represents a set of units with a name. Units are described with the name and description tags

string describing the unit

The relationship of the unit to SI (parent, multiplier) must be designated if there is to be support for unit conversions.

6. Future Issues
Further work is required on a number of issues.

6.1. Clarifying attribute names and their description
There remains a gray area of what is a unit and what is an attribute as well as of what goes into an attribute name and how to approach describing an attribute more fully.
Attribute naming is described in the LTER best Practices:
1) attributeName: the local name for a field in a table; often short or
an abbreviation (eg temp)
2) attributeLabel: provides a full or long name (eg temperature)
3) attributeDefinition: an unambiguous description (eg air temperature); should be used for describing/qualifying chemical molecules or elements associated with particular units of the attribute name, ie primary production as mgC/m3/day or ammonium concentration ug/L as NH4-N

6.2. Vetting
With a unit registry, a question to address is who will be responsible for determining whether a unit meets the conventions laid out in the Best Practices? A group or individuals will need to be identified to review new units so that a unit registered initially with a scope of site-level can be validated at the network-level. This needs to be a long-term effort.

6.3. Coordinating with other dictionaries
Many groups that represent diverse levels of organization have dictionaries of various levels of flexibility and with different update processes. In time, processes for bringing these together will develop. The concept of ‘scope’ may play a role in the coordination.

6.4. Automating unit conversions
The EML unit schema identifies each unit with a parent unit when possible. Automated conversions are therefore possible between parent and child units and may become part of the queriable web interface or of web services at some point.

7. Annotated Resource List

7.1. An NIST reference on constants, units and uncertainty. It provides a bibliographic reference, a guide to SI units, and a document on unit conventions.

7.2. The Bureau International des Poids et measures (BIPM) website with a SI document with guidelines for creating units and also two non-SI units lists giving recommended (ie time, plane, angle, area, volume, and mass) and non recommended units.

7.3 An article introducing STMML a markup language for scientific, technical and
medical publishing.

7.4 A list of standardUnits included in EML2.0.1.

7.5 CUAHSI units and other controlled vocabularies:
as well as the CUAHSI Units and Observation Data Model (ODM):
and best practices for adding data to CUAHSI is provided: ;


The National Institute of Standard and Technology (NIST) provides a reminder of the uncertainty involved in units as well as set of unit rules in the form of a checklist
A number of the points on this checklist address specific issues that have arisen frequently in recent LTER community discussions of units. These eighteen are listed below for easy reference.

NIST#2 Abbreviations
Abbreviations such as sec, cc, or mps are avoided and only standard unit symbols, prefix symbols, unit names, and prefix names are used.
proper: s or second; cm3 or cubic centimeter; m/s or meter per second
improper: sec; cc; mps

NIST#3 Plurals
Unit symbols are unaltered in the plural.
proper: l = 75 cm
improper: l = 75 cms

NIST#5 Multiplication & division
A space or half-high dot is used to signify the multiplication of units. A solidus (i.e., slash), horizontal line, or negative exponent is used to signify the division of units. The solidus must not be repeated on the same line unless parentheses are used.
proper: The speed of sound is about 344 m·s-1 (meters per second)
The decay rate of 113Cs is about 21 ms-1 (reciprocal milliseconds)
m/s, m·s-2, m·kg/(s3·A), m·kg·s-3·A-1
m/s, m s-2, m kg/(s3 A), m kg s-3 A-1
improper: The speed of sound is about 344 ms-1 (reciprocal milliseconds)
The decay rate of 113Cs is about 21 m·s-1 (meters per second)
m ÷ s, m/s/s, m·kg/s3/A

NIST#6 Typeface
Variables and quantity symbols are in italic type. Unit symbols are in roman type. Numbers should generally be written in roman type. These rules apply irrespective of the typeface used in the surrounding text. For more details, see Typefaces for symbols in scientific manuscripts
proper: She exclaimed, "That dog weighs 10 kg!"
t = 3 s, where t is time and s is second
T = 22 K, where T is thermodynamic temperature, and K is kelvin
improper: He exclaimed, "That dog weighs 10 kg!
t = 3 s, where t is time and s is second
T = 22 K, where T is thermodynamic temperature, and K is kelvin

NIST#7 Typeface
Superscripts and subscripts are in italic type if they represent variables, quantities, or running numbers. They are in roman type if they are descriptive.
subscript category typeface proper usage
quantity Italic cp, specific heat capacity at constant pressure
descriptive roman mp, mass of a proton
running number italic

NIST#8 Abbreviations
The combinations of letters "ppm," "ppb," and "ppt," and the terms part per million, part per billion, and part per trillion, and the like, are not used to express the values of quantities.
proper: 2.0 µL/L; 2.0 x 10-6 V;
4.3 nm/m; 4.3 x 10-9 l;
7 ps/s; 7 x 10-12 t,
where V, l, and t are the quantity symbols for volume, length, and time.
improper: "ppm," "ppb," and "ppt," and the terms part per million, part per billion,
and part per trillion, and the like

NIST#9 Unit modifications
Unit symbols (or names) are not modified by the addition of subscripts or other information. The following forms, for example, are used instead.
proper: Vmax = 1000 V
a mass fraction of 10 %
improper: V= 1000 Vmax
10 % (m/m) or 10 % (by weight)

NIST#10 Percent
The symbol % is used to represent simply the number 0.01.
proper: l1 = l2(1 + 0.2 %), or
D = 0.2 %,
where D is defined by the relation D = (l1 - l2)/l2.
improper: the length l1 exceeds the length l2 by 0.2 %

NIST#11 Information & units
Information is not mixed with unit symbols or names.
proper: the water content is 20 mL/kg
improper: 20 mL H2O/ kg
20 mL of water/ kg

NIST#12 Math notation
It is clear to which unit symbol a numerical value belongs and which mathematical operation applies to the value of a quantity.
proper: 35 cm x 48 cm
1 MHz to 10 MHz or (1 to 10) MHz
20 °C to 30 °C or (20 to 30) °C
123 g ± 2 g or (123 ± 2) g
70 % ± 5 % or (70 ± 5) %
240 x (1 ± 10 %) V
improper: 35 x 48 cm
1 MHz-10 MHz or 1 to 10 MHz
20 °C-30 °C or 20 to 30 °C
123 ± 2 g
70 ± 5 %
240 V ± 10 % (one cannot add 240 V and 10 %)

NIST#13 Unit symbols & names
Unit symbols and unit names are not mixed and mathematical operations are not applied to unit names.
proper: kg/m3, kg · m-3, or kilogram per cubic meter
improper: kilogram/m3, kg/cubic meter, kilogram/cubic meter, kg per m3,
or kilogram per meter3.

NIST#14 Numerals & unit symbols
Values of quantities are expressed in acceptable units using Arabic numerals and symbols for units.
proper: m = 5 kg
the current was 15 A
improper: m = five kilograms
m = five kg
the current was 15 amperes

NIST#18 Standard symbols
Standardized quantity symbols are used. Similarly, standardized mathematical signs and symbols are used. More specifically, the base of "log" in equations is specified when required by writing loga x (meaning log to the base aof x), lb x (meaning log2 x), ln x (meaning loge x), or lg x (meaning log10 x).
proper: tan x
R for resistance
Ar for relative atomic mass
improper: tg x for tangent of x
words, acronyms, or ad hoc groups of letters

NIST#22 Obsolete Terms
The obsolete terms normality, molarity, and molal and their symbols N, M, and m are not used.
proper: amount-of-substance concentration of B (more commonly called concentration of
B), and its symbol cB and SI unit mol/m3 (or a related acceptable unit)
molality of solute B, and its symbol bB or mB and SI unit mol/kg
(or a related unit of the SI)
improper: normality and the symbol N, molarity and the symbol M
molal and the symbol m

Taylor, ed, 1995. Guide for the Use of the International System of Units (SI). (sp811.pdf)412.71 KB
Taylor, ed, 2001. The International System of Units (SI). (sp330.pdf)813.13 KB
Murray-Rust and Rzepa, 2002. STMML. A Markup Language for Scientific... (ds121.pdf)239.98 KB
List of standardUnits in CVS format: standardUnits_EML201.txt10.34 KB



Please check the forum section for current info on this group.