Initial definition of "PASTA-prototype online data"

We now have an initial, strawman definition of "data online". The word document was circulated and discussed by IMExec during their Dec 1 VTC, and verbally approved. It was sent it to NISAC but they did not have time to consider it during their late 2011 call.
This definition is close to what the PASTA developers have used as their starting point, and consistent with the recently Metacat search or browse results display. This proposal is only a first step, but it represents a minimal standard, and dataset features that can actually be quantified.

Proposed definition (word-doc version attached below):
With this definition, the following examples would NOT be considered online:
1. EML metadata in the Network catalog with a URL located at any other XPath
2. Data that do not have EML metadata in the Network catalog (no matter where else metadata or data may reside)
3. Data that the public must specifically request from an individual (e.g., "Type II" according to

Note that this means that the old "Discovery Level EML" would not be considered "online data". This is reasonable because discovery level is only online metadata. Generally, publishing only metadata advertises that certain research is occurring. It is likely that the network will still want to house it, but those policies are not part of the current discussion.

Once there is a basic definition of "online data", it can be refined further:

"Human-accessible online data" is the most basic, because a human can almost always interpret or guess how to use what s/he is given, for example a human can fill out a web form. Systems that have intervening forms (but eventually produce data via a web browser) are "human-accessible online data".

"Machine-accessible online data" would be a data package in which data can be directly accessed with no other intervention. We know that this can be complex -- not all machines can automatically access all URLs. And when URLs include web forms, we know that use-tracking systems can be designed so that they don't impede machine access. This request does not include use-tracking policies.

Request to NISAC: online_data_definition_2011.docx154.01 KB