The GCE Data Toolbox is a MATLAB software library that supports metadata-based analysis, visualization, transformation and management of ecological data sets. The toolbox is based on the GCE Data Structure, a specification for storing tabular data along with comprehensive metadata and Q/C information. Metadata fields are queried by toolbox programs for all operations, allowing data values to be managed and analyzed appropriately based on the type of information they represent. This semantic processing approach supports highly automated and intelligent data analysis and ensures data set validity throughout all processing steps. Data structure size is only limited by available computer memory, and large data sets (e.g. >1 million records) can effectively be managed and analyzed using standard desktop computer hardware.
Common tasks that can quickly be performed using the GCE Data Toolbox include:
* unit conversions and other data transformations, with automatic equation logging to metadata
* filtering records based on value ranges, multi-column criteria or mathematical expressions
* generating frequency histograms, line/scatter plots and map plots to visualize data
* statistically re-sampling data using aggregation, binning, and date/time scaling tools
* synthesizing multiple data sets by joining or merging multiple structures
* exporting data and/or metadata in various ASCII and MATLAB formats for analysis in other programs
* indexing and searching stored data sets using thematic, temporal and spatial criteria
The GCE Data Toolbox also contains a dynamic, extensible QA/QC framework for tabular data sets of any size (see Sheldon, 2008). An unlimited number of QA/QC rules can be defined for each data set attribute, and an array of qualifier flags is transparently managed along with each column and updated dynamically whenever data values are modified or re-ordered. This rule-based QA/QC logic and separation of data values and QA/QC flags obviates the need to delete questionable values from data sets, and permits flexible handling and display of QA/QC information during data analysis and export.
* value flags can be displayed in the data editor and above the corresponding data values in plots
* flagged values can be included or excluded in statistical reports and summarized in aggregated data
* flags can be converted to coded data columns and displayed alongside the data values
* flagged values or rows containing flagged values can be omitted from exported data sets
* flagged values can be selectively deleted from data sets to permanently remove the values
* flags can be assigned visually on plots and propagated to dependent columns to augment or revise rule-based assignments
The GCE Data Toolbox also supports mining data directly from the USGS NWIS database, LTER ClimDB/HydroDB database, and the NOAA HADS real-time data server over the Internet. In addition, import filters and metadata templates are provided for NOAA NCDC data, Sea-Bird CTD data files, and Campbell Scientific array-based data loggers, as well as generic delimited ASCII and MATLAB import filters. User-defined data import filters and metadata templates can also be added to the toolbox menus at run time to add support for other data sources.
The GCE Data Toolbox and GCE Data Structure specification were developed using the MATLAB® programming language (The MathWorks, http://www.mathworks.com), and require MATLAB 6.5 (R13) or higher to run. However, a complete suite of graphical user interface programs is provided to augment the command-line functions, allowing users with little or no prior MATLAB experience to use the toolbox with minimal instruction. MATLAB is compatible with all major computer operating systems, including Microsoft Windows, Unix/Linux, Sun Solaris, and Apple OS/X.