Setting up Data Turbine with Matlab's GCE toolkit and ODM database (Windows)

This is a guide for setting up the Data Turbine server, Matlab QA/QC and the connection to the CUAHSI ODM MySQL database on a Windows machine.

First, download the necessary files below.

Download the newest DataTurbine at https://bitbucket.org/OSDT/osdt-v3.3-production/downloads
For older versions of DataTurbine, which do not include the DTMatlabTK, go to www.dataturbine.org
Download the GCE toolkit at https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Downloads
Download the JDBC-MySQL database connector: http://dev.mysql.com/downloads/connector/j/

The purpose of this guide is to supplement the Open Source DataTurbine Initiative documentation by providing a more detailed description of installing DataTurbine on a Linux/MacOSX platform. ODM tools for windows for hookup to a Microsoft MySQL database can be found at www.his.cuahsi.org

Installing the Java Environment

Verify Java is installed on your system by issuing the following command from the windows command prompt:

$ java -version

You should get something like this:

java version "1.6.0_24"
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) Client VM (build 23.6-b04, mixed mode, sharing)

If you get something like this, then you can proceed to the Installing DataTurbine step.

If the command displays a "Command Not Found" error, it could be one of two possibilities: (1) Java is not installed or (2) the Java path is not set (and thus the java command is not recognized). Most likely, it's the first case. You need to install the Java OpenJDK Runtime Environment here. There is plenty of documentation on the Java website about how to install JDK.

Setting the Java environment paths

Below is a visual instruction of how to add Java to the Path environment variable. This will tell your computer where to find the "java" program when you issue any java command from the command line.

1. Go to Start -> Run.
2. Type in sysdm.cpl and hit enter.
3. Navigate to the "Advanced" tab.
4. Click Environment Variables... in the lower right corner.

5. In the second white box, labeled system variables, find the variable named "Path" or "PATH".
6. Select Edit *Be careful not to erase any content that's already there*
7. You will need to APPEND the java path to this variable if there is content there by adding a semicolon to the end.
So, for example, depending on the location of where java was installed, you would add:
; C:\Program Files (x86)\Java\jre7\bin

8. Select OK. (x2)

Another set of instructions for setting the path variable. http://www.java.com/en/download/help/path.xml

Yet another set of instructions for setting the path variable.
http://www.dataturbine.org/content/path

Setting up DataTurbine

To set up DataTurbine, download it from the DataTurbine website.

To install, issue these commands:

java -jar PATH_TO_DOWNLOAD\RBNB-VX.X-install.jar

This will send you through an installation process to install DataTurbine in a directory of your choosing, then to run DataTurbine, you'll have to have some code like this:

java -Xmx1024M -jar C:\Program Files\RBNB\V3.3B1\bin\ -H ..\dtarchive -a localhost:3333 > C:\log\dataturbine.out

-Xmx1024M gives the Java Virtual Machine 1GB of memory
C:\Program Files\RBNB\V3.3B1\bin\ specifies where the main jar file is
-H flag with ..\dtarchive specifies where the DT archives are. You can specify this wherever.
-a localhost:3333 specifies the IP address of DT
C:\log\dataturbine.out specifies the location and name of the log file. You can specify this wherever.

Pushing data to DataTurbine

In order to put data into the DataTurbine you just started using the command above, you'll need to create an application that pushes the data into it. The software development guide that is attached gives a pretty good idea of the data structures needed to create a Data Turbine source. dataturbine.org also provides documentation on how to do that as well. The nice thing about version V3.3B1 is that it includes the Matlab Toolkit which can push data into DataTurbine. Currently, at North Temperate Lakes station, we use a standalone jython (python code that runs on the JVM) to push data from our raw LoggerNet .dat files to DataTurbine.

A more complete API for RBNB DataTurbine can be found at: http://dataturbine.org/sites/default/files/programs/RBNB/doc/index.html

Setting up the GCE Toolkit

Note: For a more complete documentation please visit the GCE SVN [https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki]. These instructions are for version 3.7.0 of the GCE datatools.

Download the GCE toolkit from the GCE Toolbox Trac website at https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Downloads

Unzip the files

Now you can use Matlab to navigate to your GCE toolkit. There is no 'installation' necessary. Start Matlab, and navigate to the toolkit by issuing this command from the Matlab command line:

>> cd PATH_TO_DOWNLOAD\gce_datatools__public

Setting up DT-Matlab and DT-GCE functions

Using the DT-Matlab toolkit involves setting a couple java paths/

First, edit Matlab's startup.m

Follow the instructions below to set up the Matlab search path to use the GCE toolkit and the DTMatlab toolkit.

Navigate to your Matlab installation root folder. (usually in C:\Program Files\Matlab\R2012b\ , for example)

Then, navigate to the toolbox\local\ directory

edit startup.m to have the following (if there is no startup.m, create it):

javaaddpath('RBNB_ROOT_FOLDER\bin\rbnb.jar') %adds DataTurbine Java core to the Matlab java search path.

Note, in the example code above, we assumed the RBNB_ROOT_FOLDER was located in C:\Program Files\RBNB\V3.3B1\bin\.

javaaddpath('RBNB_ROOT_FOLDER\Matlab_Toolkit\') %adds DTMatlabToolkit core functions and GCE helper functions to the Matlab java search path.

javaaddpath('MYSQL_JDBC_DOWNLOAD_DIRECTORY\mysql-connector-java--bin.jar') %adds mysql JDBC connector to the Matlab java search path.

path(path,genpath('PATH_TO_DOWNLOAD\gce_datatools_370_public')) %adds the GCE toolkit to the Matlab search path.

You'll have to restart Matlab if you have it open for these changes to take effect.

The GCE toolkit has functions that use the DT Matlab Toolkit, but it is also possible to use the DT Matlab Toolkit functions alone to perform raw Matlab QA/QC functions. The documentation for the DT Matlab toolkit is attached to this page, and it can also be found in the GCE Toolkit files. The documentation for the 3.7.0 GCE Toolkit is also attached (pdf).

ODM hookup

Hooking up Matlab to the ODM Database involves a little preparation. The main thing you'll need to do is construct a mapping table that will define how variables are related in the odm database. The main things you'll want to create are spreadsheets of:

-ODM Variable ID<->Method ID<->OffsetValue Mappings (odm_channel_mapping.mat)
-ODM Qualifiers (for QA/QC flags) (odm_qualifiers.mat)
-ODM Sites/Buoys (odm_sites.mat)
-ODM methods (optional) (odm_methods.mat)
-ODM variables (optional) (odm_variables.mat)

Note: All of the above have to be pre-loaded into the ODM database in order for a successful DataValue import.

If you look at the odm setup at http://his.cuahsi.org/odmdatabases.html , you can download a sample sql file to see how the each variable (e.g. temperature) has a certain method (e.g. temp gathered from sonde) and offset value (e.g. at depth 1.0 m). Attached to this document you can find sample xlsx files that describe our current odm constraints at NTL. These need to be imported to the Matlab installation and saved as GCE structs using the GCE Dataset Editor. The .mat files in parentheses above need to follow the exact naming convention and need to be saved in the GCE_INSTALLATION_PATH/settings/ directory.

Then gce2odm.m file in the gce/extensions/ folder uses the above files in .mat GCE format to perform the correct transformations for ODM data imports.

Setting up a Metadata Template for QA/QC

See the gce_documentation.pdf that's attached for setting up Metadata templates in the GCE toolkit.

Automatic harvesting and ODM DB inserts

To apply the template to data harvested using DTharvest.m:

>> options = DTharvestStructGCE(fn,period,time_offset,template,title,server,source);
(where template = 'TroutBog Missing Val and meanSTD template', etc)

issue help DTharvestStructGCE for syntax options.

To start harvesting

>> DTharvest('start',server,source,options);

where server is the IP address of your DataTurbine server in string format (e.g. 'localhost')
and source is the name of your DataTurbine source (e.g. 'TroutBog')

Connecting to a database...

Once you understand how DTharvest works, you can start adding custom scripts (which are separate from the GCE toolkit to insert the harvested data into your ODM database).

To do that, you have to add your script name to the DTharvest workflow, here is a simple example:

options.Workflow = [options.Workflow, 'odm_insert(data);']

an example script named odm_insert.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function odm_insert(data)

%convert gce table to odm skinny table, see help gce2odm for more info
[data_odm,msg] = gce2odm(data,'TB','Date',5,'C',0);

%issue this command once.
conn = database('YourDBName',...
'YourUserName',...
'YourPassCode',...
'com.mysql.jdbc.Driver',...
'jdbc:mysql://Your_DT_IP_Addr:3306/YourDBName');

%issue this command every hour.
gce_fastinsert(data_odm,conn,'DataValues');

%note that Matlab's fast insert inserts data 1 record at a time, so it is recommended
% to look into Matlab's MySQL bulk insert for large inserts.
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

AttachmentSize
DT Matlab Toolkit.pdf87.4 KB
gce_documentation.pdf612.22 KB
RBNB DataTurbine Developer Guide.pdf830.98 KB
odm_channel_mapping.xlsx10.64 KB
odm_methods.xlsx10.14 KB
odm_qualifiers.xlsx9.26 KB
odm_sites.xlsx9.45 KB
odm_variables.xlsx12.28 KB