Setting up Data Turbine with Matlab's GCE toolkit and ODM database (Linux)

This is a guide for setting up the Data Turbine server, Matlab QA/QC and the connection to the CUAHSI ODM MySQL database on a linux machine.

First, download the necessary files below.

Download the newest DataTurbine at https://bitbucket.org/OSDT/osdt-v3.3-production/downloads
For older versions of DataTurbine, which do not include the DTMatlabTK, go to www.dataturbine.org
Download the GCE toolkit at https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Downloads
Download the JDBC-MySQL database connector: http://dev.mysql.com/downloads/connector/j/

The purpose of this guide is to supplement the Open Source DataTurbine Initiative documentation by providing a more detailed description of installing DataTurbine on a Linux/MacOSX platform. ODM tools for windows for hookup to a Microsoft MySQL database can be found at www.his.cuahsi.org

Installing the Java Environment

Verify Java is installed on your system by issuing the following command from the terminal:

$ java -version

You should get something like this:

java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

If you get something like this, then you can proceed to the Installing DataTurbine step.

If the command displays a "Command Not Found" error, it could be one of two possibilities: (1) Java is not installed or (2) the Java path is not set (and thus the java command is not recognized). Most likely, it's the first case. You need to install the Java OpenJDK Runtime Environment here. There is plenty of documentation on the Java website about how to install JDK.

Setting the Java environment paths

The next step is to set the JAVA_HOME path so you can easily install and run DataTurbine. Do this by entering root, if you have root access. If you don't have root access, make sure it's enabled. For more information on how to enable sudo for a user, type $ man visudo. To enter root, issue this command from the terminal (without the dollar sign):

$ sudo su -

Then, issue this command:

# export PATH=$PATH:

Again, verify Java is installed on your system by issuing the command:

# java -version

If you've downloaded and installed Java Development Kit (JDK) and you're still not seeing an up-to-date java version, or you are still getting "Command Not Found" error, please refer to The Java Tutorials section on setting the PATH variable.

If you installed, JRE instead, you'll have to set the JAVA_HOME path to where JRE is, it might be in /usr... do this to get webTurbine to work:

# export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/

NOTE: Whenever setting JAVA_HOME or CLASSPATH variables, make sure you are in the root user.

Setting up DataTurbine

To set up DataTurbine, download it from the DataTurbine website.

To install, issue these commands:

# java -jar PATH_TO_DOWNLOAD/RBNB-VX.X-install.jar

This will send you through an installation process to install DataTurbine in a directory of your choosing, then to run DataTurbine, you'll have to have some code like this:

java -Xmx1024M -jar ./bin/rbnb.jar -H ../dtarchive -a localhost:3333 >>/var/log/dataturbine/dataturbine.out 2>&1

-Xmx1024M gives the Java Virtual Machine 1GB of memory
./bin/rbnb.jar specifies where the main jar file is
-H flag with ../dtarchive specifies where the DT archives are.
-a localhost:3333 specifies the IP address of DT
/var/log/dataturbine.out specifies the location and name of the log file.

Pushing data to DataTurbine

In order to put data into the DataTurbine you just started using the command above, you'll need to create an application that pushes the data into it. The software development guide that is attached gives a pretty good idea of the data structures needed to create a Data Turbine source. dataturbine.org also provides documentation on how to do that as well. The nice thing about version V3.3B1 is that it includes the Matlab Toolkit which can push data into DataTurbine. Currently, at North Temperate Lakes station, we use a standalone jython (python code that runs on the JVM) to push data from our raw LoggerNet .dat files to DataTurbine.

A more complete API for RBNB DataTurbine can be found at: http://dataturbine.org/sites/default/files/programs/RBNB/doc/index.html

Setting up the GCE Toolkit

Note: For a more complete documentation please visit the GCE SVN [https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki]. These instructions are for version _370 of the GCE datatools.

Download the GCE toolkit from the GCE Toolbox Trac website at https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Downloads

Extract the files by issuing the following command in bash:

$ unzip /PATH/TO/DOWNLOAD/gce_datatools_*_public.zip

Now you can use Matlab to navigate to your GCE toolkit. There is no 'installation' necessary. Open up Matlab, and navigate to the toolkit by issuing this command from the Matlab command line:

>> cd /PATH/TO/DOWNLOAD/gce_datatools__public

Setting up DT-Matlab and DT-GCE functions

Using the DT-Matlab toolkit involves setting a couple java paths/

First, edit Matlab's startup.m

Follow the instructions below to set up the Matlab search path to use the GCE toolkit and the DTMatlab toolkit.

Navigate to your Matlab installation root folder.

Then, navigate to the toolbox/local/ directory

edit startup.m to have the following:

javaaddpath('RBNB_ROOT_FOLDER/bin/rbnb.jar') %adds DataTurbine Java core to the Matlab java search path.

javaaddpath('RBNB_ROOT_FOLDER/Matlab_Toolkit/') %adds DTMatlabToolkit core functions and GCE helper functions to the Matlab java search path.

javaaddpath('MYSQL_JDBC_DOWNLOAD_DIRECTORY/mysql-connector-java--bin.jar') %adds mysql JDBC connector to the Matlab java search path.

addpath(genpath('GCE_INSTALLATION_PATH/')) %adds the GCE toolkit to the Matlab search path.

You'll have to restart Matlab if you have it open for these changes to take effect.

The GCE toolkit has functions that use the DT Matlab Toolkit, but it is also possible to use the DT Matlab Toolkit functions alone to perform raw Matlab QA/QC functions. The documentation for the DT Matlab toolkit is attached to this page, and it can also be found in the GCE Toolkit files. The documentation for the 3.7.0 GCE Toolkit is also attached (pdf).

ODM hookup

Hooking up Matlab to the ODM Database involves a little preparation. The main thing you'll need to do is construct a mapping table that will define how variables are related in the odm database. The main things you'll want to create are spreadsheets of:

-ODM Variable ID<->Method ID<->OffsetValue Mappings (odm_channel_mapping.mat)
-ODM Qualifiers (for QA/QC flags) (odm_qualifiers.mat)
-ODM Sites/Buoys (odm_sites.mat)
-ODM methods (optional) (odm_methods.mat)
-ODM variables (optional) (odm_variables.mat)

Note: All of the above have to be pre-loaded into the ODM database in order for a successful DataValue import. The c

If you look at the odm setup at http://his.cuahsi.org/odmdatabases.html , you can download a sample sql file to see how the each variable (e.g. temperature) has a certain method (e.g. temp gathered from sonde) and offset value (e.g. at depth 1.0 m). Attached to this document you can find sample xlsx files that describe our current odm constraints at NTL. These need to be imported to the Matlab installation and saved as GCE structs using the GCE Dataset Editor. The .mat files in parentheses above need to follow the exact naming convention and need to be saved in the GCE_INSTALLATION_PATH/settings/ directory.

Then gce2odm.m file in the gce/extensions/ folder uses the above files in .mat GCE format to perform the correct transformations for ODM data imports.

Setting up a Metadata Template for QA/QC

See the gce_documentation.pdf that's attached for setting up Metadata templates in the GCE toolkit.

Automatic harvesting and ODM DB inserts

To apply the template to data harvested using DTharvest.m:

>> options = DTharvestStructGCE(fn,period,time_offset,template,title,server,source);
(where template = 'TroutBog Missing Val and meanSTD template', etc)

issue help DTharvestStructGCE for syntax options.

To start harvesting

>> DTharvest('start',server,source,options);

where server is the IP address of your DataTurbine server in string format (e.g. 'localhost')
and source is the name of your DataTurbine source (e.g. 'TroutBog')

Connecting to a database...

Once you understand how DTharvest works, you can start adding custom scripts (which are separate from the GCE toolkit to insert the harvested data into your ODM database).

To do that, you have to add your script name to the DTharvest workflow, here is a simple example:

options.Workflow = [options.Workflow, 'odm_insert(data);']

an example script named odm_insert.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function odm_insert(data)

%convert gce table to odm skinny table, see help gce2odm for more info
[data_odm,msg] = gce2odm(data,'TB','Date',5,'C',0);

%issue this command once.
conn = database('YourDBName',...
'YourUserName',...
'YourPassCode',...
'com.mysql.jdbc.Driver',...
'jdbc:mysql://Your_DT_IP_Addr:3306/YourDBName');

%issue this command every hour.
gce_fastinsert(data_odm,conn,'DataValues');

%note that Matlab's fast insert inserts data 1 record at a time, so it is recommended
% to look into Matlab's MySQL bulk insert for large inserts.

end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

AttachmentSize
DT Matlab Toolkit.pdf87.4 KB
gce_documentation.pdf612.22 KB
RBNB DataTurbine Developer Guide.pdf830.98 KB
odm_channel_mapping.xlsx10.64 KB
odm_methods.xlsx10.14 KB
odm_qualifiers.xlsx9.26 KB
odm_sites.xlsx9.45 KB
odm_variables.xlsx12.28 KB