Loading Data by Script

SAADA OVERVIEW

Tutorial

Tips & Troubelshooting

COMMUNITY

DEVELOPER CORNER

HOME ART > DEVELOPER CORNER > Old Releases > Saada 1.4.2 > TUTORIAL > Managing Your SaadaDB

Loading Data by Script

Saada 1.5 allows operators to load data without writing configuration files. The configuration is deduced from the command parameters and some rules explained here.

The data loader is packed in the Java class saadadb.loader.Loader. Judging by the difficulty to set up a propre classpath we strongly advice you to use the sant command included into the distribution :

saadaop@host: cd $SAADA_DB_HOME/bin

saadaop@host: ./sant saadadb.data.load [parameters] filenames

>> Basic mode : Only 2 parameters are required to load a datafile. Product classes are here created in CLASSIFIER Saada mode (see). Class names are derived from the extension names. The extension name is also taken as record name (attribute namesaada).

-category=spectrum|image|table|misc indicates the category of the data to loaded. In this mode, the dataloader has to identify which extension matches the requested category. The table below gives the selection rules applied.

Category	Dataset Selection Rule
spectrum	Takes the first extension of type table (FIT BINTABLE) for which spectral coordinate can be found.
image	Takes the first extension of type image with valid WCS keywords
table	Takes the first extension of type table (FIT BINTABLE)
misc	Takes the first extension

If the rule cannot be applied with success, the data loading fails.

-collection=COLLECTION indicates in which Saada collection the selected dataset must be stored.

Multiple files can be specified at the time.

The extension can be specified in the command line.

-extension=extension_name. The dataloader will then look for an extension with the given name. If it exists, it will be validated with the rules given before.

A specific role can be affected to selected keywords. The keywords used by the following options are not mandatory. Options using missing keywords are ignored.

-name=KW1,KW2... : The record(instance) name will be built with the concatenation of the values of the keywords KW1, KW2...

-ignore=KW1,KW2,... : The keywords KW1, KW2 are ignored. Keywords HISTORY and CONTINUE are always ignored.

-ukw=CUKW1:KW1,CUKW2:KW2... : Mapping of the collection user keywords. Collection user keywords are columns added to the collections by the operator at the time of the database creation see. They can be used to select data over the whole database with criteria defined by the operator. The user keyword CUKW1 will be set with the value of the keyword KW1 and so on. If a keyword is missing the user keyword is not set and the loading process continues.

The coordinate mapping can also be specified by the dataloader parameters. Coordinate mapping only concerns images and spectra. Tables and plot have no coordinate.

-pos=KW_RA,KW_DEC : This option specifies which keywords must be used for the ascension and for the declination. A numerical constant value can also be given (in degrees).

-posmapping=only|fisrt|last : Mapping rules given by the operator can be in contradiction with data found within the product. For instance, the position given by the mapping keywords can be different from the position given by WCS. This option indicates to the dataloader the way to solve such conflicts. -posmapping=only indicates that the mapping rule must be applied whatever the data content is. If the keywords given by the mapping don’t exist, the position is simply not set. -posmapping=first indicates that the dataloader will first apply the mapping before attempting to detect the position on his own means :=). -posmapping=last allows the dataloader to use the mapping given in the command line if it doesn’t find a position into the product

-system=SYSTEM_KW,EQUINOX_KW : This option specifies which keywords must be used to read the coordinate system. A constant value (e.g. FK5:J2000) can also be given.

-sysmapping=only|fisrt|last : Mapping rules given by the operator can be in contradiction with data found within the product. For instance, the coordinate system given by the mapping keywords can differ from this found within the product. This option indicates to the dataloader the way to solve such conflicts. -sysmapping=only indicates that the mapping rule must be applied whatever the data content is. If the keywords given by the mapping don’t exist, the position is simply not set. -sysmapping=first indicates that the dataloader will first apply the mapping before attempting to detect the position on his own means :=). -sysmapping=last allows the dataloader to use the mapping given in the command line if it doesn’t find the system in the product

example : The command : saadaop@host: ./sant saadadb.data.load -pos=CorrRa__J200,CorrDec__J2000 -posmapping=last -system=FK5,J2000 -sysmapping=last -otheroptions filenames will try to detect automatically the coordinate system and the position. If it fails to detect the coordinate system, it will take FK5/J2000. If it fails to detect the coordinates, it will take the keywords CorrRa__J200 and CorrDec__J2000

The spectral coordinate can be specified by the dataloader option. It only concerns the channel axis. This setup only concerns spectra. It has no effect with other product categories.

-spc=COL_NAME : Thist option specifies which columns of the spectral data extension must be taken as spectral coordinate.

-spcunit=UNIT : Spectral cooridnate unit. UNIT must be understood by Saada. Spectral coordinate are not set otherwise

-spcmapping=only|fisrt|last : Mapping rules given by the operator can be in contradiction with data found within the product. For instance, the spectra coordinate given by the mapping keywords can differ from those found in WCD keywords. This option indicates to the dataloader the way to solve such conflicts. -spcmapping=only indicates that the mapping rule must be applied whatever the data content is. If the keywords given by the mapping don’t exist, spectral coordinates are not set. -spcmapping=first indicates that the dataloader will first apply the mapping before attempting to detect the spectral coordinate on his own means :=). -spcmapping=last allows the dataloader to use the mapping given in the command line if it doesn’t find it on his own means.

example : The command : saadaop@host: ./sant saadadb.data.load -spc=DEVIATION -spcmapping=only -spcunit=Angstroem -otheroptions filenames will use the data column named DEVIATION as spectral channel axis and it will consider this column as expressed in Angstroem

The class mapping is certainly the most sensitive point of the datalader setup. The point is to know which product files can be considered as member of one product class ?Keywords of products members of one given class must have the same scientific meaning and must be expressed with the same unit. This identification comes under a scientific report and cannot be done automatically. The other point is to know why Saada dataloader has to bother operators with the concept of class ? Theorytically, the concept of collection would be enough to classify ingested products. But this operation doesn’t allow (doesn’t prohibit either) users to query data by using constraints on any keywords. Only few keywords are reported at collection level and can be used to constraint queries on heterogeneous datasets. If we want to use the other native keywords in data selections, we have to put together products for which these keywords make sense. These products set are product classes. Saada operators are supposed to know enough about product classification in order not to ask the dataloader to download wrong products. Now we have to explain to the dataloader how to realize the classification. The three options below are aiming to solve problems. Units are not processed here. Keywords with the same name and the same type but with values expressed in different units (e.g. km/s and m/s for velocities) will be put in the same column with no consideration for some possible discrepancies.

-classifier[=CLASSNAME] : This classification mode put in a same class all products having exactly the same keywords. The classname can be specified (CLASSNAME). If the classname is not specified, it will be built from the first product filename. If several classes must be created, their names will extend with an incremental number.

-classfusion=CLASSNAME : All ingested products will be grouped in the same class name CLASSNAME. Keywords appearing with different types will be downcasted towards more general type (int -> float -> string).

-uniqueinstance=CLASSNAME : One class will be created for each input file. This option makes sense for tables because one table has multiple entries. We advice you to use this mode in order to ingest external catalogue extractions for instance. In this case, each input file is extracted from one external resource which actually corresponds to one Saada class.

Miscenalleous options

-noload : Simulate the loading of the product while indicating all the steps you went through. But be aware not to modify the base.

-debug : Switch on the debug mode

last update 2007-03-16

| Sign In | Site Map |

RSS 2.0