Data Loader Configuration for MISC Products

SAADA OVERVIEW

Tutorial

Tips & Troubelshooting

COMMUNITY

DEVELOPER CORNER

HOME ART > Tutorial > Doing More > DB Populating

Data Loader Configuration for MISC Products

MISC are FITS files or VOtable without any specific purpose. Only keywords of the primary header (for FITS files) or table PARAMS for VOTables are loaded.

Data Loader Configurations : The SaadaDB administrator can state rules helping the dataloader to build classes or to populate collection keywords. A set of rules is named configuration.

- A configuration is dedicated to one category (MISC, ENTRY, TABLE, SPECTRUM, IMAGE or FLATFILE)
- A configuration can be applied to populate any collection.
- After ingestion, no trace of the dataloader conguration used remains in the database.
- A configuration can be handled in a binary file or it can be set as command line dataloader parameters.
- A configuration file can only be modified by the admin tool.

Creating a new configuration : Some configuration parameters are common to all categories (except for FLATFILES). Other depends on the category. Configuration parameters of MISC products are used by all other categories, so let’s start with the MISC configuration.

Select the Dataloader Configuration Tab and click on the MISC button.

MISC configuration panel

The bottom bar contains 6 buttons :

Data Sample : Open a window with a data tree modeling a datafile. This datafile is supposed to be representative of the files to be loaded with that configuration.
Apply : Makes the configuration visible on the screen as the current configuration for this category. It is used by the Load data using the current configuration command of the popup menu of the category node of the data tree.
Show Loader Parameters : Show the command line parameters of the dataloader acting as the current configuration.
New Config : Reset the form and set the current config to NULL.
Load : Load a configuration previously stored in a file.
Save : Save the current configuration in a file. Configuration files are stored in SAADA_DB_HOME/config. Configuration filenames have the following form : CATEGORY.name.config where CATEGORY is the Saada category of the configuration and name is given by the administrator. Never modify this name !

Using the Data Sample Window : The most efficient way to setup a configuration is to open a data file view in the Data Sample window. Each keyword is represented by a node containing all available information about that keyword. These nodes can be drag and dropped to the text field of the configuration panel (see image below).

Class Mapping Selection

Select a mapping mode : The mapping mode tells to Saada what to do with products having different keywords. This is a key point of the mapping. From a final user point of view, only queries restricted to one class can be constrained with product keywords. From that point of view we have better to gather as much data as possible in one class. From another hand, the same keyword can for instance have different meaning in different files. Those products must not be in the same class. The mapping mode must be chosen according these parameters.
- Classifier Mode : All products having exactly the same keyword set (name + type) are put within the same class (or the same SQL table). This class has the name given in the text field. This given class anme is followed with ”_num” where num is the number of other existing classes with the same requested name.
- Fusion Mode : All products are put in the same class. New columns are added to the current class if needed but columns are never removed. If columns of different products have the same name but different types, the most general type is kept (Boolean -> integer -> float -> string). Notice that units are not considered by the class fusion mechanism.
- Thumb Rule : It the product set to load comes from one unique source (pipeline, instrument) we can suppose that they are consistent each other. They can then be loaded in fusion mode. In other cases, the classifier mode is safer.

Instance Name : The content of this text field indicates how to build the Saada name (collection level) for each loaded data. The filename is taken by default (followed by #num for table entries). The filename is described by a list of keywords mixed with constant strings. List items are coma separated. In the example below, product names will be made with the content of OBJECT keyword followed with a ‘_’ and followed with the value of the keyword TELESCOP. A file containing data related to OBJECT=M33observed with the TELESCOP=XMM will be named M33 _ XMM. Missing keyword values are replaced with empty strings.

Drag & Drop Keyword

Ignored Keywords : This field contains the list of keywords which must not be loaded. Ignored keywords must be given as a coma separated list. Ignored keywords can contain wild cards (*). In the above example, all keywords with a name beginning with EMSC will be rejected. This feature can be used to limit the SQL row size (8Kb for PSQL) or to reject instrrumental values such as HIERARESO keywords.

Extended Collection Attributes : Here are listed all collection level attributes added at creation time (see). Extended collection attributes are populated with the values of keywords given in the configuration. Current Saada release does not support to populate extended attributes with keyword concatenations as for Saada name.

last update 2011-06-01

| Sign In | Site Map |

RSS 2.0