saada logo

 
  SAADA OVERVIEW
Home  
News  
Tutos and Links  
Download  
  Tutorial
Getting started  
Doing More  
SaadaQL  
The Web Interface  
VO Publishing  
Tips & Troubelshooting  
  COMMUNITY
Mailing List  
Saada Sites  
How to Contact us ?  
  DEVELOPER CORNER
Contributors  
Next Step  
Old Releases  
Inside Saada  
Using UWS  

 

SourceForge.net Logo


HOME ART > DEVELOPER CORNER > Inside Saada
Relationship Indexation



 The first specifications for the Saada project was focused on the the capacity of dealing with heterogeneous dataset and the ability to accept complex queries based on relationships linking data each to other. If our database was populated with one collection of people and one collection of clothes, we would be able to answer quickly to such queries :

Looking for people older the 62 and having 2 red trousers but no white socks.

This kind of query mix constraints on searched data (older than 62), constraints on counterparts (trousers must be red and socks must be white) and cardinality constraints (2 red trousers and 0 white socks).

Tests achieved at this time showed that SQL was too slow to process such queries on large collections. We decide then to develop an engine dedicated to the processing of relationship patterns. The Saada query engine has been designed to process separately constraints on searched data (people older the 62) and constraints on counterparts (having 2 red trousers but no white socks) and to merge the results.

 The processing of constraints on counterparts (see correlation patterns) is based on indexes looking like Java HashMap and stored in files.

  • These indexes are completely rebuilt each time the relationship is populated.
  • They are designed to be quickly uploaded from file to the SaadaDB query engine
  • They are locked in memory while one query is processed. That allows to tag selected data within the index itself and then to avoid time consuming copies.
  • Stored indexes are not always used. In some cases, a specific index can be built in memory while the correlation pattern is processed.

The SQL limitation is a consequence of the difficulty of storing verctors in tables and to make efficient selection on their content.

Indexing of trouser color

last update 2008-05-17