The first specifications for the Saada project was focused on the the capacity of dealing with heterogeneous dataset and the ability to accept complex queries based on relationships linking data each to other.
If our database was populated with one collection of people and one collection of clothes, we would be able to answer quickly to such queries :
Looking for people older the 62 and having 2 red trousers but no white socks.
This kind of query mix constraints on searched data (older than 62), constraints on counterparts (trousers must be red and socks must be white) and cardinality constraints (2 red trousers and 0 white socks).
Tests achieved at this time showed that SQL was too slow to process such queries on large collections. We decide then to develop an engine dedicated to the processing of relationship patterns. The Saada query engine has been designed to process separately constraints on searched data (people older the 62) and constraints on counterparts (having 2 red trousers but no white socks) and to merge the results.
The processing of constraints on counterparts (see correlation patterns) is based on indexes looking like Java HashMap and stored in files.
- These indexes are completely rebuilt each time the relationship is populated.
- They are designed to be quickly uploaded from file to the SaadaDB query engine
- They are locked in memory while one query is processed. That allows to tag selected data within the index itself and then to avoid time consuming copies.
- Stored indexes are not always used. In some cases, a specific index can be built in memory while the correlation pattern is processed.
The SQL limitation is a consequence of the difficulty of storing verctors in tables and to make efficient selection on their content.
last update 2008-05-17