Paola Gomez Barreto - Analysis and evaluation of document-oriented structures

Paola Gomez Barreto


Jury :

  • Claudia Roncancio, professeure des universités, Grenoble INP - Ensimag, directrice de thèse
  • Rubby Casallas, professeur, Universidad de Los Andes, TICSw, codirecteur de thèse
  • Franck Ravat, professeur des universités, Université Toulouse I Capitole, rapporteur
  • Philippe Roose, maître de conférences, Université de Pau, rapporteur
  • Laurence  Duchien, professeure des universités, Université de Lille, examinatrice
  • Patrick Reignier, professeur des universités, Grenoble INP - Ensimag, examinateur

Nowadays, millions of different data sources produce a huge quantity of unstructured and semi-structured data that change constantly. Information systems must manage these data but providing at the same time scalability and performance. As a result, they have had to adapt it to support heterogeneous databases, included NoSQL databases. These databases propose a schema-free with great flexibility but with a no clear separation of the logical and physical layers. Data can be duplicated, split and/or incomplete, and it can also change as the business needs. 

The flexibility and absence of schema in document-oriented NoSQL systems, such as MongoDB, allows new structuring alternatives to be explored without facing constraints. The choice of the structuring remains important and critical because there are several impacts to consider and it is necessary to choose among many of options of structuring. We therefore propose to return to a design phase in which aspects of quality and the impacts of the structure are considered in order to make a decision in a more informed manner. 

In this context, we propose SCORUS, a system for the analysis and evaluation of document-oriented structures that aims to facilitate the study of document-oriented semi-structuring possibilities, such as MongoDB, and to provide objective metrics for better highlight the advantages and disadvantages of each solution in relation to the needs of the users. For this, a sequence of three phases can compose a design process. Each phase can also be performed independently for analysis and adjustment purposes. The general strategy of SCORUS is composed by: 

1. Generation of a set of structuration alternatives: in this phase we propose to start from UML modeling of the data and to automatically produce a large set of possible structuring variants for this data. 
2. Evaluation of Alternatives Using a Set of Structural Metrics: This evaluation takes a set of structuring variants and calculates the metrics against the modeled data. 
3. Analysis of the evaluated alternatives: use of the metrics to analyze the interest of the considered alternatives and to choose the most appropriate one(s).

This thesis presents the theoretical and software tools for SCORUS as well as experiments with MongoDB.