Mohannad Almasri - Réduire la probabilité de disparité des termes en exploitant leurs relations sémantiques

08:00
Tuesday
27
Jun
2017
Place: 
Organized by: 
Mohannad Almasri
Speaker: 
Mohannad Almasri
Teams: 

Membres du jury :

  • Patrice Bellot, professeur, Université Aix-Marseille, rapporteur
  • Mohand Boughanem, professeur, Université Paul Sabatier, rapporteur
  • Sylvie Calabretto, professeur, Institut National des Sciences Appliquées (INSA) de Lyon, examinatrice
  • Marie Christine Rousset, professeur, Université Grenoble Alpes, examinatrice
  • Catherine Berrut, professeur,  Université Grenoble Alpes, co-directrice de thèse
  • Jean-Pierre Chevallet, maître de conférences (HDR), Université Grenoble Alpes, directeur de thèse

 

Even though modern retrieval systems typically use a multitude of features to rank documents, the backbone for search ranking is usually the standard retrieval models. This thesis addresses a limitation of the standard retrieval models, the term mismatch problem, which happens when query terms fail to appear in relevant documents to the query. The term mismatch problem is a long standing problem in information retrieval. However, it was not well understood how often term mismatch happens in retrieval, how important it is for retrieval, or how it affects retrieval performance. This thesis answers the above questions, and proposes principled solutions to address this limitation. This research is enabled by the formal definition of term mismatch. In this thesis, term mismatch is defined as the probability that a term does not appear in a document given that this document is relevant.  Term mismatch definition is document and query dependent. Based on this fact, we propose several approaches for reducing term mismatch probability through modifying documents or queries. Our proposals are then followed by a quantitative analysis of term mismatch probability that shows how much the proposed approaches reduce term mismatch probability with maintaining the system performance. An essential component for achieving term mismatch probability reduction is the knowledge resource that defines terms and their relationships. A variety of knowledge resources are exploited, in our proposals, in order to produce effective modifications on documents or queries.