Parantapa Goswami - Learning Information Retrieval Functions and Parameters on Unlabeled Collections

12:00
Lundi
6
Oct
2014
Organisé par : 
Parantapa Goswami
Intervenant : 
Parantapa Goswami
Équipes : 

Jury:

  • M. Fabio CRESTANI - Università della Svizzera italiana - Rapporteur
  • Mme Josianne MOTHE - Institut de Recherche en Informatique de Toulouse - Rapporteur
  • M. Chengxiang ZHAI - University of Illinois at Urbana-Champaign - Examinateur
  • M. Patrick GALLINARI - Université Pierre et Marie Curie - Examinateur
  • Mme Marie-Christine ROUSSET - Université Grenoble Alpes - Examinateur
  • M. Eric GAUSSIER - Université Grenoble Alpes - Directeur de thèse
  • M. Massih-Reza AMINI - Université Grenoble Alpes - Co-Directeur de thèse

 

Réalisation technique : Djamel Hadji | Tous droits réservés

The present study focuses on (a) predicting parameters of already existing standard IR models and (b) learning new IR functions.

We first explore various statistical methods to estimate the collection parameter of family of information based models. This parameter determines  the behavior of a term in the collection. In earlier studies, it was set to the average number of documents where the term appears, without full justification. We introduce here a fully formalized estimation method which leads to improved versions of these models over the original ones. But the method developed is applicable only to estimate the collection parameter under the information model framework.

To alleviate this we propose a transfer learning approach which can predict values for any parameter for any IR model. This approach uses relevance judgments on a past collection to learn a regression function which can infer parameter values for each single query on a new unlabeled target collection. The proposed method not only outperforms the standard IR models with their default parameter values, but also yields either better or at par performance with popular parameter tuning methods which use relevance judgments on target collection.

We then investigate the application of transfer learning based techniques to directly transfer relevance information from a source collection to derive a "pseudo-relevance" judgment on an unlabeled target collection. From this derived pseudo-relevance a ranking function is learned using any standard learning algorithm which can rank documents in the target collection. In various experiments the learned function outperformed standard IR models as well as other state-of-the-art transfer learning based algorithms.

Though a ranking function learned through a learning algorithm is effective still it has a predefined form based on the learning algorithm used. We thus introduce an exhaustive discovery approach to search ranking functions from a space of simple functions. Through experimentation we found that some of the discovered functions are highly competitive with respect to standard IR models.