Debasis Ganguly - PhD Transfer Talk - 29th May 2012

Video Category: 
Transfer Talk

Title: Topical Relevance Models

Abstract: My work focuses on investigating the potential benefits of positional segmentation and topical segmentation for information retrieval (IR). While positional segmentation involves decomposing a text into topically coherent contiguous blocks, topical segmentation is more general in the sense that it imposes a probability distribution of the terms over a set of latent topics. We exploit positional segmentation to improve retrieval effectiveness in the following ways: i) segmenting top ranked initially retrieved documents into smaller units to compute sub-document level similarities with respect to the query, so as to use only these relevant sub-units of documents for feedback; ii) segmenting the top ranked documents into topics, where each topic pertains to one particular aspect of information need expressed in the query, and retrieving relevant documents against each of these aspects; iii) segmenting very long queries into smaller and more focused retrieval units. After describing our positional segmentation methodologies on feedback documents and queries, I provide empricial validation of the hypothesis that positional segmentation is useful for IR.

For exploiting topical segmentation of both feedback documents and queries, I then propose a probabilistic generative model, named the topical relevance model (TRLM), to estimate each aspect of information need implicitly or explicitly expressed in a query. Experimental results show that the TRLM can significantly improve retrieval effectiveness. I also show that associative document search, which involves using full documents for retrieving related documents in the collection, can also be improved by applying the TRLM. I then describe my proposal to use small document segments to compose dynamic summaries in response to a query by applying the TRLM.