Object-level Web Information Retrieval

  • Zaiqing Nie ,
  • Yunxiao Ma ,
  • Ji-Rong Wen ,
  • Wei-Ying Ma

MSR-TR-2005-11 |

The primary function of current Web search engines is essentially relevance ranking at the document level. However, there is lots of structured information about real-world objects embedded in static Web pages and online Web databases. Document-level information retrieval will unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we consider a new paradigm shift to enable searching at the object level. In traditional information retrieval models, document is taken as the retrieval unit and the content of a document is reliable. However the reliability assumption is no longer valid in the object retrieval context where usually exist multiple copies of information about the same object. These copies may be inconsistent because of the diverse Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we will not be able to achieve satisfactory retrieval performance. In this paper, we introduce a probabilistic model to handle the inconsistency problem using the source quality information, and our empirical evaluation shows that our object-level model is significantly better than the existing document-level models.