{"id":307607,"date":"2007-05-23T14:00:11","date_gmt":"2007-05-23T21:00:11","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?p=307607"},"modified":"2016-10-18T23:02:24","modified_gmt":"2016-10-19T06:02:24","slug":"putting-search-context","status":"publish","type":"post","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/blog\/putting-search-context\/","title":{"rendered":"Putting Search into Context"},"content":{"rendered":"<p><em>By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n<p>The breathtaking ascendance of Internet search over the past decade has tended to obscure the limitations of the underlying technology. So quickly has search been embraced by hundreds of millions worldwide that it is entirely natural for people to spend more time marveling over what they\u2019ve gained rather than focusing on the potential for improvement.<\/p>\n<p>Luckily, though, while users revel in their unprecedented access to information, <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/people\/silviu\/\" target=\"_blank\">Silviu-Petru Cucerzan<\/a> has his sights trained on the horizons of search.<\/p>\n<p>Cucerzan, a researcher for the Text Mining, Search and Navigation group within <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/lab\/microsoft-research-redmond\/\" target=\"_blank\">Microsoft Research Redmond<\/a>, is working on an approach he calls Information-Centric Browsing and Search, and his work promises to make the search experience much more robust, productive, and user-friendly.<\/p>\n<p>His project explores the space of contextual search and instant access to information, in which the content of a document is analyzed to provide hyperlinks to key concepts. In the process of doing so, it identifies the most appropriate matches for ambiguous terms by considering them in the context in which they are used.<\/p>\n<p>\u201cI believe this could change a lot of what we\u2019re doing,\u201d Cucerzan says, \u201ca lot of how search is done in general.\u201d<\/p>\n<p>Let\u2019s say you\u2019re reading a Web story on college football that mentions a running back named Bush winning the Heisman Trophy. You want to know a bit more, so you begin a Web search for \u201cBush.\u201d What happens? You get a multitude of search results, few of which have anything to do with the football player.<\/p>\n<p>But what if you had a tool that could analyze the story you\u2019re reading, understand that you\u2019re seeking information about the Bush\u2014Reggie Bush\u2014who played football for the University of Southern California, and delivered only links about that player\u2014and other pertinent information, such as the USC team itself, the conference it which it plays, or his 2005 Heisman award?<\/p>\n<p>With such a tool at hand, the search process moves from looking for a needle in a haystack to looking for a needle in a pincushion. You get what you need.<\/p>\n<p>\u201cThis would enable applications to communicate with each other in a space of concepts,\u201d Cucerzan says. \u201cRight now, applications do not communicate with each other in terms of information. If I browse a document in a window and then go to a search engine, the context is completely lost. There\u2019s no communication between different applications, even between different instances of the same application. My search engine is completely unaware of the kind of document I\u2019m reading or the document I\u2019m editing or whatever I have on my computer.\u201d<\/p>\n<p>Part of the disconnect is that we use search in different ways at different times. The way we search during work can be entirely different from the way we search during our leisure time.<\/p>\n<p>\u201cWe shift contexts a lot,\u201d Cucerzan says. \u201cThe fact that I\u2019m reading a lot of machine-learning documents doesn\u2019t mean that all that stuff is relevant when I read news. It depends on which persona is using the system.<\/p>\n<p>\u201cThe most important thing, to me, is the current context. If I\u2019m reading a news story and I query something, that\u2019s probably what the query is about.\u201d<\/p>\n<p>That\u2019s the motivation for his work.<\/p>\n<p>\u201cI was trying,\u201d he says, \u201cto create some technology to bias search-engine results by making the engine aware of what I\u2019m currently doing. What exactly is the additional information that one has to send to the search engine? Just by looking at a document I\u2019m browsing, it\u2019s pretty difficult to say. But if we\u2019re in a space of concepts and we can predict what the most important concepts from the document are, in absolute terms or with respect to a query, then the search engine or any other application could ask for these concepts and use them to meet the user\u2019s informational needs as captured by the current context.<\/p>\n<p>\u201cIt gives really good results,\u201d Cucerzan says of his technology, \u201cespecially for ambiguous queries\u2014and a lot of queries are ambiguous. It takes you from generic results to results that look beautiful in a particular context.\u201d<\/p>\n<p>The project offers a novel user interface to analyze a document and provide contextually relevant search results. An enhanced browser view is divided into two panes and a few specialized buttons. The pane on the left displays the document being viewed. A button on the address bar enables the user to process the document for contextual analysis. The right-hand pane offers relevant information from authoritative collections and relevant Web news and image search results. Other buttons enable the user to toggle on or off the search-contextualization and query-disambiguation features, if desired.<\/p>\n<p>Once the process button is pressed, the tool analyzes the document, identifies key concepts, and links those concepts to the appropriate Web pages. Having seen words such as \u201cfootball\u201d and \u201cUSC\u201d in the same story as \u201cBush,\u201d the tool retrieves links to pages about Reggie, not George W. And, after the analysis, if ambiguity remains about precisely which meaning a term has, a list of associations appears, from which the user can select the most appropriate association and receive the search results he or she is seeking.<\/p>\n<p>The preferred results are retained for a particular document, giving the user a personalized Web resource for any analyzed document. This can be particularly effective for amassing a collection of concept-based bookmarks.<\/p>\n<p>\u201cThey become active as the concept becomes active in context,\u201d Cucerzan explains. \u201cNow, I have about 300 or 400 bookmarks. I\u2019m afraid to bookmark one more page, because I know I\u2019m not going to be able to find it and it will make it harder to find anything else.<\/p>\n<p>\u201cWith this browser,\u201d he adds, \u201cthe only bookmarks that are active are those for which the concepts are active. That makes a huge difference.\u201d<\/p>\n<p>Instead of having to sift through a collection of irrelevant search results and a plethora of bookmarks not currently pertinent, Cucerzan\u2019s technology delivers tailor-made information to the user when the user needs it.<\/p>\n<p>\u201cIt\u2019s really nice to have all that information at your fingertips,\u201d he says. \u201cThis is very powerful. I can create my personalized view of the Web, based on concepts.\u201d<\/p>\n<p>That personalization is saved on the user\u2019s computer, so the next time a similar document is analyzed, the same conceptual links can be invoked.<\/p>\n<p>The possibilities for such a scenario are many and varied, but consider a couple. What if lots of users chose to share their preferences and their contextual searches?<\/p>\n<p>\u201cIf we were to collect this information from a lot of users,\u201d Cucerzan suggests, \u201cthe system could learn from them and get better and better. Of course, we\u2019d need an agreement from the users, but if they knew that they could help improve the system to their own and other users\u2019 benefit, I am sure most would agree to provide implicit feedback.\u201d<\/p>\n<p>Another possible use could have even more far-reaching effect.<\/p>\n<p>\u201cWhat if 70 percent of the people that have at least one bookmark for Reggie Bush have one particular page bookmarked?\u201d Cucerzan asks rhetorically. \u201cThen, when somebody searches for \u2018Reggie Bush,\u2019 what is the best page to show up at the top? At that point, I may not care about other algorithmic search results. I know that hundreds of thousands of people have bookmarked that page in their personalized view of the Web, and I will trust their judgment.<\/p>\n<p>\u201cIt could change the paradigm of search if we had bookmarks on this growing space of concepts from a tremendous number of people. Basically, people would be voting with their bookmarking clicks on what\u2019s important on the Web.\u201d<\/p>\n<p>Cucerzan has taken an interesting route to arrive at this point. His early interest in mathematics led to a fascination with computers, and while pursuing a bachelor\u2019s degree in science at the University of Bucharest in his native Romania, he worked on optical character recognition, building a system that made it to the finals of the European Academic Software Award. That sparked an interest in natural-language processing, for which he received his Ph.D. from Johns Hopkins University. Upon joining Microsoft, he began working on query-log mining and information extraction, supplying a new spelling-correction technology for several Microsoft products and a tool for question answering for Encarta\u00ae.<\/p>\n<p>His current project has been bolstered by the contributions of a couple of Microsoft Research colleagues, Mike Schultz and Robert Ragno. Schultz built the data infrastructure employed by the Information-Centric Browsing and Search tool. And Ragno built an application-programming interface to pull information from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/login.live.com\/\" target=\"_blank\">Windows Live\u2122 Search<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>The system uses pre-processed information from Wikipedia and Encarta and now features 1.6 million indexed concepts. It\u2019s still growing, to the benefit of hundreds of people who are using it. And Cucerzan is looking for more.<\/p>\n<p>\u201cThe experience wouldn\u2019t be quite complete unless we also link it to the search box in <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/download\/internet-explorer.aspx\" target=\"_blank\">Internet Explorer<\/a>\u00ae,\u201d he says. \u201cIt would be nice to get feedback from users who are using this on a daily basis.<\/p>\n<p>\u201cEverybody who sees it says, \u2018Wow, I want this!\u2019 That\u2019s our feedback as of now: Everybody wants such a technology. But would people actually use it daily instead of their regular browser? We don\u2019t know.\u201d<\/p>\n<p>What he does know is that the project is plowing new ground in the fields of search and text mining.<\/p>\n<p>\u201cWhat\u2019s new here is the large scale of this concept recognition and disambiguation,\u201d Cucerzan says. \u201cThere is no other system that can go to any document on the Web right now and extract all this stuff.<\/p>\n<p>\u201cThe other exciting technology is the context-aware search. If we contextualize the search, we can get more relevant results, based on what the important concepts are in a document we are reading or editing.\u201d<\/p>\n<p>Blazing trails in the still nascent years of Web search is rewarding.<\/p>\n<p>\u201cWhat I\u2019m really happy about,\u201d Cucerzan says, \u201cis that it started from an idea that we weren\u2019t sure was doable, because nobody else had done it before on such a scale.<\/p>\n<p>\u201cTo get a fully implemented system that works and people can download and use as their browser\u2014that\u2019s a really nice thing.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Rob Knies, Managing Editor, Microsoft Research The breathtaking ascendance of Internet search over the past decade has tended to obscure the limitations of the underlying technology. So quickly has search been embraced by hundreds of millions worldwide that it is entirely natural for people to spend more time marveling over what they\u2019ve gained rather [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194460],"tags":[215618,215615,202077,204595],"research-area":[13555],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-307607","post","type-post","status-publish","format-standard","hentry","category-search-and-information-retrieval","tag-concept-based-bookmarks","tag-encarta","tag-internet-explorer","tag-windows-live","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"May 23, 2007","formattedExcerpt":"By Rob Knies, Managing Editor, Microsoft Research The breathtaking ascendance of Internet search over the past decade has tended to obscure the limitations of the underlying technology. So quickly has search been embraced by hundreds of millions worldwide that it is entirely natural for people&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/307607","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/comments?post=307607"}],"version-history":[{"count":2,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/307607\/revisions"}],"predecessor-version":[{"id":308660,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/307607\/revisions\/308660"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=307607"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/categories?post=307607"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/tags?post=307607"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=307607"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=307607"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=307607"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=307607"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=307607"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=307607"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=307607"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=307607"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}