{"id":159975,"date":"2010-01-01T00:00:00","date_gmt":"2010-01-01T00:00:00","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/msr-research-item\/fingerprinting-ratings-for-collaborative-filtering-theoretical-and-empirical-analysis\/"},"modified":"2018-10-16T20:03:43","modified_gmt":"2018-10-17T03:03:43","slug":"fingerprinting-ratings-for-collaborative-filtering-theoretical-and-empirical-analysis","status":"publish","type":"msr-research-item","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/fingerprinting-ratings-for-collaborative-filtering-theoretical-and-empirical-analysis\/","title":{"rendered":"Fingerprinting Ratings For Collaborative Filtering &#8220;\u201d Theoretical and Empirical Analysis"},"content":{"rendered":"<p>We consider \ufb01ngerprinting methods for collaborative \ufb01ltering (CF) systems. In general, CF systems show their real strength when supplied with enormous data sets. Earlier work already suggests sketching techniques to handle massive amounts of information, but most prior analysis has so far been limited to non-ranking application scenarios and has focused mainly on a theoretical analysis. We demonstrate how to use \ufb01ngerprinting methods to compute a family of rank correlation coef\ufb01cients. Our methods allow identifying users who have similar rankings over a certain set of items, a problem that lies at the heart of CF applications. We show that our method allows approximating rank correlations with high accuracy and con\ufb01dence. We examine the suggested methods empirically through a recommender system for the Net\ufb02ix dataset, showing that the required \ufb01ngerprint sizes are even smaller than the theoretical analysis suggests. We also explore the of use standard hash functions rather than min-wise independent hashes and the relation between the quality of the \ufb01nal recommendations and the \ufb01ngerprint size.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We consider \ufb01ngerprinting methods for collaborative \ufb01ltering (CF) systems. In general, CF systems show their real strength when supplied with enormous data sets. Earlier work already suggests sketching techniques to handle massive amounts of information, but most prior analysis has so far been limited to non-ranking application scenarios and has focused mainly on a theoretical [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"yobach"},{"type":"user_nicename","value":"rherb"},{"type":"user_nicename","value":"t-yobach"}],"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"SPIRE 2010","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"SPIRE 2010","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2010-01-01","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2010,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13561,13555],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-159975","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-algorithms","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"","msr_edition":"SPIRE 2010","msr_affiliation":"","msr_published_date":"2010-01-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"207268","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"SketchEmp.pdf","viewUrl":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/02\/SketchEmp.pdf","id":207268,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":207268,"url":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/02\/SketchEmp.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"yobach","user_id":35033,"rest_url":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=yobach"},{"type":"user_nicename","value":"rherb","user_id":33390,"rest_url":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=rherb"},{"type":"user_nicename","value":"t-yobach","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/159975","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/159975\/revisions"}],"predecessor-version":[{"id":521495,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/159975\/revisions\/521495"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=159975"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=159975"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=159975"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=159975"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=159975"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=159975"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=159975"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=159975"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=159975"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=159975"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=159975"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=159975"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=159975"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}