{"id":305405,"date":"2011-10-31T09:30:58","date_gmt":"2011-10-31T16:30:58","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?p=305405"},"modified":"2016-10-13T11:29:23","modified_gmt":"2016-10-13T18:29:23","slug":"helping-kinect-recognize-faces","status":"publish","type":"post","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/blog\/helping-kinect-recognize-faces\/","title":{"rendered":"Helping Kinect Recognize Faces"},"content":{"rendered":"<p><em>By Douglas Gantenbein, Senior Writer, Microsoft News Center<\/em><\/p>\n<p>To use a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.xbox.com\/en-US\/xbox-360\/accessories\/kinect\" target=\"_blank\">Kinect for Xbox 360<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> gaming device is to see something akin to magic. Different people move in and out of its view, and Kinect recognizes the change in a player and responds accordingly.<\/p>\n<p>It accomplishes this task despite the enormous variation in what it sees. Lighting can change within a room. A player might appear close to the Kinect one minute, farther away the next. And faces change second to second as players react to the action.<\/p>\n<p>Kinect Identity, as the device\u2019s player-recognition tool set is called, recognizes people by looking for three visual cues:<\/p>\n<ul>\n<li>The heights of the players.<\/li>\n<li>The color of their clothing.<\/li>\n<li>Their faces.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-305414\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Kinect-Contributions.png\" alt=\"Kinect contributions\" width=\"250\" height=\"250\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Kinect-Contributions.png 250w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Kinect-Contributions-150x150.png 150w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Kinect-Contributions-180x180.png 180w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/>That last element is the key. Players might be close in height. They might be wearing similarly colored clothing. But faces are as individual as, well, individuals.<\/p>\n<p>That\u2019s where Microsoft Research work played a significant role in helping Kinect learn who is who. Jian Sun\u2014a senior researcher with <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/lab\/microsoft-research-asia\/\" target=\"_blank\">Microsoft Research Asia<\/a>\u2019s <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/group\/visual-computing\/\" target=\"_blank\">Visual Computing Group<\/a>\u2014worked with colleagues outside Microsoft on solving the complicated task of teaching a machine how to recognize people when they change poses, frown or smile, have shadows across their face, or are brightly lit.<\/p>\n<p>Identifying a face is not an easy task for a machine.<\/p>\n<p>\u201cThe fundamental difficulty comes from intrapersonal variation,\u201d Sun says. \u201cThe face of a single person can appear very different under different conditions. Due to lighting, expressions, or poses, there can be even bigger differences than between two people.\u201d<\/p>\n<h2>Prior Work<\/h2>\n<p>Sun began work on face recognition three years ago. His work contributed to a face-recognition feature in <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/download\/details.aspx?id=26689\" target=\"_blank\">Windows Live Photo Gallery<\/a> that enables users to tag and search for photos of friends or family members using face recognition.<\/p>\n<div id=\"attachment_305417\" style=\"width: 320px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-305417\" class=\"size-full wp-image-305417\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Jian-Sun.png\" alt=\"Jian Sun\" width=\"310\" height=\"407\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Jian-Sun.png 310w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/Jian-Sun-229x300.png 229w\" sizes=\"auto, (max-width: 310px) 100vw, 310px\" \/><p id=\"caption-attachment-305417\" class=\"wp-caption-text\">Jian Sun<\/p><\/div>\n<p>Sun acknowledges that, at least for now, a machine never can be 100 percent successful at detecting all the variations a single face can exhibit. The trick, he says, is giving Kinect the ability to make extremely educated guesses.<\/p>\n<p>Much of the face-recognition technology in Kinect is based on a paper called <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/CVPR10_FaceReco.pdf\" target=\"_blank\"><em>Face Recognition with Learning-based Descriptor<\/em><\/a>, co-authored by Sun along with Zhimin Cao, from The Chinese University of Hong Kong; Qi Yin, from Tsinghua University; and professor Xiaoou Tang, from The Chinese University of Hong Kong.<\/p>\n<p>Most face-recognition tools take what seems like the obvious route: They compare any faces they see with a stored database of faces. While a simple approach, it stumbles when confronted by faces with different lighting, or when a face is scowling when the initial face used as reference face is smiling.<\/p>\n<p>Sun and his co-authors devised a method for teaching a device to recognize faces based on what facial features are most prominent under different poses or lighting. That is, a nose or the left or right eye might be more critical to recognition than other features, depending on the pose.<\/p>\n<p>The technique uses two steps. First, it extracts nine key landmarks from a face: nose, mouth, eyes, and so on. The images are filtered to remove illumination variations, then assigned a compact snippet of computer code.<\/p>\n<p>Next, the system determines the facial pose\u2014whether the subject is looking straight at the camera or looking left or right. Poses can vary widely, of course, so Kinect uses an algorithm that determines what seems to be the most likely candidate. The system then matches the subject\u2019s eyes, mouth, or nose to images in its database and finds the best match.<\/p>\n<p>The facial-recognition tool also determines where the face is appearing in its field of view and \u201cnormalizes\u201d the facial size to compensate for whether the player is near the Kinect or far away.<\/p>\n<p>Under most conditions, the face-recognition tool achieves a success rate of nearly 85 percent.<\/p>\n<h2>Additional Approaches<\/h2>\n<p>Sun\u2019s team also contributed to Kinect\u2019s two other player-recognition approaches: identifying a player based on clothing and on height. Working with the Kinect product team, Sun and his colleague <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/people\/yichenw\/\" target=\"_blank\">Yichen Wei<\/a> helped develop Kinect&#8217;s approach to avoiding mistakes. And those do occur\u2014as impressive as the facial-recognition technology is, it\u2019s not perfect.<\/p>\n<p>For each new game session, Kinect gathers the players\u2019 characteristics\u2014face, height, and clothing color\u2014and matches them against information it has stored about previous players. For Kinect to \u201cidentify\u201d a player, it must have one positive response, such as a recognized height, and no negative responses, such as wrong clothing color.<\/p>\n<p>The facial-recognition component acts as something of a tiebreaker. It\u2019s part of the recognition process itself, of course, and in the case of a strong facial match, it will identify a player even if one of the other identifiers\u2014height or clothing color\u2014comes back as a negative match.<\/p>\n<p>The adoption of Microsoft Research\u2019s work by Kinect occurred in part through serendipity. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.leyvand.com\/\" target=\"_blank\">Tommer Leyvand<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, now a principal development lead with the Kinect team, interned at Microsoft Research Asia in 2005, and became familiar with that facility\u2019s facial-recognition work.<\/p>\n<p>By early 2009, Leyvand was part of the Kinect team in Redmond.<\/p>\n<p>\u201cWhen Kinect came along, we knew we were going to need facial recognition as part of it, and I knew Microsoft Research Asia had a lot of papers out on that technology,\u201d he says. \u201cThey have been part of our visual-features team ever since\u2014and they came over to Redmond at crunch time to help get Kinect ready for shipping. It was a very close working relationship.\u201d<\/p>\n<p>Microsoft Research Asia scientist Yichen Wei worked with Sun and the Kinect team on assembling the final Kinect Identify tool set.<\/p>\n<p>Kinect\u2019s ability to recognize people serves two purposes. One is to identify players, automatically sign them into their <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.xbox.com\/en-US\/live\/\" target=\"_blank\">Xbox LIVE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> account, and deploy their avatars. Of course, during a game in which players can enter and exit a game, Kinect keeps track when changes are made and puts the right player into the game.<\/p>\n<p>The way it does so is what makes Kinect so amazing to people.<\/p>\n<h2>\u2018Part of the Experience\u2019<\/h2>\n<p>\u201cIt becomes part of the experience,\u201d Leyvand says. \u201cThe magic is when you don\u2019t do anything. You just stand there, and it knows who you are.\u201d<\/p>\n<p>Sun is working on how the next generation of Kinect will handle identities. He also is pursuing a new approach to facial recognition, one that recognizes faces in the same way people do. In a second paper on face recognition, <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2016\/10\/cvpr11_faceapmodel.pdf\" target=\"_blank\"><em>An Associate-Predict Model for Face Recognition<\/em><\/a>, Sun and co-authors Yin and Tang conjecture that a person takes prior memories of other people and uses those to predict how a particular person will appear under different settings.<\/p>\n<p>To recognize a face that has changed pose or is under different lighting, the associate-predict model begins by building a database of \u201cgeneric\u201d faces. Facial components are broken down by key facial landmarks\u2014such as eye centers and mouth corners\u2014and 12 other facial features. This serves as the recognition engine\u2019s basic \u201cmemory\u201d of how faces appear under different conditions or in different poses.<\/p>\n<p>In the next step, the face of a specific subject\u2014such as a Kinect player\u2014is compared to the 28 different \u201cmemory\u201d images: seven poses times four lighting variations. The recognition engine \u201cassociates\u201d the subject\u2019s face to the memory bank of stored faces, matching one or more key facial features, such as an eye that is looking to the left and is on the shadowed side of a face. Then it uses that information to make an educated guess as to what the subject\u2019s face will look like with a different pose or under different lighting.<\/p>\n<p>The current Kinect\u2019s player-recognition ability seems uncanny. Future generations of the device could appear to be downright supernatural.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Douglas Gantenbein, Senior Writer, Microsoft News Center To use a Kinect for Xbox 360 gaming device is to see something akin to magic. Different people move in and out of its view, and Kinect recognizes the change in a player and responds accordingly. It accomplishes this task despite the enormous variation in what it [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194455],"tags":[187393,214160,196135,214154,214157,204599,187150],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-305405","post","type-post","status-publish","format-standard","hentry","category-machine-learning","tag-face-recognition","tag-facial-features","tag-kinect-for-xbox-360","tag-kinect-identity","tag-player-recognition","tag-windows-live-photo-gallery","tag-xbox-live","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[247949],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"October 31, 2011","formattedExcerpt":"By Douglas Gantenbein, Senior Writer, Microsoft News Center To use a Kinect for Xbox 360 gaming device is to see something akin to magic. Different people move in and out of its view, and Kinect recognizes the change in a player and responds accordingly. It&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/305405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/comments?post=305405"}],"version-history":[{"count":1,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/305405\/revisions"}],"predecessor-version":[{"id":305420,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/posts\/305405\/revisions\/305420"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=305405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/categories?post=305405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/tags?post=305405"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=305405"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=305405"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=305405"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=305405"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=305405"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=305405"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=305405"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=305405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}