{"id":377990,"date":"2017-04-18T11:51:36","date_gmt":"2017-04-18T18:51:36","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-project&#038;p=377990"},"modified":"2019-08-19T10:03:33","modified_gmt":"2019-08-19T17:03:33","slug":"deep-reinforcement-learning-goal-oriented-dialogue","status":"publish","type":"msr-project","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/project\/deep-reinforcement-learning-goal-oriented-dialogue\/","title":{"rendered":"Deep Reinforcement Learning for Goal-Oriented Dialogues"},"content":{"rendered":"<p>Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/microsoft-dialogue-challenge-building-end-to-end-task-completion-dialogue-systems\/\">Proposal<\/a>] All the data, source code and schedule information will be updated <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/xiul-msr\/e2e_dialog_challenge\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p style=\"text-align: left;\">This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), natural language generation (NLG), and dialogue management (DM) that consists of state tracking and policy learning. Our research focus is on deep reinforcement learning approaches for dialogue management in goal-oriented dialogue settings, including movie ticket booking, trip planning, sales assistant etc.<\/p>\n<div id=\"attachment_398516\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-398516\" class=\"wp-image-398516 size-large\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2017\/04\/composite-dialogue-1024x459.png\" alt=\"\" width=\"1024\" height=\"459\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2017\/04\/composite-dialogue-1024x459.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2017\/04\/composite-dialogue-300x134.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2017\/04\/composite-dialogue-768x344.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2017\/04\/composite-dialogue.png 1715w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-398516\" class=\"wp-caption-text\">Composite Task Completion Dialogue System<\/p><\/div>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/MiuLab\/TC-Bot\"><strong>User Simulator<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nTraining reinforcement learners is challenging because they need an environment to operate in. Thus, we developed a user simulator for learning and evaluation. [<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1612.05688\">Li et al. 2016<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>]<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/MiuLab\/KB-InfoBot\"><strong>Infobot<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nWe developed the first end-to-end reinforcement learning agent with differential knowledge base access. [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/towards-end-end-reinforcement-learning-dialogue-agents-information-access\/\">Dhuwan et al. ACL 2017<\/a>], and the first end-to-end dialogue policy trained with both supervised and reinforcement learning [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/end-end-lstm-based-dialog-control-optimized-supervised-reinforcement-learning\/\">Williams et al. 2016<\/a>].<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/MiuLab\/TC-Bot\"><strong>Task-completion bot<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nWe developed an end-to-end learning framework for task-completion neural dialogue systems [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/end-end-task-completion-neural-dialogue-systems\/\">Li et al. IJCNLP 2017<\/a>]. We also developed an BBQ Networks (Bayes-by-Backprop Q-Networks)\u00a0which performs efficient exploration for dialogue policy learning [<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1608.05081\">Lipton et al. 2017<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>], as well as efficient actor-critic methods which substantially reduce the sample complexity\u00a0for end-to-end learning of LSTM-based dialogue policy [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/sample-efficient-deep-reinforcement-learning-dialog-control\/\">Asadi et al. 2016<\/a>].<\/p>\n<p><strong>Composite Task-completion bot<\/strong><br \/>\nWe developed a composite task-completion dialogue system, based on hierarchical reinforcement learning to learn the dialogue policies that operate at different temporal scales, and demonstrated its significant improvement over flat deep reinforcement learning in both simulation and human evaluation [<a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/composite-task-completion-dialogue-system-via-hierarchical-deep-reinforcement-learning\/\">Peng et al. EMNLP 2017<\/a>]. (<em>The source code will be released soon.<\/em>)<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h1 class=\"title mathjax\" style=\"margin: 0.5em 0px 0.5em 20px; color: #000000; line-height: 28.8px; text-indent: 0px; letter-spacing: normal; font-family: 'Lucida Grande', helvetica, arial, verdana, sans-serif; font-size: x-large; font-style: normal; font-weight: bold; background-color: #ffffff;\"><\/h1>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [Proposal] All the data, source code and schedule information will be updated here. This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-377990","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2016-04-18","related-publications":[482721,500372,493946,437235,438609,454764,369872,377222,379010,376403,418055,372953,340424,347972,339806,294722,294719,305843,591757,552729,508631,506330,502184],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"Research Team","alias":"jfgao"},{"type":"guest","display_name":"Kavosh Asadi","user_id":399587,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Yun-Nung (Vivian) Chen","user_id":398510,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Bhuwan Dhingra","user_id":398498,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Zachary Lipton","user_id":398495,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Baolin Peng","user_id":398489,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Da Tang","user_id":398501,"people_section":"Past Interns & Visitors","alias":""}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":35,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990\/revisions"}],"predecessor-version":[{"id":604194,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990\/revisions\/604194"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=377990"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=377990"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=377990"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=377990"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=377990"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}