{"id":717085,"date":"2021-01-13T07:51:45","date_gmt":"2021-01-13T15:51:45","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-project&#038;p=717085"},"modified":"2021-01-15T00:46:14","modified_gmt":"2021-01-15T08:46:14","slug":"data2text-automated-text-generation-from-structured-data","status":"publish","type":"msr-project","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/project\/data2text-automated-text-generation-from-structured-data\/","title":{"rendered":"Data2Text: Automated Text Generation from Structured Data"},"content":{"rendered":"<div style=\"width: 640px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-717085-1\" width=\"640\" height=\"360\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/microsoft.sharepoint.com\/teams\/Data2Text\/DocLib1\/Data2Text%20Introduction.mp4?_=1\" \/><a href=\"https:\/\/microsoft.sharepoint.com\/teams\/Data2Text\/DocLib1\/Data2Text%20Introduction.mp4\">https:\/\/microsoft.sharepoint.com\/teams\/Data2Text\/DocLib1\/Data2Text%20Introduction.mp4<\/a><\/video><\/div>\n<p>The Data2Text (or Data-to-Text) project aims to automatically generate fluent and fact-based descriptions or utterances given data tables. Typical business applications for text generation include the generation of financial and sports news stories, the generation of product descriptions, the analysis and interpretation of business data, and the analysis and interpretation of Internet of Things data, etc. See below for a few Data2Text applications.<\/p>\n<p>Product Description Generation <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription.png\" alt=\"graphical user interface, application\" width=\"1250\" height=\"288\" class=\"alignnone size-full wp-image-717112\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription.png 1250w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription-300x69.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription-1024x236.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription-768x177.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription-16x4.png 16w\" sizes=\"auto, (max-width: 1250px) 100vw, 1250px\" \/><\/p>\n<p>Writing Assistant <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2.png\" alt=\"\" width=\"1246\" height=\"216\" class=\"alignnone size-full wp-image-717115\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2.png 1246w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2-300x52.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2-1024x178.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2-768x133.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-ProductDescription2-16x3.png 16w\" sizes=\"auto, (max-width: 1246px) 100vw, 1246px\" \/><\/p>\n<p>Fact-based QA <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA.png\" alt=\"\" width=\"1246\" height=\"202\" class=\"alignnone size-full wp-image-717109\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA.png 1246w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA-300x49.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA-1024x166.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA-768x125.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedQA-16x3.png 16w\" sizes=\"auto, (max-width: 1246px) 100vw, 1246px\" \/><\/p>\n<p>Fact-based Conversation <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation.png\" alt=\"\" width=\"1254\" height=\"134\" class=\"alignnone size-full wp-image-717106\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation.png 1254w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation-300x32.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation-1024x109.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation-768x82.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-FactBasedConversation-16x2.png 16w\" sizes=\"auto, (max-width: 1254px) 100vw, 1254px\" \/><\/p>\n<p>Analytic Narrative Generation <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration.png\" alt=\"graphical user interface, application, Word\" width=\"1250\" height=\"224\" class=\"alignnone size-full wp-image-717103\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration.png 1250w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration-300x54.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration-1024x184.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration-768x138.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-AnalyticDataNatrrativeGeneration-16x3.png 16w\" sizes=\"auto, (max-width: 1250px) 100vw, 1250px\" \/><\/p>\n<p>The mainstream methods of data-to-text generation include rule-based, template-based approaches and neural network-based approaches. Rule-based and template-based approaches are the mainstream approaches in the relevant applications, as they are clearly interpretable and controllable, making it easier to ensure the correctness of the generated text contents. However, how to create rules and extract high-quality templates require labor-intensive manual feature engineering. On the contrary, the neural network-based models are mainly data-driven, do not need too much human intervention, and can easily produce rich and smooth text description. However, users often can not directly manipulate the content generation and it is difficult to ensure that generated texts are faithful to their input data. <\/p>\n<p>Fact-based Data-to-Text Generation <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process.png\" alt=\"\" width=\"2421\" height=\"473\" class=\"alignnone size-full wp-image-717157\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process.png 2421w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-300x59.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-1024x200.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-768x150.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-1536x300.png 1536w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-2048x400.png 2048w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2021\/01\/D2T-process-16x3.png 16w\" sizes=\"auto, (max-width: 2421px) 100vw, 2421px\" \/><\/p>\n<p>The Data2Text project aims to develop automated high-fidelity data-to-text generation technologies to address the shortcomings of template-based and the neural network-based approaches.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Data2Text project aims to automatically generate fluent and fact-based descriptions or utterances given a data table. Typical business applications for text generation include the generation of financial and sports news stories, the generation of product descriptions, the analysis and interpretation of business data, and the analysis and interpretation of Internet of Things data, etc. Figure 1 gives an example of the automatic generation of weather forecasts. Figure 1a is a structured weather data collected by various sensors, the machine will be figure 1a data as input, output figure 1b weather forecast.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13563,13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-717085","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2016-09-01","related-publications":[714370,714388,714415,714688,714694,714700,714730],"related-downloads":[],"related-videos":[],"related-groups":[144919,714577],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"Articles","content":"<ol>\r\n \t<li><a href=\"https:\/\/zhuanlan.zhihu.com\/p\/57709494\">Summary or recent data-to-text generation papers<\/a> (in Chinese, 2019-02-26, \"\u6570\u636e\u5230\u6587\u672c\u751f\u6210\u7684\u8fd1\u671f\u4f18\u8d28\u8bba\u6587\u89e3\u8bfb\"<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/57709494\">)<\/a>.<\/li>\r\n \t<li><a href=\"https:\/\/zhuanlan.zhihu.com\/p\/26477869\">Learning to write data-based articles automatically<\/a> (in Chinese, 2017-04-21, \"\u5982\u4f55\u8ba9\u4eba\u5de5\u667a\u80fd\u5b66\u4f1a\u7528\u6570\u636e\u8bf4\u8bdd\"<a href=\"https:\/\/zhuanlan.zhihu.com\/p\/26477869\">)<\/a>.<\/li>\r\n<\/ol>"}],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Chin-Yew Lin","user_id":31493,"people_section":"Section name 0","alias":"cyl"}],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/717085","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":19,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/717085\/revisions"}],"predecessor-version":[{"id":717541,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/717085\/revisions\/717541"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=717085"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=717085"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=717085"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=717085"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=717085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}