{"id":862047,"date":"2022-07-17T00:36:39","date_gmt":"2022-07-17T07:36:39","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-project&#038;p=862047"},"modified":"2022-08-08T08:29:00","modified_gmt":"2022-08-08T15:29:00","slug":"nuwa-infinity","status":"publish","type":"msr-project","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/project\/nuwa-infinity\/","title":{"rendered":"NUWA Infinity"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"403\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-scaled.jpg\" class=\"attachment-full size-full\" alt=\"Cover image\" style=\"object-position: 50% 71%\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-scaled.jpg 2560w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-300x47.jpg 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-1024x161.jpg 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-768x121.jpg 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-1536x242.jpg 1536w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-2048x323.jpg 2048w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2022\/07\/7.5M-240x38.jpg 240w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 align-self-center\">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 id=\"nuwa-infinity\">NUWA Infinity<\/h1>\n\n\n\n<p>A multimodal generative foundation model that is designed to generate high-quality images and videos from given text, image or video input.<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily sized high-resolution images or long-duration videos.\u00a0 An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches,\u00a0and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to <a href=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/video\/research-talk-nuwa-neural-visual-world-creation-with-multimodal-pretraining\/\">NUWA<\/a>, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. <\/p>\n\n\n\n<p>Please visit our homepage to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nuwa-infinity.microsoft.com\" target=\"_blank\" rel=\"noopener noreferrer\">read more ><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"NUWAInfinity\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/9ocpo6VVbho?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n","protected":false},"excerpt":{"rendered":"<p>A multimodal generative foundation model that is designed to generate high-quality images and videos from given text, image or video input. NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily sized high-resolution images or long-duration videos.\u00a0 An autoregressive over autoregressive generation mechanism is proposed to deal with [&hellip;]<\/p>\n","protected":false},"featured_media":862062,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-862047","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Scarlett Li","user_id":37736,"people_section":"Section name 0","alias":"scarli"},{"type":"user_nicename","display_name":"Yu Liu","user_id":35030,"people_section":"Section name 0","alias":"yluiu"},{"type":"user_nicename","display_name":"Yang Ou","user_id":37742,"people_section":"Section name 0","alias":"yaou"},{"type":"user_nicename","display_name":"Lijuan Wang","user_id":32680,"people_section":"Section name 0","alias":"lijuanw"},{"type":"user_nicename","display_name":"Yan Xia","user_id":34972,"people_section":"Section name 0","alias":"yanxia"},{"type":"user_nicename","display_name":"Fan Yang","user_id":31782,"people_section":"Section name 0","alias":"fanyang"}],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/862047","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":19,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/862047\/revisions"}],"predecessor-version":[{"id":865341,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/862047\/revisions\/865341"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media\/862062"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=862047"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=862047"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=862047"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=862047"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=862047"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}