{"id":704944,"date":"2020-11-10T18:55:13","date_gmt":"2020-11-11T02:55:13","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-research-item&#038;p=704944"},"modified":"2020-11-10T18:55:13","modified_gmt":"2020-11-11T02:55:13","slug":"projection-efficient-subgradient-method-and-optimal-nonsmooth-frank-wolfe-method","status":"publish","type":"msr-research-item","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/publication\/projection-efficient-subgradient-method-and-optimal-nonsmooth-frank-wolfe-method\/","title":{"rendered":"Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method"},"content":{"rendered":"<p>We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve\u00a0<span id=\"MathJax-Element-1-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-1\" class=\"math\"><span id=\"MathJax-Span-2\" class=\"mrow\"><span id=\"MathJax-Span-3\" class=\"mi\">\u03f5<\/span><\/span><\/span><\/span>-suboptimality in high-dimensions,\u00a0<span id=\"MathJax-Element-2-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-6\" class=\"mi\">\u0398<\/span><span id=\"MathJax-Span-7\" class=\"mo\">(<\/span><span id=\"MathJax-Span-8\" class=\"msubsup\"><span id=\"MathJax-Span-9\" class=\"mi\">\u03f5<\/span><sup>\u22122<\/sup><span id=\"MathJax-Span-14\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0FO calls are necessary. This is achieved by the projected subgradient method (PGD). However, PGD also entails\u00a0<span id=\"MathJax-Element-3-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-15\" class=\"math\"><span id=\"MathJax-Span-16\" class=\"mrow\"><span id=\"MathJax-Span-17\" class=\"mi\">O<\/span><span id=\"MathJax-Span-18\" class=\"mo\">(<\/span><span id=\"MathJax-Span-19\" class=\"msubsup\"><span id=\"MathJax-Span-20\" class=\"mi\">\u03f5<span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-8\" class=\"msubsup\"><sup>\u22122<\/sup><\/span><\/span><\/span><\/span><\/span><span id=\"MathJax-Span-25\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0PO calls, which may be computationally costlier than FO calls (e.g. nuclear norm constraints). Improving this PO calls complexity of PGD is largely unexplored, despite the fundamental nature of this problem and extensive literature. We present first such improvement. This only requires a mild assumption that the objective function, when extended to a slightly larger neighborhood of the constraint set, still remains Lipschitz and accessible via FO. In particular, we introduce MOPES method, which carefully combines Moreau-Yosida smoothing and accelerated first-order schemes. This is guaranteed to find a feasible\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-26\" class=\"math\"><span id=\"MathJax-Span-27\" class=\"mrow\"><span id=\"MathJax-Span-28\" class=\"mi\">\u03f5<\/span><\/span><\/span><\/span>-suboptimal solution using only\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-29\" class=\"math\"><span id=\"MathJax-Span-30\" class=\"mrow\"><span id=\"MathJax-Span-31\" class=\"mi\">O<\/span><span id=\"MathJax-Span-32\" class=\"mo\">(<\/span><span id=\"MathJax-Span-33\" class=\"msubsup\"><span id=\"MathJax-Span-34\" class=\"mi\">\u03f5<span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-8\" class=\"msubsup\"><sup>\u22121<\/sup><\/span><\/span><\/span><\/span><\/span><span id=\"MathJax-Span-39\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0PO calls and optimal\u00a0<span id=\"MathJax-Element-6-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-40\" class=\"math\"><span id=\"MathJax-Span-41\" class=\"mrow\"><span id=\"MathJax-Span-42\" class=\"mi\">O<\/span><span id=\"MathJax-Span-43\" class=\"mo\">(<\/span><span id=\"MathJax-Span-44\" class=\"msubsup\"><span id=\"MathJax-Span-45\" class=\"mi\">\u03f5<span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-8\" class=\"msubsup\"><sup>\u22122<\/sup><\/span><\/span><\/span><\/span><\/span><span id=\"MathJax-Span-50\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0FO calls. Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible\u00a0<span id=\"MathJax-Element-7-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-51\" class=\"math\"><span id=\"MathJax-Span-52\" class=\"mrow\"><span id=\"MathJax-Span-53\" class=\"mi\">\u03f5<\/span><\/span><\/span><\/span>-suboptimal solution using\u00a0<span id=\"MathJax-Element-8-Frame\" class=\"MathJax\" tabindex=\"0\"><span id=\"MathJax-Span-54\" class=\"math\"><span id=\"MathJax-Span-55\" class=\"mrow\"><span id=\"MathJax-Span-56\" class=\"mi\">O<\/span><span id=\"MathJax-Span-57\" class=\"mo\">(<\/span><span id=\"MathJax-Span-58\" class=\"msubsup\"><span id=\"MathJax-Span-59\" class=\"mi\">\u03f5<span id=\"MathJax-Span-4\" class=\"math\"><span id=\"MathJax-Span-5\" class=\"mrow\"><span id=\"MathJax-Span-8\" class=\"msubsup\"><sup>\u22122<\/sup><\/span><\/span><\/span><\/span><\/span><span id=\"MathJax-Span-64\" class=\"mo\">)<\/span><\/span><\/span><\/span>\u00a0LMO calls and FO calls&#8212;both match known lower bounds, resolving a question left open since White (1993). Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve\u00a0\u03f5-suboptimality in high-dimensions,\u00a0\u0398(\u03f5\u22122)\u00a0FO calls are necessary. This is achieved by [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"ACM","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"NeurIPS 2020","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2020-11-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/neurips.cc\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13546],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-704944","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computational-sciences-mathematics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-11-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"ACM","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2010.01848","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Kiran Thekumparampil","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Prateek Jain","user_id":33278,"rest_url":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Prateek Jain"},{"type":"user_nicename","value":"Praneeth Netrapalli","user_id":33279,"rest_url":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Praneeth Netrapalli"},{"type":"text","value":"Sewoong Oh","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/704944","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/704944\/revisions"}],"predecessor-version":[{"id":705205,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/704944\/revisions\/705205"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=704944"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=704944"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=704944"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=704944"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=704944"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=704944"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=704944"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=704944"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=704944"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=704944"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=704944"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=704944"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=704944"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}