{"id":681471,"date":"2020-09-28T08:00:43","date_gmt":"2020-09-28T15:00:43","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-project&#038;p=681471"},"modified":"2021-03-10T19:47:37","modified_gmt":"2021-03-11T03:47:37","slug":"coax-rl","status":"publish","type":"msr-project","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/project\/coax-rl\/","title":{"rendered":"coax: A Modular RL Package"},"content":{"rendered":"<h2>coax<\/h2>\n<p>coax is a modular Reinforcement Learning (RL) Python package for solving <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/gym.openai.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">OpenAI Gym<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> environments with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/jax.readthedocs.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">JAX<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>-based function approximators (using <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dm-haiku.readthedocs.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Haiku<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).<\/p>\n<h3>RL concepts, not agents<\/h3>\n<p>The primary thing that sets <strong>coax<\/strong> apart from other packages is that is designed to align with the core RL concepts, not with the high-level concept of an <em>agent<\/em>. This makes\u00a0<strong>coax<\/strong> more modular and user-friendly for RL researchers and practitioners.<\/p>\n<h3>You&#8217;re in control<\/h3>\n<p>Other RL frameworks often hide structure that you (the RL practitioner) are interested in. Most notably, the neural network architecture of the function approximators is often hidden from you. In <strong>coax<\/strong>, the network architecture takes center stage. You are in charge of defining their own forward-pass function.<\/p>\n<p>Another bit of structure that other RL frameworks hide from you is the main training loop. This makes it hard to take an algorithm from paper to code. The design of\u00a0<strong>coax<\/strong> is agnostic of the details of your training loop. You are in charge of how and when you update your function approximators.<\/p>\n<h3>Learn More<\/h3>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/coax.readthedocs.io\/\" target=\"_blank\" rel=\"noopener noreferrer\">Documentation ><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/coax.readthedocs.io\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-695523 size-medium\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_docs-300x218.png\" alt=\"Documentation of coax, plug-n-play Reinforcement Learning in Python with OpenAI Gym and JAX\" width=\"600\" height=\"436\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_docs-300x218.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_docs-1024x745.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_docs-768x559.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_docs.png 1038w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/coax-dev\/coax\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub ><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/coax-dev\/coax\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-695544 size-medium\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_code-300x266.png\" alt=\"coax on GitHub\" width=\"600\" height=\"532\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_code-300x266.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_code-1024x907.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_code-768x680.png 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/09\/coax_code.png 1038w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/note.microsoft.com\/MSR-Webinar-Beginner-Guide-RL-Registration-Live.html\" target=\"_blank\" rel=\"noopener noreferrer\">Webinar ><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/note.microsoft.com\/MSR-Webinar-Beginner-Guide-RL-Registration-Live.html\"><img decoding=\"async\" class=\"alignnone size-medium wp-image-714427\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788.jpg\" alt=\"a picture of kristian holsheimer next to his webinar title beginners guide to reinforcement learning\" width=\"600\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788.jpg 1400w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-300x169.jpg 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-1024x576.jpg 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-768x432.jpg 768w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-16x9.jpg 16w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-1066x600.jpg 1066w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-655x368.jpg 655w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-343x193.jpg 343w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-640x360.jpg 640w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-960x540.jpg 960w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/12\/MSR_kristian_holsheimer_Webinar_Hero_1400x788-1280x720.jpg 1280w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>coax is a modular Reinforcement Learning (RL) Python package for solving OpenAI Gym environments with JAX-based function approximators (using Haiku).<\/p>\n","protected":false},"featured_media":686862,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-681471","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[740044],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/681471","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":26,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/681471\/revisions"}],"predecessor-version":[{"id":732436,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/681471\/revisions\/732436"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media\/686862"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=681471"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=681471"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=681471"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=681471"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=681471"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}