{"id":698455,"date":"2020-10-16T00:44:48","date_gmt":"2020-10-16T07:44:48","guid":{"rendered":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/?post_type=msr-blog-post&#038;p=698455"},"modified":"2020-10-26T21:32:22","modified_gmt":"2020-10-27T04:32:22","slug":"enabling-linear-acceleration-and-lossless-performance-for-large-scale-deep-learning-training-a-bmuf-based-adam-optimizer-parallelization-practice","status":"publish","type":"msr-blog-post","link":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/articles\/enabling-linear-acceleration-and-lossless-performance-for-large-scale-deep-learning-training-a-bmuf-based-adam-optimizer-parallelization-practice\/","title":{"rendered":"Enabling linear acceleration and lossless performance for large-scale deep learning training, a BMUF-based Adam optimizer parallelization practice"},"content":{"rendered":"<p>As an adaptive learning rate stochastic optimization method, Adam has been widely used in the field of deep learning since it was first proposed in 2014. In order to improve its training efficiency when applied to large-scale tasks, Adam is often combined with the synchronous stochastic gradient (SSG) technique to run in parallel on multiple workers. In this article, we refer to this method as \u201cSync-Adam.\u201d<\/p>\n<p>Essentially, Sync-Adam speeds up training by distributing the gradient computation of samples within a minibatch to multiple workers, and so communication occurs very frequently. With the increase of parallel workers, the number of samples within a minibatch also increases proportionally, which often hurts the final model performance. To solve the poor scalability issue of SSG-based Adam, we plugged Adam into a Blockwise Model-Update Filtering (BMUF) framework.<\/p>\n<p>BMUF is a communication-efficient distributed optimization framework proposed by researchers from MSRA\u2019s speech group in 2016. This framework periodically synchronizes model-update information among parallel workers and combines it with historically updated information to improve the performance of the global model. Compared with the SSG-based method, BMUF realizes low communication frequency, near linear speed up, and very little model performance degradation. Given all these advantages, BMUF has been widely used in the industry to train large-scale deep learning based models.<\/p>\n<p>We used BMUF to parallelize Adam and evaluated it on Microsoft\u2019s large-scale OCR and speech product tasks. Experimental results show that BMUF-Adam delivers almost linear speed-up without performance degradation with up to 64 workers on large-scale OCR tasks, and similar effects are achieved on large vocabulary continuous speech recognition (LVCSR) task with up to 32 workers.<\/p>\n<p>Next, we will investigate how we can empower Adam with BMUF to achieve further positive results.<\/p>\n<h2>Review of BMUF<\/h2>\n<p>In the BMUF-based training framework, we have N parallel workers. A worker can be a GPU card, a compute node, etc. Given a block of training data, it will be partitioned into N splits, and each split will contain \u03c4 mini-batches. Starting from a common initial model parameter \u03b8_(t-\u03c4)^((init)), all workers will update their local models with respective data splits for \u03c4 steps in parallel to obtain N local models{\u03b8_(t,1),\u03b8_(t,2),\u2026,\u03b8_(t,N)}. This procedure is called \u201cintra-block parallel optimization\u201d (IBPO). Instead of treating the averaged local models \u03b8 \u0305_t as an updated global model, BMUF introduces a block momentum to leverage historical updated information to obtain a better global model as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698461\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-1.png\" alt=\"\" width=\"340\" height=\"45\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-1.png 340w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-1-300x40.png 300w\" sizes=\"auto, (max-width: 340px) 100vw, 340px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698464\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-2.png\" alt=\"\" width=\"191\" height=\"33\" \/><\/p>\n<p>where \u03b7 refers to block momentum and \u03b6 refers to block learning rate. We set \u03b6=1 in this paper, which is common practice for BMUF. As discussed in [2], block momentum can compensate per mini-batch&#8217;s inadequate contribution to the final model-update caused by averaged operation to improve model performance. We defined a new variable \u03c1_n to represent the number of equivalent mini-batches required to get \u0394_n. Since<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698467\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-3.png\" alt=\"\" width=\"334\" height=\"87\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-3.png 334w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-3-300x78.png 300w\" sizes=\"auto, (max-width: 334px) 100vw, 334px\" \/><\/p>\n<p>therefore\uff1a<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698470\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-4.png\" alt=\"\" width=\"304\" height=\"132\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-4.png 304w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-4-300x130.png 300w\" sizes=\"auto, (max-width: 304px) 100vw, 304px\" \/><\/p>\n<p>If \u03b7=1-1\/N according to [2], lim\u252c(n\u2192\u221e)\u2061(\u03c1_n)=N\u03c4, which is equal to the number of mini-batches in a data block. This deduction shows that lim\u252c(n\u2192\u221e)(\u0394_n)can simulate model-update resulting from processing N\u03c4mini-batch data blocks in serial. This conclusion requires an assumption that \u03b8 \u0305_t-\u03b8_(t-\u03c4)^((init)) is stationary.<\/p>\n<p>In this article, we use the Nesterov block momentum technique to obtain the initial model for the next IBPO operation.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698473\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-5.png\" alt=\"\" width=\"212\" height=\"42\" \/><\/p>\n<p>Since<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698476\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-6.png\" alt=\"\" width=\"481\" height=\"44\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-6.png 481w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-6-300x27.png 300w\" sizes=\"auto, (max-width: 481px) 100vw, 481px\" \/><\/p>\n<p>therefore,<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698479\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-7.png\" alt=\"\" width=\"211\" height=\"49\" \/><\/p>\n<h2>Review of Adam<\/h2>\n<p>Adam is an adaptive learning rate based stochastic optimizer, whose learning rates are determined by the first and second moment of the stochastic gradient as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698482\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-8.png\" alt=\"\" width=\"324\" height=\"83\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-8.png 324w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-8-300x77.png 300w\" sizes=\"auto, (max-width: 324px) 100vw, 324px\" \/><\/p>\n<p>where \u2a00 represents element-wise multiplication and g_t is the stochastic gradient of the t-th mini-batch. Based on the assumption that $ E[g_t]and E[g_t\u2299g_t]are stationary, Adam uses an exponential moving average to estimate these two moments as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698485\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-9.png\" alt=\"a close up of a watch\" width=\"270\" height=\"68\" \/><\/p>\n<p>where<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698488\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-10.png\" alt=\"\" width=\"575\" height=\"128\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-10.png 575w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-10-300x67.png 300w\" sizes=\"auto, (max-width: 575px) 100vw, 575px\" \/><\/p>\n<h2>BMUF-Adam<\/h2>\n<p>Since BMUF is a general distributed optimization framework, we can plug Adam into it as a local optimizer. According to our revision of BMUF and Adam, if we directly plug Adam into BMUF without any change, the 1st and 2nd order moments of Adam will be mismatched with the model parameter. Assume we set:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698491\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-11.png\" alt=\"\" width=\"312\" height=\"176\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-11.png 312w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-11-300x169.png 300w\" sizes=\"auto, (max-width: 312px) 100vw, 312px\" \/><\/p>\n<p>Obviously, due to the existence of \u03b7\u0394_n, the averaged 1st and 2nd moments will become stale for the initial model parameter of the next IBPO operation. Since the number of equivalent minibatches of \u03b7\u0394_n is \u03b7\u03c1_n, we believe the m_t^((init)),v_t^((init)) that is compatible with \u03b8_t^((init)) can be obtained by processing \u03b7\u03c1_n minibatches starting from m \u0305_t,v \u0305_t.<\/p>\n<p>According to the 1st moment update formulation, we can have:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698494\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-12.png\" alt=\"\" width=\"474\" height=\"78\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-12.png 474w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-12-300x49.png 300w\" sizes=\"auto, (max-width: 474px) 100vw, 474px\" \/><\/p>\n<p>Combined with the stationary consumption of E[g_t], we have:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698497\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-13.png\" alt=\"\" width=\"460\" height=\"63\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-13.png 460w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-13-300x41.png 300w\" sizes=\"auto, (max-width: 460px) 100vw, 460px\" \/><\/p>\n<p>Where E[g^((n))] represents the expectation of the minibatch stochastic gradient of n-th IBPO. Therefore,<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698500\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-14.png\" alt=\"\" width=\"519\" height=\"70\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-14.png 519w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-14-300x40.png 300w\" sizes=\"auto, (max-width: 519px) 100vw, 519px\" \/><\/p>\n<p>because<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698503\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-15.png\" alt=\"\" width=\"304\" height=\"79\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-15.png 304w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-15-300x78.png 300w\" sizes=\"auto, (max-width: 304px) 100vw, 304px\" \/><\/p>\n<p>Finally, we obtain:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698506\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-16.png\" alt=\"\" width=\"531\" height=\"74\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-16.png 531w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-16-300x42.png 300w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><\/p>\n<p>According to the formulation of m \u0305_t and m_t^((init)), it\u2019s not difficult to find that when \u03c4 is small and N is large, m_(t-\u03c4)^((init)) overly contributes to m \u0305_t, which results in more serious inconsistency between \u03b8_t^((init)) and m \u0305_t, and this phenomenon is verified in our experiments. With the introduction of an equivalent minibatch, the contribution of stale m_(t-\u03c4)^((init)) to m_t^((init)) can be reduced significantly, which will improve the model performance. Meanwhile, using small \u03b2_1 can also reduce the contribution of m_(t-\u03c4)^((init)). The above analysis can also be applied to the 2nd moment. Finally, we can obtain BMUF-Adam as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-698509\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-17.png\" alt=\"\" width=\"715\" height=\"997\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-17.png 715w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-17-215x300.png 215w\" sizes=\"auto, (max-width: 715px) 100vw, 715px\" \/><\/p>\n<h2>Experimental results<\/h2>\n<p>We first verified our method on Microsoft\u2019s large scale English OCR task. First, different moments estimation strategies are compared when Adam is plugged into BMUF. The experimental results are listed in Table 1 (\u03c4=8 in these experiments):<\/p>\n<div id=\"attachment_698512\" style=\"width: 704px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-698512\" class=\"size-full wp-image-698512\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-18.png\" alt=\"table\" width=\"694\" height=\"307\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-18.png 694w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-18-300x133.png 300w\" sizes=\"auto, (max-width: 694px) 100vw, 694px\" \/><p id=\"caption-attachment-698512\" class=\"wp-caption-text\">Table 1: experimental results<\/p><\/div>\n<p>We find that when an averaging strategy is used, model performance degradation becomes increasingly serious with an increase in parallel workers. The proposed strategy can relieve the degradation significantly, and combined with a small \u03b2_1, the model performance is further improved. Next, we compared Sync-Adam with BMUF-Adam on this task. Experimental results are listed in Table 2:<\/p>\n<div id=\"attachment_698515\" style=\"width: 1162px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-698515\" class=\"size-full wp-image-698515\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-19.png\" alt=\"table\" width=\"1152\" height=\"394\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-19.png 1152w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-19-300x103.png 300w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-19-1024x350.png 1024w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-19-768x263.png 768w\" sizes=\"auto, (max-width: 1152px) 100vw, 1152px\" \/><p id=\"caption-attachment-698515\" class=\"wp-caption-text\">Table 2: a comparison of Sync-Adam and BMUF-Adam<\/p><\/div>\n<p>With an increase in parallel workers, Sync-Adam suffers increasingly serious performance degradation and the speedup ratio worsens. As a comparison, BMUF-Adam can achieve almost linear speedup with little performance degradation.<\/p>\n<p>We also verified BMUF-Adam on the LVCSR task with 6000 hours of Microsoft product data, and the results are listed in Table 3:<\/p>\n<div id=\"attachment_698518\" style=\"width: 727px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-698518\" class=\"size-full wp-image-698518\" src=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-20.png\" alt=\"table\" width=\"717\" height=\"429\" srcset=\"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-20.png 717w, https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-content\/uploads\/2020\/10\/BMUF-Adam-20-300x179.png 300w\" sizes=\"auto, (max-width: 717px) 100vw, 717px\" \/><p id=\"caption-attachment-698518\" class=\"wp-caption-text\">Table 3: experimental results on the speech recognition task<\/p><\/div>\n<p>Again, BMUF-Adam can achieve almost linear speedup with little performance degradation and shows excellent scalability.<\/p>\n<h2>Conclusion<\/h2>\n<p>In this article, we used BMUF to parallelize the widely used Adam algorithm, and experimental results show that compared with the traditional SSG based method, BMUF-Adam can achieve faster training speed, better model performance, and better scalability. This algorithm has been applied to several Microsoft products, and we welcome everyone to give it a try. ????<\/p>\n<p>More details:<br \/>\nhttps:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=9052983<\/p>\n<p>References:<br \/>\n[1] D. P. Kinagma, J. Ba, \u201cAdam: a method for stochastic optimization,\u201d Proc. ICLR-2015, arXiv:1412.6980.<br \/>\n[2] K. Chen, Q. Huo, \u201cScalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering,\u201d Proc. ICASSP-2016, pp.5880-5884.<br \/>\n[3] K. Chen, H. Ding, Q. Huo, \u201cParallelizing Adam optimizer with blockwise model-update filtering,\u201d Proc. ICASSP-2020, pp.3027-3031.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As an adaptive learning rate stochastic optimization method, Adam has been widely used in the field of deep learning since it was first proposed in 2014. In order to improve its training efficiency when applied to large-scale tasks, Adam is often combined with the synchronous stochastic gradient (SSG) technique to run in parallel on multiple [&hellip;]<\/p>\n","protected":false},"author":34512,"featured_media":701359,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":199560,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-698455","msr-blog-post","type-msr-blog-post","status-publish","has-post-thumbnail","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":199560,"type":"lab"},"_links":{"self":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/698455","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/users\/34512"}],"version-history":[{"count":1,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/698455\/revisions"}],"predecessor-version":[{"id":698521,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/698455\/revisions\/698521"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media\/701359"}],"wp:attachment":[{"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/media?parent=698455"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=698455"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=698455"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/newed.any0.dpdns.org\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=698455"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}