{"id":75228,"date":"2026-03-04T12:03:02","date_gmt":"2026-03-04T12:03:02","guid":{"rendered":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/"},"modified":"2026-03-04T12:03:02","modified_gmt":"2026-03-04T12:03:02","slug":"physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/","title":{"rendered":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks"},"content":{"rendered":"<p>Current end-to-end robotic policies, specifically Vision-Language-Action (VLA) models, typically operate on a single observation or a very short history. This \u2018lack of memory\u2019 makes long-horizon tasks, such as cleaning a kitchen or following a complex recipe, computationally intractable or prone to failure. To address this, researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have introduced <strong>Multi-Scale Embodied Memory (MEM)<\/strong>.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"2116\" height=\"1046\" data-attachment-id=\"78201\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/03\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/screenshot-2026-03-03-at-9-53-17-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1.png\" data-orig-size=\"2116,1046\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-03 at 9.53.17\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-300x148.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-1024x506.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1.png\" alt=\"\" class=\"wp-image-78201\" \/><figcaption class=\"wp-element-caption\">https:\/\/www.pi.website\/download\/Mem.pdf<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Dual-Scale Memory Architecture<\/strong><\/h3>\n<p>MEM factorizes robotic memory into two distinct scales to balance semantic context with real-time control constraints<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<h4 class=\"wp-block-heading\"><strong>(1) Short-Term Video Memory<\/strong><\/h4>\n<p>For tasks requiring fine-grained spatial awareness\u2014like resolving self-occlusions or adapting a grasp\u2014dense visual data is required. MEM utilizes an efficient video encoder that extends standard Vision Transformers (ViTs). To maintain real-time inference (the 380ms \u2018real-time barrier\u2019), the architecture avoids joint attention over all patches. Instead, it uses <strong>Space-Time Separable Attention<\/strong>, interleaving spatial attention within frames with causal-temporal attention across frames every fourth layer.<\/p>\n<p>The computational complexity is reduced from  <em>O<\/em>(<em>n<sup>2<\/sup>K<sup>2<\/sup><\/em>) to <em>O<\/em>(<em>Kn<sup>2<\/sup>+nK<sup>2<\/sup><\/em>), where <em>n<\/em> is the number of spatial patches and <em>K<\/em> is the number of timesteps. By dropping tokens from past timesteps in upper layers, the model passes only the current observation\u2019s representation to the VLA backbone, keeping the token count invariant compared to single-frame models.<\/p>\n<h4 class=\"wp-block-heading\"><strong>(2) Long-Term Language Memory<\/strong><\/h4>\n<p>To handle tasks spanning up to 15 minutes, MEM uses a language-based representation for semantic events<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>. The system decomposes the action prediction as:<\/p>\n<div class=\"wp-block-mathml-mathmlblock\">$$pi(a_{t:t+H},l_{t+1},m_{t+1}|o_{t-T:t},m_{t},g) approxpi_{LL}(a_{t:t+H}|o_{t-K:t},l_{t+1},g)pi_{HL}(l_{t+1},m_{t+1}|o_{t},m_{t},g)$$\n<\/div>\n<p>Here, a high-level policy  (<em>\u03c0<sub>HL<\/sub><\/em><sub>)<\/sub> maintains a running language summary (<em>m<sub>t<\/sub>)<\/em> of past events and generates subtask instructions (l<sub>t+1<\/sub>) for a low-level policy (<em>\u03c0<\/em><sub>LL<\/sub>). This language memory is trained using LLM-generated summaries that compress information (e.g., \u2018I placed three bowls\u2019 instead of individual attributes), reducing the risk of training-inference distribution shifts.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"2152\" height=\"898\" data-attachment-id=\"78203\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/03\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/screenshot-2026-03-03-at-9-53-58-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.58-PM-1.png\" data-orig-size=\"2152,898\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-03 at 9.53.58\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.58-PM-1-300x125.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.58-PM-1-1024x427.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.58-PM-1.png\" alt=\"\" class=\"wp-image-78203\" \/><figcaption class=\"wp-element-caption\">https:\/\/www.pi.website\/download\/Mem.pdf<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Implementation and Performance<\/strong><\/h3>\n<p>The research team integrated MEM into the <strong><em>\u03c0<\/em><sub>0.6<\/sub> VLA<\/strong>, which is initialized from a pre-trained <strong>Gemma 3-4B<\/strong> model. The model was pre-trained on a diverse mixture of robot demonstrations, vision-language tasks, and internet video data.<\/p>\n<h4 class=\"wp-block-heading\"><strong>Key Results:<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li><strong>In-Context Adaptation<\/strong>: MEM enables robots to adapt manipulation strategies based on recent failures. In evaluation, this led to a <strong>+62%<\/strong> success rate increase in opening refrigerators with unknown hinge directions and a <strong>+11%<\/strong> increase in picking up chopsticks at variable heights.<\/li>\n<li><strong>Long-Horizon Tasks<\/strong>: The model successfully performed 15-minute tasks like \u2018Recipe Setup\u2019 (retrieving ingredients from multiple locations) and \u2018Kitchen Cleaning\u2019 (washing dishes and wiping counters). Memory-less VLAs failed these tasks significantly more often.<\/li>\n<li><strong>Efficiency<\/strong>: The video encoder allows the model to process up to 16 observation frames (spanning ~1 minute) while remaining under critical real-time inference thresholds on a single NVIDIA H100 GPU.<\/li>\n<\/ul>\n<p>MEM demonstrates that combining dense, short-term visual tokens with compressed, long-term language summaries allows VLAs to scale their \u2018working memory\u2019 without incurring prohibitive computational costs.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/www.pi.website\/download\/Mem.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a> <\/strong>and<strong>\u00a0<a href=\"https:\/\/www.pi.website\/research\/memory\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/03\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\">Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Current end-to-end robotic policies, specifically Vision-Language-Action (VLA) models, typically operate on a single observation or a very short history. This \u2018lack of memory\u2019 makes long-horizon tasks, such as cleaning a kitchen or following a complex recipe, computationally intractable or prone to failure. To address this, researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have introduced Multi-Scale Embodied Memory (MEM). https:\/\/www.pi.website\/download\/Mem.pdf The Dual-Scale Memory Architecture MEM factorizes robotic memory into two distinct scales to balance semantic context with real-time control constraints. (1) Short-Term Video Memory For tasks requiring fine-grained spatial awareness\u2014like resolving self-occlusions or adapting a grasp\u2014dense visual data is required. MEM utilizes an efficient video encoder that extends standard Vision Transformers (ViTs). To maintain real-time inference (the 380ms \u2018real-time barrier\u2019), the architecture avoids joint attention over all patches. Instead, it uses Space-Time Separable Attention, interleaving spatial attention within frames with causal-temporal attention across frames every fourth layer. The computational complexity is reduced from O(n2K2) to O(Kn2+nK2), where n is the number of spatial patches and K is the number of timesteps. By dropping tokens from past timesteps in upper layers, the model passes only the current observation\u2019s representation to the VLA backbone, keeping the token count invariant compared to single-frame models. (2) Long-Term Language Memory To handle tasks spanning up to 15 minutes, MEM uses a language-based representation for semantic events. The system decomposes the action prediction as: $$pi(a_{t:t+H},l_{t+1},m_{t+1}|o_{t-T:t},m_{t},g) approxpi_{LL}(a_{t:t+H}|o_{t-K:t},l_{t+1},g)pi_{HL}(l_{t+1},m_{t+1}|o_{t},m_{t},g)$$ Here, a high-level policy (\u03c0HL) maintains a running language summary (mt) of past events and generates subtask instructions (lt+1) for a low-level policy (\u03c0LL). This language memory is trained using LLM-generated summaries that compress information (e.g., \u2018I placed three bowls\u2019 instead of individual attributes), reducing the risk of training-inference distribution shifts. https:\/\/www.pi.website\/download\/Mem.pdf Implementation and Performance The research team integrated MEM into the \u03c00.6 VLA, which is initialized from a pre-trained Gemma 3-4B model. The model was pre-trained on a diverse mixture of robot demonstrations, vision-language tasks, and internet video data. Key Results: In-Context Adaptation: MEM enables robots to adapt manipulation strategies based on recent failures. In evaluation, this led to a +62% success rate increase in opening refrigerators with unknown hinge directions and a +11% increase in picking up chopsticks at variable heights. Long-Horizon Tasks: The model successfully performed 15-minute tasks like \u2018Recipe Setup\u2019 (retrieving ingredients from multiple locations) and \u2018Kitchen Cleaning\u2019 (washing dishes and wiping counters). Memory-less VLAs failed these tasks significantly more often. Efficiency: The video encoder allows the model to process up to 16 observation frames (spanning ~1 minute) while remaining under critical real-time inference thresholds on a single NVIDIA H100 GPU. MEM demonstrates that combining dense, short-term visual tokens with compressed, long-term language summaries allows VLAs to scale their \u2018working memory\u2019 without incurring prohibitive computational costs. Check out the\u00a0Paper and\u00a0Technical details.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0120k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":75229,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-75228","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-04T12:03:02+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks\",\"datePublished\":\"2026-03-04T12:03:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\"},\"wordCount\":592,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\",\"url\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\",\"name\":\"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png\",\"datePublished\":\"2026-03-04T12:03:02+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png\",\"width\":2116,\"height\":1046},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/","og_locale":"de_DE","og_type":"article","og_title":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-03-04T12:03:02+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"3\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks","datePublished":"2026-03-04T12:03:02+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/"},"wordCount":592,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/","url":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/","name":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png","datePublished":"2026-03-04T12:03:02+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png","width":2116,"height":1046},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png",2116,1046,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png",2116,1046,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T.png",2116,1046,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-300x148.png",300,148,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-1024x506.png",1024,506,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-1536x759.png",1536,759,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-2048x1012.png",2048,1012,true],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-18x9.png",18,9,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-600x297.png",600,297,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-03-at-9.53.17-PM-1-m9t52T-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Current end-to-end robotic policies, specifically Vision-Language-Action (VLA) models, typically operate on a single observation or a very short history. This \u2018lack of memory\u2019 makes long-horizon tasks, such as cleaning a kitchen or following a complex recipe, computationally intractable or prone to failure. To address this, researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/75228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=75228"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/75228\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media\/75229"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=75228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=75228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=75228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}