YouZum

AI

AI, Committee, ニュース, Uncategorized

How robots learn: A brief, contemporary history

Roboticists used to dream big but build small. They’d hope to match or exceed the extraordinary complexity of the human body, and then they’d spend their career refining robotic arms for auto plants. Aim for C-3P0; end up with the Roomba.  The real ambition for many of these researchers was the robot of science fiction—one that could move through the world, adapt to different environments, and interact safely and helpfully with people. For the socially minded, such a machine could help those with mobility issues, ease loneliness, or do work too dangerous for humans. For the more financially inclined, it would mean a bottomless source of wage-free labor. Either way, a long history of failure left most of Silicon Valley hesitant to bet on helpful robots. That has changed. The machines are yet unbuilt, but the money is flowing: Companies and investors put $6.1 billion into humanoid robots in 2025 alone, four times what was invested in 2024.  What happened? A revolution in how machines have learned to interact with the world.  Imagine you’d like a pair of robot arms installed in your home purely to do one thing: fold clothes. How would it learn to do that? You could start by writing rules. Check the fabric to figure out how much deformation it can tolerate before tearing. Identify a shirt’s collar. Move the gripper to the left sleeve, lift it, and fold it inward by exactly this distance. Repeat for the right sleeve. If the shirt is rotated, turn the plan accordingly. If the sleeve is twisted, correct it. Very quickly the number of rules explodes, but a complete accounting of them could produce reliable results. This was the original craft of robotics: anticipating every possibility and encoding it in advance. Around 2015, the cutting edge started to do things differently: Build a digital simulation of the robotic arms and the clothes, and give the program a reward signal every time it folds successfully and a ding every time it fails. This way, it gets better by trying all sorts of techniques through trial and error, with millions of iterations—the same way AI got good at playing games. The arrival of ChatGPT in 2022 catalyzed the current boom. Trained on vast amounts of text, large language models work not through trial and error but by learning to predict what word should come next in a sentence. Similar models adapted to robotics were soon able to absorb pictures, sensor readings, and the position of a robot’s joints and predict the next action the machine should take, issuing dozens of motor commands every second. This conceptual shift—to reliance on AI models that ingest large amounts of data—seems to work whether that helpful robot is supposed to talk to people, move through an environment, or even do complicated tasks. And it was paired with other ideas about how to accomplish this new way of learning, like deploying robots even if they aren’t yet perfect so they can learn from the environment they’re meant to work in. Today, Silicon Valley roboticists are dreaming big again. Here’s how that happened.  Jibo Jibo A movable social robot carried out conversations long before the age of LLMs. An MIT robotics researcher named Cynthia Breazeal introduced an armless, legless, faceless robot called Jibo to the world in 2014. It looked, in fact, like a lamp. Breazeal’s aim was to create a social robot for families, and the idea pulled in $3.7 million in a crowdsourced funding campaign. Early preorders cost $749. The early Jibo could introduce itself and dance to entertain kids, but that was about it. The vision was always for it to become a sort of embodied assistant that could handle everything from scheduling and emails to telling stories. It earned a number of devoted users, but ultimately the company shut down in 2019. A crowdfunding campaign started in 2014 and drew 4,800 Jibo preorders.COURTESY OF MIT MEDIA LAB In retrospect, one thing that Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and all those technologies at the time relied on heavy scripting. In broad terms, when you spoke to them, software would translate your speech into text, analyze what you wanted, and create a response pulled from preapproved snippets. Those snippets could be charming, but they were also repetitive and simply boring—downright robotic. That was especially a challenge for a robot that was supposed to be social and family oriented.  What has happened since, of course, is a revolution in how machines can generate language. Voice mode from any leading AI provider is now engaging and impressive, and multiple hardware startups are trying (and failing) to build products that take advantage of it.  But that comes with a new risk: While scripted conversations can’t really go off the rails, ones generated by AI certainly can. Some popular AI toys have, for example, talked to kids about how to find matches and knives.  OpenAI Dactyl A robot hand trained with simulations tries to model the unpredictability and variation of the real world. By 2018, every leading robotics lab was trying to scrap the old scripted rules and train robots through trial and error. OpenAI tried to train its robotic hand, Dactyl, virtually—with digital models of the hand and of the palm-size cubes Dactyl was supposed to manipulate. The cubes had letters and numbers on their faces; the model might set a task like “Rotate the cube so the red side with the letter O faces upward.” Here’s the problem: A robotic hand might get really good at doing this in its simulated world, but when you take that program and ask it to work on a real version in the real world, the slight differences between the two can cause things to go awry. Colors might be slightly different, or the deformable rubber in the robot’s fingertips could turn out to be stretchier than it was in simulation. Dactyl, part of OpenAI’s first attempt at robotics,

How robots learn: A brief, contemporary history 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: bad news for inner Neanderthals, and AI warfare’s human illusion

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The problem with thinking you’re part Neanderthal There’s a theory that many of us have an “inner Neanderthal.” The idea is that Homo sapiens and a cousin species once bred, leaving some people today with a trace of Neanderthal DNA.  This DNA is arguably the 21st century’s most celebrated discovery in human evolution. But in 2024, a pair of French geneticists called into question the theory’s very foundations.  They proposed that what scientists interpret as interbreeding could instead be explained by population structure—the way genes concentrate in smaller, isolated groups. Find out what it all means for human evolution. —Ben Crair This story is from the next issue of our print magazine, which is all about nature. Subscribe now to read it when it lands on Wednesday, April 22. Why having “humans in the loop” in an AI war is an illusion —Uri Maoz AI is starting to shape real wars. It’s at the center of a legal battle between Anthropic and the Pentagon, playing a growing role in the conflict with Iran, and raising questions about how much humans should remain “in the loop.” Under Pentagon guidelines, human oversight is meant to provide accountability, context, and security. But the idea of “humans in the loop” is a comforting distraction. The real danger isn’t that machines will act without oversight; it’s that human overseers have no idea what the machines are actually “thinking.” Thankfully, science may offer a way forward. Read the full op-ed on the urgent need for new safeguards around AI warfare. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Despite blacklisting Anthropic, the White House wants its new modelTrump officials are negotiating access to Mythos. (Axios)+ Anthropic said it was too dangerous for a public release. (Bloomberg $)+ Finance ministers are alarmed about the security risks. (BBC)+ Anthropic just rolled out a model that’s less risky than Mythos. (CNBC)+ The Pentagon has pursued a culture war against the company. (MIT Technology Review) 2 Sam Altman’s side hustles have raised conflict-of-interest concernsHis opaque investments could influence decisions at OpenAI. (WSJ $)+ A jury will soon decide if OpenAI abandoned its founding mission. (Wired $)+ The company is making a big play for science. (MIT Technology Review) 3 A Starlink outage during drone tests exposed the Pentagon’s SpaceX relianceIt was one of several Navy test disruptions linked to Starlink. (Reuters $)+ The DoD is also tapping Ford and GM for military innovations.(NYT $) 4 Data center delays threaten to choke AI expansion40% of this year’s projects are at risk of falling behind schedule. (FT $)+ Partly because no one wants a data center in their backyard. (MIT Technology Review) 5 Alibaba just released its own version of a world modelHappy Oyster is the latest attempt to extend AI’s ability to comprehend physical reality. (SCMP)+ But they still need to understand cause and effect. (FT $) 6 Google’s Gemini is now generating AI images tailored to personal dataBy analyzing users’ Google services and data. (Quartz)+ Google says it will cut the need for detailed prompts. (TechCrunch) 7 OpenAI is beefing up its agentic coding and development systemIts Codex update is a direct shot at Claude Code. (The Verge)+ But not everyone is convinced about AI coding. (MIT Technology Review) 8 Europe’s online age verification app is hereIt’s available for free to any company that wants it. (Wired $)  9 Smartglasses are giving Korean theaters hope of a K-Pop momentTheir AI-powered translations are taking the shows to the world. (NYT $) 10 Global voice actors are fighting Hollywood’s AI pushTheir voices are training the models that are replacing them. (Rest of World) Quote of the day “There’s this dark period between now and some time in the future where the advantage is very much offensive AI.”  —Rob Joyce, former director of cybersecurity at the National Security Agency, tells Bloomberg how AI is creating new hacking threats. One More Thing COURTESY OF NOVEON MAGNETICS The race to produce rare earth elements Access to rare earth elements will determine which countries meet their goals for lowering emissions or generating energy from non-fossil-fuel sources. But some nations, including the US, are worried about the supply of these elements.  China dominates the market, while extraction in the US is limited. As a result, scientists and companies are exploring unconventional sources. Read the full story on their search for critical minerals. —Mureji Fatunde We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.)+ This ska cover of Rage Against the Machine is an upbeat way to start a revolution.+ We finally know how far Stretch Armstrong can really stretch.+ Customize these ambient sounds to wash away disruptive thoughts.+ Here’s proof childhood dreams can come true: a girl guiding a seal to perform tricks. 

The Download: bad news for inner Neanderthals, and AI warfare’s human illusion 投稿を読む »

AI, Committee, ニュース, Uncategorized

Why having “humans in the loop” in an AI war is an illusion

The availability of artificial intelligence for use in warfare is at the center of a legal battle between Anthropic and the Pentagon. This debate has become urgent, with AI playing a bigger role than ever before in the current conflict with Iran. AI is no longer just helping humans analyze intelligence. It is now an active player—generating targets in real time, controlling and coordinating missile interceptions, and guiding lethal swarms of autonomous drones. Most of the public conversation regarding the use of AI-driven autonomous lethal weapons centers on how much humans should remain “in the loop.” Under the Pentagon’s current guidelines, human oversight supposedly provides accountability, context, and nuance while reducing the risk of hacking. AI systems are opaque “black boxes” But the debate over “humans in the loop” is a comforting distraction. The immediate danger is not that machines will act without human oversight; it is that human overseers have no idea what the machines are actually “thinking.” The Pentagon’s guidelines are fundamentally flawed because they rest on the dangerous assumption that humans understand how AI systems work. Having studied intentions in the human brain for decades and in AI systems more recently, I can attest that state-of-the-art AI systems are essentially “black boxes.” We know the inputs and outputs, but the artificial “brain” processing them remains opaque. Even their creators cannot fully interpret them or understand how they work. And when AIs do provide reasons, they are not always trustworthy. The illusion of human oversight in autonomous systems In the debate over human oversight, a fundamental question is going unasked: Can we understand what an AI system intends to do before it acts? Imagine an autonomous drone tasked with destroying an enemy munitions factory. The automated command and control system determines that the optimal target is a munitions storage building. It reports a 92% probability of mission success because secondary explosions of the munitions in the building will thoroughly destroy the facility. A human operator reviews the legitimate military objective, sees the high success rate, and approves the strike. But what the operator does not know is that the AI system’s calculation included a hidden factor: Beyond devastating the munitions factory, the secondary explosions would also severely damage a nearby children’s hospital. The emergency response would then focus on the hospital, ensuring the factory burns down. To the AI, maximizing disruption in this way meets its given objective. But to a human, it is potentially committing a war crime by violating the rules regarding civilian life.  Keeping a human in the loop may not provide the safeguard people imagine, because the human cannot know the AI’s intention before it acts. Advanced AI systems do not simply execute instructions; they interpret them. If operators fail to define their objectives carefully enough—a highly likely scenario in high-pressure situations—the “black box” system could be doing exactly what it was told and still not acting as humans intended. This “intention gap” between AI systems and human operators is precisely why we hesitate to deploy frontier black-box AI in civilian health care or air traffic control, and why its integration into the workplace remains fraught—yet we are rushing to deploy it on the battlefield. To make matters worse, if one side in a conflict deploys fully autonomous weapons, which operate at machine speed and scale, the pressure to remain competitive would push the other side to rely on such weapons too. This means the use of increasingly autonomous—and opaque—AI decision-making in war is only likely to grow. The solution: Advance the science of AI intentions The science of AI must comprise both building highly capable AI technology and understanding how this technology works. Huge advances have been made in developing and building more capable models, driven by record investments—forecast by Gartner to grow to around $2.5 trillion in 2026 alone. In contrast, the investment in understanding how the technology works has been minuscule. We need a massive paradigm shift. Engineers are building increasingly capable systems. But understanding how these systems work is not just an engineering problem—it requires an interdisciplinary effort. We must build the tools to characterize, measure, and intervene in the intentions of AI agents before they act. We need to map the internal pathways of the neural networks that drive these agents so that we can build a true causal understanding of their decision-making, moving beyond merely observing inputs and outputs.  A promising way forward is to combine techniques from mechanistic interpretability (breaking neural networks down into human-understandable components) with insights, tools, and models from the neuroscience of intentions. Another idea is to develop transparent, interpretable “auditor” AIs designed to monitor the behavior and emergent goals of more capable black-box systems in real time.   Developing a better understanding of how AI functions will enable us to rely on AI systems for mission-critical applications. It will also make it easier to build more efficient, more capable, and safer systems. Colleagues and I are exploring how ideas from neuroscience, cognitive science, and philosophy—fields that study how intentions arise in human decision-making—might help us understand the intentions of artificial systems. We must prioritize these kinds of interdisciplinary efforts, including collaborations between academia, government, and industry. However, we need more than just academic exploration. The tech industry—and the philanthropists funding AI alignment, which strives to encode human values and goals into these models—must direct substantial investments toward interdisciplinary interpretability research. Furthermore, as the Pentagon pursues increasingly autonomous systems, Congress must mandate rigorous testing of AI systems’ intentions, not just their performance. Until we achieve that, human oversight over AI may be more illusion than safeguard. Uri Maoz is a cognitive and computational neuroscientist specializing in how the brain transforms intentions into actions. A professor at Chapman University with appointments at UCLA and Caltech, he leads an interdisciplinary initiative focused on understanding and measuring intentions in artificial intelligence systems (ai-intentions.org).

Why having “humans in the loop” in an AI war is an illusion 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: cyberscammers’ banking bypasses, and carbon removal troubles

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Cyberscammers are bypassing banks’ security with illicit tools sold on Telegram  Inside a money-laundering center in Cambodia, an employee opens a banking app on his phone. It asks for a photo linked to the account, so he uploads a picture of a 30-something Asian man.  The app then requests a video “liveness” check. The scammer holds up a static image of a woman who doesn’t match the account. After 90 seconds, he’s in.  The exploit relies on illicit hacking services sold on Telegram that break “Know Your Customer” (KYC) facial scans. MIT Technology Review found 22 channels and groups advertising these services. This is what we discovered.  —Fiona Kelliher  Is carbon removal in trouble?  —Casey Crownhart  Last week, news emerged that Microsoft was pausing carbon removal purchases. It was a bombshell—Microsoft effectively is the carbon removal market, single-handedly purchasing around 80% of all contracted carbon removal.  The report sparked fear across the industry, raising questions about the future of carbon removal and the role of Big Tech. Read the full story.  This story is from The Spark, our weekly newsletter exploring the technology that could combat the climate crisis. Sign up to receive it in your inbox every Wednesday.  The quest to measure our relationship with nature  —Emma Marris  Humans have done some destructive things to the ecosystems around us. But conservationists are learning that we can also be a force for good.  To understand how we work best with nature, a group of scientists, authors, and philosophers have developed new measurements of human-nonhuman relationships. Now, a team in the United Nations is continuing the work. Find out why—and what they hope to achieve.  This story is from the next issue of our print magazine, which is all about nature. Subscribe now to read it when it lands on Wednesday, April 22.   The must-reads  I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.  1 Ukraine says Russian troops have surrendered to robots  They claim a fully automated attack captured army positions for the first time in history. (404 Media) + Europe’s vision for future wars is full of drones. (MIT Technology Review)  2 Monkeys with BCIs are navigating virtual worlds using only their thoughts The research could help people with paralysis. (New Scientist)  + But these implants still face a critical test. (MIT Technology Review)  3 NASA wants to put nuclear reactors on the Moon They could power lunar bases and extend spaceflight. (Wired $) + NASA is also building a nuclear-powered spacecraft. (MIT Technology Review)  4 Plans for online age verification in the US are raising red flags Experts warn of compliance issues and potential data breaches. (NBC News) + In the EU, an age verification app is about to launch. (Reuters $)  5 An AI chip boom just pushed Taiwan’s stock market past the UK’s It’s risen past $4 trillion to become the world’s seventh largest. (FT $) + Future AI chips could be built on glass. (MIT Technology Review)  6 The public backlash against data centers is intensifying in the US Protests and litigation are blocking projects. (CNBC) + One potential solution? Putting them in space. (MIT Technology Review)  7 Five-minute EV charging is becoming a reality China’s BYD has started rolling it out. (Gizmodo)  + “Extended-range electric vehicles” are about to hit US streets. (Atlantic $)  8 Stealth signals are bypassing Iran’s internet blackout  Files hidden in satellite TV broadcasts keep information flowing. (IEEE)  9 Shoe brand Allbirds made a shock pivot to AI, sending stock up 700%  No bubble to see here, folks. (CNBC)  + What even is the AI bubble? (MIT Technology Review)  10 The largest ever map of the universe is complete  It captures 47 million galaxies and quasars. (Space.com)  Quote of the day  “I like the internet as much as anybody, but we’ve got to go on an internet diet. We don’t need to pay for corporations to do their internet stuff.”   —Sylvia Whitt, a 78-year-old retiree based in Virginia, tells the Washington Post why they’re protesting against data centers.   One More Thing  ISRAEL VARGAS AI and the future of sex  Some Republican lawmakers want to criminalize porn and arrest its creators. But what if porn is wholly created by an algorithm? In that case, whether it’s obscene, ethical, or safe becomes a secondary issue. The primary concern will be what it means for porn to be “real”—and what the answer demands from all of us.  Technological advances could even remove the “messy humanity” from sex itself. The rise of AI-generated porn may be a symptom of a new synthetic sexuality, not the cause. Read the full story.  —Leo Herrera  We can still have nice things  A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.)  + An animator turned his son’s drawings into epic anime characters. + Hundreds of baby green sea turtles made a spectacular first journey to the ocean. + You can now track rocket launches from take-off to orbit in real time. + These musical mistakes prove that even the classics aren’t perfect. 

The Download: cyberscammers’ banking bypasses, and carbon removal troubles 投稿を読む »

AI, Committee, ニュース, Uncategorized

Making AI operational in constrained public sector environments

The AI boom has hit across industries, and public sector organizations are facing pressure to accelerate adoption. At the same time, government institutions face distinct constraints around security, governance, and operations that set them apart from their business counterparts. For this reason, purpose-built small language models (SLMs) offer a promising path to operationalize AI in these environments.   A Capgemini study found that 79 percent of public sector executives globally are wary about AI’s data security, an understandable figure given the heightened sensitivity of government data and the legal obligations surrounding its use. As Han Xiao, vice president of AI at Elastic, says, “Government agencies must be very restricted about what kind of data they send to the network. This sets a lot of boundaries on how they think about and manage their data.” The fundamental need for control over sensitive information is one of many factors complicating AI deployment, particularly when compared against the private sector’s standard operational assumptions. Unique operational challenges When private-sector entities expand AI, they typically assume certain conditions will be in place, including continuous connectivity to the cloud, reliance on centralized infrastructure, acceptance of incomplete model transparency, and limited restrictions on data movement. For many state institutions, however, accepting these conditions could be anything from dangerous to impossible.  Government agencies must ensure that their data stays under their control, that information can be checked and verified, and that operational disruptions are kept to an absolute minimum. At the same time, they often have to run their systems in environments where internet connectivity is limited, unreliable, or unavailable. These complexities prevent many promising public sector AI pilots from moving beyond experimentation. “Many people undervalue the operating challenge of AI,” Xiao says. “The public sector needs AI to perform reliably on all kinds of data, and then to be able to grow without breaking. Continuity of operations is often underestimated.” An Elastic survey of public sector leaders found that 65 percent struggle to use data continuously in real time and at scale.  Infrastructure constraints compound the problem. Government organizations may also struggle to obtain the graphics processing units (GPUs) used to train and access complex AI models. As Xiao points out, “Government doesn’t often purchase GPUs, unlike the private sector—they’re not used to managing GPU infrastructure. So accessing a GPU to run the model is a bottleneck for much of the public sector.”  A smaller, more practical model The many nonnegotiable requirements in the public sector make large language models (LLMs) untenable. But SLMs can be housed locally, offering greater security and control. SLMs are specialized AI models that typically use billions rather than hundreds of billions of parameters, making them far less computationally demanding than the largest LLMs. The public sector does not need to build ever-larger models housed in offsite, centralized locations. An empirical study found that SLMs performed as well or better than LLMs. SLMs allow sensitive information to be used effectively and efficiently while avoiding the operational complexity of maintaining large models. Xiao puts it this way: “It is easy to use ChatGPT to do proofreading. It’s very difficult to run your own large language models just as smoothly in an environment with no network access.”  SLMs are purpose-built for the needs of the department or agency that will use them. The data is stored securely outside the model, and is only accessed when queried. Carefully engineered prompts ensure that only the most relevant information is retrieved, providing more accurate responses. Using methods such as smart retrieval, vector search, and verifiable source grounding, AI systems can be built that cater to public sector needs.  Thus, the next phase of AI adoption in the public sector may be to bring the AI tool to the data, rather than sending the data out into the cloud. Gartner predicts that by 2027, small, specialized AI models will be used three times more than LLMs. Superior search capabilities “When people in the public sector hear AI, they probably think about ChatGPT. But we can be much more ambitious,” says Xiao. “AI can revolutionize how the government searches and manages the large amounts of data they have.” Looking beyond chatbots reveals one of AI’s most immediate opportunities: dramatically improved search. Like many organizations, the public sector has mountains of unstructured data—including technical reports, procurement documents, minutes, and invoices. Today’s AI, however, can deliver results sourced from mixed media, like readable PDFs, scans, images, spreadsheets, and recordings, and in multiple languages. All of this can be indexed by SLM-powered systems to provide tailored responses and to draft complex texts in any language, while ensuring outputs are legally compliant. “The public sector has a lot of data, and they don’t always know how to use this data. They don’t know what the possibilities are,” says Xiao. Even more powerful, AI can help government employees interpret the data they access. “Today’s AI can provide you with a completely new view of how to harness that data,” says Xiao. A well-trained SLM can interpret legal norms, extract insights from public consultations, support data-driven executive decision-making, and improve public access to services and administrative information. This can contribute to dramatic improvements in how the public sector conducts its operations. The small-language promise Focusing on SLMs shifts the conversation from how comprehensive the model can be to how efficient it is. LLMs incur significant performance and computational costs and require specialized hardware that many public entities cannot afford. Despite requiring some capital expenses, SLMs are less resource-intensive than LLMs, so they tend to be cheaper and reduce environmental impact.  Public sector agencies often face stringent audit requirements, and SLM algorithms can be documented and certified as transparent. Some countries, particularly in Europe, also have privacy regulations such as GDPR that SLMs can be designed to meet. Tailored training data produces more targeted results, reducing errors, bias, and hallucinations that AI is prone to. As Xiao puts it, “Large language models generate text based on what they were trained on, so there is a cut-off

Making AI operational in constrained public sector environments 投稿を読む »

AI, Committee, ニュース, Uncategorized

Treating enterprise AI as an operating layer

There’s a fault line running through enterprise AI, and it’s not the one getting the most attention. The public conversation still tracks foundation models and benchmarks — GPT versus Gemini, reasoning scores, and marginal capability gains. But in practice, the more durable advantage is structural: who owns the operating layer where intelligence is applied, governed, and improved. One model treats AI as an on-demand utility; the other embeds it as an operating layer—the combination of workflow software, data capture, feedback loops and governance that sits between models and real work— that compounds with use. Model providers like OpenAI and Anthropic sell intelligence as a service: you have a problem, you call an API, you get an answer. That intelligence is general-purpose, largely stateless, and only loosely connected to the day-to-day workflow where decisions are made. It’s highly capable and increasingly interchangeable. The distinction that matters is whether intelligence resets on every prompt or accumulates over time. Incumbent organizations, by contrast, can treat AI as an operating layer: instrumentation across workflows, feedback loops from human decisions, and governance that turns individual tasks into reusable policy. In that setup, every exception, correction, and approval becomes a chance to learn—and intelligence can improve as the platform absorbs more of the organization’s work. The organizations most likely to shape the enterprise AI era are those that can embed intelligence directly into operational platforms and instrument those platforms so work generates usable signals. The prevailing narrative says nimble startups will out-innovate incumbents by building AI-native from scratch. If AI is primarily a model problem, that story holds. But in many enterprise domains, AI is a systems problem — integrations, permissions, evaluation, and change management — where advantage accrues to whomever already sits inside high-volume, high-stakes workflows and converts that position into learning and automation. The inversion: AI executes, humans adjudicate Traditional services organizations are built on a simple architecture: humans use software to do expert work. Operators log into systems, navigate workflows, make decisions, and process cases. Technology is the medium. Human judgment is the product. An AI-native platform inverts this. It ingests a problem, applies accumulated domain knowledge, executes autonomously what it can with high confidence, and routes targeted sub-tasks to human experts when the situation demands judgment that the system can’t yet reliably provide. But inverting human-AI interaction isn’t just a UI redesign — it requires raw material. It’s only possible when the platform is built on a foundation of domain expertise, behavioral data, and operational knowledge accumulated over years. The three compounding assets incumbents already own AI-native startups begin with a clean architectural slate and can move quickly. What they can’t easily manufacture is the raw material that makes domain AI defensible at scale: Proprietary operational data A large workforce of domain experts whose day-to-day decisions generate training signals Accumulated tacit knowledge about how complex work actually gets done Services companies already have all three. But these ingredients aren’t moats on their own. They become an advantage only when a company can systematically convert messy operations into AI-ready signals and institutional knowledge — then feed the results back into the workflow so the system keeps improving. Codifying expertise into reusable signals In most services organizations, expertise is tacit and perishable. The best operators know things they cannot easily articulate: heuristics developed over the years, edge-case intuitions, and pattern recognition that operate below the level of conscious reasoning. At Ensemble, the strategy for addressing this challenge is knowledge distillation. The systematic conversion of expert judgment and operational decisions into machine-readable training signals. In health-care revenue cycle management, for example, systems can be seeded with explicit domain knowledge and then deepen their coverage through structured daily interaction with operators. In Ensemble’s implementation, the system identifies gaps, formulates targeted questions, and cross-checks answers across multiple experts to capture both consensus and edge-case nuance. It then synthesizes these inputs into a living knowledge base that reflects the situational reasoning behind expert-level performance. Turning decisions into a learning flywheel Once a system is constrained enough to be trusted, the next question is how it gets better without waiting for annual model upgrades. Every time a skilled operator makes a decision, they generate more than a completed task. They generate a potential labeled example—context paired with an expert action (and sometimes an outcome). At scale, across thousands of operators and millions of decisions, that stream can power supervised learning, evaluation, and targeted forms of reinforcement—teaching systems to behave more like experts in real conditions. For example, if an organization processes 50,000 cases a week and captures just three high-quality decision points per case, that’s 150,000 labeled examples every week without creating a separate data-collection program. A more advanced human-in-the-loop design places experts inside the decision process, so systems learn not just what the right answer was, but how ambiguity gets resolved. Practically, humans intervene at branch points—selecting from AI-generated options, correcting assumptions, and redirecting the workflow. Each intervention becomes a high-value training signal. When the platform detects an edge case or a deviation from the expected process, it can prompt for a brief, structured rationale, capturing decision factors without requiring lengthy free-form reasoning logs. Building toward expertise amplification The goal is to permanently embed the accumulated expertise of thousands of domain experts—their knowledge, decisions, and reasoning—into an AI platform that amplifies what every operator can accomplish. Done well, this produces a quality of execution that neither humans nor AI achieve independently: higher consistency, improved throughput, and measurable operational gains. Operators can focus on more consequential work, supported by an AI that has already completed the analytical groundwork across thousands of analogous prior cases. The broader implication for enterprise leaders is straightforward. Advantages in AI won’t be determined by access to general-purpose models alone. It will come from an organization’s ability to capture, refine, and compound what it knows, its data, decisions, and operational judgment, while building the controls required for high-stakes environments. As AI shifts from experimentation to infrastructure, the most durable edge may belong to the

Treating enterprise AI as an operating layer 投稿を読む »

AI, Committee, ニュース, Uncategorized

Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI

Google DeepMind research team introduced Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model designed to serve as the ‘cognitive brain’ of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning, and success detection — acting as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), or any other third-party user-defined functions. Here is the key architectural idea to understand: Google DeepMind takes a dual-model approach to robotics AI. Gemini Robotics 1.5 is the vision-language-action (VLA) model — it processes visual inputs and user prompts and directly translates them into physical motor commands. Gemini Robotics-ER, on the other hand, is the embodied reasoning model: it specializes in understanding physical spaces, planning, and making logical decisions, but does not directly control robotic limbs. Instead, it provides high-level insights to help the VLA model decide what to do next. Think of it as the difference between a strategist and an executor — Gemini Robotics-ER 1.6 is the strategist. https://deepmind.google/blog/gemini-robotics-er-1-6/? What’s New in Gemini Robotics-ER 1.6 Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. But the key addition is a capability that did not exist in prior versions at all: instrument reading. Pointing as a Foundation for Spatial Reasoning Pointing — the model’s ability to identify precise pixel-level locations in an image — is far more powerful than it sounds. Points can be used to express spatial reasoning (precision object detection and counting), relational logic (making comparisons such as identifying the smallest item in a set, or defining from-to relationships like ‘move X to location Y’), motion reasoning (mapping trajectories and identifying optimal grasp points), and constraint compliance (reasoning through complex prompts like “point to every object small enough to fit inside the blue cup”). https://deepmind.google/blog/gemini-robotics-er-1-6/? In internal benchmarks, Gemini Robotics-ER 1.6 demonstrates a clear advantage over its predecessor. Gemini Robotics-ER 1.6 correctly identifies the number of hammers, scissors, paintbrushes, pliers, and garden tools in a scene, and does not point to requested items that are not present in the image — such as a wheelbarrow and Ryobi drill. In comparison, Gemini Robotics-ER 1.5 fails to identify the correct number of hammers or paintbrushes, misses scissors altogether, and hallucinates a wheelbarrow. For AI Robotics professionals this matters because hallucinated object detections in robotic pipelines can cause cascading downstream failures — a robot that ‘sees’ an object that isn’t there will attempt to interact with empty space. Success Detection and Multi-View Reasoning In robotics, knowing when a task is finished is just as important as knowing how to start it. Success detection serves as a critical decision-making engine that allows an agent to intelligently choose between retrying a failed attempt or progressing to the next stage of a plan. This is a harder problem than it looks. Most modern robotics setups include multiple camera views such as an overhead and wrist-mounted feed. This means a system needs to understand how different viewpoints combine to form a coherent picture at each moment and across time. Gemini Robotics-ER 1.6 advances multi-view reasoning, enabling it to better fuse information from multiple camera streams, even in occluded or dynamically changing environments. Instrument Reading: A Real-World Breakthrough The genuinely new capability in Gemini Robotics-ER 1.6 is instrument reading — the ability to interpret analog gauges, pressure meters, sight glasses, and digital readouts in industrial settings. This task stems from facility inspection needs, a critical focus area for Boston Dynamics. Spot, a Boston Dynamics robot, is able to visit instruments throughout a facility and capture images of them for Gemini Robotics-ER 1.6 to interpret. Instrument reading requires complex visual reasoning: one must precisely perceive a variety of inputs — including the needles, liquid level, container boundaries, tick marks, and more — and understand how they all relate to each other. In the case of sight glasses, this involves estimating how much liquid fills the sightglass while accounting for distortion from the camera perspective. Gauges typically have text describing the unit, which must be read and interpreted, and some have multiple needles referring to different decimal places that need to be combined. https://deepmind.google/blog/gemini-robotics-er-1-6/? Gemini Robotics-ER 1.6 achieves its instrument readings by using agentic vision (a capability that combines visual reasoning with code execution, introduced with Gemini 3.0 Flash and extended in Gemini Robotics-ER 1.6). The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals, and ultimately applying world knowledge to interpret meaning. Gemini Robotics-ER 1.5 achieves a 23% success rate on instrument reading, Gemini 3.0 Flash reaches 67%, Gemini Robotics-ER 1.6 reaches 86%, and Gemini Robotics-ER 1.6 with agentic vision hits 93%. One important caveat: Gemini Robotics-ER 1.5 was evaluated without agentic vision because it does not support that capability. The other three models were evaluated with agentic vision enabled for the instrument reading task, making the 23% baseline less a performance gap and more a fundamental architectural difference. For AI developers evaluating model generations, this distinction matters — you are not comparing apples to apples across the full benchmark column. Key Takeaways Gemini Robotics-ER 1.6 is a reasoning model, not an action model: It acts as the high-level ‘brain’ of a robot — handling spatial understanding, task planning, and success detection — while the separate VLA model (Gemini Robotics 1.5) handles the actual physical motor commands. Pointing is more powerful than it looks: Gemini Robotics-ER 1.6’s pointing capability goes far beyond simple object detection — it enables relational logic, motion trajectory mapping, grasp point identification, and constraint-based reasoning, all of which are foundational to reliable robotic manipulation. Instrument reading is the biggest new capability: Built in collaboration with Boston Dynamics’ Spot robot for industrial facility inspection, Gemini Robotics-ER 1.6 can

Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI 投稿を読む »

We use cookies to improve your experience and performance on our website. You can learn more at プライバシーポリシー and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
ja