Committee Archives

Google DeepMind Introduces an AI-Enabled Mouse Pointer Powered by Gemini That Captures Visual and Semantic Context Around the Cursor

admin NU / 5月 13, 2026

The mouse pointer has sat at the center of personal computing for more than half a century. It tracks cursor position. It registers clicks. Beyond that, it does almost nothing. Google DeepMind researchers outlined a set of experimental principles and demos for an AI-enabled pointer that goes considerably further: one that understands not just where you are pointing, but what you are pointing at and why it matters. The system is powered by Gemini and is currently in the experimental stage. Two demos are live in Google AI Studio today: one for editing an image and one for finding places on a map, both operable by pointing and speaking. A deeper integration called Magic Pointer is also rolling out inside Chrome, and a further integration is planned for Googlebook, Google’s new line of Gemini-powered laptops announced this week. https://deepmind.google/blog/ai-pointer/ What DeepMind is Targeting The frustration DeepMind researchers are addressing is a familiar one for anyone who has tried to use an AI assistant while already in the middle of work. Because a typical AI tool lives in its own window, users need to drag their world into it. The research team wants the opposite — intuitive AI that meets users across all the tools they use, without interrupting their flow. In practice, today’s AI workflow often looks like this: you are working inside a document or a browser tab, you spot something you want to ask about, you switch to a chat interface, you re-describe what you were looking at, you run the query, and you paste the result back. This maps to a concrete technical gap: current LLM interfaces are largely text-in, text-out. They have no awareness of the screen state around them. The AI-enabled pointer is an attempt to close that gap by giving the model real-time visual and semantic context derived from cursor position and hover state — without requiring users to manually serialize that context into a written prompt. Four interaction principles DeepMind researchers have developed four principles that together shift the hard work of conveying context and intent from the user to the computer, replacing text-heavy prompts with simpler, more intuitive interactions. The first is Maintain the flow. AI capabilities should work across all apps, not force users into ‘AI detours’ between them. The prototype AI-enabled pointer is available wherever the user is working. For example, they could point at a PDF and request a bullet-point summary to paste directly into an email, hover over a table of statistics and request a pie chart version, or highlight a recipe and ask for all the ingredients doubled. This is a direct architectural stance: instead of building AI assistance as a sidecar application, the capability lives at the pointer level and is present in whichever tool the user is already working in. The second is Show and tell. Current AI models demand precise instructions. To get a good response, a user has to write a detailed prompt. An AI-enabled pointer would streamline this process by smoothly capturing the visual and semantic context around the pointer, letting the computer ‘see’ and understand what’s important to the user. In the experimental system, just point, and the AI knows exactly which word, paragraph, part of an image, or code block the user needs help with. From a technical standpoint, this means the system treats cursor hover state and the surrounding UI content as structured model inputs — comparable to how multimodal models process image and text together, except here the visual region is dynamically cropped and contextualized in real time around a moving cursor. The third is Embrace the power of ‘This’ and ‘That‘. In everyday interactions with each other, humans rarely speak in long, detailed paragraphs. We might say, ‘Fix this’, ‘Move that here’, or ‘What does this mean?’ — while relying on physical gestures and our shared context to fill in any gaps in understanding. An AI system that understands this combination of context, pointing and speech would allow users to make complex requests in natural shorthand, no fiddly prompting required. The name of the principle is deliberate: deictic language (words like ‘this’ and ‘that’ that depend on physical reference to carry meaning) is how humans naturally communicate when they can point at something. The AI-enabled pointer is designed to handle exactly that class of instruction without needing the user to spell out what “this” refers to. The fourth is Turn pixels into actionable entities. For decades, computers have only tracked where we are pointing. AI can now also understand what the user is pointing at. This transforms pixels into structured entities, such as places, dates, and objects, that users can interact with instantly. A photo of a scribbled note becomes an interactive to-do list; a paused frame in a travel video becomes a booking link for that cool-looking restaurant. For ML engineers, this is the most technically substantive of the four principles. It describes an entity extraction step that happens at inference time on whatever visual content is under the cursor — converting raw pixel regions into typed, actionable objects rather than leaving them as unstructured screen content. Where it is going Google DeepMind is now integrating these principles to reimagine pointing in Chrome and the new Googlebook laptop experience. Starting now, instead of writing a complex prompt, users can use their pointer to ask Gemini in Chrome about the part of the webpage they care about. For example, selecting a few products on a page and asking to compare them, or pointing to where they want to visualize a new couch in their living room. Key Takeaways Google DeepMind introduces experimental demos of an AI-enabled mouse pointer powered by Gemini that captures visual and semantic context around the cursor — no manual prompting required. The system is built on four principles: Maintain the flow, Show and tell, Embrace the power of “This” and “That”, and Turn pixels into actionable entities. “Turn pixels into actionable entities” is the key technical idea — the pointer converts on-screen

Google DeepMind Introduces an AI-Enabled Mouse Pointer Powered by Gemini That Captures Visual and Semantic Context Around the Cursor 投稿を読む »

AI, Committee, ニュース, Uncategorized

A plan to make drugs in orbit is going commercial

admin NU / 5月 13, 2026

Varda Space Industries, a startup that’s been pitching its ability to perform drug experiments in space, says it has signed up the pharmaceutical company United Therapeutics in what may be remembered as a notable step toward in-orbit manufacturing. The idea of building things in outer space for use on Earth has so far been explored mostly on board the International Space Station, and only in small-scale experiments backed by governments. But Varda, based in El Segundo, California, is now telling drug companies it has a practical, and repeatable, way to produce novel molecules in microgravity. “This is the first commercial path to products made in space,” says Michael Reilly, Varda’s chief strategy officer. The scientific idea is that chemical mixtures have different properties under weightless conditions. For instance, water will hang together in a wiggly sphere, since without gravity, surface tension is the strongest force present. The plan is to launch versions of United Therapeutics’ drugs into orbit, where they can be allowed to form solid crystals. The hope is that in microgravity, they’ll take on atomic arrangements not seen on Earth, possibly leading to new versions with improved stability or other valuable properties. United is led by CEO Martine Rothblatt, who worked on early telecommunications satellites. Since then, she’s built a multibillion-dollar health franchise with a succession of drugs to treat a lung disease called pulmonary arterial hypertension, which her daughter suffers from, and a subsidiary developing genetically modified pigs as a source of organs for transplantation. Rothblatt says space could be the next step if orbital conditions permit United to identify “even more amazing” versions of its drugs. Space to reformulate Pharmaceutical companies often try to keep their blockbuster franchises alive by creating improved versions of drugs or reformulating them—for example, making the switch from a pill to an inhaled version, as United has done with some of its products. Doing so can keep imitators at bay and create extra decades of patent protection. Assisting drugmakers are specialist companies, such as Halozyme and MannKind, that earn profits by helping to reformulate other companies’ drugs, often taking a royalty on future sales. That’s the business Varda has been trying to break into—by using excursions into space instead of nebulizers, patches, or nanoparticles. The company was formed in 2021 by Delian Asparouhov, a partner at Peter Thiel’s Founders Fund, along with Will Bruey, a former avionics engineer with Elon Musk’s SpaceX who is now Varda’s CEO. The pair’s bet is that space manufacturing will become viable once rocket launches become frequent enough—and cheap enough—to support a business model in which raw materials are sent into orbit, processed, and then returned to Earth in a new form. And that’s starting to happen. To get into space, Varda has been purchasing rides from SpaceX—which now launches a rocket every two or three days, usually a reusable Falcon 9. Those rockets have a nose cone, or payload fairing, about the size of a moving truck that gets filled with satellites or instruments, which are then released into orbit. Starting in 2023, Varda began sending up small satellites that have a boulder-size capsule attached. The capsule contains equipment to carry out experiments, and it can detach and fall back to Earth, entering the atmosphere at a speed of around Mach 25 before slowing via air resistance and eventually drifting to land with a parachute. (Varda lands its craft in the Australian outback.) That speedy reentry has also drawn interest from the US military, including the Air Force, which has paid Varda to fly instruments and take measurements relevant to hypersonic missile technology. Of the six craft Varda has paid to put into orbit so far, half have been dedicated to military research and half carried drug-related demonstrations. At Varda, such “dual use” of technology is accepted as part of being in the space business, which remains reliant on government support. The company’s founders say Varda may be the only company that employs hypersonic engineers and pharmaceutical chemists under the same roof. At Varda’s headquarters, drug samples are loaded into a spinning arm that creates extra-high g-forces. While that’s the opposite of microgravity, increased weight can provide clues into whether a drug will act differently under new conditions.COURTESY VARDA Launching industries Actual space manufacturing still remains mostly an aspirational project. In 2021, Jeff Bezos, after his first trip aloft in a rocket, suggested that polluting industries should be moved beyond the atmosphere. “We need to take all heavy industry, all polluting industry, and move it into space. And keep Earth as this beautiful gem of a planet that it is,” he told MSNBC. Weight is the big obstacle to such dreams. It still costs around $7,000 to launch a single kilogram of payload into orbit, which makes it impractical to, say, send cotton into space to be dyed there, or even to launch the acids and solvents needed to make a semiconductor chip. But drugs may be among the few exceptions to this economic rule, since pound for pound, they can be as valuable as rare radioactive isotopes and fine-cut diamonds. For instance, just one kilogram of the weight-loss drug Ozempic is worth more than $100 million at retail. (The reason your Ozempic bill is only $1,000 a month is that minute quantities of the active ingredient are present in the shots.) That’s why Varda thinks it may eventually be able to manufacture drugs in orbit. However, its effort with United is more of a flying experiment to learn whether the company’s lung medicines will crystallize differently in microgravity. The terms of the deal between Varda and United aren’t public, and the companies haven’t said which specific drugs the collaboration will study. But Rothblatt did confirm that United is paying Varda to help it identify new crystal forms of its drugs (also called polymorphs), which it hopes could have improved properties. “One has to do the experiment to find out if that is so. The first part of the experiment is to see what polymorphs of

A plan to make drugs in orbit is going commercial 投稿を読む »

AI, Committee, ニュース, Uncategorized

Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

admin NU / 5月 13, 2026

Most AI systems today work in turns. You type or speak, the model waits, processes your input, and then responds. That’s the entire interaction loop. Thinking Machines Lab, an AI research lab, is arguing that this model of interaction is a fundamental bottleneck. Thinking Machines Lab team introduced a research preview of a new class of system they call interaction models to address it. The main idea for their research is interactivity should be native to the model itself, not bolted on as an afterthought. What’s Wrong with Turn-Based AI If you’ve built anything with a language model or voice API, you’ve worked around the limitations of turn-based interaction. The model has no awareness of what’s happening while you’re still typing or speaking. It can’t see you pause mid-sentence, notice your camera feed, or react to something visual in real time. While the model is generating, it’s equally blind — perception freezes until it finishes or gets interrupted. This creates a narrow channel for human-AI collaboration that limits how much of a person’s knowledge, intent, and judgment can reach the model, and how much of the model’s work can be understood. To work around this, most real-time AI systems use a harness — a collection of separate components stitched together to simulate responsiveness. A common example is voice-activity detection (VAD), which predicts when a user has finished speaking so a turn-based model knows when to start generating. This harness is made out of components that are meaningfully less intelligent than the model itself, and it precludes capabilities like proactive visual reactions, speaking while listening, or responding to cues that are never explicitly stated aloud. Thinking Machines Lab’s argument is a version of the ‘bitter lesson’ in machine learning: hand-crafted systems will eventually be outpaced by scaling general capabilities. For interactivity to scale with intelligence, it must be part of the model itself. With this approach, scaling a model makes it smarter and a better collaborator. https://thinkingmachines.ai/blog/interaction-models/ The Architecture: Multi-Stream, Micro-Turn Design The system has two components working in parallel: an interaction model that maintains constant real-time exchange with the user, and a background model that handles deeper reasoning tasks asynchronously. The interaction model is always on — continuously taking in audio, video, and text and producing responses in real time. When a task requires sustained reasoning (tool use, web search, longer-horizon planning), it delegates to the background model by sending a rich context package containing the full conversation — not a standalone query. Results stream back as the background model produces them, and the interaction model interleaves those updates into the conversation at a moment appropriate to what the user is currently doing, rather than as an abrupt context switch. Both models share their context throughout. Think of it like one person who keeps you engaged in conversation while a colleague in the background looks something up and passes notes forward in real time. The key architectural decision enabling this is time-aligned micro-turns. The interaction model continuously interleaves the processing of 200ms worth of input with the generation of 200ms worth of output. Rather than consuming a complete user turn and generating a complete response, both input and output are treated as streams processed in 200ms chunks. This is what allows the model to speak while listening, react to visual cues without being prompted verbally, handle true simultaneous speech, and make tool calls and browse the web while the conversation is still in progress — weaving results back in as they arrive. Encoder-free early fusion is the specific design choice that makes multimodal processing work at this cadence. Rather than routing audio and video through large, separate pretrained encoders (like a Whisper-style ASR model or a standalone TTS decoder), the architecture uses minimal pre-processing. Audio signals are ingested as dMel and transformed via a lightweight embedding layer. Video frames are split into 40×40 patches encoded by an hMLP. Audio output uses a flow head for decoding. All components are co-trained from scratch together with the transformer — there is no separately pretrained encoder or decoder at any stage. On the inference side, the 200ms chunk design creates engineering challenges. Existing LLM inference libraries aren’t optimized for frequent small prefills — they carry significant per-turn overhead. Thinking Machines implemented streaming sessions, where the client sends each 200ms chunk as a separate request while the inference server appends chunks into a persistent sequence in GPU memory, avoiding repeated memory reallocations and metadata computations. They’ve upstreamed a version of this to SGLang, the open-source inference framework. Additionally, they use a gather+gemv strategy for MoE kernels instead of standard grouped gemm, following prior work from PyTorch and Cursor, to optimize for the latency-sensitive shapes required by bidirectional serving. https://thinkingmachines.ai/blog/interaction-models/ Benchmarks: Where It Stands The model, named TML-Interaction-Small, is a 276B parameter Mixture-of-Experts (MoE) with 12B active parameters. The benchmark table distinguishes between Instant models (no extended reasoning) and Thinking models (with reasoning). TML-Interaction-Small is an Instant model. Among all Instant models in the comparison, it achieves the highest score on Audio MultiChallenge APR at 43.4% — above GPT-realtime-2.0 (minimal) at 37.6%, GPT-realtime-1.5 at 34.7%, and Gemini-3.1-flash-live-preview (minimal) at 26.8%. The Thinking models, GPT-realtime-2.0 (xhigh) at 48.5% and Gemini-3.1-flash-live (high) at 36.1%, use extended reasoning to achieve their scores. On FD-bench v1.5, which measures interaction quality across user interruption, backchanneling, talking-to-others, and background speech scenarios, TML-Interaction-Small scores 77.8 average quality — compared to 54.3 for Gemini-3.1-flash-live (minimal), 48.3 for GPT-realtime-1.5, and 47.8 for GPT-realtime-2.0 (xhigh). On FD-bench v1 turn-taking latency, the model responds in 0.40 seconds — compared to 0.57s for Gemini, 0.59s for GPT-realtime-1.5, and 1.18s for GPT-realtime-2.0 (minimal). On FD-bench v3, which evaluates response quality and tool use (audio + tools combined), TML-Interaction-Small (with background agent enabled) scores 82.8% Response Quality / 68.0% Pass@1 — the highest in the comparison table. https://thinkingmachines.ai/blog/interaction-models/ Thinking Machines research team also introduced new internal benchmarks targeting capabilities that no existing model handles: TimeSpeak — Tests whether the model initiates speech at user-specified times with correct content.

Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration 投稿を読む »

AI, Committee, ニュース, Uncategorized

Choosing the Right Agentic Design Pattern: A Decision-Tree Approach

admin NU / 5月 13, 2026

Most

Choosing the Right Agentic Design Pattern: A Decision-Tree Approach 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: making drugs in orbit and NASA’s nuclear-powered spacecraft

admin NU / 5月 13, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A plan to make drugs in orbit is going commercial A startup called Varda Space Industries is betting that the future of pharmaceuticals lies in orbit. The company has signed a deal with United Therapeutics to test whether drugs crystallize differently in microgravity, potentially creating improved versions with new properties. The idea sounds futuristic, but falling launch costs and reusable rockets are making space-based manufacturing seem increasingly plausible. Varda says the partnership could mark an important step toward building products in orbit for use back on Earth. Discover how space could become the next frontier for drug development. —Antonio Regalado MIT Technology Review Narrated: NASA is building the first nuclear reactor-powered interplanetary spacecraft. How will it work? Just before Artemis II began its historic slingshot around the moon, NASA revealed an even grander space travel plan. By the end of 2028, the agency aims to fly a nuclear reactor-powered interplanetary spacecraft to Mars. A successful mission would herald a new era in spaceflight—and might just give the US the edge in the race against China. But the project remains shrouded in mystery. MIT Technology Review picked the brains of nuclear power and propulsion experts to find out how the nuclear-powered spacecraft might work. —Robin George Andrews This is our latest story to be turned into an MIT Technology Review Narrated podcast, which we publish each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Sam Altman claims Elon Musk tried to seize control of OpenAIAltman said Musk initially wanted 90% of the equity. (AFP)+ And that control should go to his children when he dies. (BBC)+ Altman also accused Musk of twice trying to end its non-profit status. (NPR)+ Musk’s motivations for the suit are under scrutiny. (MIT Technology Review) 2 Google and SpaceX are in talks to launch data centers into orbitSpaceX could join Suncatcher, Google’s orbital data center project. (WSJ $)+ The project’s first launch is slated for early 2027. (Guardian)+ Anthropic and SpaceX have also discussed orbital data centers. (Wired $)+ But there are a few hurdles to overcome. (MIT Technology Review) 3 Jensen Huang has joined Donald Trump’s high-stakes mission to ChinaNvidia is lobbying to sell its AI chips in the country. (Bloomberg $)+ Elon Musk and Tim Cook are also on the trip. (CNBC)+ But a tech rivalry and distrust have sapped hopes for big deals. (Reuters $) 4 ICE agents have a list of 20 million people on their iPhones, thanks to PalantirAn ICE official said Palantir is speeding up raids and arrests. (404 Media)+ ICE has also used facial recognition and Paragon spyware. (TechCrunch) 5 Defense tech firm Anduril just doubled its valuation to over $60 billionIn a $5 billion funding round led by Thrive Capital and a16z. (FT $)Anduril, which makes AI-backed weapons, may go public next year. (NYT $) 6 Meta employees are protesting computer-tracking at workFlyers posted at offices are urging staff to oppose the program. (Reuters $)+ Meta plans to track workers’ clicks and keystrokes to train AI. (CNBC) 7 OpenAI is facing another wrongful death lawsuit over ChatGPT medical adviceThe chatbot’s tips allegedly led to a teenager’s overdose. (Ars Technica) 8 The Canvas learning platform has paid hackers to delete stolen student dataIt caved to ransomware demands after the biggest-ever edtech breach. (BBC) 9 Scientific researchers are thinking twice about using AIDue to price hikes, usage limitations, and unreliable outputs. (Nature) 10 The latest AI compute solution? Putting data centers in your homeHardware hosts get subsidized electricity and internet. (Ars Technica) Quote of the day “Mr Musk did try to kill it.” —Sam Altman claims that Elon Musk tried to destroy rather than protect OpenAI’s non-profit operations, the Guardian reports. One More Thing YOSHI SODEOKA Why does AI hallucinate? Chatbot fails are now a familiar meme. Meta’s short-lived scientific chatbot generated wiki articles about the history of bears in space. Lawyers have submitted court documents filled with legal citations fabricated by ChatGPT. Air Canada was ordered to honor a refund policy invented by its customer service chatbot. This tendency to make things up—known as hallucination—is one of the biggest obstacles holding chatbots back from more widespread adoption. Here’s why they do it—and why we still can’t fix it. —Will Douglas Heaven This story is part of MIT Technology Review Explains, our series untangling the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here. We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + A historian has unearthed the etymology of every single dinosaur name.+ Humus on the moon is getting closer to reality after scientists grew chickpeas in lunar soil.+ Witness the patience of a master paper artist in this gallery of intricate, handmade sculptures.+ Want to tell the time alphabetically? Me neither, but this cursed clock is an intriguing reason to try.

The Download: making drugs in orbit and NASA’s nuclear-powered spacecraft 投稿を読む »

AI, Committee, ニュース, Uncategorized

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

admin NU / 5月 12, 2026

Researchers at Tilde Research have released Aurora, a new optimizer for training neural networks that addresses a structural flaw in the widely-used Muon optimizer. The flaw quietly kills off a significant fraction of MLP neurons during training and keeps them permanently dead. Aurora comes with a 1.1B parameter pretraining experiment, a new state-of-the-art result on the modded-nanoGPT speedrun benchmark, and open codes. What is Muon? To understand Aurora, it helps to first understand Muon. The Muon optimizer attracted attention in the ML community after outperforming AdamW in wall-clock time to convergence on the nanoGPT speedrun competition — a community benchmark that measures how fast you can train a GPT-style model to a target validation loss. Since then, Muon has been adopted in frontier-scale model training by several research groups. Muon’s key algorithmic step is computing the polar factor of the gradient matrix. For a gradient matrix G with thin Singular Value Decomposition (SVD) G = UΣVᵀ, Muon computes polar(G) = UVᵀ, which is the closest semi-orthogonal matrix to G in the Frobenius norm. This orthogonalized gradient is then used to update the weights: W ← W − η UVᵀ for a learning rate η. The use of matmul-only iterative algorithms to compute the polar factor is what makes Muon practical at scale. The NorMuon Puzzle: Row Normalization Helps, But Why? Before Aurora, NorMuon led the modded-nanoGPT speedrun. It introduced a row-normalization step—similar to Adam’s per-parameter scaling—that adjusted the polar factor by its inverse RMS norm. While this often pulls the update away from a strictly orthogonal gradient, NorMuon still yields impressive results. The Tilde team set out to understand exactly what gap in Muon’s formulation NorMuon was addressing. The Core Problem: Row-Norm Anisotropy and Neuron Death in Tall Matrices The research team discovered that the Muon optimizer unintentionally “kills” a large portion of neurons in tall weight matrices, such as those found in SwiGLU-based MLP layers. Because it is mathematically impossible for these specific matrix shapes to stay perfectly orthogonal while keeping row updates even, the optimizer ends up giving massive updates to some neurons while virtually ignoring others. This results in a “death spiral” where under-performing neurons receive less signal over time, eventually becoming permanently inactive. The research study revealed that by the 500th training step, more than one in four neurons are effectively dead. This isn’t just a local issue; the lack of activity in these neurons starves subsequent layers of necessary data, spreading the inefficiency throughout the model. Aurora solves this by using a new mathematical approach that enforces uniform updates across all neurons without sacrificing the benefits of orthogonalization. The Intermediate Step: U-NorMuon Before arriving at Aurora, the research introduces an intermediate fix called U-NorMuon. The key observation is that NorMuon normalizes each row to unit norm (norm = 1), but this is actually the wrong target for a tall matrix. For a column-orthogonal tall matrix, the mathematically correct average row norm is √(n/m), not 1. U-NorMuon corrects this by normalizing tall matrix rows to have norm √(n/m) instead of 1. In experiments at 340M scale, U-NorMuon outperforms both Muon and standard NorMuon and completely eliminates the neuron death phenomenon — leverage scores become approximately isotropic throughout training. Crucially, U-NorMuon propagates this benefit to layers it doesn’t directly touch: keeping up/gate rows alive ensures isotropic gradient flow into the down-projection, stabilizing its column leverage without any direct intervention. However, U-NorMuon still has a problem: it forcefully overrides the polar factor with uniform row norms, sacrificing polar factor precision, which is both theoretically undesirable and empirically costly in the Muon framework (the paper shows that Muon achieves monotonically lower loss with more precise orthogonalization). This is the motivation for Aurora. Aurora: Steepest Descent Under Two Joint Constraints Aurora reformulates the update-selection problem from scratch. Rather than running orthogonalization and then patching it with row normalization, Aurora asks: what is the optimal update under the joint constraint of left semi-orthogonality and uniform row norms? Formally, for tall matrices, Aurora solves: U∗=argUmaxTr(G⊤U)s.t.U⊤U=In,∥Ui:∥2=mn∀iU ∗ =arg U max Tr(G ⊤ U)s.t.U ⊤ U=I n ,∥U i: ∥ 2 = m n ∀i The research shows that these two constraints together force all singular values of U to exactly equal 1. This means the joint constraint still produces a valid left semi-orthogonal update, not a compromised one. This is the key insight that separates Aurora from NorMuon and U-NorMuon: it achieves row-norm uniformity and orthogonality simultaneously rather than trading one off against the other. The research also provides two algorithmic implementations of Aurora’s solution. The Riemannian Aurora uses a gradient projection approach restricted to the joint Stiefel/equal-row-leverage manifold. The vanilla Aurora is a simpler, more practical implementation. Both are open-sourced. For non-tall (wide and square) matrices, row-norm uniformity is already implied by orthogonality, so Aurora leaves those parameters unchanged. Results Aurora was used to train a 1.1B model that achieves 100x data efficiency on open-source internet data and outperforms larger models on general evals like HellaSwag. At 1B scale, Aurora achieves large gains over both Muon and NorMuon. On the modded-nanoGPT optimization speedrun, Aurora’s submitted run outperforms the prior state-of-the-art (which was NorMuon). Untuned Aurora carries only a 6% compute overhead over traditional Muon and is designed as a drop-in replacement. The research team also found that Aurora’s performance gains scale with MLP width, suggesting it is particularly effective for networks with large MLP expansion factors — which is consistent with the neuron death hypothesis, since wider MLPs have more tall matrices and more opportunity for leverage anisotropy to compound. Key Takeaways Muon’s polar factor update inherits row-norm anisotropy on tall matrices, causing over 25% of MLP neurons to permanently die as early as step 500 of training. Aurora solves this by finding the optimal update under a joint constraint of left semi-orthogonality and uniform row norms — achieving both simultaneously rather than trading one off against the other. At 1.1B scale, Aurora achieves 100x data efficiency on open-source internet data, outperforms larger models on HellaSwag, and

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon 投稿を読む »

AI, Committee, ニュース, Uncategorized

Beyond Multiple Choice: Evaluating Steering Vectors for Summarization

admin NU / 5月 12, 2026

arXiv:2505.24859v3 Announce Type: replace-cross Abstract: Steering vectors are a lightweight method for controlling text properties by adding a learned bias to language model activations at inference time. While predominantly studied for multiple-choice and toy tasks, their effectiveness in free-form generation remains largely unexplored. Moving “Beyond Multiple Choice,” we evaluate steering vectors for controlling topical focus, sentiment, toxicity, and readability in abstractive summaries across the SAMSum, NEWTS, and arXiv datasets. We find that steering effectively controls targeted properties, but high steering strengths consistently induce degenerate repetition and factual hallucinations. Prompting alone preserves summary quality but offers weaker control. Combining both methods yields the strongest control and the most favorable efficacy-quality trade-off at moderate steering strengths. Our work demonstrates that steering vectors face a critical control-quality trade-off in free-form generation, and that hybrid approaches offer the best balance in practice.

Beyond Multiple Choice: Evaluating Steering Vectors for Summarization 投稿を読む »

AI, Committee, ニュース, Uncategorized

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

admin NU / 5月 12, 2026

arXiv:2603.14937v3 Announce Type: replace-cross Abstract: Text-rich graphs, which integrate complex structural dependencies with abundant textual information, are ubiquitous yet remain challenging for existing learning paradigms. Conventional methods and even LLM-hybrids compress rich text into static embeddings or summaries before structural reasoning, creating an information bottleneck and detaching updates from the raw content. We argue that in text-rich graphs, the text is not merely a node attribute but the primary medium through which structural relationships are manifested. We introduce RAMP, a Raw-text Anchored Message Passing approach that moves beyond using LLMs as mere feature extractors and instead recasts the LLM itself as a graph-native aggregation operator. RAMP exploits the text-rich nature of the graph via a novel dual-representation scheme: it anchors inference on each node’s raw text during each iteration while propagating dynamically optimized messages from neighbors. It further handles both discriminative and generative tasks under a single unified generative formulation. Extensive experiments show that RAMP effectively bridges the gap between graph propagation and deep text reasoning, achieving competitive performance and offering new insights into the role of LLMs as graph kernels for general-purpose graph learning.

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs 投稿を読む »

AI, Committee, ニュース, Uncategorized

LLM Observability Tools for Reliable AI Applications

admin NU / 5月 12, 2026

Large language models (LLMs) now power everything from customer service bots to autonomous coding agents.

LLM Observability Tools for Reliable AI Applications 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: a Nobel winner on AI, and the case for fixing everything

admin NU / 5月 12, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Three things in AI to watch, according to a Nobel-winning economist A few months before he won the Nobel Prize in economics in 2024, Daron Acemoglu published a paper that earned him few fans in Silicon Valley. He argued that AI would give only a small boost to US productivity and would not eliminate the need for human work. Two years later, Acemoglu’s measured take has not caught on. The technology has advanced quite a bit since his cautious predictions, but the data is still largely on his side. MIT Technology Review spoke with him to understand if any of the latest developments have changed his thesis. Here are the three things Acemoglu is paying closest attention to in AI right now. —James O’Donnell This story is from The Algorithm, our weekly newsletter giving you the inside track on all things AI. Sign up to receive it in your inbox every Monday. The case for fixing everything Stewart Brand, the counterculture icon and tech industry legend, considers maintenance a “civilizational” act. His new book argues that taking responsibility for maintaining something, whether a motorcycle, a monument, or the planet, can be radical. Brand argues that maintainers haven’t gotten the laurels they deserve—and he’s right. Yet his vision of maintenance often feels solitary: profound, but more about personal fulfillment than tending to a shared world or making it better. Read the full review of his handsome new book, Maintenance: Of Everything, Part One. —Lee Vinsel Lee Vinsel is an associate professor of science, technology, and society at Virginia Tech, a cofounder of The Maintainers, and the host of Peoples & Things, a podcast about human life with technology. This story is from the latest edition of our print magazine, which is all about nature. Subscribe now to read the full issue and receive future print copies once they land. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The first zero-day exploit built by AI has been discoveredGoogle spotted and stopped the attempted “mass exploitation event.” (CNBC)+ The hackers used AI to discover an unknown bug. (NYT $)+ AI-powered hacking has exploded into an industrial-scale threat. (Guardian)+ New tools are simplifying online crime. (MIT Technology Review) 2 OpenAI just launched its answer to Claude MythosDaybreak patches vulnerabilities before attackers find them. (The Verge)+ Sam Altman said it will “continuously secure software.” (Gizmodo)+ It will rival Anthropic’s Claude Mythos, which arrived a month ago. (BBC)+ OpenAI is allowing wider access to its cyber models than Anthropic. (CNBC) 3 Trump is heading to China to spread the gospel of American techWhile taking cues from Beijing’s more stringent approach. (Guardian)+ But investors want Trump and Xi to stay out of AI’s way. (Reuters $)+ Elon Musk and Tim Cook are joining him on the trip this week. (BBC) 4 Ilya Sutskever has testified on Sam Altman’s “pattern of lying”OpenAI co-founder Sutskever took the stand in the Altman v. Musk trial. (BI)+ He said he spent a year gathering proof of Altman’s dishonesty. (Reuters $)+ But he also added to OpenAI’s defense. (Wired $)+ While Satya Nadella called attempts to remove Altman “amateur city.” (FT $)+ Here’s what happened last week in the trial. (MIT Technology Review) 5 A new hantavirus vaccine is in the worksModerna and Korea University are developing an mRNA vaccine. (Wired $)+ Here’s what you need to know about the cruise ship outbreak. (MIT Technology Review) 6 Texas has sued Netflix over alleged data harvesting and “addictive” designAG Ken Paxton accuses Netflix of secretly collecting and selling user data. (Quartz)+ And spying on children while deliberately fostering addiction. (Guardian) 7 A data center guzzled 30 million gallons of water—and no one noticedThe curious case serves as a warning for other data center projects. (Ars Technica) 8 Europe is reportedly selling spyware to human rights abusersEU states allegedly sold the tech to countries violating rights. (Bloomberg $) 9 The US government’s AI vetting announcement has mysteriously vanishedIt had detailed a security test agreement with Google, xAI, and Microsoft. (Gizmodo) 10 Amazon staff are using AI for pointless tasks just to inflate usage scoresIn a bid to impress managers. (FT $) + An AI expert says we should stop using AI so much. (MIT Technology Review) Quote of the day “This is like the cheating husband complaining about the cheating wife.” —Anupam Chander, a professor of law and technology at Georgetown Law School, tells the New York Times that Elon Musk’s hypocrisy over OpenAI becoming a for-profit company will undermine his courtroom battle with Sam Altman. One More Thing STUART BRADFORD How sounds can turn us on to the wonders of the universe For decades, astronomy has relied on visual information to make sense of the cosmos: images, charts, and graphs. Now, some researchers are trying something different: listening to the universe. Using sonification, the process of turning information into sound, they’re helping blind and visually impaired researchers explore the cosmos—and even uncover patterns that might otherwise go unnoticed. The approach is spreading beyond astronomy into fields like climate science, navigation, and education. Discover how sound could make science more accessible—and even more revealing. —Corey S. Powell We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + This musical mashup beautifully blends LCD Soundsystem with Twin Peaks.+ Match your speculative ideas to sci-fi stories with the Extrapolated Futures Archive.+ A live-action animation Coyote vs. ACME is coming soon—and the first trailer just dropped.+ Want to surf elsewhere in the galaxy? Here’s what it would be like to catch waves on distant planets.

The Download: a Nobel winner on AI, and the case for fixing everything 投稿を読む »

Committee

Google DeepMind Introduces an AI-Enabled Mouse Pointer Powered by Gemini That Captures Visual and Semantic Context Around the Cursor

A plan to make drugs in orbit is going commercial

Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

Choosing the Right Agentic Design Pattern: A Decision-Tree Approach

The Download: making drugs in orbit and NASA’s nuclear-powered spacecraft

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

Beyond Multiple Choice: Evaluating Steering Vectors for Summarization

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

LLM Observability Tools for Reliable AI Applications

The Download: a Nobel winner on AI, and the case for fixing everything

私たちのサービス

ホーム

仕組み

ニュース

料金

サポート

ヘルプセンター

問題を報告

フィードバックを送る

プライバシーポリシー

ユーザーアカウント

フォローする