YouZum

Committee

AI, Committee, ニュース, Uncategorized

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

arXiv:2502.20592v3 Announce Type: replace Abstract: Recent advances in test-time scaling have shown promising results in improving Large Language Model (LLM) performance through strategic computation allocation during inference. While this approach has demonstrated strong improvements in logical and mathematical reasoning tasks, its application to natural language generation (NLG), particularly summarization, remains unexplored. Multi-Document Summarization (MDS), a fundamental task in NLG, presents unique challenges by requiring models to extract and synthesize essential information across multiple lengthy documents. Unlike reasoning tasks, MDS demands a more nuanced approach to prompt design and ensemble methods, as no single “best” prompt can satisfy diverse summarization requirements. We propose a novel framework leveraging test-time scaling for MDS. Our approach employs prompt ensemble techniques to generate multiple candidate summaries using various prompts, then combines them with an aggregator to produce a refined summary. To evaluate our method effectively, we also introduce two new LLM-based metrics: the Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (LLM-ACU) score, which assess summary quality while addressing the positional bias inherent in traditional automatic evaluation. Our extensive experiments demonstrate that this framework significantly enhances summary quality while also revealing the practical scaling boundaries to MDS tasks.

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing 投稿を読む »

AI, Committee, ニュース, Uncategorized

By putting AI into everything, Google wants to make it invisible 

If you want to know where AI is headed, this year’s Google I/O has you covered. The company’s annual showcase of next-gen products, which kicked off yesterday, has all of the pomp and pizzazz, the sizzle reels and celebrity walk-ons, that you’d expect from a multimillion dollar marketing event. But it also shows us just how fast this still-experimental technology is being subsumed into a line-up designed to sell phones and subscription tiers. Never before have I seen this thing we call artificial intelligence appear so normal. Yes, Google’s line up of consumer-facing products is the slickest on offer. The firm is bundling most of its multimodal models into its Gemini app, including the new Imagen 4 image generator and the new Veo 3 video-generator. That means you can now access Google’s full range of generative models via a single chatbot. It also announced Gemini Live, a feature that lets you share your phone’s screen or your camera’s view with the chatbot and ask it about what it can see. Those features were previously only seen in demos of Project Astra, a “universal AI assistant” that Google DeepMind is working on. Now, Google is inching towards putting Project Astra into the hands of anyone with a smartphone. Google is also rolling out AI Mode, an LLM-powered front-end to search. This can now pull in personal information from Gmail or Google Docs to tailor searches to users. It will include Deep Search, which can break a query down into hundreds of individual searches and then summarize the results; a version of Project Mariner, Google DeepMind’s browser-using agent; and Search Live, which lets you hold up your camera and ask it what it sees. This is the new frontier. It’s no longer about who has the most powerful models, but who can spin them into the best products. OpenAI’s ChatGPT includes many similar features to Gemini. But with its existing ecosystem of consumer services and billions of existing users, Google has a clear advantage. Power users wanting access to the latest versions of everything on display can now sign up for Google AI Ultra for $250 a month.   When OpenAI released ChatGPT in late 2022, Google was caught on the back foot and had to jump into a higher gear to catch up. With this year’s product line-up, it feels like Google has stuck its landing. On a preview call, Google’s CEO Sundar Pichai claimed that AI Overviews, a precrusor to AI Mode that provides LLM-generated summaries of search results, had turned out to be popular with hundreds of millions of users. He speculated that many of them may not even know (or care) whether or not they were using AI—it was just a cool new feature. Google I/O gives a broader glimpse of that future, one where AI is invisible. “More intelligence is available, for everyone, everywhere,” Pichai told his audience. I think we are expected to marvel. But by putting AI in everything, Google is turning AI into a technology we won’t notice and may not even bother to name.

By putting AI into everything, Google wants to make it invisible  投稿を読む »

AI, Committee, ニュース, Uncategorized

Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling

Data Scarcity in Generative Modeling Generative models traditionally rely on large, high-quality datasets to produce samples that replicate the underlying data distribution. However, in fields like molecular modeling or physics-based inference, acquiring such data can be computationally infeasible or even impossible. Instead of labeled data, only a scalar reward—typically derived from a complex energy function—is available to judge the quality of generated samples. This presents a significant challenge: how can one train generative models effectively without direct supervision from data? Meta AI Introduces Adjoint Sampling, a New Learning Algorithm Based on Scalar Rewards Meta AI tackles this challenge with Adjoint Sampling, a novel learning algorithm designed for training generative models using only scalar reward signals. Built on the theoretical framework of stochastic optimal control (SOC), Adjoint Sampling reframes the training process as an optimization task over a controlled diffusion process. Unlike standard generative models, it does not require explicit data. Instead, it learns to generate high-quality samples by iteratively refining them using a reward function—often derived from physical or chemical energy models. Adjoint Sampling excels in scenarios where only an unnormalized energy function is accessible. It produces samples that align with the target distribution defined by this energy, bypassing the need for corrective methods like importance sampling or MCMC, which are computationally intensive. Source: https://arxiv.org/abs/2504.11713 Technical Details The foundation of Adjoint Sampling is a stochastic differential equation (SDE) that models how sample trajectories evolve. The algorithm learns a control drift u(x,t)u(x, t)u(x,t) such that the final state of these trajectories approximates a desired distribution (e.g., Boltzmann). A key innovation is its use of Reciprocal Adjoint Matching (RAM)—a loss function that enables gradient-based updates using only the initial and final states of sample trajectories. This sidesteps the need to backpropagate through the entire diffusion path, greatly improving computational efficiency. By sampling from a known base process and conditioning on terminal states, Adjoint Sampling constructs a replay buffer of samples and gradients, allowing multiple optimization steps per sample. This on-policy training method provides scalability unmatched by previous approaches, making it suitable for high-dimensional problems like molecular conformer generation. Moreover, Adjoint Sampling supports geometric symmetries and periodic boundary conditions, enabling models to respect molecular invariances like rotation, translation, and torsion. These features are crucial for physically meaningful generative tasks in chemistry and physics. Performance Insights and Benchmark Results Adjoint Sampling achieves state-of-the-art results in both synthetic and real-world tasks. On synthetic benchmarks such as the Double-Well (DW-4), Lennard-Jones (LJ-13 and LJ-55) potentials, it significantly outperforms baselines like DDS and PIS, especially in energy efficiency. For example, where DDS and PIS require 1000 evaluations per gradient update, Adjoint Sampling only uses three, with similar or better performance in Wasserstein distance and effective sample size (ESS). In a practical setting, the algorithm was evaluated on large-scale molecular conformer generation using the eSEN energy model trained on the SPICE-MACE-OFF dataset. Adjoint Sampling, especially its Cartesian variant with pretraining, achieved up to 96.4% recall and 0.60 Å mean RMSD, surpassing RDKit ETKDG—a widely used chemistry-based baseline—across all metrics. The method generalizes well to the GEOM-DRUGS dataset, showing substantial improvements in recall while maintaining competitive precision. The algorithm’s ability to explore the configuration space broadly, aided by its stochastic initialization and reward-based learning, results in greater conformer diversity—critical for drug discovery and molecular design. Conclusion: A Scalable Path Forward for Reward-Driven Generative Models Adjoint Sampling represents a major step forward in generative modeling without data. By leveraging scalar reward signals and an efficient on-policy training method grounded in stochastic control, it enables scalable training of diffusion-based samplers with minimal energy evaluations. Its integration of geometric symmetries and its ability to generalize across diverse molecular structures position it as a foundational tool in computational chemistry and beyond. Check out the Paper, Model on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling appeared first on MarkTechPost.

Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling 投稿を読む »

AI, Committee, ニュース, Uncategorized

Google just leapfrogged every competitor with mind-blowing AI that can think deeper, shop smarter, and create videos with dialogue

Google unveiled major AI advancements at I/O 2025, including Gemini 2.5 with Deep Think, AI Mode in Search, Veo 3 for video with audio, and a $249 Ultra plan aimed at power users and enterprises.Read More

Google just leapfrogged every competitor with mind-blowing AI that can think deeper, shop smarter, and create videos with dialogue 投稿を読む »

AI, Committee, ニュース, Uncategorized

Everything you need to know about estimating AI’s energy and emissions burden

When we set out to write a story on the best available estimates for AI’s energy and emissions burden, we knew there would be caveats and uncertainties to these numbers. But, we quickly discovered, the caveats are the story too.  This story is a part of MIT Technology Review’s series “Power Hungry: AI and our energy future,” on the energy demands and carbon costs of the artificial-intelligence revolution. Measuring the energy used by an AI model is not like evaluating a car’s fuel economy or an appliance’s energy rating. There’s no agreed-upon method or public database of values. There are no regulators who enforce standards, and consumers don’t get the chance to evaluate one model against another.  Despite the fact that billions of dollars are being poured into reshaping energy infrastructure around the needs of AI, no one has settled on a way to quantify AI’s energy usage. Worse, companies are generally unwilling to disclose their own piece of the puzzle. There are also limitations to estimating the emissions associated with that energy demand, because the grid hosts a complicated, ever-changing mix of energy sources.  It’s a big mess, basically. So, that said, here are the many variables, assumptions, and caveats that we used to calculate the consequences of an AI query. (You can see the full results of our investigation here.) Measuring the energy a model uses Companies like OpenAI, dealing in “closed-source” models, generally offer access to their  systems through an interface where you input a question and receive an answer. What happens in between—which data center in the world processes your request, the energy it takes to do so, and the carbon intensity of the energy sources used—remains a secret, knowable only to the companies. There are few incentives for them to release this information, and so far, most have not. That’s why, for our analysis, we looked at open-source models. They serve as a very imperfect proxy but the best one we have. (OpenAI, Microsoft, and Google declined to share specifics on how much energy their closed-source models use.)  The best resources for measuring the energy consumption of open-source AI models are AI Energy Score, ML.Energy, and MLPerf Power. The team behind ML.Energy assisted us with our text and image model calculations, and the team behind AI Energy Score helped with our video model calculations. Text models AI models use up energy in two phases: when they initially learn from vast amounts of data, called training, and when they respond to queries, called inference. When ChatGPT was launched a few years ago, training was the focus, as tech companies raced to keep up and build ever-bigger models. But now, inference is where the most energy is used. The most accurate way to understand how much energy an AI model uses in the inference stage is to directly measure the amount of electricity used by the server handling the request. Servers contain all sorts of components—powerful chips called GPUs that do the bulk of the computing, other chips called CPUs, fans to keep everything cool, and more. Researchers typically measure the amount of power the GPU draws and estimate the rest (more on this shortly).  To do this, we turned to PhD candidate Jae-Won Chung and associate professor Mosharaf Chowdhury at the University of Michigan, who lead the ML.Energy project. Once we collected figures for different models’ GPU energy use from their team, we had to estimate how much energy is used for other processes, like cooling. We examined research literature, including a 2024 paper from Microsoft, to understand how much of a server’s total energy demand GPUs are responsible for. It turns out to be about half. So we took the team’s GPU energy estimate and doubled it to get a sense of total energy demands.  The ML.Energy team uses a batch of 500 prompts from a larger dataset to test models. The hardware is kept the same throughout; the GPU is a popular Nvidia chip called the H100. We decided to focus on models of three sizes from the Meta Llama family: small (8 billion parameters), medium (70 billion), and large (405 billion). We also identified a selection of prompts to test. We compared these with the averages for the entire batch of 500 prompts.  Image models Stable Diffusion 3 from Stability AI is one of the most commonly used open-source image-generating models, so we made it our focus. Though we tested multiple sizes of the text-based Meta Llama model, we focused on one of the most popular sizes of Stable Diffusion 3, with 2 billion parameters.  The team uses a dataset of example prompts to test a model’s energy requirements. Though the energy used by large language models is determined partially by the prompt, this isn’t true for diffusion models. Diffusion models can be programmed to go through a prescribed number of “denoising steps” when they generate an image or video, with each step being an iteration of the algorithm that adds more detail to the image. For a given step count and model, all images generated have the same energy footprint. The more steps, the higher quality the end result—but the more energy used. Numbers of steps vary by model and application, but 25 is pretty common, and that’s what we used for our standard quality. For higher quality, we used 50 steps.  We mentioned that GPUs are usually responsible for about half of the energy demands of large language model requests. There is not sufficient research to know how this changes for diffusion models that generate images and videos. In the absence of a better estimate, and after consulting with researchers, we opted to stick with this 50% rule of thumb for images and videos too. Video models Chung and Chowdhury do test video models, but only ones that generate short, low-quality GIFs. We don’t think the videos these models produce mirror the fidelity of the AI-generated video that many people are used to seeing.  Instead, we turned to Sasha Luccioni,

Everything you need to know about estimating AI’s energy and emissions burden 投稿を読む »

ja