YouZum

Committee

AI, Committee, Actualités, Uncategorized

Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding

Addressing Architectural Trade-offs in Language Models As language models scale, balancing expressivity, efficiency, and adaptability becomes increasingly challenging. Transformer architectures dominate due to their strong performance across a wide range of tasks, but they are computationally expensive—particularly for long-context scenarios—due to the quadratic complexity of self-attention. On the other hand, Structured State Space Models (SSMs) offer improved efficiency and linear scaling, yet often lack the nuanced sequence modeling required for complex language understanding. A combined architecture that leverages the strengths of both approaches is needed to support diverse applications across environments. Introducing Falcon-H1: A Hybrid Architecture The Falcon-H1 series, released by the Technology Innovation Institute (TII), introduces a hybrid family of language models that combine Transformer attention mechanisms with Mamba2-based SSM components. This architecture is designed to improve computational efficiency while maintaining competitive performance across tasks requiring deep contextual understanding. Falcon-H1 covers a wide parameter range—from 0.5B to 34B—catering to use cases from resource-constrained deployments to large-scale distributed inference. The design aims to address common bottlenecks in LLM deployment: memory efficiency, scalability, multilingual support, and the ability to handle extended input sequences. Source: https://falcon-lm.github.io/blog/falcon-h1/ Architectural Details and Design Objectives Falcon-H1 adopts a parallel structure where attention heads and Mamba2 SSMs operate side by side. This design allows each mechanism to independently contribute to sequence modeling: attention heads specialize in capturing token-level dependencies, while SSM components support efficient long-range information retention. The series supports a context length of up to 256K tokens, which is particularly useful for applications in document summarization, retrieval-augmented generation, and multi-turn dialogue systems. Model training incorporates a customized microparameterization (μP) recipe and optimized data pipelines, allowing for stable and efficient training across model sizes. The models are trained with a focus on multilingual capabilities. The architecture is natively equipped to handle 18 languages, with coverage including English, Chinese, Arabic, Hindi, French, and others. The framework is extensible to over 100 languages, supporting localization and region-specific model adaptation. Empirical Results and Comparative Evaluation Despite relatively modest parameter counts, Falcon-H1 models demonstrate strong empirical performance: Falcon-H1-0.5B achieves results comparable to 7B-parameter models released in 2024. Falcon-H1-1.5B-Deep performs on par with leading 7B to 10B Transformer models. Falcon-H1-34B matches or exceeds the performance of models such as Qwen3-32B, Llama4-Scout-17B/109B, and Gemma3-27B across several benchmarks. Evaluations emphasize both general-purpose language understanding and multilingual benchmarks. Notably, the models achieve strong performance across both high-resource and low-resource languages without requiring excessive fine-tuning or additional adaptation layers. Source: https://falcon-lm.github.io/blog/falcon-h1/ Deployment and inference are supported through integration with open-source tools such as Hugging Face Transformers. FlashAttention-2 compatibility further reduces memory usage during inference, offering an attractive efficiency-performance balance for enterprise use. Conclusion Falcon-H1 represents a methodical effort to refine language model architecture by integrating complementary mechanisms—attention and SSMs—within a unified framework. By doing so, it addresses key limitations in both long-context processing and scaling efficiency. The model family provides a range of options for practitioners, from lightweight variants suitable for edge deployment to high-capacity configurations for server-side applications. Through its multilingual coverage, long-context capabilities, and architectural flexibility, Falcon-H1 offers a technically sound foundation for research and production use cases that demand performance without compromising on efficiency or accessibility. Check out the Official Release, Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding appeared first on MarkTechPost.

Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding Lire l’article »

AI, Committee, Actualités, Uncategorized

Three takeaways about AI’s energy use and climate impacts

This week, we published Power Hungry, a package all about AI and energy. At the center of this package is the most comprehensive look yet at AI’s growing power demand, if I do say so myself.  This data-heavy story is the result of over six months of reporting by me and my colleague James O’Donnell (and the work of many others on our team). Over that time, with the help of leading researchers, we quantified the energy and emissions impacts of individual queries to AI models and tallied what it all adds up to, both right now and for the years ahead.  There’s a lot of data to dig through, and I hope you’ll take the time to explore the whole story. But in the meantime, here are three of my biggest takeaways from working on this project.  1. The energy demands of AI are anything but constant.  If you’ve heard estimates of AI’s toll, it’s probably a single number associated with a query, likely to OpenAI’s ChatGPT. One popular estimate is that writing an email with ChatGPT uses 500 milliliters (or roughly a bottle) of water. But as we started reporting, I was surprised to learn just how much the details of a query can affect its energy demand. No two queries are the same—for several reasons, including their complexity and the particulars of the model being queried. One key caveat here is that we don’t know much about “closed source” models—for these, companies hold back the details of how they work. (OpenAI’s ChatGPT and Google’s Gemini are examples.) Instead, we worked with researchers who measured the energy it takes to run open-source AI models, for which the source code is publicly available.  But using open-source models, it’s possible to directly measure the energy used to respond to a query rather than just guess. We worked with researchers who generated text, images, and video and measured the energy required for the chips the models are based on to perform the task.   Even just within the text responses, there was a pretty large range of energy needs. A complicated travel itinerary consumed nearly 10 times as much energy as a simple request for a few jokes, for example. An even bigger difference comes from the size of the model used. Larger models with more parameters used up to 70 times more energy than smaller ones for the same prompts.  As you might imagine, there’s also a big difference between text, images, or video. Videos generally took hundreds of times more energy to generate than text responses.  2. What’s powering the grid will greatly affect the climate toll of AI’s energy use.  As the resident climate reporter on this project, I was excited to take the expected energy toll and translate it into an expected emissions burden.  Powering a data center with a nuclear reactor or a whole bunch of solar panels and batteries will not affect our planet the same way as burning mountains of coal. To quantify this idea, we used a figure called carbon intensity, a measure of how dirty a unit of electricity is on a given grid.  We found that the same exact query, with the same exact energy demand, will have a very different climate impact depending on what the data center is powered by, and that depends on the location and the time of day. For example, querying a data center in West Virginia could cause nearly twice the emissions of querying one in California, according to calculations based on average data from 2024. This point shows why it matters where tech giants are building data centers, what the grid looks like in their chosen locations, and how that might change with more demand from the new infrastructure.  3. There is still so much that we don’t know when it comes to AI and energy.  Our reporting resulted in estimates that are some of the most specific and comprehensive out there. But ultimately, we still have no idea what many of the biggest, most influential models are adding up to in terms of energy and emissions. None of the companies we reached out to were willing to provide numbers during our reporting. Not one. Adding up our estimates can only go so far, in part because AI is increasingly everywhere. While today you might generally have to go to a dedicated site and type in questions, in the future AI could be stitched into the fabric of our interactions with technology. (See my colleague Will Douglas Heaven’s new story on Google’s I/O showcase: “By putting AI into everything, Google wants to make it invisible.”) AI could be one of the major forces that shape our society, our work, and our power grid. Knowing more about its consequences could be crucial to planning our future.  To dig into our reporting, give the main story a read. And if you’re looking for more details on how we came up with our numbers, you can check out this behind-the-scenes piece. There are also some great related stories in this package, including one from James Temple on the data center boom in the Nevada desert, one from David Rotman about how AI’s rise could entrench natural gas, and one from Will Douglas Heaven on a few technical innovations that could help make AI more efficient. Oh, and I also have a piece on why nuclear isn’t the easy answer some think it is.  Find them, and the rest of the stories in the package, here.  This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Three takeaways about AI’s energy use and climate impacts Lire l’article »

AI, Committee, Actualités, Uncategorized

A new atomic clock in space could help us measure elevations on Earth

In 2003, engineers from Germany and Switzerland began building a bridge across the Rhine River simultaneously from both sides. Months into construction, they found that the two sides did not meet. The German side hovered 54 centimeters above the Swiss side. The misalignment occurred because the German engineers had measured elevation with a historic level of the North Sea as its zero point, while the Swiss ones had used the Mediterranean Sea, which was 27 centimeters lower. We may speak colloquially of elevations with respect to “sea level,” but Earth’s seas are actually not level. “The sea level is varying from location to location,” says Laura Sanchez, a geodesist at the Technical University of Munich in Germany. (Geodesists study our planet’s shape, orientation, and gravitational field.) While the two teams knew about the 27-centimeter difference, they mixed up which side was higher. Ultimately, Germany lowered its side to complete the bridge.  To prevent such costly construction errors, in 2015 scientists in the International Association of Geodesy voted to adopt the International Height Reference Frame, or IHRF, a worldwide standard for elevation. It’s the third-dimensional counterpart to latitude and longitude, says Sanchez, who helps coordinate the standardization effort.  Now, a decade after its adoption, geodesists are looking to update the standard—by using the most precise clock ever to fly in space. That clock, called the Atomic Clock Ensemble in Space, or ACES, launched into orbit from Florida last month, bound for the International Space Station. ACES, which was built by the European Space Agency, consists of two connected atomic clocks, one containing cesium atoms and the other containing hydrogen, combined to produce a single set of ticks with higher precision than either clock alone.  Pendulum clocks are only accurate to about a second per day, as the rate at which a pendulum swings can vary with humidity, temperature, and the weight of extra dust. Atomic clocks in current GPS satellites will lose or gain a second on average every 3,000 years. ACES, on the other hand, “will not lose or gain a second in 300 million years,” says Luigi Cacciapuoti, an ESA physicist who helped build and launch the device. (In 2022, China installed a potentially stabler clock on its space station, but the Chinese government has not publicly shared the clock’s performance after launch, according to Cacciapuoti.)  From space, ACES will link to some of the most accurate clocks on Earth to create a synchronized clock network, which will support its main purpose: to perform tests of fundamental physics.  But it’s of special interest for geodesists because it can be used to make gravitational measurements that will help establish a more precise zero point from which to measure elevation across the world. Alignment over this “zero point” (basically where you stick the end of the tape measure to measure elevation) is important for international collaboration. It makes it easier, for example, to monitor and compare sea-level changes around the world. It is especially useful for building infrastructure involving flowing water, such as dams and canals. In 2020, the international height standard even resolved a long-standing dispute between China and Nepal over Mount Everest’s height. For years, China said the mountain was 8,844.43 meters; Nepal measured it at 8,848. Using the IHRF, the two countries finally agreed that the mountain was 8,848.86 meters.  A worker performs tests on ACES at a cleanroom at the Kennedy Space Center in Florida.ESA-T. PEIGNIER To create a standard zero point, geodesists create a model of Earth known as a geoid. Every point on the surface of this lumpy, potato-shaped model experiences the same gravity, which means that if you dug a canal at the height of the geoid, the water within the canal would be level and would not flow. Distance from the geoid establishes a global system for altitude. However, the current model lacks precision, particularly in Africa and South America, says Sanchez. Today’s geoid has been built using instruments that directly measure Earth’s gravity. These have been carried on satellites, which excel at getting a global but low-resolution view, and have also been used to get finer details via expensive ground- and airplane-based surveys. But geodesists have not had the funding to survey Africa and South America as extensively as other parts of the world, particularly in difficult terrain such as the Amazon rainforest and Sahara Desert.  To understand the discrepancy in precision, imagine a bridge that spans Africa from the Mediterranean coast to Cape Town, South Africa. If it’s built using the current geoid, the two ends of the bridge will be misaligned by tens of centimeters. In comparison, you’d be off by at most five centimeters if you were building a bridge spanning North America.  To improve the geoid’s precision, geodesists want to create a worldwide network of clocks, synchronized from space. The idea works according to Einstein’s theory of general relativity, which states that the stronger the gravitational field, the more slowly time passes. The 2014 sci-fi movie Interstellar illustrates an extreme version of this so-called time dilation: Two astronauts spend a few hours in extreme gravity near a black hole to return to a shipmate who has aged more than two decades. Similarly, Earth’s gravity grows weaker the higher in elevation you are. Your feet, for example, experience slightly stronger gravity than your head when you’re standing. Assuming you live to be about 80 years old, over a lifetime your head will age tens of billionths of a second more than your feet.  A clock network would allow geodesists to compare the ticking of clocks all over the world. They could then use the variations in time to map Earth’s gravitational field much more precisely, and consequently create a more precise geoid. The most accurate clocks today are precise enough to measure variations in time that map onto centimeter-level differences in elevation.  “We want to have the accuracy level at the one-centimeter or sub-centimeter level,” says Jürgen Müller, a geodesist at Leibniz University Hannover in Germany. Specifically, geodesists would

A new atomic clock in space could help us measure elevations on Earth Lire l’article »

AI, Committee, Actualités, Uncategorized

SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

arXiv:2502.16747v2 Announce Type: replace Abstract: Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios for the NL2SQL task. SQLong generates augmented datasets by extending existing database schemas with additional synthetic CREATE TABLE commands and corresponding data rows, sampled from diverse schemas in the training data. This approach effectively simulates long-context scenarios during finetuning and evaluation. Through experiments on the Spider and BIRD datasets, we demonstrate that LLMs finetuned with SQLong-augmented data significantly outperform those trained on standard datasets. These imply SQLong’s practical implementation and its impact on improving NL2SQL capabilities in real-world settings with complex database schemas.

SQLong: Enhanced NL2SQL for Longer Contexts with LLMs Lire l’article »

AI, Committee, Actualités, Uncategorized

Forensic deepfake audio detection using segmental speech features

arXiv:2505.13847v1 Announce Type: cross Abstract: This study explores the potential of using acoustic features of segmental speech sounds to detect deepfake audio. These features are highly interpretable because of their close relationship with human articulatory processes and are expected to be more difficult for deepfake models to replicate. The results demonstrate that certain segmental features commonly used in forensic voice comparison are effective in identifying deep-fakes, whereas some global features provide little value. These findings underscore the need to approach audio deepfake detection differently for forensic voice comparison and offer a new perspective on leveraging segmental features for this purpose.

Forensic deepfake audio detection using segmental speech features Lire l’article »

AI, Committee, Actualités, Uncategorized

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing

arXiv:2502.20592v3 Announce Type: replace Abstract: Recent advances in test-time scaling have shown promising results in improving Large Language Model (LLM) performance through strategic computation allocation during inference. While this approach has demonstrated strong improvements in logical and mathematical reasoning tasks, its application to natural language generation (NLG), particularly summarization, remains unexplored. Multi-Document Summarization (MDS), a fundamental task in NLG, presents unique challenges by requiring models to extract and synthesize essential information across multiple lengthy documents. Unlike reasoning tasks, MDS demands a more nuanced approach to prompt design and ensemble methods, as no single “best” prompt can satisfy diverse summarization requirements. We propose a novel framework leveraging test-time scaling for MDS. Our approach employs prompt ensemble techniques to generate multiple candidate summaries using various prompts, then combines them with an aggregator to produce a refined summary. To evaluate our method effectively, we also introduce two new LLM-based metrics: the Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (LLM-ACU) score, which assess summary quality while addressing the positional bias inherent in traditional automatic evaluation. Our extensive experiments demonstrate that this framework significantly enhances summary quality while also revealing the practical scaling boundaries to MDS tasks.

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing Lire l’article »

AI, Committee, Actualités, Uncategorized

By putting AI into everything, Google wants to make it invisible 

If you want to know where AI is headed, this year’s Google I/O has you covered. The company’s annual showcase of next-gen products, which kicked off yesterday, has all of the pomp and pizzazz, the sizzle reels and celebrity walk-ons, that you’d expect from a multimillion dollar marketing event. But it also shows us just how fast this still-experimental technology is being subsumed into a line-up designed to sell phones and subscription tiers. Never before have I seen this thing we call artificial intelligence appear so normal. Yes, Google’s line up of consumer-facing products is the slickest on offer. The firm is bundling most of its multimodal models into its Gemini app, including the new Imagen 4 image generator and the new Veo 3 video-generator. That means you can now access Google’s full range of generative models via a single chatbot. It also announced Gemini Live, a feature that lets you share your phone’s screen or your camera’s view with the chatbot and ask it about what it can see. Those features were previously only seen in demos of Project Astra, a “universal AI assistant” that Google DeepMind is working on. Now, Google is inching towards putting Project Astra into the hands of anyone with a smartphone. Google is also rolling out AI Mode, an LLM-powered front-end to search. This can now pull in personal information from Gmail or Google Docs to tailor searches to users. It will include Deep Search, which can break a query down into hundreds of individual searches and then summarize the results; a version of Project Mariner, Google DeepMind’s browser-using agent; and Search Live, which lets you hold up your camera and ask it what it sees. This is the new frontier. It’s no longer about who has the most powerful models, but who can spin them into the best products. OpenAI’s ChatGPT includes many similar features to Gemini. But with its existing ecosystem of consumer services and billions of existing users, Google has a clear advantage. Power users wanting access to the latest versions of everything on display can now sign up for Google AI Ultra for $250 a month.   When OpenAI released ChatGPT in late 2022, Google was caught on the back foot and had to jump into a higher gear to catch up. With this year’s product line-up, it feels like Google has stuck its landing. On a preview call, Google’s CEO Sundar Pichai claimed that AI Overviews, a precrusor to AI Mode that provides LLM-generated summaries of search results, had turned out to be popular with hundreds of millions of users. He speculated that many of them may not even know (or care) whether or not they were using AI—it was just a cool new feature. Google I/O gives a broader glimpse of that future, one where AI is invisible. “More intelligence is available, for everyone, everywhere,” Pichai told his audience. I think we are expected to marvel. But by putting AI in everything, Google is turning AI into a technology we won’t notice and may not even bother to name.

By putting AI into everything, Google wants to make it invisible  Lire l’article »

AI, Committee, Actualités, Uncategorized

Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling

Data Scarcity in Generative Modeling Generative models traditionally rely on large, high-quality datasets to produce samples that replicate the underlying data distribution. However, in fields like molecular modeling or physics-based inference, acquiring such data can be computationally infeasible or even impossible. Instead of labeled data, only a scalar reward—typically derived from a complex energy function—is available to judge the quality of generated samples. This presents a significant challenge: how can one train generative models effectively without direct supervision from data? Meta AI Introduces Adjoint Sampling, a New Learning Algorithm Based on Scalar Rewards Meta AI tackles this challenge with Adjoint Sampling, a novel learning algorithm designed for training generative models using only scalar reward signals. Built on the theoretical framework of stochastic optimal control (SOC), Adjoint Sampling reframes the training process as an optimization task over a controlled diffusion process. Unlike standard generative models, it does not require explicit data. Instead, it learns to generate high-quality samples by iteratively refining them using a reward function—often derived from physical or chemical energy models. Adjoint Sampling excels in scenarios where only an unnormalized energy function is accessible. It produces samples that align with the target distribution defined by this energy, bypassing the need for corrective methods like importance sampling or MCMC, which are computationally intensive. Source: https://arxiv.org/abs/2504.11713 Technical Details The foundation of Adjoint Sampling is a stochastic differential equation (SDE) that models how sample trajectories evolve. The algorithm learns a control drift u(x,t)u(x, t)u(x,t) such that the final state of these trajectories approximates a desired distribution (e.g., Boltzmann). A key innovation is its use of Reciprocal Adjoint Matching (RAM)—a loss function that enables gradient-based updates using only the initial and final states of sample trajectories. This sidesteps the need to backpropagate through the entire diffusion path, greatly improving computational efficiency. By sampling from a known base process and conditioning on terminal states, Adjoint Sampling constructs a replay buffer of samples and gradients, allowing multiple optimization steps per sample. This on-policy training method provides scalability unmatched by previous approaches, making it suitable for high-dimensional problems like molecular conformer generation. Moreover, Adjoint Sampling supports geometric symmetries and periodic boundary conditions, enabling models to respect molecular invariances like rotation, translation, and torsion. These features are crucial for physically meaningful generative tasks in chemistry and physics. Performance Insights and Benchmark Results Adjoint Sampling achieves state-of-the-art results in both synthetic and real-world tasks. On synthetic benchmarks such as the Double-Well (DW-4), Lennard-Jones (LJ-13 and LJ-55) potentials, it significantly outperforms baselines like DDS and PIS, especially in energy efficiency. For example, where DDS and PIS require 1000 evaluations per gradient update, Adjoint Sampling only uses three, with similar or better performance in Wasserstein distance and effective sample size (ESS). In a practical setting, the algorithm was evaluated on large-scale molecular conformer generation using the eSEN energy model trained on the SPICE-MACE-OFF dataset. Adjoint Sampling, especially its Cartesian variant with pretraining, achieved up to 96.4% recall and 0.60 Å mean RMSD, surpassing RDKit ETKDG—a widely used chemistry-based baseline—across all metrics. The method generalizes well to the GEOM-DRUGS dataset, showing substantial improvements in recall while maintaining competitive precision. The algorithm’s ability to explore the configuration space broadly, aided by its stochastic initialization and reward-based learning, results in greater conformer diversity—critical for drug discovery and molecular design. Conclusion: A Scalable Path Forward for Reward-Driven Generative Models Adjoint Sampling represents a major step forward in generative modeling without data. By leveraging scalar reward signals and an efficient on-policy training method grounded in stochastic control, it enables scalable training of diffusion-based samplers with minimal energy evaluations. Its integration of geometric symmetries and its ability to generalize across diverse molecular structures position it as a foundational tool in computational chemistry and beyond. Check out the Paper, Model on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling appeared first on MarkTechPost.

Sampling Without Data is Now Scalable: Meta AI Releases Adjoint Sampling for Reward-Driven Generative Modeling Lire l’article »

fr_FR