YouZum

Committee

AI, Committee, ข่าว, Uncategorized

The Download: the desert data center boom, and how to measure Earth’s elevations

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The data center boom in the desert In the high desert east of Reno, Nevada, construction crews are flattening the golden foothills of the Virginia Range, laying the foundations of a data center city. Google, Tract, Switch, EdgeCore, Novva, Vantage, and PowerHouse are all operating, building, or expanding huge facilities nearby. Meanwhile, Microsoft has acquired more than 225 acres of undeveloped property, and Apple is expanding its existing data center just across the Truckee River from the industrial park. The corporate race to amass computing resources to train and run artificial intelligence models and store information in the cloud has sparked a data center boom in the desert—and it’s just far enough away from Nevada’s communities to elude wide notice and, some fear, adequate scrutiny. Read the full story. —James Temple This story is part of Power Hungry: AI and our energy future—our new series shining a light on the energy demands and carbon costs of the artificial intelligence revolution. Check out the rest of the package here. A new atomic clock in space could help us measure elevations on Earth In 2003, engineers from Germany and Switzerland began building a bridge across the Rhine River simultaneously from both sides. Months into construction, they found that the two sides did not meet. The German side hovered 54 centimeters above the Swiss one. The misalignment happened because they measured elevation from sea level differently. To prevent such costly construction errors, in 2015 scientists in the International Association of Geodesy voted to adopt the International Height Reference Frame, or IHRF, a worldwide standard for elevation. Now, a decade after its adoption, scientists are looking to update the standard—by using the most precise clock ever to fly in space. Read the full story. —Sophia Chen Three takeaways about AI’s energy use and climate impacts —Casey Crownhart This week, we published Power Hungry, a package all about AI and energy. At the center of this package is the most comprehensive look yet at AI’s growing power demand, if I do say so myself. This data-heavy story is the result of over six months of reporting by me and my colleague James O’Donnell (and the work of many others on our team). Over that time, with the help of leading researchers, we quantified the energy and emissions impacts of individual queries to AI models and tallied what it all adds up to, both right now and for the years ahead. There’s a lot of data to dig through, and I hope you’ll take the time to explore the whole story. But in the meantime, here are three of my biggest takeaways from working on this project. Read the full story. This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here. MIT Technology Review Narrated: Congress used to evaluate emerging technologies. Let’s do it again. Artificial intelligence comes with a shimmer and a sheen of magical thinking. And if we’re not careful, politicians, employers, and other decision-makers may accept at face value the idea that machines can and should replace human judgment and discretion. One way to combat that might be resurrecting the Office of Technology Assessment, a Congressional think tank that detected lies and tested tech until it was shuttered in 1995. This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 OpenAI is buying Jony Ive’s AI startupThe former Apple design guru will work with Sam Altman to design an entirely new range of devices. (NYT $)+ The deal is worth a whopping $6.5 billion. (Bloomberg $)+ Altman gave OpenAI staff a preview of its AI ‘companion’ devices. (WSJ $)+ AI products to date have failed to set the world alight. (The Atlantic $) 2 Microsoft has blocked employee emails containing ‘Gaza’ or ‘Palestine’Although the term ‘Israel’ does not trigger such a block. (The Verge)+ Protest group No Azure for Apartheid has accused the company of censorship. (Fortune $) 3 DOGE needs to do its work in secretThat’s what the Trump administration is claiming to the Supreme Court, at least. (Ars Technica)+ It’s trying to avoid being forced to hand over internal documents. (NYT $)+ DOGE’s tech takeover threatens the safety and stability of our critical data. (MIT Technology Review) 4 US banks are racing to embrace cryptocurrencyAhead of new stablecoin legislation. (The Information $)+ Attendees at Trump’s crypto dinner paid over $1 million for the privilege. (NBC News)+ Bitcoin has surged to an all-time peak yet again. (Reuters) 5 China is making huge technological leapsThanks to the billions it’s poured into narrowing the gap between it and the US. (WSJ $)+ Nvidia’s CEO has branded America’s chip curbs on China ‘a failure.’ (FT $)+ There can be no winners in a US-China AI arms race. (MIT Technology Review) 6 Disordered eating content is rife on TikTokBut a pocket of creators are dedicated to debunking the worst of it. (Wired $) 7 The US military is interested in the world’s largest aircraftThe gigantic WindRunner plane will have an 80-metre wingspan. (New Scientist $)+ Phase two of military AI has arrived. (MIT Technology Review) 8 How AI is shaking up animationNew tools are slashing the costs of creating episodes by up to 90%. (NYT $)+ Generative AI is reshaping South Korea’s webcomics industry. (MIT Technology Review) 9 Tesla’s Cybertruck is a flopSorry, Elon. (Fast Company $)+ The vehicles’ resale value is plummeting. (The Daily Beast) 10 Google’s new AI video generator loves this terrible jokeWhich appears to originate from a Reddit post. (404 Media)+ What happened

The Download: the desert data center boom, and how to measure Earth’s elevations Read Post »

AI, Committee, ข่าว, Uncategorized

Google DeepMind Releases Gemma 3n: A Compact, High-Efficiency Multimodal AI Model for Real-Time On-Device Use

Researchers are reimagining how models operate as demand skyrockets for faster, smarter, and more private AI on phones, tablets, and laptops. The next generation of AI isn’t just lighter and faster; it’s local. By embedding intelligence directly into devices, developers are unlocking near-instant responsiveness, slashing memory demands, and putting privacy back into users’ hands. With mobile hardware rapidly advancing, the race is on to build compact, lightning-fast models that are intelligent enough to redefine everyday digital experiences. A major concern is delivering high-quality, multimodal intelligence within the constrained environments of mobile devices. Unlike cloud-based systems that have access to extensive computational power, on-device models must perform under strict RAM and processing limits. Multimodal AI, capable of interpreting text, images, audio, and video, typically requires large models, which most mobile devices cannot handle efficiently. Also, cloud dependency introduces latency and privacy concerns, making it essential to design models that can run locally without sacrificing performance. Earlier models like Gemma 3 and Gemma 3 QAT attempted to bridge this gap by reducing size while maintaining performance. Designed for use on cloud or desktop GPUs, they significantly improved model efficiency. However, these models still required robust hardware and could not fully overcome mobile platforms’ memory and responsiveness constraints. Despite supporting advanced functions, they often involved compromises limiting their real-time smartphone usability. Researchers from Google and Google DeepMind introduced Gemma 3n. The architecture behind Gemma 3n has been optimized for mobile-first deployment, targeting performance across Android and Chrome platforms. It also forms the underlying basis for the next version of Gemini Nano. The innovation represents a significant leap forward by supporting multimodal AI functionalities with a much lower memory footprint while maintaining real-time response capabilities. This marks the first open model built on this shared infrastructure and is made available to developers in preview, allowing immediate experimentation. The core innovation in Gemma 3n is the application of Per-Layer Embeddings (PLE), a method that drastically reduces RAM usage. While the raw model sizes include 5 billion and 8 billion parameters, they behave with memory footprints equivalent to 2 billion and 4 billion parameter models. The dynamic memory consumption is just 2GB for the 5B model and 3GB for the 8B version. Also, it uses a nested model configuration where a 4B active memory footprint model includes a 2B submodel trained through a technique known as MatFormer. This allows developers to dynamically switch performance modes without loading separate models. Further advancements include KVC sharing and activation quantization, which reduce latency and increase response speed. For example, response time on mobile improved by 1.5x compared to Gemma 3 4B while maintaining better output quality. Image Source The performance metrics achieved by Gemma 3n reinforce its suitability for mobile deployment. It excels in automatic speech recognition and translation, allowing seamless speech conversion to translated text. On multilingual benchmarks like WMT24++ (ChrF), it scores 50.1%, highlighting its strength in Japanese, German, Korean, Spanish, and French. Its mix’n’match capability allows the creation of submodels optimized for various quality and latency combinations, offering developers further customization. The architecture supports interleaved inputs from different modalities, text, audio, images, and video, allowing more natural and context-rich interactions. It also performs offline, ensuring privacy and reliability even without network connectivity. Use cases include live visual and auditory feedback, context-aware content generation, and advanced voice-based applications. Image Source Several Key Takeaways from the Research on Gemma 3n include: Built using collaboration between Google, DeepMind, Qualcomm, MediaTek, and Samsung System LSI. Designed for mobile-first deployment. Raw model size of 5B and 8B parameters, with operational footprints of 2GB and 3GB, respectively, using Per-Layer Embeddings (PLE). 1.5x faster response on mobile vs Gemma 3 4B. Multilingual benchmark score of 50.1% on WMT24++ (ChrF). Accepts and understands audio, text, image, and video, enabling complex multimodal processing and interleaved inputs. Supports dynamic trade-offs using MatFormer training with nested submodels and mix’n’match capabilities. Operates without an internet connection, ensuring privacy and reliability. Preview is available via Google AI Studio and Google AI Edge, with text and image processing capabilities. In conclusion, this innovation provides a clear pathway for making high-performance AI portable and private. By tackling RAM constraints through innovative architecture and enhancing multilingual and multimodal capabilities, researchers offer a viable solution for bringing sophisticated AI directly into everyday devices. The flexible submodel switching, offline readiness, and fast response time mark a comprehensive approach to mobile-first AI. The research addresses the balance of computational efficiency, user privacy, and dynamic responsiveness. The result is a system capable of delivering real-time AI experiences without sacrificing capability or versatility, fundamentally expanding what users can expect from on-device intelligence. Check out the Technical details and Try it here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Google DeepMind Releases Gemma 3n: A Compact, High-Efficiency Multimodal AI Model for Real-Time On-Device Use appeared first on MarkTechPost.

Google DeepMind Releases Gemma 3n: A Compact, High-Efficiency Multimodal AI Model for Real-Time On-Device Use Read Post »

AI, Committee, ข่าว, Uncategorized

This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment

Multimodal mathematical reasoning enables machines to solve problems involving textual information and visual components like diagrams and figures. This requires combining language understanding and visual interpretation to make sense of complex mathematical contexts. Such capabilities are vital in education, automated tutoring, and document analysis, where problems are often presented with a blend of text and images. A major obstacle in this area is the lack of high-quality, precise alignment between math images and their textual or symbolic representations. Most datasets used to train large multimodal models are derived from image captions in natural settings, which often miss the detailed elements essential for mathematical accuracy. This creates problems for models that rely on these data sources, making them unreliable when dealing with geometry, figures, or technical diagrams. A model’s performance in mathematical reasoning depends heavily on its ability to correctly interpret and link these visual details with mathematical expressions or instructions. In the past, some approaches tried to address this by either enhancing the visual encoders or using manually crafted datasets. However, these methods tend to produce low image diversity, relying on hand-coded or template-based generation, which limits their applicability. Some efforts, like Math-LLaVA and MAVIS, developed synthetic datasets and used templates or predefined categories. Still, they could not dynamically create a wide variety of math-related visuals. This shortfall restricts the learning scope of models and leaves them struggling with more complex or less structured mathematical problems. Researchers from the Multimedia Laboratory at The Chinese University of Hong Kong and CPII under InnoHK introduced a novel approach called MathCoder-VL. This method combines a vision-to-code model named FigCodifier and a synthetic data engine. They constructed the ImgCode-8.6M dataset using a model-in-the-loop strategy, which allowed them to build the largest image-code dataset to date iteratively. Further, they developed MM-MathInstruct-3M, a multimodal instruction dataset enriched with newly synthesized images. The MathCoder-VL model is trained in two stages: mid-training on ImgCode-8.6M to improve visual-text alignment and fine-tuning on MM-MathInstruct-3M to strengthen reasoning abilities. The FigCodifier model works by translating mathematical figures into code that can recreate those figures exactly. This code-image pairing ensures strict alignment and accuracy, unlike caption-based datasets. The process begins with 119K image-code pairs from DaTikZ and expands through iterative training using images collected from textbooks, K12 datasets, and arXiv papers. The final dataset includes 8.6 million code-image pairs and covers various mathematical topics. FigCodifier also supports Python-based rendering, which adds variety to image generation. The system filters low-quality data by checking code validity and removing redundant or unhelpful visuals, resulting in 4.3M high-quality TikZ and 4.3M Python-based pairs. Performance evaluations show that MathCoder-VL outperforms multiple open-source models. The 8B version achieved 73.6% accuracy on the MathVista Geometry Problem Solving subset, surpassing GPT-4o and Claude 3.5 Sonnet by 8.9% and 9.2%, respectively. It also scored 26.1% on MATH-Vision and 46.5% on MathVerse. In Chinese-language benchmarks, it achieved 51.2% on GAOKAO-MM. On the We-Math benchmark, it solved two-step problems at 58.6%, outperforming GPT-4o’s 58.1%. Its performance on three-step problems reached 52.1%, again exceeding GPT-4o’s 43.6%. Compared to its base model InternVL2-8B, it showed gains of 6.1% on MATH-Vision and 11.6% on MathVista. This work clearly defines the problem of insufficient visual-textual alignment in multimodal math reasoning and provides a scalable and innovative solution. The introduction of FigCodifier and synthetic datasets allows models to learn from accurate, diverse visuals paired with exact code, significantly boosting their reasoning abilities. MathCoder-VL represents a practical advancement in this field, demonstrating how thoughtful model design and high-quality data can overcome longstanding limitations in mathematical AI. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment appeared first on MarkTechPost.

This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment Read Post »

AI, Committee, ข่าว, Uncategorized

Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding

Addressing Architectural Trade-offs in Language Models As language models scale, balancing expressivity, efficiency, and adaptability becomes increasingly challenging. Transformer architectures dominate due to their strong performance across a wide range of tasks, but they are computationally expensive—particularly for long-context scenarios—due to the quadratic complexity of self-attention. On the other hand, Structured State Space Models (SSMs) offer improved efficiency and linear scaling, yet often lack the nuanced sequence modeling required for complex language understanding. A combined architecture that leverages the strengths of both approaches is needed to support diverse applications across environments. Introducing Falcon-H1: A Hybrid Architecture The Falcon-H1 series, released by the Technology Innovation Institute (TII), introduces a hybrid family of language models that combine Transformer attention mechanisms with Mamba2-based SSM components. This architecture is designed to improve computational efficiency while maintaining competitive performance across tasks requiring deep contextual understanding. Falcon-H1 covers a wide parameter range—from 0.5B to 34B—catering to use cases from resource-constrained deployments to large-scale distributed inference. The design aims to address common bottlenecks in LLM deployment: memory efficiency, scalability, multilingual support, and the ability to handle extended input sequences. Source: https://falcon-lm.github.io/blog/falcon-h1/ Architectural Details and Design Objectives Falcon-H1 adopts a parallel structure where attention heads and Mamba2 SSMs operate side by side. This design allows each mechanism to independently contribute to sequence modeling: attention heads specialize in capturing token-level dependencies, while SSM components support efficient long-range information retention. The series supports a context length of up to 256K tokens, which is particularly useful for applications in document summarization, retrieval-augmented generation, and multi-turn dialogue systems. Model training incorporates a customized microparameterization (μP) recipe and optimized data pipelines, allowing for stable and efficient training across model sizes. The models are trained with a focus on multilingual capabilities. The architecture is natively equipped to handle 18 languages, with coverage including English, Chinese, Arabic, Hindi, French, and others. The framework is extensible to over 100 languages, supporting localization and region-specific model adaptation. Empirical Results and Comparative Evaluation Despite relatively modest parameter counts, Falcon-H1 models demonstrate strong empirical performance: Falcon-H1-0.5B achieves results comparable to 7B-parameter models released in 2024. Falcon-H1-1.5B-Deep performs on par with leading 7B to 10B Transformer models. Falcon-H1-34B matches or exceeds the performance of models such as Qwen3-32B, Llama4-Scout-17B/109B, and Gemma3-27B across several benchmarks. Evaluations emphasize both general-purpose language understanding and multilingual benchmarks. Notably, the models achieve strong performance across both high-resource and low-resource languages without requiring excessive fine-tuning or additional adaptation layers. Source: https://falcon-lm.github.io/blog/falcon-h1/ Deployment and inference are supported through integration with open-source tools such as Hugging Face Transformers. FlashAttention-2 compatibility further reduces memory usage during inference, offering an attractive efficiency-performance balance for enterprise use. Conclusion Falcon-H1 represents a methodical effort to refine language model architecture by integrating complementary mechanisms—attention and SSMs—within a unified framework. By doing so, it addresses key limitations in both long-context processing and scaling efficiency. The model family provides a range of options for practitioners, from lightweight variants suitable for edge deployment to high-capacity configurations for server-side applications. Through its multilingual coverage, long-context capabilities, and architectural flexibility, Falcon-H1 offers a technically sound foundation for research and production use cases that demand performance without compromising on efficiency or accessibility. Check out the Official Release, Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding appeared first on MarkTechPost.

Technology Innovation Institute TII Releases Falcon-H1: Hybrid Transformer-SSM Language Models for Scalable, Multilingual, and Long-Context Understanding Read Post »

AI, Committee, ข่าว, Uncategorized

Three takeaways about AI’s energy use and climate impacts

This week, we published Power Hungry, a package all about AI and energy. At the center of this package is the most comprehensive look yet at AI’s growing power demand, if I do say so myself.  This data-heavy story is the result of over six months of reporting by me and my colleague James O’Donnell (and the work of many others on our team). Over that time, with the help of leading researchers, we quantified the energy and emissions impacts of individual queries to AI models and tallied what it all adds up to, both right now and for the years ahead.  There’s a lot of data to dig through, and I hope you’ll take the time to explore the whole story. But in the meantime, here are three of my biggest takeaways from working on this project.  1. The energy demands of AI are anything but constant.  If you’ve heard estimates of AI’s toll, it’s probably a single number associated with a query, likely to OpenAI’s ChatGPT. One popular estimate is that writing an email with ChatGPT uses 500 milliliters (or roughly a bottle) of water. But as we started reporting, I was surprised to learn just how much the details of a query can affect its energy demand. No two queries are the same—for several reasons, including their complexity and the particulars of the model being queried. One key caveat here is that we don’t know much about “closed source” models—for these, companies hold back the details of how they work. (OpenAI’s ChatGPT and Google’s Gemini are examples.) Instead, we worked with researchers who measured the energy it takes to run open-source AI models, for which the source code is publicly available.  But using open-source models, it’s possible to directly measure the energy used to respond to a query rather than just guess. We worked with researchers who generated text, images, and video and measured the energy required for the chips the models are based on to perform the task.   Even just within the text responses, there was a pretty large range of energy needs. A complicated travel itinerary consumed nearly 10 times as much energy as a simple request for a few jokes, for example. An even bigger difference comes from the size of the model used. Larger models with more parameters used up to 70 times more energy than smaller ones for the same prompts.  As you might imagine, there’s also a big difference between text, images, or video. Videos generally took hundreds of times more energy to generate than text responses.  2. What’s powering the grid will greatly affect the climate toll of AI’s energy use.  As the resident climate reporter on this project, I was excited to take the expected energy toll and translate it into an expected emissions burden.  Powering a data center with a nuclear reactor or a whole bunch of solar panels and batteries will not affect our planet the same way as burning mountains of coal. To quantify this idea, we used a figure called carbon intensity, a measure of how dirty a unit of electricity is on a given grid.  We found that the same exact query, with the same exact energy demand, will have a very different climate impact depending on what the data center is powered by, and that depends on the location and the time of day. For example, querying a data center in West Virginia could cause nearly twice the emissions of querying one in California, according to calculations based on average data from 2024. This point shows why it matters where tech giants are building data centers, what the grid looks like in their chosen locations, and how that might change with more demand from the new infrastructure.  3. There is still so much that we don’t know when it comes to AI and energy.  Our reporting resulted in estimates that are some of the most specific and comprehensive out there. But ultimately, we still have no idea what many of the biggest, most influential models are adding up to in terms of energy and emissions. None of the companies we reached out to were willing to provide numbers during our reporting. Not one. Adding up our estimates can only go so far, in part because AI is increasingly everywhere. While today you might generally have to go to a dedicated site and type in questions, in the future AI could be stitched into the fabric of our interactions with technology. (See my colleague Will Douglas Heaven’s new story on Google’s I/O showcase: “By putting AI into everything, Google wants to make it invisible.”) AI could be one of the major forces that shape our society, our work, and our power grid. Knowing more about its consequences could be crucial to planning our future.  To dig into our reporting, give the main story a read. And if you’re looking for more details on how we came up with our numbers, you can check out this behind-the-scenes piece. There are also some great related stories in this package, including one from James Temple on the data center boom in the Nevada desert, one from David Rotman about how AI’s rise could entrench natural gas, and one from Will Douglas Heaven on a few technical innovations that could help make AI more efficient. Oh, and I also have a piece on why nuclear isn’t the easy answer some think it is.  Find them, and the rest of the stories in the package, here.  This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Three takeaways about AI’s energy use and climate impacts Read Post »

AI, Committee, ข่าว, Uncategorized

A new atomic clock in space could help us measure elevations on Earth

In 2003, engineers from Germany and Switzerland began building a bridge across the Rhine River simultaneously from both sides. Months into construction, they found that the two sides did not meet. The German side hovered 54 centimeters above the Swiss side. The misalignment occurred because the German engineers had measured elevation with a historic level of the North Sea as its zero point, while the Swiss ones had used the Mediterranean Sea, which was 27 centimeters lower. We may speak colloquially of elevations with respect to “sea level,” but Earth’s seas are actually not level. “The sea level is varying from location to location,” says Laura Sanchez, a geodesist at the Technical University of Munich in Germany. (Geodesists study our planet’s shape, orientation, and gravitational field.) While the two teams knew about the 27-centimeter difference, they mixed up which side was higher. Ultimately, Germany lowered its side to complete the bridge.  To prevent such costly construction errors, in 2015 scientists in the International Association of Geodesy voted to adopt the International Height Reference Frame, or IHRF, a worldwide standard for elevation. It’s the third-dimensional counterpart to latitude and longitude, says Sanchez, who helps coordinate the standardization effort.  Now, a decade after its adoption, geodesists are looking to update the standard—by using the most precise clock ever to fly in space. That clock, called the Atomic Clock Ensemble in Space, or ACES, launched into orbit from Florida last month, bound for the International Space Station. ACES, which was built by the European Space Agency, consists of two connected atomic clocks, one containing cesium atoms and the other containing hydrogen, combined to produce a single set of ticks with higher precision than either clock alone.  Pendulum clocks are only accurate to about a second per day, as the rate at which a pendulum swings can vary with humidity, temperature, and the weight of extra dust. Atomic clocks in current GPS satellites will lose or gain a second on average every 3,000 years. ACES, on the other hand, “will not lose or gain a second in 300 million years,” says Luigi Cacciapuoti, an ESA physicist who helped build and launch the device. (In 2022, China installed a potentially stabler clock on its space station, but the Chinese government has not publicly shared the clock’s performance after launch, according to Cacciapuoti.)  From space, ACES will link to some of the most accurate clocks on Earth to create a synchronized clock network, which will support its main purpose: to perform tests of fundamental physics.  But it’s of special interest for geodesists because it can be used to make gravitational measurements that will help establish a more precise zero point from which to measure elevation across the world. Alignment over this “zero point” (basically where you stick the end of the tape measure to measure elevation) is important for international collaboration. It makes it easier, for example, to monitor and compare sea-level changes around the world. It is especially useful for building infrastructure involving flowing water, such as dams and canals. In 2020, the international height standard even resolved a long-standing dispute between China and Nepal over Mount Everest’s height. For years, China said the mountain was 8,844.43 meters; Nepal measured it at 8,848. Using the IHRF, the two countries finally agreed that the mountain was 8,848.86 meters.  A worker performs tests on ACES at a cleanroom at the Kennedy Space Center in Florida.ESA-T. PEIGNIER To create a standard zero point, geodesists create a model of Earth known as a geoid. Every point on the surface of this lumpy, potato-shaped model experiences the same gravity, which means that if you dug a canal at the height of the geoid, the water within the canal would be level and would not flow. Distance from the geoid establishes a global system for altitude. However, the current model lacks precision, particularly in Africa and South America, says Sanchez. Today’s geoid has been built using instruments that directly measure Earth’s gravity. These have been carried on satellites, which excel at getting a global but low-resolution view, and have also been used to get finer details via expensive ground- and airplane-based surveys. But geodesists have not had the funding to survey Africa and South America as extensively as other parts of the world, particularly in difficult terrain such as the Amazon rainforest and Sahara Desert.  To understand the discrepancy in precision, imagine a bridge that spans Africa from the Mediterranean coast to Cape Town, South Africa. If it’s built using the current geoid, the two ends of the bridge will be misaligned by tens of centimeters. In comparison, you’d be off by at most five centimeters if you were building a bridge spanning North America.  To improve the geoid’s precision, geodesists want to create a worldwide network of clocks, synchronized from space. The idea works according to Einstein’s theory of general relativity, which states that the stronger the gravitational field, the more slowly time passes. The 2014 sci-fi movie Interstellar illustrates an extreme version of this so-called time dilation: Two astronauts spend a few hours in extreme gravity near a black hole to return to a shipmate who has aged more than two decades. Similarly, Earth’s gravity grows weaker the higher in elevation you are. Your feet, for example, experience slightly stronger gravity than your head when you’re standing. Assuming you live to be about 80 years old, over a lifetime your head will age tens of billionths of a second more than your feet.  A clock network would allow geodesists to compare the ticking of clocks all over the world. They could then use the variations in time to map Earth’s gravitational field much more precisely, and consequently create a more precise geoid. The most accurate clocks today are precise enough to measure variations in time that map onto centimeter-level differences in elevation.  “We want to have the accuracy level at the one-centimeter or sub-centimeter level,” says Jürgen Müller, a geodesist at Leibniz University Hannover in Germany. Specifically, geodesists would

A new atomic clock in space could help us measure elevations on Earth Read Post »

AI, Committee, ข่าว, Uncategorized

SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

arXiv:2502.16747v2 Announce Type: replace Abstract: Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios for the NL2SQL task. SQLong generates augmented datasets by extending existing database schemas with additional synthetic CREATE TABLE commands and corresponding data rows, sampled from diverse schemas in the training data. This approach effectively simulates long-context scenarios during finetuning and evaluation. Through experiments on the Spider and BIRD datasets, we demonstrate that LLMs finetuned with SQLong-augmented data significantly outperform those trained on standard datasets. These imply SQLong’s practical implementation and its impact on improving NL2SQL capabilities in real-world settings with complex database schemas.

SQLong: Enhanced NL2SQL for Longer Contexts with LLMs Read Post »

AI, Committee, ข่าว, Uncategorized

Forensic deepfake audio detection using segmental speech features

arXiv:2505.13847v1 Announce Type: cross Abstract: This study explores the potential of using acoustic features of segmental speech sounds to detect deepfake audio. These features are highly interpretable because of their close relationship with human articulatory processes and are expected to be more difficult for deepfake models to replicate. The results demonstrate that certain segmental features commonly used in forensic voice comparison are effective in identifying deep-fakes, whereas some global features provide little value. These findings underscore the need to approach audio deepfake detection differently for forensic voice comparison and offer a new perspective on leveraging segmental features for this purpose.

Forensic deepfake audio detection using segmental speech features Read Post »

th