AI Archives - Seite 4 von 203

AI, Committee, Nachrichten, Uncategorized

Five things you need to know about AI

admin NU / Juni 9, 2026

At SXSW London last week I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I pulled a few things from our first AI10 list, an annual guide to the most important trends in this buzzy world, but I also veered off on a number of tangents. In my half-hour slot, I tried to cover the key talking points that I think help to make sense of what’s going on in tech—and thus the economy—today. (I gave a talk with the same title at SXSW London last year with five different things you needed to know. A lot has happened since then!) So: This is how I’m thinking about AI midway through 2026. Let me know if you would pick different points! 1. Strictly speaking, I didn’t need to show up to give this talk. Tongue in cheek? Maybe. But generative AI tools have already become mundane, used by millions to automate everyday office tasks (including producing and delivering talks). It’s no surprise that one of the biggest questions out there right now is what this all means for jobs. People are confused and scared. The frustrating answer is that despite the hype coming from the top about the potential for AI to join the workforce soon—and viral social media posts yelling that something big is happening—there is almost no data to say either way what kind of effect this technology will have on employment and the economy overall. That’s not to say it won’t have an impact, even a huge one, but it’s just too soon to tell. In theory, teams of agents working together toward common goals could become assembly lines for white-collar work, doing to offices this century what Henry Ford’s innovations did to factories in the 20th century. In theory. Because in order to know what will happen to jobs, we need to know what will happen inside the companies that create those jobs. But most companies are still figuring that out. 2. AI is getting scary (for real this time). There have been scary stories about AI for years—claims that it will kill us all or bring about the end of civilization. There’s still a loud crowd of doomers, but those scenarios remain dystopian science fiction. What’s happened instead is that many of the worst near-term, real-world fears have come true. Take deepfakes, AI-generated images or videos of people doing things they didn’t actually do. Deepfakes have been used to incite violence, swing votes, and sow distrust. Trump’s White House is among those creating and publishing fake images. Many deepfakes are also used to abuse women and girls. One study found that 98% of deepfakes are pornographic and 99% involve women. Another concern is the rise of dangerous and delusional relationships with chatbots. Many people turn to chatbots to seek private advice and to feel heard. But there are now multiple lawsuits against AI companies alleging that the technology encouraged or aided suicides and other forms of self-harm. AI is also being used in warfare in new and worrying ways. LLMs are now giving advice, not just being used for analysis. One US defense official told my colleague James O’Donnell that you could now give a military chatbot a list of targets and ask which one to hit first. Anyone who uses AI knows that its output needs to be reviewed carefully. In fact-paced, high-stress active conflict, the risk that corners get cut is high. 3. A lot of people really hate AI. I checked out an anti-AI protest in London earlier this year and found a very broad mix of complaints. Banners proclaiming the end times bounced along to chants of “Stop the slop! Stop the slop!” Protests are getting more organized and drawing larger crowds. There’s pushback from fans of films and video games, who object to the use of generative AI in their favorite titles. In one notable case, the acclaimed 2025 game Clair Obscur was stripped of an award when the developers admitted to using AI in just one small, specific part of its production. And there’s the data center backlash. The US has more than 5,400 data centers and counting. With the energy demands of AI growing, people are unhappy about the environmental impact and their rising electricity bills. Activists are managing to stall development in a number of places. Regulation is becoming politically popular. Grassroots movements like QuitGPT have gained momentum. A small number have turned to violence; a few weeks ago somebody threw a Molotov cocktail at Sam Altman’s house. It’s not clear where all this leads. But the apocalyptic hype from tech leaders is not helping people stay calm. 4. AI for science is a very big deal. It’s early days yet, but the potential for AI to help make a genuine and important scientific discovery is greater than ever. Google DeepMind has developed Co-Scientist, a multipurpose tool that can help researchers dig up and compare previous results, generate hypotheses, and devise experiments to test them. OpenAI told me this year that its North Star is the goal of building a fully automated researcher by 2028. Mathematicians are excited too. Fundamental math underpins many everyday technologies, from internet security to video streaming. The last few months have seen a string of claims that AI has cracked unsolved math problems. And software that can solve really hard math problems will be able—so the argument goes—to solve more general-purpose real-world problems too. What are the downsides? Some scientists are warning that an overreliance on AI tools could narrow the scope of research because scientists may choose problems that are most suited to AI assistance. There are also concerns that AI-assisted research will lead to a flood of inaccurate or fake results: science slop. 5. AI is everywhere all at once. So where does that leave us? There are a lot of exciting things, a lot of worrying things, and a

Five things you need to know about AI Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Learning to lead in a hybrid human-AI enterprise

admin NU / Juni 9, 2026

As adoption of AI agents looks set to surge by as much as 300% in the next two years, leadership teams are carefully considering the implications of a hybrid human-AI workforce. Unlike existing enterprise-level automation that relies on manual input, AI agents are capable of autonomously coordinating complex tasks, interacting with multiple tools and environments across an organization. In early applications that center on customer service, HR, and sales, adoption of agentic AI has led to productivity gains of 30-50%. Their autonomy positions agents more as collaborators than tools, working side-by-side with human employees in blended teams that look poised to upend traditional workplace dynamics. More than three-quarters of HR leaders believe that the deployment of AI agents will transform existing workplace norms, driving a complete reappraisal of how roles and responsibilities are distributed, how skills are prioritized, and how workplace culture is shaped. Though many admit they’re in the early or preparatory phase of this shift, 86% of chief HR officers predict that navigating digital labor shaped by agentic AI will be a central component of their role in the years ahead. Fluency in the change management aspect of agentic AI adoption will be a crucial differentiator when it comes to unlocking the full potential of the technology going forward, believes Ateet Jayaswal, chief culture and employee experience officer at Wipro, a leading technology services and consulting company. This moment is one that he says, “calls for a mindset shift in how HR leaders would enable their organizations.” Redeploying roles to enable higher-value work As AI agents assume ownership of more complex and integral tasks, the distribution of roles and responsibilities within an organization will undergo significant change. It’s estimated that three-quarters of current roles will require redesign, reskilling, or redeployment by 2030 as a result of agentic AI. For leadership, this shift should be about reskilling employees toward higher-value work in order to optimize the potential of an agent-human hybrid workforce, says Jayaswal. For example, Wipro is a complex organization of 240,000 employees across 65 countries. It previously had multiple policies, documents, and knowledge fragmented across different systems, which delayed response to employee queries. But the company has recently integrated a custom agentic AI assistant—an agent co-created in partnership with enterprise agentic AI platform Ema Unlimited—that can swiftly navigate this complex system, assuming responsibility for 50 HR tasks that had previously fallen to human employees. With the help of an AI agent, average response time to queries has lowered from 48 hours to five seconds. Human employees have more time to focus on work “that requires a creative and imaginative mind and cross-functional collaboration, leveraging diverse ideas and thoughts to problem-solve,” says Jayaswal. The AI agent, meanwhile, handles rote administrative tasks like sorting timesheets or helping employees navigate policies and take actions in the flow of work. When reallocating employee responsibilities, though, it is imperative that humans remain in the loop, Jayaswal caveats. When agentic AI is incorporated into enterprise technology, it must work with sensitive and personal data and therefore needs even more stringent guardrails and constraints than consumer applications. “When you expose an AI agent to organizational data, when you integrate it into multiple enterprise systems, then pathways around the AI agent become extremely important,” he says. “It’s an evolving space that leadership needs to have front-of-mind.” Governance should include robust data privacy rules and the establishment of governance layers, such as an AI council, he suggests. At a fundamental level, the adoption of AI agents will force a re-evaluation of human roles, believes Jayaswal. Rather than employees primarily performing repetitive tasks or troubleshooting, a significant proportion of their time will shift to designing, teaching, and optimizing an AI agent that can do this work for them with far greater speed and predictability and without the agent getting bored. “The nature of your job changes from being the hero who comes in to solve the problem to designing the hero who can solve the problem,” he summarizes. “The individuals who I have seen thrive in this environment are the ones who make this shift.” An evolving employee skillset Just as roles and responsibilities will be reconfigured to reflect the input of AI agents, the core skills of human employees will be reprioritized. More than four in five HR leaders say they’re planning to reskill workers to become more competitive in a market shaped by AI agents. Technical skills will be increasingly important. Leading employers such as Salesforce, Danone, and Walmart are already rolling out dedicated AI and digital skills programs that aim to equip everyone from frontline workers to C-suite executives with a baseline level of AI literacy in response to the pervasiveness of the technology. But desirable soft skills will also evolve, Jayaswal points out. Employees who assign tasks to an AI agent need to plainly articulate what modular steps may be needed to accomplish a task, what the desired outcome should be, and what parameters or guardrails need to be in place to ensure the agent doesn’t access or share confidential data. As HR executives adapt to a blended workforce, three skills are emerging as top priorities during recruitment, according to a recent survey: relationship building, like forging constructive partnerships and account management; collaboration; and adaptability. Maintaining a healthy workplace culture In freeing up human employees to focus on higher-value tasks, the hope is that AI agents can elevate the employee experience, deepening fulfilment and satisfaction in the workplace. “At Wipro, our vision is to improve the life of Wiproites,” says Jayaswal. “We are taking away non-value added work by embracing modern ways of collaborating, engaging, and transacting, leaving associates with higher order work content.” But leadership teams embracing agentic AI will also need to plan for the new pressures and stressors that the technology can place on a workforce. There is already confusion and knowledge gaps, with 73% of HR leaders reporting their employees don’t yet understand how digital labor will impact their work. Many organizations have opted to define AI agents as

Learning to lead in a hybrid human-AI enterprise Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

The Download: whole-body rejuvenation drugs and five things to know about AI

admin NU / Juni 9, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition The outspoken longevity scientist David Sinclair has predicted that, one day, you’ll go to the doctor and get a prescription that will make you 10 years younger. MIT Technology Review has learned of his latest step toward this: human tests of a “reprogramming” drug. Sinclair, a biologist at Harvard Medical School, plans to launch the tests in a $101 million competition organized by the XPrize Foundation. The winners will “restore” a person to an earlier apparent age, as measured by improvements in immune, cognitive, and muscle function. The grand prize goes to any team able to show a 10-year (or greater) relative improvement after one year of treatment. Sinclair says he plans to give an oral drug mixture to volunteers, in a bid to seek “evidence for age restoration in humans.” Find out how he hopes to reverse ageing through chemical reprogramming. —Antonio Regalado Five things you need to know about AI —Will Douglas Heaven At SXSW London last week, I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I pulled a few things from our first AI10 list, an annual guide to the top trends in this buzzy world, but I also veered off on several tangents. In my half-hour slot, I tried to cover the key talking points that I think help to make sense of what’s going on in tech—and thus the economy—today. Five key thoughts emerged: AI is everywhere all at once, it’s getting scary, a backlash is growing, it’s becoming a big deal for science—and I didn’t even need to show up at the talk. Read the full story for all the details. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 OpenAI has confidentially filed for a US IPOThe listing could come as early as September. (Reuters $)+ OpenAI is targeting a valuation of up to $1 trillion. (Financial Times $)+ The IPO will test investor appetite for AI companies. (WSJ $)+ The move follows IPO filings from Anthropic and SpaceX. (CNN) 2 The US claims BYD, Baidu, Alibaba, and others are aiding China’s militaryThe Pentagon added them to a list of military-linked companies. (WSJ $)+ The designations limit their operations in the US. (BBC)+ The new additions also include humanoid firm Unitree. (TechCrunch)+ The Pentagon is adapting to China’s tech rise. (MIT Technology Review) 3 Apple’s long-awaited AI overhaul of Siri is finally here“Siri AI” promises to be a more conversational assistant. (NYT $)+ It includes a standalone app and screen-reading features. (Reuters $)+ And arrives after two years of repeated delays. (Axios) 4 The White House and Congress are working to limit state AI lawsA new deal would curb state rules for federal legislation. (Axios)+ AI regulation has divided US politicians. (MIT Technology Review) 5 Meta is launching a “workforce academy” for building data centersThe five-week program is free of charge and guarantees a job. (WSJ $)+ It arrives shortly after Meta laid off 8,000 employees. (NPR)6 Taiwan is mulling curbs on AI chip exports to ChinaThe new controls would further align with US restrictions. (Bloomberg $)+ Future AI chips could be built on glass. (MIT Technology Review) 7 Meta has quietly removed face-recognition code from its smart glasses appThe code identified by investigators has disappeared. (Wired $) 8 Humanoid robots are edging towards the battlefieldAmerican and Chinese militaries are pursuing the tech. (BBC) 9 The world’s first wind-powered underwater data center has launchedIt uses less power and water than land-based equivalents. (Guardian) 10 You could get some benefits of sleep without having to nod offIf new brain stimulation works as well on humans as on mice, that is. (New Scientist $) Quote of the day “You’re on the train, but you know that there’s no destination.” —Clara Shih, a former top AI executive at Salesforce and Meta, tells the New York Times that AI training can’t keep up with the field’s advances. One More Thing ILLUSTRATIONS BY AMRITA MARINO Inside the race to make human sex cells in the lab An embryo forms when sperm meets egg. But what if we could start with other cells—if a blood sample or skin biopsy could be transformed into “artificial” sperm and eggs? What if those were all you needed to make a baby? That’s the promise of a radical approach to reproduction. Scientists have already created artificial eggs and sperm from mouse cells and used them to create mouse pups. Artificial human sex cells are next. The advances could herald the end of infertility, but they raise major scientific and ethical challenges. Read the full story on the new recipes for sperm and eggs. —Jessica Hamzelou We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + These chefs turn Pop-Tarts into the desserts that inspired them.+ A choir has beautifully transformed System of a Down’s “Chop Suey!”+ Scientists finally traced crabs’ sideways walk in this fascinating study of evolution.+ This nostalgic essay on the family computer is a touching throwback to early internet life. Top image credit: Stephanie Arnett/MIT Technology Review | Getty Images Please send Pop-Tarts to hi@technologyreview.com. You can follow me on LinkedIn. Thanks for reading! —Thomas

The Download: whole-body rejuvenation drugs and five things to know about AI Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

admin NU / Juni 9, 2026

Google just announced Gemini 3.5 Live Translate. It is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model detects over 70 languages automatically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn systems wait for a speaker to finish before responding. Gemini 3.5 Live Translate generates speech continuously instead. It balances a trade-off between waiting for context and translating immediately. More context improves quality. Faster output keeps the translation in sync with the speaker. The result stays a few seconds behind the speaker throughout a session. Gemini 3.5 Live Translate Gemini 3.5 Live Translate is a single audio model (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech as the audio streams in, rather than after a full sentence. It handles multilingual inputs without manually configuring settings. Its noise robustness lets applications run in loud, unpredictable environments. The model is rolling out across three surfaces. Developers get it in public preview through the Gemini Live API and Google AI Studio. Enterprises get a private preview in Google Meet starting this month. Everyone else gets it through the Google Translate app on Android and iOS. How the Continuous Streaming Works The design difference matters for building real-time features. A conversational Live agent uses turn-based interactions. It relies on pauses, intent detection, and interruption handling. Live Translation uses continuous stream processing instead. It translates as the speaker talks, without waiting for turns to end. To hold strict real-time latency thresholds, the translation path accepts audio input only. Text input is not supported in translation mode. The model also drops tool use and system instructions in this mode. That keeps it a focused translator pipeline rather than a general agent. Building With the Live API Developers configure translation inside the Live API session setup. You set a translationConfig block within the generationConfig. The targetLanguageCode field takes a BCP-47 code, such as “pl” or “es”. BCP-47 is the standard format for language tags like en or pt-BR. It defaults to “en”. The echoTargetLanguage boolean controls input that is already in the target language. When true, the model echoes that speech. When false, it stays silent. You can also enable inputAudioTranscription and outputAudioTranscription for text transcripts. Audio formats are fixed. Input is raw 16-bit PCM at 16kHz, mono, little-endian. Output is raw 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed raw audio. You send audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint avoid exposing your API key. Dimension Live Agent Live Translation Model role Assistant that listens, reasons, and acts Interpreter / real-time translator pipeline Interaction Turn-based, with interruption handling Continuous stream processing, no turns Tools Function calling, Google Search, instructions Translation only, no tools or instructions Inputs Text, audio, video, and image Audio only, for strict latency Configuration Generation, speech, tools, instructions targetLanguageCode and echoTargetLanguage Use Case The model targets live interpretation across several settings. Google lists multilingual calls, meetings, lessons, and broadcasts. Developer platforms reduce the integration work for real-time media. Agora, Fishjam, LiveKit, Pipecat, and Vision Agents already use the Live API. These platforms handle the complex real-time media streaming infrastructure. That lets developers focus on the user experience instead. Google’s example app demonstrates dubbing and simultaneous multi-language translation. Grab is testing the model for driver-and-traveler communication at pickups. Grab users make over 10 million voice calls per month. CJ ENM, LiveKit, and others reported positive feedback on quality, accuracy, and low latency. How It Changes Google Meet and Translate According to Google’s official release, Google Meet will soon use 3.5 Live Translate for speech translation. The table shows the stated before-and-after for Meet. Capability Previous Meet With 3.5 Live Translate Languages 5 70+ Combinations per meeting Only to and from English 2000+ combinations Access Existing interface Updated interface for instant access The Meet update is in private preview for select business Workspace customers this month. A broader rollout follows later this year. In the Translate app, the Live translate feature works with any connected headphones. It mirrors the speaker’s tone across 70+ languages. Android also gains a listening mode. You hold the phone to your ear like a regular call. The translated audio then streams through the earpiece, without others hearing. Key Takeaways Gemini 3.5 Live Translate is Google’s latest audio model for live speech-to-speech translation across 70+ languages. It streams continuously instead of turn-by-turn, staying a few seconds behind the speaker. Developers can configure it via the Live API using targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out. It rolls out to the Gemini Live API, Google Meet (5→70+ languages), and the Translate app. All generated audio carries an imperceptible SynthID watermark for detectability. Check out the Model Card and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API appeared first on MarkTechPost.

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

admin NU / Juni 8, 2026

Last week Microsoft AI has announced MAI-Transcribe-1.5. It is the second iteration of the company’s in-house speech-to-text family. The model targets accuracy across 43 languages, accents, and noisy environments. The Microsoft team positions it for production transcription workloads. What is MAI-Transcribe-1.5 MAI-Transcribe-1.5 is an automatic speech recognition (ASR) model. It takes audio as input and returns text. Microsoft built it in-house, not on a third-party base. The model handles 43 languages with a single system. It is optimized for diverse accents, dialects, and real-world acoustic conditions. Microsoft is integrating it into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. It is also available in Foundry, Microsoft’s model platform. The Accuracy Case Accuracy here is measured by Word-Error-Rate (WER). Lower WER means fewer mistakes per transcribed word. Microsoft reports best-in-class WER across 43 languages on FLEURS. FLEURS is a standard multilingual transcription benchmark. On the Artificial Analysis leaderboard, the model posts a WER of 2.4%. That places it third on a competitive open benchmark. So the picture is split. Microsoft team claims first place on FLEURS and third on Artificial Analysis. The language expansion is the other accuracy story. Coverage grew from 25 languages to 43. The 18 new languages were added without compromising accuracy. Ten of them are South Asian, including Bengali, Tamil, and Telugu. Eight are European, such as Ukrainian, Greek, and Catalan. Speed MAI-Transcribe-1.5 leads on accuracy-times-speed on the Artificial Analysis leaderboard. It runs up to 5x faster than models of comparable accuracy. The effect is largest on long audio files. The model can transcribe an hour of audio in under 15 seconds. Microsoft cites up to 5x speedups over Gemini 3.1, Scribe v2, and GPT-4o-Transcribe on long audio. Against the prior MAI-Transcribe-1, the Azure card lists up to 5.7x faster long-form inference. For batch pipelines processing large archives, that latency gap compounds quickly. Keyword (Entity) Biasing: The Feature Worth Understanding Generic transcribers often fail on domain-specific words. These include people, product names, medical terms, and internal acronyms. Those words frequently matter most to enterprise users. MAI-Transcribe-1.5 adds keyword biasing, also called entity biasing. You supply a list of domain-specific keywords. The Azure card supports up to 200 keywords. The model biases its predictions toward that list. Critically, it does not blindly force matches. It uses shared context to decide when biasing should apply. Microsoft reports a 30% WER reduction on FLEURS when biasing is used. A short example shows the effect. Without biasing, names render as “Sean,” “Oif,” and “Societal.” With a supplied name list, the model recovers “Shaun,” “Aoife,” and “Xochitl.” This is relevant for meetings, healthcare, and call centers with niche vocabulary. Use Cases The Azure model card lists concrete production scenarios. Each maps to a common engineering workload: Video captions for media and content platforms. Accessibility tools that depend on accurate captions. Meeting transcription for Teams-style collaboration tools. Call analysis for contact centers and support analytics. Content creation workflows that need fast draft transcripts. Voice agents that convert speech to text before reasoning. Automatic language identification helps when the input language is unknown. The model detects the spoken language without a manual setting. MAI-Transcribe-1.5 vs MAI-Transcribe-1 The table below compares the two generations using stated facts only. Attribute MAI-Transcribe-1 MAI-Transcribe-1.5 Languages covered 25 43 Keyword/entity biasing Not listed Up to 200 keywords Long-form inference speed Baseline Up to 5.7x faster Artificial Analysis WER Not specified 2.4% (ranked #3) FLEURS position (per Microsoft) State-of-the-art Best-in-class across 43 languages Automatic language identification Not specified Yes Lifecycle Prior release Generally available (GA) Input / Output Audio / Text Audio / Text Strengths and Limitations Strengths: 43-language coverage from a single model, up from 25. Keyword/entity biasing yields up to 30% WER reduction on FLEURS. Sub-15-second transcription for an hour of audio. Generally available now through Azure AI Foundry. Robust on noisy, real-world audio, per Microsoft. Limitations: No diarization yet, so speaker labels are unavailable. No native streaming API, so real-time use is limited. Several accuracy, speed, and cost claims are first-party. Ranked third on Artificial Analysis, behind two competitors. Sources Introducing MAI-Transcribe-1.5 — Microsoft AI MAI-Transcribe-1.5 model card — Azure AI Foundry MAI-Transcribe-1.5 Foundry API documentation MAI-Transcribe-1.5 Cookbook MAI Playground The post Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription appeared first on MarkTechPost.

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Why this year’s World Cup ball may not fly as far

admin NU / Juni 8, 2026

Much is new about this month’s upcoming FIFA World Cup tournament, which will be held in the US, Canada, and Mexico. It hosts more teams than ever before. It’s the first to occur in three different host countries. And, like predecessor cups for over half a century, it will employ a soccer ball with a brand-new design. One group of researchers that has been testing the physics of World Cup balls for the past 20 years recently studied this new entry, called the Trionda. Made by Adidas, the Trionda features four red, green, and blue panels textured with deep grooves and maple leaf, green eagle, and star emblems to represent the three host countries. Through wind-tunnel experiments, the research team found that this ball improves over previous versions in some ways, but long-distance kicks might not go as far as they did in the past. “The simple picture is that Trionda may very slightly punish extreme distance, but it should reward clean technique and predictable flight,” says team member John Eric Goff, who researches sports physics and is an incoming professor of engineering practice at Purdue University. “Goalkeepers, defenders hitting long passes, and long-range shooters are where I would look first for visible differences.” Researchers used a wind tunnel to study the Trionda ball at the University of Tsukuba. TAKESHI ASAI, SUNGCHAN HONG, AND RICHONG LIU Adidas has been designing new balls for each World Cup since the 1970s. Some of the design changes in the first few decades were aesthetic: The 1986 ball featured graphics inspired by Aztec temples for the Mexico tournament, and 1994’s had space graphics in honor of the moon landing’s 25th anniversary. There were some structural differences too, such as upgraded foam cores and improved water resistance. But by and large, the balls used the same design of 32 pentagonal panels stitched together. That changed in the 2006 World Cup in Germany, when Adidas introduced the +Teamgeist ball. It featured just 14 curved panels, which were thermally bonded together rather than stitched. The design helped keep moisture out so the ball wouldn’t grow heavier throughout the game, Goff says. It was around this time that he started studying soccer balls. In the years since then, he and his colleagues have followed the transformations as Adidas has released balls with different surface textures and even fewer panels—design changes significant enough to affect game play. In-flight motion Goff discovered early on that by analyzing a ball’s trajectory data, he could derive its drag coefficient—a number that determines the air resistance it experiences midflight at a given speed. Shortly after, he began working with a team in Japan to analyze how the World Cup ball’s in-flight behavior changes with each new design. The experiments, carried out at the University of Tsukuba in Japan, have been purposely consistent over the years because “maintaining continuity is important for comparing new data with historical data sets,” says Takeshi Asai, a professor there who works on the experiments. They entail attaching the ball to a metal rod connected to an instrument called a force balance, which measures aerodynamic forces such as drag and lift as the ball is exposed to the same wind speeds it would experience in a real soccer game—seven to 35 meters per second. The team tests the ball in different orientations, “but you can only do a few because the Trionda ball is $170,” Goff says, and each new test effectively destroys it. The experiments show the team how the drag coefficient changes with speed, and Goff then writes code to simulate the ball’s overall trajectory as it flies through the air. The team’s analysis has shown how recent World Cup balls evolved since the eight-panel Jabulani ball for the 2010 event. The Jabulani faced much criticism from players—particularly goalkeepers, who said it had a deceptive trajectory that “dipped wickedly,” as one player told the Guardian. ALAMY ADOBE STOCK TAKESHI ASAI, SUNGCHAN HONG, RICHONG LIU The 2010 Jabulani ball (left) had eight panels and a smooth texture that translated into unpredictable performance. Later balls, like the 2014 Brazuca (center) and this year’s Trionda (right), have fewer panels but more roughness. The ball had one key flaw: It was too smooth. Even though its drag coefficient was relatively low at high speeds, once the ball slowed to a certain point the coefficient would ratchet up, causing it to lose speed quite fast and behave as the 2010 players complained. This sudden transition—called the drag crisis—occurs at higher speeds for smoother balls, but with added texture like seams and grooves, the transition can be avoided until a ball reaches lower speeds. This allows the ball to travel farther and generally behave in a more predictable way during typical play. “It’s the same reason why golf balls have dimples and baseballs have those nice 108 double stitches. If those rough features of those balls were not there, you would not get anywhere near the kind of distance when those balls are thrown or hit that you see now,” Goff says. “There has to be some kind of a roughness on the ball to move this transition to a smaller speed.” New grooves Subsequent designs have been able to push the drag crisis to lower speeds, according to the analysis by Goff and his colleagues. The Brazuca ball used in 2014, for instance, has only six panels, but their total seam length is much longer, adding to the surface’s roughness. And this year’s Trionda ball contains just four panels, but each panel also has three deep grooves for more texture. There’s a trade-off to this roughness, though. While Goff and his colleagues found that the Trionda ball experiences the drag crisis at the slowest speed since 2010, its drag coefficient is also higher than that of the other balls at high speeds. That means that even though the most dramatic change doesn’t happen until the ball is moving quite slowly, the ball will still slow down faster than its recent predecessors during the faster portion

Why this year’s World Cup ball may not fly as far Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

The Download: how the World Cup ball will fly and OpenAI’s “super app”

admin NU / Juni 8, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Why this year’s World Cup ball may not fly as far Much is new about this month’s FIFA World Cup tournament. It hosts more teams than ever before. It’s the first to occur in three different host countries. And, like every World Cup for over half a century, it will employ a football with a brand-new design. Through wind-tunnel experiments, researchers found that long-distance kicks with Adidas’s new Trionda ball might not travel as far as they did in the past. The payoff is a more predictable flight path, something players have not always enjoyed from World Cup balls. Find out how a few grooves and seams can change the way the game is played. —Jenna Ahart The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 OpenAI plans to turn ChatGPT into a ‘super app’ before its IPOThe revamp would combine coding tools and AI agents. (Financial Times $)+ The super app ambitions first emerged last year. (Fast Company)+ OpenAI is also building a fully automated researcher. (MIT Technology Review) 2 Trump wants the US government to take a stake in AI companiesHe will meet AI leaders to discuss the plan. (BBC)+ Which would create “a partnership with the American public.” (Reuters $)+ He wants a slice of the AI boom. (Axios) 3 Google has agreed to pay SpaceX $30 billion for AI computing powerThe $920 million-a-month contract runs through June 2029. (NYT $)+ Google will use about 110,000 Nvidia GPUs owned by SpaceX. (CNBC)+ It comes days after Anthropic struck a SpaceX data center deal. (WSJ $) 4 AI is set to make everyday life more expensiveIts insatiable thirst for resources is likely to push up inflation. (WP $)+ We did the math on AI’s energy footprint. (MIT Technology Review) 5 Europe is accelerating its withdrawal from US Big TechNew analysis reveals dozens of moves to alternative providers. (Wired $) + Last week, the EU launched a “made in Europe” drive. (Reuters $) 6 ICE plans to give local police a new facial recognition appIt would allow them to verify a person’s immigration status. (404 Media)+ Is the Pentagon allowed to surveil Americans with AI? (MIT Technology Review) 7 Silicon Valley’s lure is fading for India’s tech talentDue to Trump’s immigration policies and AI-driven layoffs. (Rest of World) 8 ‘Recursive self-improvement’ has sparked fears of AI escaping controlNobody is sure about the consequences of RSI. (The Economist $)+ Here are five ways that AI is learning to improve itself. (MIT Technology Review) 9 Gene-edited embryos are getting closer, but a key safety gap remainsCurrent techniques still fail to edit every cell. (New Scientist $)+ “Base-edited baby” is one of our 10 Breakthrough Technologies for 2026. (MIT Technology Review) 10 NASA astronauts will wear high-tech Prada underwear on their moon tripsVentilation tubes are knitted into the garments. (The Verge) Quote of the day “Chat is dead.” —A senior OpenAI employee tells the Financial Times why the company is shifting focus from chatbots to AI agents. One More Thing BETH HOECKEL How AI is helping historians better understand our past The digitization of historical records is making it possible to study the past in new ways. Historians are now using machine learning—particularly deep neural networks—to analyze everything from centuries-old astronomy textbooks to ancient Greek inscriptions. The technology is helping researchers uncover new patterns in the historical record. But it also introduces risks, including the possibility that machine learning will slip bias or outright falsifications into our understanding of the past. Read the full story on how AI is transforming the study of history. —Moira Donovan We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Take a tour of extinct everyday objects to travel back to pre-smartphone life.+ This a cappella cover of “I Want To Know What Love Is” nails the power-ballad drama.+ Korea’s ingenious “one-a-day” banana packs are designed so each one ripens sequentially.+ Casino dialogue has been synced over Looney Tunes footage in this unexpectedly perfect mashup.

The Download: how the World Cup ball will fly and OpenAI’s “super app” Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

The Practitioner’s Guide to AgentOps

admin NU / Juni 8, 2026

According to Futurum Research’s 2025 market overview of agentic AI platforms,

The Practitioner’s Guide to AgentOps Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

admin NU / Juni 8, 2026

Inference speed is becoming a competitive metric for large language models. Xiaomi’s MiMo team just released MiMo-V2.5-Pro-UltraSpeed, built in collaboration with the TileRT systems group. It decodes faster than 1000 tokens per second on a 1-trillion-parameter model. Xiaomi team describes this as a first at trillion-parameter scale. Demos show generation peaks near 1200 tokens per second. The notable part is the hardware: it runs on commodity GPUs, not custom silicon. What is MiMo-V2.5-Pro-UltraSpeed UltraSpeed is a high-speed serving mode for the existing MiMo-V2.5-Pro model. The base model uses a Mixture-of-Experts (MoE) architecture at trillion-parameter scale. UltraSpeed targets generation speed rather than model capability. It changes how fast the model produces output tokens. The speedup comes from three coordinated techniques across the model and the serving system. Xiaomi calls this approach extreme model-system codesign. Crucially, the entire stack runs on a single standard 8-GPU commodity node. The Speed Case: Three Layers Working Together The first layer is FP4 quantization. At trillion scale, FP8 or FP16 weights create heavy memory and bandwidth pressure. Lower bit-width weights move through memory faster, which directly lifts decode speed. Xiaomi uses the MXFP4 format, applied selectively to the MoE Experts only. Other modules keep higher precision, reported as FP8 by TileRT. Experts hold most parameters and tolerate quantization best, so the tradeoff is favorable. Quantization-Aware Training (QAT) keeps benchmark quality essentially on par with the original. The second layer is DFlash speculative decoding, covered in detail below. The third layer is TileRT, the system that executes everything on the GPU. Each technique alone is not enough. The 1000 TPS result needs all three aligned tightly. DFlash: Parallel Drafting Without a Serial Bottleneck Standard speculative decoding uses a small draft model to guess upcoming tokens. The large model then verifies those guesses in parallel. Rejection sampling keeps output identical to normal decoding, so quality is lossless. The problem is that the draft model still generates tokens one at a time. DFlash, a method from the research community, removes that constraint. It uses block-level masked parallel prediction. The draft model fills a whole block of masked positions in one forward pass. Xiaomi tuned DFlash with the Muon second-order optimizer and model self-distillation. The draft model uses Sliding Window Attention (SWA) only, matching the MiMo-V2 design. This makes per-prediction compute constant rather than growing with context length. Block size is capped at 8 to limit verification cost and raise concurrency. Acceptance length measures how many draft tokens survive verification each round. Scenario Acceptance Length Coding 6.30 Math / Reasoning 5.56 Agent 4.29 In coding, six to seven of eight draft tokens are accepted per round. Some samples reach a maximum of 7.14. TileRT: Squeezing the Microseconds At 1000 TPS, each operator runs for only microseconds. Traditional systems launch operators one by one, and each launch costs time. Those gaps fracture the execution stream and become the real bottleneck. TileRT replaces this with a Persistent Engine Kernel that stays resident on the GPU. It uses Warp Specialization to split data movement, compute, and communication into coordinated roles. Small operations like RMSNorm, RoPE, and KV cache writes turn into bottlenecks at this scale. The system was co-designed with the FP4 and DFlash choices, not added afterward. Use Cases The release targets latency-sensitive work where waiting breaks the loop: Parallel reasoning: run many Best-of-N or tree-search paths within the same wall-clock time. Coding agents: faster code generation cuts the wait between agent steps. Real-time decision loops: trading signal generation, fraud interception, and live dialogue. Interactive prototyping: demos show a Snake game in about 10 seconds and a macOS interface in about one minute. These are throughput-bound workloads where raw token speed is the binding constraint. How It Compares The first table contrasts the two routes to extreme decode speed. Approach Hardware How speed is achieved Cerebras Wafer-Scale integration (custom) Scale on a single custom wafer Groq Custom architecture Pure on-chip SRAM MiMo × TileRT Commodity GPUs (8-GPU node) Model-system codesign: FP4 + DFlash + TileRT The second table compares the standard model with the UltraSpeed mode. Dimension MiMo-V2.5-Pro MiMo-V2.5-Pro-UltraSpeed Decode speed Baseline ~10× faster (1000+ TPS) Price 1× 3× Weight precision Standard FP4 MoE Experts via QAT Decoding Standard autoregressive DFlash speculative decoding Access Standard model plans API only, application-based trial Token Plan Supported Not supported Access, Pricing, and Open Source UltraSpeed ships through a limited, application-based window. The API trial runs June 9 to June 23, 2026. Pricing is 3× the standard MiMo-V2.5-Pro rate, for roughly 10× the speed. It is API only, and the Token Plan is not supported. Approved users also receive free Chat access during the trial. Chat limits apply: 10 queue entries daily, 30-minute sessions, and 5-minute idle release. Xiaomi open-sourced the MiMo-V2.5-Pro-FP4-DFlash checkpoint on Hugging Face. TileRT has open-sourced select modules on GitHub. Strengths and Limitations Strengths 1000+ TPS on a 1T model without custom silicon. Lossless decoding through rejection sampling in DFlash. FP4 applied only where tolerance is highest, preserving quality. An open checkpoint lets the community test the claims. Limitations Access is gated, short, and approval-based at launch. Pricing triples per token versus the standard model. Acceptance length drops in open-ended conversation. Independent third-party speed verification is not yet public. Key Takeaways Xiaomi MiMo and TileRT decode a 1-trillion-parameter model past 1000 tokens per second on commodity GPUs. The speedup comes from three layers: FP4 quantization, DFlash speculative decoding, and the TileRT runtime. FP4 (MXFP4) is applied only to MoE Experts; QAT keeps capability essentially on par. DFlash predicts a whole masked block per forward pass, hitting 6.30 average acceptance length in coding. UltraSpeed runs on a single 8-GPU node via an application-based API trial, June 9–23, 2026. Marktechpost’s Visual Explainer GUIDE • INFERENCE SYSTEMS MiMo-V2.5-Pro-UltraSpeed: 1000+ Tokens Per Second on a 1T Model Xiaomi MiMo & TileRT — FP4 quantization, DFlash speculative decoding, and a microsecond-scale runtime. 01 / 08 What It Is Xiaomi’s MiMo team built it with the TileRT systems group. It decodes over 1000 tokens/s on a 1-trillion-parameter model.

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal

admin NU / Juni 7, 2026

This week, Google AI team released the Colab CLI. The tool connects your local terminal to remote Colab runtimes. It lets developers and AI agents run code on cloud GPUs and TPUs. You stay in your terminal the entire time. The CLI is open source under the Apache 2.0 license. What is Google Colab CLI The Colab CLI is a command-line interface for Google Colab. You can create sessions, run code, and manage files from the terminal. Any agent with terminal access can call the tool. That includes Claude Code, Codex, and Google’s Antigravity. Google ships a prepackaged skill file named COLAB_SKILL.md. It gives agents built-in context on how to use the CLI. Installation uses a single uv tool install command from the GitHub repository. Copy CodeCopiedUse a different Browser uv tool install git+https://github.com/googlecolab/google-colab-cli A minimal session looks like this: Copy CodeCopiedUse a different Browser colab new # provision a CPU session echo “print(‘hello’)” | colab exec # run code colab stop # release the VM How the Commands Work The CLI groups commands into sessions, execution, files, and automation. colab new provisions a session, with CPU as the default. Add –gpu T4, –gpu L4, –gpu A100, or –gpu H100 for a GPU. TPU options are v5e1 and v6e1. colab exec runs Python from stdin, a .py file, or a notebook. The exec reads files locally and ships their contents. Local edits therefore need no separate upload step. colab stop terminates the session and releases the VM. Other commands cover files and authentication. colab upload and colab download move files between local and remote. colab drivemount mounts Google Drive, defaulting to /content/drive. colab auth authenticates the VM for Google Cloud services. colab exec and Artifact Recovery: The Core Loop The core loop is short. You provision a runtime, run a script, then pull results back. colab download retrieves models, datasets, and other files. colab log exports session history as .ipynb, .md, .txt, or .jsonl. So a remote run becomes a replayable notebook on your disk. colab repl and colab console give interactive access to the VM. colab install adds packages with uv, falling back to pip. Session metadata is stored at ~/.config/colab-cli/sessions.json. Example: Fine-Tuning Gemma 3 1B Google’s official release demonstrates an agent-driven fine-tuning job. The task fine-tunes google/gemma-3-1b-it using QLoRA. It trains on a Text-to-SQL dataset to improve SQL generation. The Antigravity agent runs the full pipeline with five commands. Copy CodeCopiedUse a different Browser colab new –gpu T4 colab install transformers datasets peft trl bitsandbytes accelerate colab exec -f finetune_run.py colab log –output gemma_finetune_log.ipynb colab stop The agent then downloads the adapter model, adapter config, tokenizer config, and tokenizer. You can load and serve the fine-tuned model locally. No manual cloud provisioning command was typed by the user. Use Cases Offload laptop-bound training to a remote GPU or TPU without leaving the terminal. Let agents like Claude Code, Codex, or Antigravity run end-to-end ML pipelines. Fine-tune small models, such as Gemma 3 1B, with QLoRA remotely. Script notebook execution and export replayable .ipynb logs for reproducibility. Debug interactively on the VM through colab repl or colab console. Colab CLI vs Browser-Based Colab The CLI does not replace the notebook UI. It targets scripted, automated, and agent-driven work instead. Here is how the two workflows compare across common tasks. Dimension Browser-Based Colab Colab CLI Interface Web notebook UI Local terminal Accelerator selection Runtime menu in the browser –gpu / –tpu flags on colab new Agent use Manual, UI-driven Any terminal agent via commands Run local scripts Paste or upload into cells colab exec -f script.py Artifact retrieval Manual download or Drive colab download, colab log Package install !pip inside a cell colab install (uv, then pip) Session control Browser-managed runtime colab new, colab stop, colab status Agent skill file None Bundled COLAB_SKILL.md Strengths and Considerations Strengths: Terminal-native workflow fits scripts, CI, and agent loops. One command provisions T4, L4, A100, or H100 GPUs. exec ships local file contents, so no upload step is needed. Logs export to replayable notebook formats for reproducibility. Open source under Apache 2.0, with a bundled agent skill file. Works with multiple agents, not a single vendor’s tool. Considerations: Access requires authentication; the default strategy is oauth2. repl and console need a TTY when run interactively. Pipe stdin to use those two commands inside scripts. Compute still runs on Colab’s backend and its runtime model. Key Takeaways Google’s Colab CLI runs code on remote Colab GPUs and TPUs from your local terminal. One command provisions accelerators: colab new –gpu T4 through A100 and H100, plus TPUs. colab exec ships local .py and .ipynb files to the runtime without an upload step. Any terminal agent — Claude Code, Codex, Antigravity — can drive it via a bundled COLAB_SKILL.md. It is open source under Apache 2.0, and colab log exports replayable notebook logs. Marktechpost Visual Explainer Google Colab CLI — Terminal Guide 1 / 8 Overview Run Colab GPUs and TPUs from your terminal The Google Colab CLI connects your local terminal to remote Colab runtimes. Developers and AI agents run code on cloud accelerators without leaving the shell. Announced June 5, 2026 • Open source under Apache 2.0 Step 1 What it is A command-line interface for Google Colab. It connects your local terminal to remote Colab runtimes. You create sessions, run code, and manage files from the terminal. Any terminal-based AI agent can call it too. Step 2 Install and quick start Install with a single command, then run a first session. uv tool install git+https://github.com/googlecolab/google-colab-cli colab new # provision a CPU session echo “print(‘hello’)” | colab exec # run code colab stop # release the VM Step 3 Provision GPUs and TPUs Request an accelerator when you create the session. CPU is the default. colab new –gpu T4 colab new –gpu A100 colab new –tpu v6e1 Accelerator availability depends on your active Colab plan. Step 4 Run local scripts remotely The exec command reads your file locally and ships its contents. No separate

Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal Beitrag lesen »

AI

Five things you need to know about AI

Learning to lead in a hybrid human-AI enterprise

The Download: whole-body rejuvenation drugs and five things to know about AI

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

Why this year’s World Cup ball may not fly as far

The Download: how the World Cup ball will fly and OpenAI’s “super app”

The Practitioner’s Guide to AgentOps

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal

Unsere Dienstleistungen

Startseite

Wie es funktioniert

Nachrichten

Preise

Support

Hilfe-Center

Problem melden

Feedback geben

Datenschutzrichtlinie

Benutzerkonto

Folgen Sie uns