AI Archives - YouZum

Five things you need to know about AI

admin NU / 6 月 9, 2026

At SXSW London last week I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I pulled a few things from our first AI10 list, an annual guide to the most important trends in this buzzy world, but I also veered off on a number of tangents. In my half-hour slot, I tried to cover the key talking points that I think help to make sense of what’s going on in tech—and thus the economy—today. (I gave a talk with the same title at SXSW London last year with five different things you needed to know. A lot has happened since then!) So: This is how I’m thinking about AI midway through 2026. Let me know if you would pick different points! 1. Strictly speaking, I didn’t need to show up to give this talk. Tongue in cheek? Maybe. But generative AI tools have already become mundane, used by millions to automate everyday office tasks (including producing and delivering talks). It’s no surprise that one of the biggest questions out there right now is what this all means for jobs. People are confused and scared. The frustrating answer is that despite the hype coming from the top about the potential for AI to join the workforce soon—and viral social media posts yelling that something big is happening—there is almost no data to say either way what kind of effect this technology will have on employment and the economy overall. That’s not to say it won’t have an impact, even a huge one, but it’s just too soon to tell. In theory, teams of agents working together toward common goals could become assembly lines for white-collar work, doing to offices this century what Henry Ford’s innovations did to factories in the 20th century. In theory. Because in order to know what will happen to jobs, we need to know what will happen inside the companies that create those jobs. But most companies are still figuring that out. 2. AI is getting scary (for real this time). There have been scary stories about AI for years—claims that it will kill us all or bring about the end of civilization. There’s still a loud crowd of doomers, but those scenarios remain dystopian science fiction. What’s happened instead is that many of the worst near-term, real-world fears have come true. Take deepfakes, AI-generated images or videos of people doing things they didn’t actually do. Deepfakes have been used to incite violence, swing votes, and sow distrust. Trump’s White House is among those creating and publishing fake images. Many deepfakes are also used to abuse women and girls. One study found that 98% of deepfakes are pornographic and 99% involve women. Another concern is the rise of dangerous and delusional relationships with chatbots. Many people turn to chatbots to seek private advice and to feel heard. But there are now multiple lawsuits against AI companies alleging that the technology encouraged or aided suicides and other forms of self-harm. AI is also being used in warfare in new and worrying ways. LLMs are now giving advice, not just being used for analysis. One US defense official told my colleague James O’Donnell that you could now give a military chatbot a list of targets and ask which one to hit first. Anyone who uses AI knows that its output needs to be reviewed carefully. In fact-paced, high-stress active conflict, the risk that corners get cut is high. 3. A lot of people really hate AI. I checked out an anti-AI protest in London earlier this year and found a very broad mix of complaints. Banners proclaiming the end times bounced along to chants of “Stop the slop! Stop the slop!” Protests are getting more organized and drawing larger crowds. There’s pushback from fans of films and video games, who object to the use of generative AI in their favorite titles. In one notable case, the acclaimed 2025 game Clair Obscur was stripped of an award when the developers admitted to using AI in just one small, specific part of its production. And there’s the data center backlash. The US has more than 5,400 data centers and counting. With the energy demands of AI growing, people are unhappy about the environmental impact and their rising electricity bills. Activists are managing to stall development in a number of places. Regulation is becoming politically popular. Grassroots movements like QuitGPT have gained momentum. A small number have turned to violence; a few weeks ago somebody threw a Molotov cocktail at Sam Altman’s house. It’s not clear where all this leads. But the apocalyptic hype from tech leaders is not helping people stay calm. 4. AI for science is a very big deal. It’s early days yet, but the potential for AI to help make a genuine and important scientific discovery is greater than ever. Google DeepMind has developed Co-Scientist, a multipurpose tool that can help researchers dig up and compare previous results, generate hypotheses, and devise experiments to test them. OpenAI told me this year that its North Star is the goal of building a fully automated researcher by 2028. Mathematicians are excited too. Fundamental math underpins many everyday technologies, from internet security to video streaming. The last few months have seen a string of claims that AI has cracked unsolved math problems. And software that can solve really hard math problems will be able—so the argument goes—to solve more general-purpose real-world problems too. What are the downsides? Some scientists are warning that an overreliance on AI tools could narrow the scope of research because scientists may choose problems that are most suited to AI assistance. There are also concerns that AI-assisted research will lead to a flood of inaccurate or fake results: science slop. 5. AI is everywhere all at once. So where does that leave us? There are a lot of exciting things, a lot of worrying things, and a

Five things you need to know about AI Read Post »

AI, Committee, 新闻, Uncategorized

David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition

admin NU / 6 月 9, 2026

The outspoken longevity scientist David Sinclair has been predicting that one day, you’ll go to the doctor and get a prescription that will make you 10 years younger. Now MIT Technology Review has learned that he has plans to launch human tests of an oral “reprogramming” drug as part of a $101 million competition organized by the XPrize Foundation. The foundation is offering cash awards to teams able to “restore” a person to an earlier apparent age, as measured by improvements in immune, cognitive, and muscle function. The grand prize goes to any team able to show a 10-year (or greater) relative improvement after one year of treatment. Reached by phone, Sinclair, a biologist at Harvard Medical School, confirmed that he plans to give an oral drug mixture to volunteers in a bid to seek “evidence for age restoration in humans.” The trial, if it goes forward, will be a significant new development in the race to harness so-called epigenetic reprogramming. That technology is based on the discovery, 20 years ago, of powerful genes able to turn an adult cell into a stem cell similar to those found in embryos. The age-reversal effect is believed to occur via a resetting of molecular controls on DNA known as epigenetic marks, which help determine a cell’s overall metabolism and identity. Companies are now racing to use that phenomenon for a new form of rejuvenation medicine. Only this January, one of Sinclair’s companies, Life Biosciences, made news by winning approval to launch an initial human trial using a set of powerful reprogramming genes. The company announced today it had treated its first patient. But that test involves a complex gene therapy and is limited to patients’ eyes, where it could treat conditions like glaucoma. Sinclair’s new plan is bolder: a reprogramming drug you’d swallow in order to promote such effects across the body. “What we’re aiming to do is to epigenetically restore the animal and eventually the person,” he says. “It is true that we’ve been doing extensive animal studies with the oral agent and are looking to compete in the XPrize.” This alternative method, chemical reprogramming, uses drugs to mimic the effects of the embryonic genes. That is significant because drug compounds can travel through the bloodstream, reaching most or all cells in a person’s body. Some experts expressed caution, saying the chemical process, at least as used in labs, is extremely harsh and not even particularly effective. “Who doesn’t dream of whole-body rejuvenation? I think it’s a great goal,” says Sergiy Velychko, founder of Soxogen, a stealth reprogramming company in Boston. “But these chemicals are used in very, very high concentrations for cell reprogramming.” Sinclair declined to describe the exact makeup of the drug candidate, code-named SL-100, calling its contents “highly, highly confidential.” However, he has previously published lab studies of what he called “epigenetic age-reversal cocktails,” which mixed powerful chemicals with known supplements and commercially available medicines. It’s those latter components that would be easiest to test on people, since doctors are free to prescribe them, even for unusual objectives like age reversal. James Clement, head of Betterhumans, an organization that specializes in life-extension studies using existing drugs, said in a message that he is “running clinical trials” of an oral reprogramming cocktail for Sinclair’s XPrize team. Sinclair’s team is competing in the XPrize Healthspan Competition, launched in 2023. It follows several previous competitions that focused on commercial spaceflight, lunar landings, and other goals. The XPrize Foundation is led by executive chairman Peter Diamandis, also an active promoter of longevity research. “If two teams are equivalent, they would split the award,” says Jamie Justice, a doctor and executive director for the contest, which was bankrolled by Saudi Arabia’s Hevolution Foundation, “But it will be incredibly hard to even get to one winner.” Justice says a judging panel is now in the process of picking 10 finalists from 65 teams that have been exploring health foods, lifestyle interventions, digital trackers, and drug compounds. Sinclair’s team, Justice says, was a late entrant to the contest, but like all teams, it would be required to move into wider human tests starting this year. “You have to be ready and in trials,” she says. The race to harness the reprogramming phenomenon and apply it to living people is heating up, even outside the XPrize competition. On June 2, a startup called NewLimit, founded by the crypto billionaire Brian Armstrong, said it had raised a further $435 million, from investors including Peter Thiel’s Founders Fund, to support what it calls “age reprogramming.” The company says it is working toward delivering genetic reprogramming instructions to the liver, to treat diseases of that organ. But Sinclair has been saying that whole-body rejuvenation is a possibility too. And for that, chemicals, rather than gene therapy, could be the most practical strategy. Sinclair says his lab has been searching for such compounds and is starting to use AI “to improve the oral agents that we’re testing.” Chemical reprogramming cocktails, as used in labs, typically involve a mix of vitamins, approved drugs, and experimental molecules. For instance, one recipe Sinclair filed a patent on includes the supplement forskolin, the antidepressant tranylcypromine, and an experimental chemical, laduviglusib, which has been tested against Alzheimer’s, among other ingredients. “In those days it was a six-factor cocktail,” Sinclair says of his earlier research. “But we’ve come a long way. I can’t disclose what’s in it, but it’s an improvement and an advance on that, and we’ve done a number of animal studies. They are not published, but we’ve been doing them for a long time, and we want to make sure that we’ve done a full investigation of safety and efficacy before we release any of the data.” While Sinclair’s results aren’t published, other teams say attempts to reverse the age of entire animals using chemical drugs haven’t worked yet. Last year, the lab of Vadim Gladyshev, another Harvard biologist and a member of a different XPrize team, reported on its attempt to

David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition Read Post »

AI, Committee, 新闻, Uncategorized

Learning to lead in a hybrid human-AI enterprise

admin NU / 6 月 9, 2026

As adoption of AI agents looks set to surge by as much as 300% in the next two years, leadership teams are carefully considering the implications of a hybrid human-AI workforce. Unlike existing enterprise-level automation that relies on manual input, AI agents are capable of autonomously coordinating complex tasks, interacting with multiple tools and environments across an organization. In early applications that center on customer service, HR, and sales, adoption of agentic AI has led to productivity gains of 30-50%. Their autonomy positions agents more as collaborators than tools, working side-by-side with human employees in blended teams that look poised to upend traditional workplace dynamics. More than three-quarters of HR leaders believe that the deployment of AI agents will transform existing workplace norms, driving a complete reappraisal of how roles and responsibilities are distributed, how skills are prioritized, and how workplace culture is shaped. Though many admit they’re in the early or preparatory phase of this shift, 86% of chief HR officers predict that navigating digital labor shaped by agentic AI will be a central component of their role in the years ahead. Fluency in the change management aspect of agentic AI adoption will be a crucial differentiator when it comes to unlocking the full potential of the technology going forward, believes Ateet Jayaswal, chief culture and employee experience officer at Wipro, a leading technology services and consulting company. This moment is one that he says, “calls for a mindset shift in how HR leaders would enable their organizations.” Redeploying roles to enable higher-value work As AI agents assume ownership of more complex and integral tasks, the distribution of roles and responsibilities within an organization will undergo significant change. It’s estimated that three-quarters of current roles will require redesign, reskilling, or redeployment by 2030 as a result of agentic AI. For leadership, this shift should be about reskilling employees toward higher-value work in order to optimize the potential of an agent-human hybrid workforce, says Jayaswal. For example, Wipro is a complex organization of 240,000 employees across 65 countries. It previously had multiple policies, documents, and knowledge fragmented across different systems, which delayed response to employee queries. But the company has recently integrated a custom agentic AI assistant—an agent co-created in partnership with enterprise agentic AI platform Ema Unlimited—that can swiftly navigate this complex system, assuming responsibility for 50 HR tasks that had previously fallen to human employees. With the help of an AI agent, average response time to queries has lowered from 48 hours to five seconds. Human employees have more time to focus on work “that requires a creative and imaginative mind and cross-functional collaboration, leveraging diverse ideas and thoughts to problem-solve,” says Jayaswal. The AI agent, meanwhile, handles rote administrative tasks like sorting timesheets or helping employees navigate policies and take actions in the flow of work. When reallocating employee responsibilities, though, it is imperative that humans remain in the loop, Jayaswal caveats. When agentic AI is incorporated into enterprise technology, it must work with sensitive and personal data and therefore needs even more stringent guardrails and constraints than consumer applications. “When you expose an AI agent to organizational data, when you integrate it into multiple enterprise systems, then pathways around the AI agent become extremely important,” he says. “It’s an evolving space that leadership needs to have front-of-mind.” Governance should include robust data privacy rules and the establishment of governance layers, such as an AI council, he suggests. At a fundamental level, the adoption of AI agents will force a re-evaluation of human roles, believes Jayaswal. Rather than employees primarily performing repetitive tasks or troubleshooting, a significant proportion of their time will shift to designing, teaching, and optimizing an AI agent that can do this work for them with far greater speed and predictability and without the agent getting bored. “The nature of your job changes from being the hero who comes in to solve the problem to designing the hero who can solve the problem,” he summarizes. “The individuals who I have seen thrive in this environment are the ones who make this shift.” An evolving employee skillset Just as roles and responsibilities will be reconfigured to reflect the input of AI agents, the core skills of human employees will be reprioritized. More than four in five HR leaders say they’re planning to reskill workers to become more competitive in a market shaped by AI agents. Technical skills will be increasingly important. Leading employers such as Salesforce, Danone, and Walmart are already rolling out dedicated AI and digital skills programs that aim to equip everyone from frontline workers to C-suite executives with a baseline level of AI literacy in response to the pervasiveness of the technology. But desirable soft skills will also evolve, Jayaswal points out. Employees who assign tasks to an AI agent need to plainly articulate what modular steps may be needed to accomplish a task, what the desired outcome should be, and what parameters or guardrails need to be in place to ensure the agent doesn’t access or share confidential data. As HR executives adapt to a blended workforce, three skills are emerging as top priorities during recruitment, according to a recent survey: relationship building, like forging constructive partnerships and account management; collaboration; and adaptability. Maintaining a healthy workplace culture In freeing up human employees to focus on higher-value tasks, the hope is that AI agents can elevate the employee experience, deepening fulfilment and satisfaction in the workplace. “At Wipro, our vision is to improve the life of Wiproites,” says Jayaswal. “We are taking away non-value added work by embracing modern ways of collaborating, engaging, and transacting, leaving associates with higher order work content.” But leadership teams embracing agentic AI will also need to plan for the new pressures and stressors that the technology can place on a workforce. There is already confusion and knowledge gaps, with 73% of HR leaders reporting their employees don’t yet understand how digital labor will impact their work. Many organizations have opted to define AI agents as

Learning to lead in a hybrid human-AI enterprise Read Post »

AI, Committee, 新闻, Uncategorized

The Download: whole-body rejuvenation drugs and five things to know about AI

admin NU / 6 月 9, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition The outspoken longevity scientist David Sinclair has predicted that, one day, you’ll go to the doctor and get a prescription that will make you 10 years younger. MIT Technology Review has learned of his latest step toward this: human tests of a “reprogramming” drug. Sinclair, a biologist at Harvard Medical School, plans to launch the tests in a $101 million competition organized by the XPrize Foundation. The winners will “restore” a person to an earlier apparent age, as measured by improvements in immune, cognitive, and muscle function. The grand prize goes to any team able to show a 10-year (or greater) relative improvement after one year of treatment. Sinclair says he plans to give an oral drug mixture to volunteers, in a bid to seek “evidence for age restoration in humans.” Find out how he hopes to reverse ageing through chemical reprogramming. —Antonio Regalado Five things you need to know about AI —Will Douglas Heaven At SXSW London last week, I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I pulled a few things from our first AI10 list, an annual guide to the top trends in this buzzy world, but I also veered off on several tangents. In my half-hour slot, I tried to cover the key talking points that I think help to make sense of what’s going on in tech—and thus the economy—today. Five key thoughts emerged: AI is everywhere all at once, it’s getting scary, a backlash is growing, it’s becoming a big deal for science—and I didn’t even need to show up at the talk. Read the full story for all the details. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 OpenAI has confidentially filed for a US IPOThe listing could come as early as September. (Reuters $)+ OpenAI is targeting a valuation of up to $1 trillion. (Financial Times $)+ The IPO will test investor appetite for AI companies. (WSJ $)+ The move follows IPO filings from Anthropic and SpaceX. (CNN) 2 The US claims BYD, Baidu, Alibaba, and others are aiding China’s militaryThe Pentagon added them to a list of military-linked companies. (WSJ $)+ The designations limit their operations in the US. (BBC)+ The new additions also include humanoid firm Unitree. (TechCrunch)+ The Pentagon is adapting to China’s tech rise. (MIT Technology Review) 3 Apple’s long-awaited AI overhaul of Siri is finally here“Siri AI” promises to be a more conversational assistant. (NYT $)+ It includes a standalone app and screen-reading features. (Reuters $)+ And arrives after two years of repeated delays. (Axios) 4 The White House and Congress are working to limit state AI lawsA new deal would curb state rules for federal legislation. (Axios)+ AI regulation has divided US politicians. (MIT Technology Review) 5 Meta is launching a “workforce academy” for building data centersThe five-week program is free of charge and guarantees a job. (WSJ $)+ It arrives shortly after Meta laid off 8,000 employees. (NPR)6 Taiwan is mulling curbs on AI chip exports to ChinaThe new controls would further align with US restrictions. (Bloomberg $)+ Future AI chips could be built on glass. (MIT Technology Review) 7 Meta has quietly removed face-recognition code from its smart glasses appThe code identified by investigators has disappeared. (Wired $) 8 Humanoid robots are edging towards the battlefieldAmerican and Chinese militaries are pursuing the tech. (BBC) 9 The world’s first wind-powered underwater data center has launchedIt uses less power and water than land-based equivalents. (Guardian) 10 You could get some benefits of sleep without having to nod offIf new brain stimulation works as well on humans as on mice, that is. (New Scientist $) Quote of the day “You’re on the train, but you know that there’s no destination.” —Clara Shih, a former top AI executive at Salesforce and Meta, tells the New York Times that AI training can’t keep up with the field’s advances. One More Thing ILLUSTRATIONS BY AMRITA MARINO Inside the race to make human sex cells in the lab An embryo forms when sperm meets egg. But what if we could start with other cells—if a blood sample or skin biopsy could be transformed into “artificial” sperm and eggs? What if those were all you needed to make a baby? That’s the promise of a radical approach to reproduction. Scientists have already created artificial eggs and sperm from mouse cells and used them to create mouse pups. Artificial human sex cells are next. The advances could herald the end of infertility, but they raise major scientific and ethical challenges. Read the full story on the new recipes for sperm and eggs. —Jessica Hamzelou We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + These chefs turn Pop-Tarts into the desserts that inspired them.+ A choir has beautifully transformed System of a Down’s “Chop Suey!”+ Scientists finally traced crabs’ sideways walk in this fascinating study of evolution.+ This nostalgic essay on the family computer is a touching throwback to early internet life. Top image credit: Stephanie Arnett/MIT Technology Review | Getty Images Please send Pop-Tarts to hi@technologyreview.com. You can follow me on LinkedIn. Thanks for reading! —Thomas

The Download: whole-body rejuvenation drugs and five things to know about AI Read Post »

AI, Committee, 新闻, Uncategorized

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

admin NU / 6 月 9, 2026

Google just announced Gemini 3.5 Live Translate. It is their latest audio model for live speech-to-speech translation. Speech-to-speech means spoken audio goes in, and translated spoken audio comes out. The model detects over 70 languages automatically and generates translated speech. It preserves the speaker’s intonation, pacing, and pitch in the output. Turn-by-turn systems wait for a speaker to finish before responding. Gemini 3.5 Live Translate generates speech continuously instead. It balances a trade-off between waiting for context and translating immediately. More context improves quality. Faster output keeps the translation in sync with the speaker. The result stays a few seconds behind the speaker throughout a session. Gemini 3.5 Live Translate Gemini 3.5 Live Translate is a single audio model (gemini-3.5-live-translate-preview), not a chat assistant. It processes speech as the audio streams in, rather than after a full sentence. It handles multilingual inputs without manually configuring settings. Its noise robustness lets applications run in loud, unpredictable environments. The model is rolling out across three surfaces. Developers get it in public preview through the Gemini Live API and Google AI Studio. Enterprises get a private preview in Google Meet starting this month. Everyone else gets it through the Google Translate app on Android and iOS. How the Continuous Streaming Works The design difference matters for building real-time features. A conversational Live agent uses turn-based interactions. It relies on pauses, intent detection, and interruption handling. Live Translation uses continuous stream processing instead. It translates as the speaker talks, without waiting for turns to end. To hold strict real-time latency thresholds, the translation path accepts audio input only. Text input is not supported in translation mode. The model also drops tool use and system instructions in this mode. That keeps it a focused translator pipeline rather than a general agent. Building With the Live API Developers configure translation inside the Live API session setup. You set a translationConfig block within the generationConfig. The targetLanguageCode field takes a BCP-47 code, such as “pl” or “es”. BCP-47 is the standard format for language tags like en or pt-BR. It defaults to “en”. The echoTargetLanguage boolean controls input that is already in the target language. When true, the model echoes that speech. When false, it stays silent. You can also enable inputAudioTranscription and outputAudioTranscription for text transcripts. Audio formats are fixed. Input is raw 16-bit PCM at 16kHz, mono, little-endian. Output is raw 16-bit PCM at 24kHz, mono, little-endian. PCM is uncompressed raw audio. You send audio in chunks of 100ms. For client-side apps, ephemeral tokens on the v1alpha endpoint avoid exposing your API key. Dimension Live Agent Live Translation Model role Assistant that listens, reasons, and acts Interpreter / real-time translator pipeline Interaction Turn-based, with interruption handling Continuous stream processing, no turns Tools Function calling, Google Search, instructions Translation only, no tools or instructions Inputs Text, audio, video, and image Audio only, for strict latency Configuration Generation, speech, tools, instructions targetLanguageCode and echoTargetLanguage Use Case The model targets live interpretation across several settings. Google lists multilingual calls, meetings, lessons, and broadcasts. Developer platforms reduce the integration work for real-time media. Agora, Fishjam, LiveKit, Pipecat, and Vision Agents already use the Live API. These platforms handle the complex real-time media streaming infrastructure. That lets developers focus on the user experience instead. Google’s example app demonstrates dubbing and simultaneous multi-language translation. Grab is testing the model for driver-and-traveler communication at pickups. Grab users make over 10 million voice calls per month. CJ ENM, LiveKit, and others reported positive feedback on quality, accuracy, and low latency. How It Changes Google Meet and Translate According to Google’s official release, Google Meet will soon use 3.5 Live Translate for speech translation. The table shows the stated before-and-after for Meet. Capability Previous Meet With 3.5 Live Translate Languages 5 70+ Combinations per meeting Only to and from English 2000+ combinations Access Existing interface Updated interface for instant access The Meet update is in private preview for select business Workspace customers this month. A broader rollout follows later this year. In the Translate app, the Live translate feature works with any connected headphones. It mirrors the speaker’s tone across 70+ languages. Android also gains a listening mode. You hold the phone to your ear like a regular call. The translated audio then streams through the earpiece, without others hearing. Key Takeaways Gemini 3.5 Live Translate is Google’s latest audio model for live speech-to-speech translation across 70+ languages. It streams continuously instead of turn-by-turn, staying a few seconds behind the speaker. Developers can configure it via the Live API using targetLanguageCode and echoTargetLanguage; audio-only, 16kHz in, 24kHz out. It rolls out to the Gemini Live API, Google Meet (5→70+ languages), and the Translate app. All generated audio carries an imperceptible SynthID watermark for detectability. Check out the Model Card and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API appeared first on MarkTechPost.

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API Read Post »

AI, Committee, 新闻, Uncategorized

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

admin NU / 6 月 8, 2026

Last week Microsoft AI has announced MAI-Transcribe-1.5. It is the second iteration of the company’s in-house speech-to-text family. The model targets accuracy across 43 languages, accents, and noisy environments. The Microsoft team positions it for production transcription workloads. What is MAI-Transcribe-1.5 MAI-Transcribe-1.5 is an automatic speech recognition (ASR) model. It takes audio as input and returns text. Microsoft built it in-house, not on a third-party base. The model handles 43 languages with a single system. It is optimized for diverse accents, dialects, and real-world acoustic conditions. Microsoft is integrating it into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. It is also available in Foundry, Microsoft’s model platform. The Accuracy Case Accuracy here is measured by Word-Error-Rate (WER). Lower WER means fewer mistakes per transcribed word. Microsoft reports best-in-class WER across 43 languages on FLEURS. FLEURS is a standard multilingual transcription benchmark. On the Artificial Analysis leaderboard, the model posts a WER of 2.4%. That places it third on a competitive open benchmark. So the picture is split. Microsoft team claims first place on FLEURS and third on Artificial Analysis. The language expansion is the other accuracy story. Coverage grew from 25 languages to 43. The 18 new languages were added without compromising accuracy. Ten of them are South Asian, including Bengali, Tamil, and Telugu. Eight are European, such as Ukrainian, Greek, and Catalan. Speed MAI-Transcribe-1.5 leads on accuracy-times-speed on the Artificial Analysis leaderboard. It runs up to 5x faster than models of comparable accuracy. The effect is largest on long audio files. The model can transcribe an hour of audio in under 15 seconds. Microsoft cites up to 5x speedups over Gemini 3.1, Scribe v2, and GPT-4o-Transcribe on long audio. Against the prior MAI-Transcribe-1, the Azure card lists up to 5.7x faster long-form inference. For batch pipelines processing large archives, that latency gap compounds quickly. Keyword (Entity) Biasing: The Feature Worth Understanding Generic transcribers often fail on domain-specific words. These include people, product names, medical terms, and internal acronyms. Those words frequently matter most to enterprise users. MAI-Transcribe-1.5 adds keyword biasing, also called entity biasing. You supply a list of domain-specific keywords. The Azure card supports up to 200 keywords. The model biases its predictions toward that list. Critically, it does not blindly force matches. It uses shared context to decide when biasing should apply. Microsoft reports a 30% WER reduction on FLEURS when biasing is used. A short example shows the effect. Without biasing, names render as “Sean,” “Oif,” and “Societal.” With a supplied name list, the model recovers “Shaun,” “Aoife,” and “Xochitl.” This is relevant for meetings, healthcare, and call centers with niche vocabulary. Use Cases The Azure model card lists concrete production scenarios. Each maps to a common engineering workload: Video captions for media and content platforms. Accessibility tools that depend on accurate captions. Meeting transcription for Teams-style collaboration tools. Call analysis for contact centers and support analytics. Content creation workflows that need fast draft transcripts. Voice agents that convert speech to text before reasoning. Automatic language identification helps when the input language is unknown. The model detects the spoken language without a manual setting. MAI-Transcribe-1.5 vs MAI-Transcribe-1 The table below compares the two generations using stated facts only. Attribute MAI-Transcribe-1 MAI-Transcribe-1.5 Languages covered 25 43 Keyword/entity biasing Not listed Up to 200 keywords Long-form inference speed Baseline Up to 5.7x faster Artificial Analysis WER Not specified 2.4% (ranked #3) FLEURS position (per Microsoft) State-of-the-art Best-in-class across 43 languages Automatic language identification Not specified Yes Lifecycle Prior release Generally available (GA) Input / Output Audio / Text Audio / Text Strengths and Limitations Strengths: 43-language coverage from a single model, up from 25. Keyword/entity biasing yields up to 30% WER reduction on FLEURS. Sub-15-second transcription for an hour of audio. Generally available now through Azure AI Foundry. Robust on noisy, real-world audio, per Microsoft. Limitations: No diarization yet, so speaker labels are unavailable. No native streaming API, so real-time use is limited. Several accuracy, speed, and cost claims are first-party. Ranked third on Artificial Analysis, behind two competitors. Sources Introducing MAI-Transcribe-1.5 — Microsoft AI MAI-Transcribe-1.5 model card — Azure AI Foundry MAI-Transcribe-1.5 Foundry API documentation MAI-Transcribe-1.5 Cookbook MAI Playground The post Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription appeared first on MarkTechPost.

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription Read Post »

AI, Committee, 新闻, Uncategorized

Why this year’s World Cup ball may not fly as far

admin NU / 6 月 8, 2026

Much is new about this month’s upcoming FIFA World Cup tournament, which will be held in the US, Canada, and Mexico. It hosts more teams than ever before. It’s the first to occur in three different host countries. And, like predecessor cups for over half a century, it will employ a soccer ball with a brand-new design. One group of researchers that has been testing the physics of World Cup balls for the past 20 years recently studied this new entry, called the Trionda. Made by Adidas, the Trionda features four red, green, and blue panels textured with deep grooves and maple leaf, green eagle, and star emblems to represent the three host countries. Through wind-tunnel experiments, the research team found that this ball improves over previous versions in some ways, but long-distance kicks might not go as far as they did in the past. “The simple picture is that Trionda may very slightly punish extreme distance, but it should reward clean technique and predictable flight,” says team member John Eric Goff, who researches sports physics and is an incoming professor of engineering practice at Purdue University. “Goalkeepers, defenders hitting long passes, and long-range shooters are where I would look first for visible differences.” Researchers used a wind tunnel to study the Trionda ball at the University of Tsukuba. TAKESHI ASAI, SUNGCHAN HONG, AND RICHONG LIU Adidas has been designing new balls for each World Cup since the 1970s. Some of the design changes in the first few decades were aesthetic: The 1986 ball featured graphics inspired by Aztec temples for the Mexico tournament, and 1994’s had space graphics in honor of the moon landing’s 25th anniversary. There were some structural differences too, such as upgraded foam cores and improved water resistance. But by and large, the balls used the same design of 32 pentagonal panels stitched together. That changed in the 2006 World Cup in Germany, when Adidas introduced the +Teamgeist ball. It featured just 14 curved panels, which were thermally bonded together rather than stitched. The design helped keep moisture out so the ball wouldn’t grow heavier throughout the game, Goff says. It was around this time that he started studying soccer balls. In the years since then, he and his colleagues have followed the transformations as Adidas has released balls with different surface textures and even fewer panels—design changes significant enough to affect game play. In-flight motion Goff discovered early on that by analyzing a ball’s trajectory data, he could derive its drag coefficient—a number that determines the air resistance it experiences midflight at a given speed. Shortly after, he began working with a team in Japan to analyze how the World Cup ball’s in-flight behavior changes with each new design. The experiments, carried out at the University of Tsukuba in Japan, have been purposely consistent over the years because “maintaining continuity is important for comparing new data with historical data sets,” says Takeshi Asai, a professor there who works on the experiments. They entail attaching the ball to a metal rod connected to an instrument called a force balance, which measures aerodynamic forces such as drag and lift as the ball is exposed to the same wind speeds it would experience in a real soccer game—seven to 35 meters per second. The team tests the ball in different orientations, “but you can only do a few because the Trionda ball is $170,” Goff says, and each new test effectively destroys it. The experiments show the team how the drag coefficient changes with speed, and Goff then writes code to simulate the ball’s overall trajectory as it flies through the air. The team’s analysis has shown how recent World Cup balls evolved since the eight-panel Jabulani ball for the 2010 event. The Jabulani faced much criticism from players—particularly goalkeepers, who said it had a deceptive trajectory that “dipped wickedly,” as one player told the Guardian. ALAMY ADOBE STOCK TAKESHI ASAI, SUNGCHAN HONG, RICHONG LIU The 2010 Jabulani ball (left) had eight panels and a smooth texture that translated into unpredictable performance. Later balls, like the 2014 Brazuca (center) and this year’s Trionda (right), have fewer panels but more roughness. The ball had one key flaw: It was too smooth. Even though its drag coefficient was relatively low at high speeds, once the ball slowed to a certain point the coefficient would ratchet up, causing it to lose speed quite fast and behave as the 2010 players complained. This sudden transition—called the drag crisis—occurs at higher speeds for smoother balls, but with added texture like seams and grooves, the transition can be avoided until a ball reaches lower speeds. This allows the ball to travel farther and generally behave in a more predictable way during typical play. “It’s the same reason why golf balls have dimples and baseballs have those nice 108 double stitches. If those rough features of those balls were not there, you would not get anywhere near the kind of distance when those balls are thrown or hit that you see now,” Goff says. “There has to be some kind of a roughness on the ball to move this transition to a smaller speed.” New grooves Subsequent designs have been able to push the drag crisis to lower speeds, according to the analysis by Goff and his colleagues. The Brazuca ball used in 2014, for instance, has only six panels, but their total seam length is much longer, adding to the surface’s roughness. And this year’s Trionda ball contains just four panels, but each panel also has three deep grooves for more texture. There’s a trade-off to this roughness, though. While Goff and his colleagues found that the Trionda ball experiences the drag crisis at the slowest speed since 2010, its drag coefficient is also higher than that of the other balls at high speeds. That means that even though the most dramatic change doesn’t happen until the ball is moving quite slowly, the ball will still slow down faster than its recent predecessors during the faster portion

Why this year’s World Cup ball may not fly as far Read Post »

AI, Committee, 新闻, Uncategorized

The Download: how the World Cup ball will fly and OpenAI’s “super app”

admin NU / 6 月 8, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Why this year’s World Cup ball may not fly as far Much is new about this month’s FIFA World Cup tournament. It hosts more teams than ever before. It’s the first to occur in three different host countries. And, like every World Cup for over half a century, it will employ a football with a brand-new design. Through wind-tunnel experiments, researchers found that long-distance kicks with Adidas’s new Trionda ball might not travel as far as they did in the past. The payoff is a more predictable flight path, something players have not always enjoyed from World Cup balls. Find out how a few grooves and seams can change the way the game is played. —Jenna Ahart The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 OpenAI plans to turn ChatGPT into a ‘super app’ before its IPOThe revamp would combine coding tools and AI agents. (Financial Times $)+ The super app ambitions first emerged last year. (Fast Company)+ OpenAI is also building a fully automated researcher. (MIT Technology Review) 2 Trump wants the US government to take a stake in AI companiesHe will meet AI leaders to discuss the plan. (BBC)+ Which would create “a partnership with the American public.” (Reuters $)+ He wants a slice of the AI boom. (Axios) 3 Google has agreed to pay SpaceX $30 billion for AI computing powerThe $920 million-a-month contract runs through June 2029. (NYT $)+ Google will use about 110,000 Nvidia GPUs owned by SpaceX. (CNBC)+ It comes days after Anthropic struck a SpaceX data center deal. (WSJ $) 4 AI is set to make everyday life more expensiveIts insatiable thirst for resources is likely to push up inflation. (WP $)+ We did the math on AI’s energy footprint. (MIT Technology Review) 5 Europe is accelerating its withdrawal from US Big TechNew analysis reveals dozens of moves to alternative providers. (Wired $) + Last week, the EU launched a “made in Europe” drive. (Reuters $) 6 ICE plans to give local police a new facial recognition appIt would allow them to verify a person’s immigration status. (404 Media)+ Is the Pentagon allowed to surveil Americans with AI? (MIT Technology Review) 7 Silicon Valley’s lure is fading for India’s tech talentDue to Trump’s immigration policies and AI-driven layoffs. (Rest of World) 8 ‘Recursive self-improvement’ has sparked fears of AI escaping controlNobody is sure about the consequences of RSI. (The Economist $)+ Here are five ways that AI is learning to improve itself. (MIT Technology Review) 9 Gene-edited embryos are getting closer, but a key safety gap remainsCurrent techniques still fail to edit every cell. (New Scientist $)+ “Base-edited baby” is one of our 10 Breakthrough Technologies for 2026. (MIT Technology Review) 10 NASA astronauts will wear high-tech Prada underwear on their moon tripsVentilation tubes are knitted into the garments. (The Verge) Quote of the day “Chat is dead.” —A senior OpenAI employee tells the Financial Times why the company is shifting focus from chatbots to AI agents. One More Thing BETH HOECKEL How AI is helping historians better understand our past The digitization of historical records is making it possible to study the past in new ways. Historians are now using machine learning—particularly deep neural networks—to analyze everything from centuries-old astronomy textbooks to ancient Greek inscriptions. The technology is helping researchers uncover new patterns in the historical record. But it also introduces risks, including the possibility that machine learning will slip bias or outright falsifications into our understanding of the past. Read the full story on how AI is transforming the study of history. —Moira Donovan We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Take a tour of extinct everyday objects to travel back to pre-smartphone life.+ This a cappella cover of “I Want To Know What Love Is” nails the power-ballad drama.+ Korea’s ingenious “one-a-day” banana packs are designed so each one ripens sequentially.+ Casino dialogue has been synced over Looney Tunes footage in this unexpectedly perfect mashup.

The Download: how the World Cup ball will fly and OpenAI’s “super app” Read Post »

AI, Committee, 新闻, Uncategorized

The Practitioner’s Guide to AgentOps

admin NU / 6 月 8, 2026

According to Futurum Research’s 2025 market overview of agentic AI platforms,

The Practitioner’s Guide to AgentOps Read Post »

AI, Committee, 新闻, Uncategorized

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

admin NU / 6 月 8, 2026

Inference speed is becoming a competitive metric for large language models. Xiaomi’s MiMo team just released MiMo-V2.5-Pro-UltraSpeed, built in collaboration with the TileRT systems group. It decodes faster than 1000 tokens per second on a 1-trillion-parameter model. Xiaomi team describes this as a first at trillion-parameter scale. Demos show generation peaks near 1200 tokens per second. The notable part is the hardware: it runs on commodity GPUs, not custom silicon. What is MiMo-V2.5-Pro-UltraSpeed UltraSpeed is a high-speed serving mode for the existing MiMo-V2.5-Pro model. The base model uses a Mixture-of-Experts (MoE) architecture at trillion-parameter scale. UltraSpeed targets generation speed rather than model capability. It changes how fast the model produces output tokens. The speedup comes from three coordinated techniques across the model and the serving system. Xiaomi calls this approach extreme model-system codesign. Crucially, the entire stack runs on a single standard 8-GPU commodity node. The Speed Case: Three Layers Working Together The first layer is FP4 quantization. At trillion scale, FP8 or FP16 weights create heavy memory and bandwidth pressure. Lower bit-width weights move through memory faster, which directly lifts decode speed. Xiaomi uses the MXFP4 format, applied selectively to the MoE Experts only. Other modules keep higher precision, reported as FP8 by TileRT. Experts hold most parameters and tolerate quantization best, so the tradeoff is favorable. Quantization-Aware Training (QAT) keeps benchmark quality essentially on par with the original. The second layer is DFlash speculative decoding, covered in detail below. The third layer is TileRT, the system that executes everything on the GPU. Each technique alone is not enough. The 1000 TPS result needs all three aligned tightly. DFlash: Parallel Drafting Without a Serial Bottleneck Standard speculative decoding uses a small draft model to guess upcoming tokens. The large model then verifies those guesses in parallel. Rejection sampling keeps output identical to normal decoding, so quality is lossless. The problem is that the draft model still generates tokens one at a time. DFlash, a method from the research community, removes that constraint. It uses block-level masked parallel prediction. The draft model fills a whole block of masked positions in one forward pass. Xiaomi tuned DFlash with the Muon second-order optimizer and model self-distillation. The draft model uses Sliding Window Attention (SWA) only, matching the MiMo-V2 design. This makes per-prediction compute constant rather than growing with context length. Block size is capped at 8 to limit verification cost and raise concurrency. Acceptance length measures how many draft tokens survive verification each round. Scenario Acceptance Length Coding 6.30 Math / Reasoning 5.56 Agent 4.29 In coding, six to seven of eight draft tokens are accepted per round. Some samples reach a maximum of 7.14. TileRT: Squeezing the Microseconds At 1000 TPS, each operator runs for only microseconds. Traditional systems launch operators one by one, and each launch costs time. Those gaps fracture the execution stream and become the real bottleneck. TileRT replaces this with a Persistent Engine Kernel that stays resident on the GPU. It uses Warp Specialization to split data movement, compute, and communication into coordinated roles. Small operations like RMSNorm, RoPE, and KV cache writes turn into bottlenecks at this scale. The system was co-designed with the FP4 and DFlash choices, not added afterward. Use Cases The release targets latency-sensitive work where waiting breaks the loop: Parallel reasoning: run many Best-of-N or tree-search paths within the same wall-clock time. Coding agents: faster code generation cuts the wait between agent steps. Real-time decision loops: trading signal generation, fraud interception, and live dialogue. Interactive prototyping: demos show a Snake game in about 10 seconds and a macOS interface in about one minute. These are throughput-bound workloads where raw token speed is the binding constraint. How It Compares The first table contrasts the two routes to extreme decode speed. Approach Hardware How speed is achieved Cerebras Wafer-Scale integration (custom) Scale on a single custom wafer Groq Custom architecture Pure on-chip SRAM MiMo × TileRT Commodity GPUs (8-GPU node) Model-system codesign: FP4 + DFlash + TileRT The second table compares the standard model with the UltraSpeed mode. Dimension MiMo-V2.5-Pro MiMo-V2.5-Pro-UltraSpeed Decode speed Baseline ~10× faster (1000+ TPS) Price 1× 3× Weight precision Standard FP4 MoE Experts via QAT Decoding Standard autoregressive DFlash speculative decoding Access Standard model plans API only, application-based trial Token Plan Supported Not supported Access, Pricing, and Open Source UltraSpeed ships through a limited, application-based window. The API trial runs June 9 to June 23, 2026. Pricing is 3× the standard MiMo-V2.5-Pro rate, for roughly 10× the speed. It is API only, and the Token Plan is not supported. Approved users also receive free Chat access during the trial. Chat limits apply: 10 queue entries daily, 30-minute sessions, and 5-minute idle release. Xiaomi open-sourced the MiMo-V2.5-Pro-FP4-DFlash checkpoint on Hugging Face. TileRT has open-sourced select modules on GitHub. Strengths and Limitations Strengths 1000+ TPS on a 1T model without custom silicon. Lossless decoding through rejection sampling in DFlash. FP4 applied only where tolerance is highest, preserving quality. An open checkpoint lets the community test the claims. Limitations Access is gated, short, and approval-based at launch. Pricing triples per token versus the standard model. Acceptance length drops in open-ended conversation. Independent third-party speed verification is not yet public. Key Takeaways Xiaomi MiMo and TileRT decode a 1-trillion-parameter model past 1000 tokens per second on commodity GPUs. The speedup comes from three layers: FP4 quantization, DFlash speculative decoding, and the TileRT runtime. FP4 (MXFP4) is applied only to MoE Experts; QAT keeps capability essentially on par. DFlash predicts a whole masked block per forward pass, hitting 6.30 average acceptance length in coding. UltraSpeed runs on a single 8-GPU node via an application-based API trial, June 9–23, 2026. Marktechpost’s Visual Explainer GUIDE • INFERENCE SYSTEMS MiMo-V2.5-Pro-UltraSpeed: 1000+ Tokens Per Second on a 1T Model Xiaomi MiMo & TileRT — FP4 quantization, DFlash speculative decoding, and a microsecond-scale runtime. 01 / 08 What It Is Xiaomi’s MiMo team built it with the TileRT systems group. It decodes over 1000 tokens/s on a 1-trillion-parameter model.

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs Read Post »

AI

Five things you need to know about AI

David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition

Learning to lead in a hybrid human-AI enterprise

The Download: whole-body rejuvenation drugs and five things to know about AI

Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

Why this year’s World Cup ball may not fly as far

The Download: how the World Cup ball will fly and OpenAI’s “super app”

The Practitioner’s Guide to AgentOps

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

我们的服务

首页

工作原理

新闻

定价

支持

幫助中心

报告问题

提供反馈

隱私權政策

用户账户

关注我们