YouZum

Committee

AI, Committee, Actualités, Uncategorized

Chinese tech workers are starting to train their AI doubles–and pushing back

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters.  Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with an AI agent, went viral on Chinese social media. Though the project was created as a spoof, it struck a nerve among tech workers, a number of whom told MIT Technology Review that their bosses are encouraging them to document their workflows in order to automate specific tasks and processes using AI agent tools like OpenClaw or Claude Code.  To set up Colleague Skill, a user names the coworker whose tasks they want to replicate and adds basic profile details. The tool then automatically imports chat history and files from Lark and DingTalk, both popular workplace apps in China, and generates reusable manuals describing that coworker’s duties—and even their unique quirks—for an AI agent to replicate.  Colleague Skill was created by Tianyi Zhou, who works as an engineer at the Shanghai Artificial Intelligence Laboratory. Earlier this week he told Chinese outlet Southern Metropolis Daily that the project was started as a stunt, prompted by AI-related layoffs and by the growing tendency of companies to ask employees to automate themselves. He didn’t respond to requests for further comment. Internet users have found humor in the idea behind the tool, joking about automating their coworkers before themselves. However, Colleague Skill’s virality has sparked a lot of debate about workers’ dignity and individuality in the age of AI. After seeing Colleague Skill on social media, Amber Li, 27, a tech worker in Shanghai, used it to recreate a former coworker as a personal experiment. Within minutes, the tool created a file detailing how that person did their job. “It is surprisingly good,” Li says. “It even captures the person’s little quirks, like how they react and their punctuation habits.” With this skill, Li can use an AI agent as a new “coworker” that helps debug her code and replies instantly. It felt uncanny and uncomfortable, Li says.  Even so,  replacing coworkers with agents could become a norm. Since OpenClaw became a national craze, bosses in China have been pushing tech workers to experiment with agents.  Although AI agents can take control of your computer, read and summarize news, reply to emails, and book restaurant reservations for you, tech workers on the ground say their utility has so far proven to be limited in business contexts. Asking employees to make manuals describing the minutiae of their day-to-day jobs the way Colleague Skill does is one way to help bridge that gap.  Hancheng Cao, an assistant professor at Emory University who studies AI and work, believes that companies have good reasons to push employees to create work blueprints like these, beyond simply following a trend. “Firms gain not only internal experience with the tools, but also richer data on employee know-how, workflows, and decision patterns. That helps companies see which parts of work can be standardized or codified into systems, and which still depend on human judgment,” he says. To employees, though, making agents or even blueprints for them can feel strange and alienating. One software engineer, who spoke with MIT Technology Review anonymously because of concerns about their job security, trained an AI (not Colleague Skill) on their workflow and found that the process felt reductive—as if their work had been flattened into modules in a way that made them easier to replace. On social media, workers have turned to bleak humor to express similar feelings. In one comment on Rednote, a user wrote that “a cold farewell can be turned into warm tokens,” quipping that if they use Colleague Skill to distill their coworkers into tasks first, they themselves might survive a little longer. The push for creating agents has also spurred clever countermeasures. Irritated by the idea of reducing a person to a skill, Koki Xu, 26 an AI product manager in Beijing, published an “anti-distillation” skill on GitHub on April 4. The tool, which took Xu about an hour to build, is designed to sabotage the process of creating workflows for agents. Users can choose between light, medium, and heavy sabotage modes depending on how closely their boss is observing the process, and the agent rewrites the material into generic, non-actionable language that would produce a less useful AI stand-in. A video Xu posted about the project went viral, drawing more than 5 million likes across platforms. Xu told MIT Technology Review that she has been following the Colleague Skill trend from the start and that it has made her think about alienation, disempowerment, and broader implications for labor. “I originally wanted to write an op-ed, but decided it would be more useful to make something that pushes back against it,” she says. Xu, who has undergraduate and master’s degrees in law, said the trend also raises legal questions. While a company may be able to argue that work chat histories and materials created on a work laptop are corporate property, a skill like this can also capture elements of personality, tone, and judgment, making ownership much less clear. She said she hopes Colleague Skill prompts more discussion about how to protect workers’ dignity and identity in the age of AI. “I believe it’s important to keep up with these trends so we (employees) can participate in shaping how they are used,” she says. Xu herself is an avid AI adopter, with seven OpenClaw agents set up across her personal and work devices. Li, the tech worker in Shanghai, says her company has not yet found a way to replace actual workers with AI tools, largely because they remain unreliable and require constant supervision. “I don’t feel like my job is immediately at risk,” she says. “But I do feel that my value is being cheapened, and I don’t know what to do about it.”

Chinese tech workers are starting to train their AI doubles–and pushing back Lire l’article »

AI, Committee, Actualités, Uncategorized

OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders

Cybersecurity has always had a dual-use problem: the same technical knowledge that helps defenders find vulnerabilities can also help attackers exploit them. For AI systems, that tension is sharper than ever. Restrictions intended to prevent harm have historically created friction for good-faith security work, and it can be genuinely difficult to tell whether any particular cyber action is intended for defensive usage or to cause harm. OpenAI is now proposing a concrete structural solution to that problem: verified identity, tiered access, and a purpose-built model for defenders. OpenAI team announced that it is scaling up its Trusted Access for Cyber (TAC) program to thousands of verified individual defenders and hundreds of teams responsible for defending critical software. The main focus of this expansion is the introduction of GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned specifically for defensive cybersecurity use cases. What Is GPT-5.4-Cyber and How Does It Differ From Standard Models? If you’re an AI engineer or data scientist who has worked with large language models on security tasks, you’re likely familiar with the frustrating experience of a model refusing to analyze a piece of malware or explain how a buffer overflow works — even in a clearly research-oriented context. GPT-5.4-Cyber is designed to eliminate that friction for verified users. Unlike standard GPT-5.4, which applies blanket refusals to many dual-use security queries, GPT-5.4-Cyber is described by OpenAI as ‘cyber-permissive’ — meaning it has a deliberately lower refusal threshold for prompts that serve a legitimate defensive purpose. That includes binary reverse engineering, enabling security professionals to analyze compiled software for malware potential, vulnerabilities, and security robustness without access to the source code. Binary reverse engineering without source code is a significant capability unlock. In practice, defenders routinely need to analyze closed-source binaries — firmware on embedded devices, third-party libraries, or suspected malware samples — without having access to the original code. That model was described as a GPT-5.4 variant purposely fine-tuned for additional cyber capabilities, with fewer capability restrictions and support for advanced defensive workflows including binary reverse engineering without source code. There are also hard limits. Users with trusted access must still abide by OpenAI’s Usage Policies and Terms of Use. The approach is designed to reduce friction for defenders while preventing prohibited behavior, including data exfiltration, malware creation or deployment, and destructive or unauthorized testing. This distinction matters: TAC lowers the refusal boundary for legitimate work, but does not suspend policy for any user. There are also deployment constraints. Use in zero-data-retention environments is limited, given that OpenAI has less visibility into the user, environment, and intent in those configurations — a tradeoff the company frames as a necessary control surface in a tiered-access model. For dev teams accustomed to running API calls in Zero-Data-Retention mode, this is an important implementation constraint to plan around before building pipelines on top of GPT-5.4-Cyber. The Tiered Access Framework: How TAC Actually Works TAC is not a checkbox feature — it is an identity-and-trust-based access framework with multiple tiers. Understanding the structure matters if you or your organization plans to integrate these capabilities. The access process runs through two paths. Individual users can verify their identity at chatgpt.com/cyber. Enterprises can request trusted access for their team through an OpenAI representative. Customers approved through either path gain access to model versions with reduced friction around safeguards that might otherwise trigger on dual-use cyber activity. Approved uses include security education, defensive programming, and responsible vulnerability research. TAC customers who want to go further and authenticate as cyber defenders can express interest in additional access tiers, including GPT-5.4-Cyber. Deployment of the more permissive model is starting with a limited, iterative rollout to vetted security vendors, organizations, and researchers. That means OpenAI is now drawing at least three practical lines instead of one: there is baseline access to general models; there is trusted access to existing models with less accidental friction for legitimate security work; and there is a higher tier of more permissive, more specialized access for vetted defenders who can justify it. The framework is grounded in three explicit principles. The first is democratized access: using objective criteria and methods, including strong KYC and identity verification, to determine who can access more advanced capabilities, with the goal of making those capabilities available to legitimate actors of all sizes, including those protecting critical infrastructure and public services. The second is iterative deployment — OpenAI updates models and safety systems as it learns more about the benefits and risks of specific versions, including improving resilience to jailbreaks and adversarial attacks. The third is ecosystem resilience, which includes targeted grants, contributions to open-source security initiatives, and tools like Codex Security. How the Safety Stack Is Built: From GPT-5.2 to GPT-5.4-Cyber It’s worth understanding how OpenAI has structured its safety architecture across model versions — because TAC is built on top of that architecture, not instead of it. OpenAI began cyber-specific safety training with GPT-5.2, then expanded it with additional safeguards through GPT-5.3-Codex and GPT-5.4. A critical milestone in that progression: GPT-5.3-Codex is the first model OpenAI is treating as High cybersecurity capability under its Preparedness Framework, which requires additional safeguards. These safeguards include training the model to refuse clearly malicious requests like stealing credentials. The Preparedness Framework is OpenAI’s internal evaluation rubric for classifying how dangerous a given capability level could be. Reaching ‘High’ under that framework is what triggered the full cybersecurity safety stack being deployed — not just model-level training, but an additional automated monitoring layer. In addition to safety training, automated classifier-based monitors detect signals of suspicious cyber activity and route high-risk traffic to a less cyber-capable model, GPT-5.2. In other words, if a request looks suspicious enough to exceed a threshold, the platform doesn’t just refuse — it silently reroutes the traffic to a safer fallback model. This is a key architectural detail: safety is enforced not only inside model weights, but also at the infrastructure routing layer. GPT-5.4-Cyber extends this stack further upward — more permissive for verified defenders, but wrapped in

OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders Lire l’article »

AI, Committee, Actualités, Uncategorized

Colossal Biosciences said it cloned red wolves. Is it for real?

If you want to capture something wolflike, it’s best to embark before dawn. So on a morning this January, with the eastern horizon still pink-hued, I drove with two young scientists into a blanket of fog. Forty miles to the west, the industrial sprawl of Houston spawned a golden glow. Tanner Broussard’s old Toyota Tacoma bumped over the levee-top roads as killdeer, flushed from their rest, flew across the beams of his headlights.  Broussard peered into the darkness, looking for traps. “I have one over here,” he said, slowing slightly. A master’s student at McNeese State University, he was quiet and contemplative, his bearded face half-hidden under a black ball cap. “Nothing on it,” he said, blandly. The truck rolled on. Wolves and their relations—dogs, jackals, coyotes, and so on—are classed in the family Canidae, and the canid that dominated this landscape in eastern Texas was once the red wolf. But as soon as white settlers arrived on the continent, Canis rufus found itself under siege. The war on wolves “lasted 200 years,” federal researchers once put it, in a surprisingly evocative report. “The wolf lost.” By 1980, the red wolf was declared extinct in the wild, its population reduced to a small captive breeding population. Still, for decades afterward, people noted that strange wolflike creatures persisted along the Gulf Coast. Finally, in 2018, scientists confirmed that some local coyotes were more than coyotes: They were taller, long-legged, their coats shaded with hints of cinnamon. These animals contained relict red wolf genes. They became known as the ghost wolves. Broussard grew up in southwest Louisiana, watching coyotes trot across his parents’ ranch. The thrilling fact that these might have been not just coyotes but something more? That reset a rambling academic career. In 2023, Broussard had recently returned to college after a seven-year pause, and his budding obsession with wolves narrowed his focus. Before he finished his bachelor’s degree, he began to supply field data to a prominent conservation nonprofit. The American red wolf, Canis rufus, is the most endangered wolf species in the world. This pup is one of four animals said to be clones of this native North American species.COURTESY OF COLOSSAL BIOSCIENCES Then, last year, just before he began his master’s studies, he woke to disconcerting news. A startup called Colossal Biosciences claimed to have resuscitated the dire wolf, a large canid that went extinct more than 10,000 years ago. Pundits debated the utility of the project and whether the clones—technically, gray wolves with some genetic tweaks—could really be called dire wolves. But what mattered to Broussard was Colossal’s simultaneous announcement that it had cloned four red wolves.   “That surprised pretty much everybody in the wolf community,” Broussard said as we toured the wildlife refuge where he’d set his traps. The Association of Zoos and Aquariums runs a program that sustains red wolves through captive breeding; its leadership had no idea a cloning project was underway. Nor did ecologist Joey Hinton, one of Broussard’s advisors, who had trapped the canids Colossal used to source the DNA for its clones. Some of Hinton’s former partners were collaborating with the company, but he didn’t know that clones were on the table. There was already disagreement among scientists about the entire idea of de-extinction. Now Colossal had made these mystery clones, whose location was kept secret. Even the purpose of the clones was murky to some scientists; just how they might restore red wolf populations was unclear.  Red wolves had always been a contentious species, hard for scientists to pin down. The red wolf research community was already marked by the inevitable interpersonal tensions of a small and passionate group. Now Colossal’s clones became one more lightning rod. Perhaps the most curious question, though, was whether the company had cloned red wolves at all.  You can think of the red wolf as the wolf of the East—an apex predator that once roamed the forests and grasslands and marshes everywhere from Texas to Illinois to New York. Smaller than a gray wolf (though a good bit larger than a coyote), this was a sleek beast, with, according to one old field guide, a “cunning fox-like appearance”: long body, long legs; clearly built to run across long distances. Its coat was smooth and flat and came in many colors: a reddish tone that comes out in the right light, yes, but also, despite the name, white and gray and, in certain regions and populations, an ominous all black. We know these details thanks to a few notes from early naturalists. As writer Andrew Moore writes in his new book, The Beasts of the East, by the time a mammalogist decided to class these eastern wolves as a standalone species in the 1930s, the red wolf had been extirpated from the East Coast and was rapidly dwindling across its range. Working with remnant skulls and other specimens, the mammalogist chose the name red wolf—which was later enshrined with the Latinate Canis rufus—because that’s what these wolves were called in the last place they survived.  The looming extinction of the red wolf turned out to be a good thing for coyotes. Canis latrans is a distant relative of wolves that split away from a common ancestor thousands of years ago and might be considered, as one canid biologist put it to me, the “wolf of the Anthropocene.” Their smaller size means they need less food and can survive in smaller and more fragmented territory, the kind that modern humans tend to build.  The last red wolves, which lived in Louisiana and Texas, decided a strange and smaller mate was preferable to no mate at all. Red wolves had kept coyotes out of eastern America, outcompeting them for prey. Now, as the wolves declined, the coyotes began to slip in. The last red wolves, which lived in Louisiana and Texas, decided a strange and smaller mate was preferable to no mate at all. Soon the territory became a genetic jumble, home to both

Colossal Biosciences said it cloned red wolves. Is it for real? Lire l’article »

AI, Committee, Actualités, Uncategorized

The Download: murderous ‘mirror’ bacteria, and Chinese workers fighting AI doubles

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. No one’s sure if synthetic mirror life will kill us all In February 2019, a group of scientists proposed a high-risk, cutting-edge, irresistibly exciting idea that the National Science Foundation should fund: making “mirror” bacteria. These lab-created microbes would be organized like ordinary bacteria, but their proteins and sugars would be mirror images of those found in nature. Researchers believed they could reveal new insights into building cells, designing drugs, and even the origins of life. But now, many of them have reversed course. They’ve become convinced that mirror organisms could trigger a catastrophic event threatening every form of life on Earth. Find out why they’re ringing alarm bells. —Stephen Ornes This story is from the next issue of our print magazine, which is all about nature. Subscribe now to read it when it lands this Wednesday. Chinese tech workers are starting to train their AI doubles—and pushing back Earlier this month, a GitHub project called Colleague Skill struck a nerve by claiming to “distill” a worker’s skills and personality—and replicate them with an AI agent. Though the project was a spoof, it prompted a wave of soul-searching among otherwise enthusiastic early adopters. A number of tech workers told MIT Technology Review that their bosses are already encouraging them to document their workflows for automation via tools like OpenClaw. Many now fear that they are being flattened into code and losing their professional identity. In response, some are fighting back with tools designed to sabotage the automation process. Read the full story. —Caiwei Chen The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The White House and Anthropic are working toward a compromiseThe Trump administration says they had a “productive meeting.” (Reuters $)+ Trump had ordered US agencies to phase out Anthropic’s tech. (Guardian)+ Despite the blacklist, the NSA is using Anthropic’s new Mythos model. (Axios) 2 Palantir has unveiled a manifesto calling for universal national serviceWhile denouncing inclusivity and “regressive” cultures. (TechCrunch)+ It’s a summary of CEO Alex Karp’s book “The Technological Republic.” (Engadget)+ One critic called the book “a piece of corporate sales material.“ (Bloomberg $) 3 Germany’s chancellor and largest company want looser AI rules Chancellor Merz said industrial AI needs ‌more regulatory freedom. (Reuters $)+ Siemens says it plans to shift investments to the US if EU rules don’t change. (Bloomberg $)+ Fractures over AI regulation are also emerging in the US. (MIT Technology Review)   4 Nvidia’s once-tight bond with gamers is cracking over AI  Consumer graphics cards are no longer the priority. (CNBC)+ But generative AI could reinvent what it means to play. (MIT Technology Review) 5 Insurers are trying to exclude AI-related harms from their coverageAnd escape legal liability for AI’s mistakes. (FT $)+ AI images are being used in insurance scams. (BBC) 6 AI is about to make the global e-waste crisis much worseAnd most of the trash will end up in non-Western countries. (Rest of World)+ Here’s what we can do about it. (MIT Technology Review) 7 Tinder and Zoom have partnered with Sam Altman’s eye-scanning firmTo offer a “proof of humanity” badge to users. (BBC) 8 Islamist insurgents in West Africa are driving surging demand for dronesA Nigerian UAV startup is opening its first factory abroad in Ghana. (Bloomberg $) 9 Hundreds of fake pro-Trump AI influencers are flooding social mediaIn an apparent bid to hook conservative voters. (NYT) 10 A Chinese humanoid has smashed the human half-marathon recordDespite crashing into a railing near the end of the race. (NBC News)+ Chinese tech firm Honor swept the podium spots. (Engadget)+ Last year, humans won the race by a mile. (CNN) Quote of the day “This is the only issue where you’ve got Steve Bannon and Ralph Nader, Glenn Beck and Bernie Sanders fighting for the same thing.” —Ben Cumming, head of communications at the AI safety nonprofit Future of Life Institute, tells the Washington Post that diverse public figures are endorsing a declaration of AI policy priorities. One More Thing NASA The great commercial takeover of low Earth orbit The International Space Station will be decommissioned as soon as 2030, but the story of America in low Earth orbit (LEO) will continue.  Using lessons from the ISS, NASA has partnered with private companies to develop new commercial space stations for research, manufacturing, and tourism. If they are successful, these businesses will bring about a new era of space exploration: private rockets flying to private destinations. They will also demonstrate a new model in which NASA builds infrastructure and the private sector takes it from there—freeing the agency to explore deeper and deeper into space. Read the full story. —David W. Brown We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Bask in thisadorable test of a dog’s devotion.+ This vocal pitch trainer improves your singing straight from your browser.+ Master international etiquette with this interactive guide to the world’s cultures.+ Explore the networks of public figures with this intriguing interactive graph. 

The Download: murderous ‘mirror’ bacteria, and Chinese workers fighting AI doubles Lire l’article »

AI, Committee, Actualités, Uncategorized

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks

Anthropic has launched Claude Opus 4.7, it’s latest frontier model and a direct successor to Claude Opus 4.6. The release is positioned as a focused improvement rather than a full generational leap, but the gains it delivers are substantial in the areas that matter most to developers building real-world AI-powered applications: agentic software engineering, multimodal reasoning, and long-running autonomous task execution. https://www.anthropic.com/news/claude-opus-4-7 What Exactly is Claude Opus 4.7? Anthropic maintains a model family with tiers — Haiku (fast and lightweight), Sonnet (balanced), and Opus (highest capability). Opus 4.7 sits at the top of this stack, below only the newly previewed Claude Mythos, which Anthropic has kept in a restricted release. Opus 4.7 represents a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Crucially, users report being able to hand off their hardest coding work — the kind that previously needed close supervision — to Opus 4.7 with confidence, as it handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back. The model verifying its own outputs is a meaningful behavioral shift. Earlier models often produced results without internal sanity checks; Opus 4.7 appears to close that loop autonomously, which has significant implications for CI/CD pipelines and multi-step agentic workflows. Stronger Coding Benchmarks Early testers have put some sharp numbers on the coding improvements. On a 93-task coding benchmark, Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. On CursorBench — a widely-used developer evaluation harness — Opus 4.7 cleared 70% versus Opus 4.6 at 58%. And for complex multi-step workflows, one tester observed a 14% gain over Opus 4.6 at fewer tokens and a third of the tool errors — and notably, Opus 4.7 was the first model to pass their implicit-need tests, continuing to execute through tool failures that used to stop Opus cold. Improved Vision: 3× the Resolution of Prior Models One of the most technically concrete upgrades in Opus 4.7 is its multimodal capability. Opus 4.7 can now accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many pixels as prior Claude models. Many real-world applications — from computer-use agents reading dense UI screenshots to data extraction from complex engineering diagrams — fail not because the model lacks reasoning ability, but because it can’t resolve fine visual detail. This opens up a wealth of multimodal uses that depend on fine visual detail: computer-use agents reading dense screenshots, data extractions from complex diagrams, and work that needs pixel-perfect references. The impact in production has already been dramatic. One tester working on computer-use workflows reported that Opus 4.7 scored 98.5% on their visual-acuity benchmark versus 54.5% for Opus 4.6 — effectively eliminating their single biggest Opus pain point. This is a model-level change rather than an API parameter, so images users send to Claude will simply be processed at higher fidelity — though because higher-resolution images consume more tokens, users who don’t require the extra detail can downsample images before sending them to the model. https://www.anthropic.com/news/claude-opus-4-7 A New Effort Level: xhigh, Plus Task Budgets Developers working with the Claude API will notice two new levers for controlling compute spend. First, Opus 4.7 introduces a new xhigh (‘extra high’) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, Anthropic team has raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, Anthropic recommends starting with high or xhigh effort. Second, task budgets are now launching in public beta on the Claude Platform API, giving developers a way to guide Claude’s token spend so it can prioritize work across longer runs. Together, these two controls give developer teams meaningful production levers — especially relevant when running parallelized agent pipelines where per-call cost and latency must be managed carefully. New in Claude Code: /ultrareview and Auto Mode for Max Users Two new Claude Code features ship alongside Opus 4.7 that are worth flagging for devs who use it as part of their development workflow. The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. Anthropic is giving Pro and Max Claude Code users three free ultrareviews to try it out. Think of it as a senior engineer review pass on demand — useful before merging complex PRs or shipping to production. Additionally, auto mode has been extended to Max users. Auto mode is a new permissions option where Claude makes decisions on your behalf, meaning that you can run longer tasks with fewer interruptions — and with less risk than if you had chosen to skip all permissions. This is particularly valuable for agents executing multi-step tasks overnight or across large codebases. File System-Based Memory for Long Multi-Session Work A less-discussed but operationally significant improvement is how Opus 4.7 handles memory. Opus 4.7 is better at using file system-based memory — it remembers important notes across long, multi-session work and uses them to move on to new tasks that, as a result, need less up-front context. On third-party benchmarks, the model also achieved state-of-the-art results on GDPval-AA, a third-party evaluation of economically valuable knowledge work across finance, legal, and other domains. Key Takeaways Claude Opus 4.7 is Anthropic’s strongest coding model to date, handling complex, long-running agentic tasks with far less supervision than Opus 4.6 — and uniquely verifies its own outputs before reporting back. Vision capability has tripled, with support for images up to ~3.75 megapixels, making it significantly more reliable for computer-use agents, diagram parsing, and any workflow that depends on fine visual detail. A new xhigh effort level and task budgets give developers precise control over the reasoning-vs-latency tradeoff and token spend — critical levers

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks Lire l’article »

AI, Committee, Actualités, Uncategorized

A Coding Tutorial for Running PrismML Bonsai 1-Bit LLM on CUDA with GGUF, Benchmarking, Chat, JSON, and RAG

In this tutorial, we implement how to run the Bonsai 1-bit large language model efficiently using GPU acceleration and PrismML’s optimized GGUF deployment stack. We set up the environment, install the required dependencies, and download the prebuilt llama.cpp binaries, and load the Bonsai-1.7B model for fast inference on CUDA. As we progress, we examine how 1-bit quantization works under the hood, why the Q1_0_g128 format is so memory-efficient, and how this makes Bonsai practical for lightweight yet capable language model deployment. We also test core inference, benchmarking, multi-turn chat, structured JSON generation, code generation, OpenAI-compatible server mode, and a small retrieval-augmented generation workflow, giving us a complete, hands-on view of how Bonsai operates in real-world use. Copy CodeCopiedUse a different Browser import os, sys, subprocess, time, json, urllib.request, tarfile, textwrap try: import google.colab IN_COLAB = True except ImportError: IN_COLAB = False def section(title): bar = “═” * 60 print(f”n{bar}n {title}n{bar}”) section(“1 · Environment & GPU Check”) def run(cmd, capture=False, check=True, **kw): return subprocess.run( cmd, shell=True, capture_output=capture, text=True, check=check, **kw ) gpu_info = run(“nvidia-smi –query-gpu=name,memory.total,driver_version –format=csv,noheader”, capture=True, check=False) if gpu_info.returncode == 0: print(” GPU detected:”, gpu_info.stdout.strip()) else: print(” No GPU found — inference will run on CPU (much slower).”) cuda_check = run(“nvcc –version”, capture=True, check=False) if cuda_check.returncode == 0: for line in cuda_check.stdout.splitlines(): if “release” in line: print(” CUDA:”, line.strip()) break print(f” Python {sys.version.split()[0]} | Platform: Linux (Colab)”) section(“2 · Installing Python Dependencies”) run(“pip install -q huggingface_hub requests tqdm openai”) print(” huggingface_hub, requests, tqdm, openai installed”) from huggingface_hub import hf_hub_download We begin by importing the core Python modules that we need for system operations, downloads, timing, and JSON handling. We check whether we are running inside Google Colab, define a reusable section printer, and create a helper function to run shell commands cleanly from Python. We then verify the GPU and CUDA environment, print the Python runtime details, install the required Python dependencies, and prepare the Hugging Face download utility for the next stages. Copy CodeCopiedUse a different Browser section(“3 · Downloading PrismML llama.cpp Prebuilt Binaries”) RELEASE_TAG = “prism-b8194-1179bfc” BASE_URL = f”https://github.com/PrismML-Eng/llama.cpp/releases/download/{RELEASE_TAG}” BIN_DIR = “/content/bonsai_bin” os.makedirs(BIN_DIR, exist_ok=True) def detect_cuda_build(): r = run(“nvcc –version”, capture=True, check=False) for line in r.stdout.splitlines(): if “release” in line: try: ver = float(line.split(“release”)[-1].strip().split(“,”)[0].strip()) if ver >= 13.0: return “13.1” if ver >= 12.6: return “12.8” return “12.4” except ValueError: pass return “12.4” cuda_build = detect_cuda_build() print(f” Detected CUDA build slot: {cuda_build}”) TAR_NAME = f”llama-{RELEASE_TAG}-bin-linux-cuda-{cuda_build}-x64.tar.gz” TAR_URL = f”{BASE_URL}/{TAR_NAME}” tar_path = f”/tmp/{TAR_NAME}” if not os.path.exists(f”{BIN_DIR}/llama-cli”): print(f” Downloading: {TAR_URL}”) urllib.request.urlretrieve(TAR_URL, tar_path) print(” Extracting …”) with tarfile.open(tar_path, “r:gz”) as t: t.extractall(BIN_DIR) for fname in os.listdir(BIN_DIR): fp = os.path.join(BIN_DIR, fname) if os.path.isfile(fp): os.chmod(fp, 0o755) print(f” Binaries extracted to {BIN_DIR}”) bins = sorted(f for f in os.listdir(BIN_DIR) if os.path.isfile(os.path.join(BIN_DIR, f))) print(” Available:”, “, “.join(bins)) else: print(f” Binaries already present at {BIN_DIR}”) LLAMA_CLI = f”{BIN_DIR}/llama-cli” LLAMA_SERVER = f”{BIN_DIR}/llama-server” test = run(f”{LLAMA_CLI} –version”, capture=True, check=False) if test.returncode == 0: print(f” llama-cli version: {test.stdout.strip()[:80]}”) else: print(f” llama-cli test failed: {test.stderr.strip()[:200]}”) section(“4 · Downloading Bonsai-1.7B GGUF Model”) MODEL_REPO = “prism-ml/Bonsai-1.7B-gguf” MODEL_DIR = “/content/bonsai_models” GGUF_FILENAME = “Bonsai-1.7B.gguf” os.makedirs(MODEL_DIR, exist_ok=True) MODEL_PATH = os.path.join(MODEL_DIR, GGUF_FILENAME) if not os.path.exists(MODEL_PATH): print(f” Downloading {GGUF_FILENAME} (~248 MB) from HuggingFace …”) MODEL_PATH = hf_hub_download( repo_id=MODEL_REPO, filename=GGUF_FILENAME, local_dir=MODEL_DIR, ) print(f” Model saved to: {MODEL_PATH}”) else: print(f” Model already cached: {MODEL_PATH}”) size_mb = os.path.getsize(MODEL_PATH) / 1e6 print(f” File size on disk: {size_mb:.1f} MB”) section(“5 · Core Inference Helpers”) DEFAULT_GEN_ARGS = dict( temp=0.5, top_p=0.85, top_k=20, repeat_penalty=1.0, n_predict=256, n_gpu_layers=99, ctx_size=4096, ) def build_llama_cmd(prompt, system_prompt=”You are a helpful assistant.”, **overrides): args = {**DEFAULT_GEN_ARGS, **overrides} formatted = ( f”<|im_start|>systemn{system_prompt}<|im_end|>n” f”<|im_start|>usern{prompt}<|im_end|>n” f”<|im_start|>assistantn” ) safe_prompt = formatted.replace(‘”‘, ‘\”‘) return ( f'{LLAMA_CLI} -m “{MODEL_PATH}”‘ f’ -p “{safe_prompt}”‘ f’ -n {args[“n_predict”]}’ f’ –temp {args[“temp”]}’ f’ –top-p {args[“top_p”]}’ f’ –top-k {args[“top_k”]}’ f’ –repeat-penalty {args[“repeat_penalty”]}’ f’ -ngl {args[“n_gpu_layers”]}’ f’ -c {args[“ctx_size”]}’ f’ –no-display-prompt’ f’ -e’ ) def infer(prompt, system_prompt=”You are a helpful assistant.”, verbose=True, **overrides): cmd = build_llama_cmd(prompt, system_prompt, **overrides) t0 = time.time() result = run(cmd, capture=True, check=False) elapsed = time.time() – t0 output = result.stdout.strip() if verbose: print(f”n{‘─’*50}”) print(f”Prompt : {prompt[:100]}{‘…’ if len(prompt) > 100 else ”}”) print(f”{‘─’*50}”) print(output) print(f”{‘─’*50}”) print(f” {elapsed:.2f}s | ~{len(output.split())} words”) return output, elapsed print(” Inference helpers ready.”) section(“6 · Basic Inference — Hello, Bonsai!”) infer(“What makes 1-bit language models special compared to standard models?”) We download and prepare the PrismML prebuilt llama.cpp CUDA binaries that power local inference for the Bonsai model. We detect the available CUDA version, choose the matching binary build, extract the downloaded archive, make the files executable, and verify that the llama-cli binary works correctly. After that, we download the Bonsai-1.7B GGUF model from Hugging Face, set up the model path, define the default generation settings, and build the core helper functions that format prompts and run inference. Copy CodeCopiedUse a different Browser section(“7 · Q1_0_g128 Quantization — What’s Happening Under the Hood”) print(textwrap.dedent(“”” ╔══════════════════════════════════════════════════════════════╗ ║ Bonsai Q1_0_g128 Weight Representation ║ ╠══════════════════════════════════════════════════════════════╣ ║ Each weight = 1 bit: 0 → −scale ║ ║ 1 → +scale ║ ║ Every 128 weights share one FP16 scale factor. ║ ║ ║ ║ Effective bits per weight: ║ ║ 1 bit (sign) + 16/128 bits (shared scale) = 1.125 bpw ║ ║ ║ ║ Memory comparison for Bonsai-1.7B: ║ ║ FP16: 3.44 GB (1.0× baseline) ║ ║ Q1_0_g128: 0.24 GB (14.2× smaller!) ║ ║ MLX 1-bit g128: 0.27 GB (12.8× smaller) ║ ╚══════════════════════════════════════════════════════════════╝ “””)) print(” Python demo of Q1_0_g128 quantization logic:n”) import random random.seed(42) GROUP_SIZE = 128 weights_fp16 = [random.gauss(0, 0.1) for _ in range(GROUP_SIZE)] scale = max(abs(w) for w in weights_fp16) quantized = [1 if w >= 0 else 0 for w in weights_fp16] dequantized = [scale if b == 1 else -scale for b in quantized] mse = sum((a – b) ** 2 for a, b in zip(weights_fp16, dequantized)) / GROUP_SIZE print(f” FP16 weights (first 8): {[f'{w:.4f}’ for w in weights_fp16[:8]]}”) print(f” 1-bit repr (first 8): {quantized[:8]}”) print(f” Shared scale: {scale:.4f}”) print(f” Dequantized (first 8): {[f'{w:.4f}’ for w in dequantized[:8]]}”) print(f” MSE of reconstruction: {mse:.6f}”) memory_fp16 = GROUP_SIZE * 2 memory_1bit = GROUP_SIZE / 8 + 2 print(f”n Memory: FP16={memory_fp16}B vs Q1_0_g128={memory_1bit:.1f}B ” f”({memory_fp16/memory_1bit:.1f}× reduction)”) section(“8 ·

A Coding Tutorial for Running PrismML Bonsai 1-Bit LLM on CUDA with GGUF, Benchmarking, Chat, JSON, and RAG Lire l’article »

AI, Committee, Actualités, Uncategorized

A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design

In this tutorial, we explore property-based testing using Hypothesis and build a rigorous testing pipeline that goes far beyond traditional unit testing. We implement invariants, differential testing, metamorphic testing, targeted exploration, and stateful testing to validate both functional correctness and behavioral guarantees of our systems. Instead of manually crafting edge cases, we let Hypothesis generate structured inputs, shrink failures to minimal counterexamples, and systematically uncover hidden bugs. Also, we demonstrate how modern testing practices can be integrated directly into experimental and research-driven workflows. Copy CodeCopiedUse a different Browser import sys, textwrap, subprocess, os, re, math !{sys.executable} -m pip -q install hypothesis pytest test_code = r”’ import re, math import pytest from hypothesis import ( given, assume, example, settings, note, target, HealthCheck, Phase ) from hypothesis import strategies as st from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, initialize, precondition def clamp(x: int, lo: int, hi: int) -> int: if x < lo: return lo if x > hi: return hi return x def normalize_whitespace(s: str) -> str: return ” “.join(s.split()) def is_sorted_non_decreasing(xs): return all(xs[i] <= xs[i+1] for i in range(len(xs)-1)) def merge_sorted(a, b): i = j = 0 out = [] while i < len(a) and j < len(b): if a[i] <= b[j]: out.append(a[i]); i += 1 else: out.append(b[j]); j += 1 out.extend(a[i:]) out.extend(b[j:]) return out def merge_sorted_reference(a, b): return sorted(list(a) + list(b)) We set up the environment by installing Hypothesis and pytest and importing all required modules. We begin constructing the full test suite by defining core utility functions such as clamp, normalize_whitespace, and merge_sorted. We establish the functional foundation that our property-based tests will rigorously validate in later snippets. Copy CodeCopiedUse a different Browser def safe_parse_int(s: str): t = s.strip() if re.fullmatch(r”[+-]?d+”, t) is None: return (False, “not_an_int”) if len(t.lstrip(“+-“)) > 2000: return (False, “too_big”) try: return (True, int(t)) except Exception: return (False, “parse_error”) def safe_parse_int_alt(s: str): t = s.strip() if not t: return (False, “not_an_int”) sign = 1 if t[0] == “+”: t = t[1:] elif t[0] == “-“: sign = -1 t = t[1:] if not t or any(ch < “0” or ch > “9” for ch in t): return (False, “not_an_int”) if len(t) > 2000: return (False, “too_big”) val = 0 for ch in t: val = val * 10 + (ord(ch) – 48) return (True, sign * val) bounds = st.tuples(st.integers(-10_000, 10_000), st.integers(-10_000, 10_000)).map( lambda t: (t[0], t[1]) if t[0] <= t[1] else (t[1], t[0]) ) @st.composite def int_like_strings(draw): sign = draw(st.sampled_from([“”, “+”, “-“])) digits = draw(st.text(alphabet=st.characters(min_codepoint=48, max_codepoint=57), min_size=1, max_size=300)) left_ws = draw(st.text(alphabet=[” “, “t”, “n”], min_size=0, max_size=5)) right_ws = draw(st.text(alphabet=[” “, “t”, “n”], min_size=0, max_size=5)) return f”{left_ws}{sign}{digits}{right_ws}” sorted_lists = st.lists(st.integers(-10_000, 10_000), min_size=0, max_size=200).map(sorted) We implement parsing logic and define structured strategies that generate constrained, meaningful test inputs. We create composite strategies such as int_like_strings to precisely control the input space for property validation. We prepare sorted list generators and bounds strategies that enable differential and invariant-based testing. Copy CodeCopiedUse a different Browser @settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow]) @given(x=st.integers(-50_000, 50_000), b=bounds) def test_clamp_within_bounds(x, b): lo, hi = b y = clamp(x, lo, hi) assert lo <= y <= hi @settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow]) @given(x=st.integers(-50_000, 50_000), b=bounds) def test_clamp_idempotent(x, b): lo, hi = b y = clamp(x, lo, hi) assert clamp(y, lo, hi) == y @settings(max_examples=250) @given(s=st.text()) @example(” attb n c “) def test_normalize_whitespace_is_idempotent(s): t = normalize_whitespace(s) assert normalize_whitespace(t) == t assert normalize_whitespace(” nt ” + s + ” t”) == normalize_whitespace(s) @settings(max_examples=250, suppress_health_check=[HealthCheck.too_slow]) @given(a=sorted_lists, b=sorted_lists) def test_merge_sorted_matches_reference(a, b): out = merge_sorted(a, b) ref = merge_sorted_reference(a, b) assert out == ref assert is_sorted_non_decreasing(out) We define core property tests that validate correctness and idempotence across multiple functions. We use Hypothesis decorators to automatically explore edge cases and verify behavioral guarantees such as boundary constraints and deterministic normalization. We also implement differential testing to ensure our merge implementation matches a trusted reference. Copy CodeCopiedUse a different Browser @settings(max_examples=250, deadline=200, suppress_health_check=[HealthCheck.too_slow]) @given(s=int_like_strings()) def test_two_parsers_agree_on_int_like_strings(s): ok1, v1 = safe_parse_int(s) ok2, v2 = safe_parse_int_alt(s) assert ok1 and ok2 assert v1 == v2 @settings(max_examples=250) @given(s=st.text(min_size=0, max_size=200)) def test_safe_parse_int_rejects_non_ints(s): t = s.strip() m = re.fullmatch(r”[+-]?d+”, t) ok, val = safe_parse_int(s) if m is None: assert ok is False else: if len(t.lstrip(“+-“)) > 2000: assert ok is False and val == “too_big” else: assert ok is True and isinstance(val, int) def variance(xs): if len(xs) < 2: return 0.0 mu = sum(xs) / len(xs) return sum((x – mu) ** 2 for x in xs) / (len(xs) – 1) @settings(max_examples=250, phases=[Phase.generate, Phase.shrink]) @given(xs=st.lists(st.integers(-1000, 1000), min_size=0, max_size=80)) def test_statistics_sanity(xs): target(variance(xs)) if len(xs) == 0: assert variance(xs) == 0.0 elif len(xs) == 1: assert variance(xs) == 0.0 else: v = variance(xs) assert v >= 0.0 k = 7 assert math.isclose(variance([x + k for x in xs]), v, rel_tol=1e-12, abs_tol=1e-12) We extend our validation to parsing robustness and statistical correctness using targeted exploration. We verify that two independent integer parsers agree on structured inputs and enforce rejection rules on invalid strings. We further implement metamorphic testing by validating invariants of variance under transformation. Copy CodeCopiedUse a different Browser class Bank: def __init__(self): self.balance = 0 self.ledger = [] def deposit(self, amt: int): if amt <= 0: raise ValueError(“deposit must be positive”) self.balance += amt self.ledger.append((“dep”, amt)) def withdraw(self, amt: int): if amt <= 0: raise ValueError(“withdraw must be positive”) if amt > self.balance: raise ValueError(“insufficient funds”) self.balance -= amt self.ledger.append((“wd”, amt)) def replay_balance(self): bal = 0 for typ, amt in self.ledger: bal += amt if typ == “dep” else -amt return bal class BankMachine(RuleBasedStateMachine): def __init__(self): super().__init__() self.bank = Bank() @initialize() def init(self): assert self.bank.balance == 0 assert self.bank.replay_balance() == 0 @rule(amt=st.integers(min_value=1, max_value=10_000)) def deposit(self, amt): self.bank.deposit(amt) @precondition(lambda self: self.bank.balance > 0) @rule(amt=st.integers(min_value=1, max_value=10_000)) def withdraw(self, amt): assume(amt <= self.bank.balance) self.bank.withdraw(amt) @invariant() def balance_never_negative(self): assert self.bank.balance >= 0 @invariant() def ledger_replay_matches_balance(self): assert self.bank.replay_balance() == self.bank.balance TestBankMachine = BankMachine.TestCase ”’ path = “/tmp/test_hypothesis_advanced.py” with open(path, “w”, encoding=”utf-8″) as f: f.write(test_code) print(“Hypothesis version:”, __import__(“hypothesis”).__version__) print(“nRunning pytest on:”, path, “n”) res = subprocess.run([sys.executable, “-m”, “pytest”, “-q”, path], capture_output=True, text=True) print(res.stdout) if res.returncode != 0: print(res.stderr) if res.returncode == 0: print(“nAll Hypothesis tests passed.”) elif res.returncode == 5:

A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design Lire l’article »

AI, Committee, Actualités, Uncategorized

NVIDIA Releases Ising: the First Open Quantum AI Model Family for Hybrid Quantum-Classical Systems

Quantum computing has spent years living in the future tense. Hardware has improved, research has compounded, and venture dollars have followed — but the gap between a quantum processor running in a lab and one running a real-world application remains stubbornly wide. NVIDIA moved to close that gap with the launch of NVIDIA Ising, the world’s first family of open quantum AI models specifically designed to help researchers and enterprises build quantum processors capable of running useful applications. Here’s the core problem Ising is designed to solve: quantum computers are extraordinarily sensitive. Their fundamental unit of computation, the qubit, is so easily disturbed by environmental noise that errors accumulate rapidly during computation. Before you can run anything meaningful on a quantum processor, two things have to work well — calibration (making sure the hardware is tuned and operating correctly) and error correction (detecting and fixing errors as they occur in real time). Both of these have historically been manual, slow, and difficult to scale. NVIDIA is betting that AI can automate both. What the Ising Model Family Actually Includes NVIDIA Ising includes two distinct components: Ising Calibration and Ising Decoding. Ising Calibration is a vision language model — a model architecture familiar to anyone who has worked with multimodal AI — that is designed to rapidly interpret and react to measurements from quantum processors. Think of it as an AI agent that continuously watches diagnostic readouts from quantum hardware and autonomously adjusts the system to keep it running optimally. This enables AI agents to automate continuous calibration, reducing the time needed from days to hours. That’s not a minor speedup — in quantum hardware development, days of calibration time between experiments is a major bottleneck. Ising Decoding comes in two variants of a 3D convolutional neural network (3D CNN) model, each optimized for different trade-offs: one tuned for speed and the other tuned for accuracy. These models perform real-time decoding for quantum error correction. If you’ve worked with signal processing or sequence modeling, error correction decoding is conceptually similar — you’re trying to infer what the ‘correct’ state of the system should be, given noisy observations. Ising Decoding models are up to 2.5x faster and 3x more accurate than pyMatching, the current open-source industry standard. The Ecosystem Is Already Moving Ising Calibration is already in use by Atom Computing, Academia Sinica, EeroQ, Conductor Quantum, Fermi National Accelerator Laboratory, Harvard John A. Paulson School of Engineering and Applied Sciences, Infleqtion, IonQ, IQM Quantum Computers, Lawrence Berkeley National Laboratory’s Advanced Quantum Testbed, Q-CTRL, and the U.K. National Physical Laboratory. Ising Decoding is being deployed by Cornell University, EdenCode, Infleqtion, IQM Quantum Computers, Quantum Elements, Sandia National Laboratories, SEEQC, University of California San Diego, UC Santa Barbara, University of Chicago, University of Southern California, and Yonsei University. That’s a remarkably broad day-one adoption spanning national labs, Ivy League institutions, and commercial quantum hardware companies across multiple qubit modalities. How It Fits Into NVIDIA’s Quantum Stack NVIDIA Ising complements the NVIDIA CUDA-Q software platform for hybrid quantum-classical computing and integrates with the NVIDIA NVQLink QPU-GPU hardware interconnect for real-time control and quantum error correction. CUDA-Q is NVIDIA’s broader programming model for hybrid quantum-classical workflows — if you’ve written CUDA kernels for GPU acceleration, CUDA-Q follows a similar philosophy of tightly coupling classical and accelerated compute. NVQLink is the hardware bridge that lets GPUs communicate with quantum processing units (QPUs) at the latency required for real-time error correction. Key Takeaways NVIDIA Ising is the world’s first family of open quantum AI models, purpose-built to solve the two hardest engineering problems blocking practical quantum computing — calibration and error correction — using AI instead of slow, manual processes. Ising Calibration uses a vision language model to autonomously tune quantum processors, reducing the time required for continuous calibration from days to hours by enabling AI agents to interpret and react to hardware measurements in real time. Ising Decoding uses a 3D convolutional neural network (3D CNN) to perform real-time quantum error correction, delivering up to 2.5x faster performance and 3x higher accuracy compared to pyMatching. Adoption is already broad and diverse on day one, with leading institutions including Fermi National Accelerator Laboratory, Harvard, Lawrence Berkeley National Laboratory’s Advanced Quantum Testbed, IQM Quantum Computers, Sandia National Laboratories, and over a dozen universities and enterprises deploying Ising Calibration and Ising Decoding across multiple qubit modalities. Ising integrates directly into NVIDIA’s full quantum-classical software and hardware stack, complementing the NVIDIA CUDA-Q platform for hybrid quantum-classical computing and the NVIDIA NVQLink QPU-GPU hardware interconnect, with models available on GitHub, Hugging Face, and build.nvidia.com and fine-tunable via NVIDIA NIM microservices. Check out the Technical details and Product Page here. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post NVIDIA Releases Ising: the First Open Quantum AI Model Family for Hybrid Quantum-Classical Systems appeared first on MarkTechPost.

NVIDIA Releases Ising: the First Open Quantum AI Model Family for Hybrid Quantum-Classical Systems Lire l’article »

AI, Committee, Actualités, Uncategorized

xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers

Elon Musk’s AI company xAI has launched two standalone audio APIs — a Speech-to-Text (STT) API and a Text-to-Speech (TTS) API — both built on the same infrastructure that powers Grok Voice on mobile apps, Tesla vehicles, and Starlink customer support. The release moves xAI squarely into the competitive speech API market currently occupied by ElevenLabs, Deepgram, and AssemblyAI. What Is the Grok Speech-to-Text API? Speech-to-Text is the technology that converts spoken audio into written text. For developers building meeting transcription tools, voice agents, call center analytics, or accessibility features, an STT API is a core building block. Rather than developing this from scratch, developers call an endpoint, send audio, and receive a structured transcript in return. The Grok STT API is now generally available, offering transcription across 25 languages with both batch and streaming modes. The batch mode is designed for processing pre-recorded audio files, while streaming enables real-time transcription as audio is captured. Pricing is kept straightforward: Speech-to-Text is $0.10 per hour for batch and $0.20 per hour for streaming. The API includes word-level timestamps, speaker diarization, and multichannel support, along with intelligent Inverse Text Normalization that correctly handles numbers, dates, currencies, and more. It also accepts 12 audio formats — nine container formats (WAV, MP3, OGG, Opus, FLAC, AAC, MP4, M4A, MKV) and three raw formats (PCM, µ-law, A-law), with a maximum file size of 500 MB per request. Speaker diarization is the process of separating audio by individual speakers — answering the question ‘who said what.’ This is critical for multi-speaker recordings like meetings, interviews, or customer calls. Word-level timestamps assign precise start and end times to each word in the transcript, enabling use cases like subtitle generation, searchable recordings, and legal documentation. Inverse Text Normalization converts spoken forms like ‘one hundred sixty-seven thousand nine hundred eighty-three dollars and fifteen cents’ into readable structured output: “$167,983.15.”. Benchmark Performance xAI research team is making strong claims on accuracy. On phone call entity recognition — names, account numbers, dates — Grok STT claims a 5.0% error rate versus ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%. That is a substantial margin if it holds in production. For video and podcast transcription, Grok and ElevenLabs tied at a 2.4% error rate, with Deepgram and AssemblyAI trailing at 3.0% and 3.2% respectively. xAI team also reports a 6.9% word error rate on general audio benchmarks. https://x.ai/news/grok-stt-and-tts-apis https://x.ai/news/grok-stt-and-tts-apis What is the Grok Text-to-Speech API? Text-to-Speech converts written text into spoken audio. Developers use TTS APIs to power voice assistants, read-aloud features, podcast generation, IVR (interactive voice response) systems, and accessibility tools. The Grok TTS API delivers fast, natural speech synthesis with detailed control via speech tags, and is priced at $4.20 per 1 million characters. The API accepts up to 15,000 characters per REST request; for longer content, a WebSocket streaming endpoint is available that has no text length limit and begins returning audio before the full input is processed. The API supports 20 languages and five distinct voices: Ara, Eve, Leo, Rex, and Sal — with Eve set as the default. Beyond voice selection, developers can inject inline and wrapping speech tags to control delivery. These include inline tags like [laugh], [sigh], and [breath], and wrapping tags like <whisper>text</whisper> and <emphasis>text</emphasis>, letting developers create engaging, lifelike delivery without complex markup. This expressiveness addresses one of the core limitations of traditional TTS systems, which often produce technically correct but emotionally flat output. Key Takeaways xAI has launched two standalone audio APIs — Grok Speech-to-Text (STT) and Text-to-Speech (TTS) — built on the same production stack already serving millions of users across Grok mobile apps, Tesla vehicles, and Starlink customer support. The Grok STT API offers real-time and batch transcription across 25 languages with speaker diarization, word-level timestamps, Inverse Text Normalization, and support for 12 audio formats — priced at $0.10/hour for batch and $0.20/hour for streaming. On phone call entity recognition benchmarks, Grok STT reports a 5.0% error rate, significantly outperforming ElevenLabs (12.0%), Deepgram (13.5%), and AssemblyAI (21.3%), with particularly strong performance in medical, legal, and financial use cases. The Grok TTS API supports five expressive voices (Ara, Eve, Leo, Rex, Sal) across 20 languages, with inline and wrapping speech tags like [laugh], [sigh], and <whisper> giving developers fine-grained control over vocal delivery — priced at $4.20 per 1 million characters. Check out the Technical details here. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers appeared first on MarkTechPost.

xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers Lire l’article »

We use cookies to improve your experience and performance on our website. You can learn more at Politique de confidentialité and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
fr_FR