AI Archives - Page 7 sur 176

AI, Committee, Actualités, Uncategorized

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context

admin NU / avril 9, 2026

A deep neural network can be understood as a geometric system, where each layer reshapes the input space to form increasingly complex decision boundaries. For this to work effectively, layers must preserve meaningful spatial information — particularly how far a data point lies from these boundaries — since this distance enables deeper layers to build rich, non-linear representations. Sigmoid disrupts this process by compressing all inputs into a narrow range between 0 and 1. As values move away from decision boundaries, they become indistinguishable, causing a loss of geometric context across layers. This leads to weaker representations and limits the effectiveness of depth. ReLU, on the other hand, preserves magnitude for positive inputs, allowing distance information to flow through the network. This enables deeper models to remain expressive without requiring excessive width or compute. In this article, we focus on this forward-pass behavior — analyzing how Sigmoid and ReLU differ in signal propagation and representation geometry using a two-moons experiment, and what that means for inference efficiency and scalability. Setting up the dependencies Copy CodeCopiedUse a different Browser import numpy as np import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec from matplotlib.colors import ListedColormap from sklearn.datasets import make_moons from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split Copy CodeCopiedUse a different Browser plt.rcParams.update({ “font.family”: “monospace”, “axes.spines.top”: False, “axes.spines.right”: False, “figure.facecolor”: “white”, “axes.facecolor”: “#f7f7f7”, “axes.grid”: True, “grid.color”: “#e0e0e0”, “grid.linewidth”: 0.6, }) T = { “bg”: “white”, “panel”: “#f7f7f7”, “sig”: “#e05c5c”, “relu”: “#3a7bd5”, “c0”: “#f4a261”, “c1”: “#2a9d8f”, “text”: “#1a1a1a”, “muted”: “#666666”, } Creating the dataset To study the effect of activation functions in a controlled setting, we first generate a synthetic dataset using scikit-learn’s make_moons. This creates a non-linear, two-class problem where simple linear boundaries fail, making it ideal for testing how well neural networks learn complex decision surfaces. We add a small amount of noise to make the task more realistic, then standardize the features using StandardScaler so both dimensions are on the same scale — ensuring stable training. The dataset is then split into training and test sets to evaluate generalization. Finally, we visualize the data distribution. This plot serves as the baseline geometry that both Sigmoid and ReLU networks will attempt to model, allowing us to later compare how each activation function transforms this space across layers. Copy CodeCopiedUse a different Browser X, y = make_moons(n_samples=400, noise=0.18, random_state=42) X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.25, random_state=42 ) fig, ax = plt.subplots(figsize=(7, 5)) fig.patch.set_facecolor(T[“bg”]) ax.set_facecolor(T[“panel”]) ax.scatter(X[y == 0, 0], X[y == 0, 1], c=T[“c0″], s=40, edgecolors=”white”, linewidths=0.5, label=”Class 0″, alpha=0.9) ax.scatter(X[y == 1, 0], X[y == 1, 1], c=T[“c1″], s=40, edgecolors=”white”, linewidths=0.5, label=”Class 1″, alpha=0.9) ax.set_title(“make_moons — our dataset”, color=T[“text”], fontsize=13) ax.set_xlabel(“x₁”, color=T[“muted”]); ax.set_ylabel(“x₂”, color=T[“muted”]) ax.tick_params(colors=T[“muted”]); ax.legend(fontsize=10) plt.tight_layout() plt.savefig(“moons_dataset.png”, dpi=140, bbox_inches=”tight”) plt.show() Creating the Network Next, we implement a small, controlled neural network to isolate the effect of activation functions. The goal here is not to build a highly optimized model, but to create a clean experimental setup where Sigmoid and ReLU can be compared under identical conditions. We define both activation functions (Sigmoid and ReLU) along with their derivatives, and use binary cross-entropy as the loss since this is a binary classification task. The TwoLayerNet class represents a simple 3-layer feedforward network (2 hidden layers + output), where the only configurable component is the activation function. A key detail is the initialization strategy: we use He initialization for ReLU and Xavier initialization for Sigmoid, ensuring that each network starts in a fair and stable regime based on its activation dynamics. The forward pass computes activations layer by layer, while the backward pass performs standard gradient descent updates. Importantly, we also include diagnostic methods like get_hidden and get_z_trace, which allow us to inspect how signals evolve across layers — this is crucial for analyzing how much geometric information is preserved or lost. By keeping architecture, data, and training setup constant, this implementation ensures that any difference in performance or internal representations can be directly attributed to the activation function itself — setting the stage for a clear comparison of their impact on signal propagation and expressiveness. Copy CodeCopiedUse a different Browser def sigmoid(z): return 1 / (1 + np.exp(-np.clip(z, -500, 500))) def sigmoid_d(a): return a * (1 – a) def relu(z): return np.maximum(0, z) def relu_d(z): return (z > 0).astype(float) def bce(y, yhat): return -np.mean(y * np.log(yhat + 1e-9) + (1 – y) * np.log(1 – yhat + 1e-9)) class TwoLayerNet: def __init__(self, activation=”relu”, seed=0): np.random.seed(seed) self.act_name = activation self.act = relu if activation == “relu” else sigmoid self.dact = relu_d if activation == “relu” else sigmoid_d # He init for ReLU, Xavier for Sigmoid scale = lambda fan_in: np.sqrt(2 / fan_in) if activation == “relu” else np.sqrt(1 / fan_in) self.W1 = np.random.randn(2, 8) * scale(2) self.b1 = np.zeros((1, 8)) self.W2 = np.random.randn(8, 8) * scale(8) self.b2 = np.zeros((1, 8)) self.W3 = np.random.randn(8, 1) * scale(8) self.b3 = np.zeros((1, 1)) self.loss_history = [] def forward(self, X, store=False): z1 = X @ self.W1 + self.b1; a1 = self.act(z1) z2 = a1 @ self.W2 + self.b2; a2 = self.act(z2) z3 = a2 @ self.W3 + self.b3; out = sigmoid(z3) if store: self._cache = (X, z1, a1, z2, a2, z3, out) return out def backward(self, lr=0.05): X, z1, a1, z2, a2, z3, out = self._cache n = X.shape[0] dout = (out – self.y_cache) / n dW3 = a2.T @ dout; db3 = dout.sum(axis=0, keepdims=True) da2 = dout @ self.W3.T dz2 = da2 * (self.dact(z2) if self.act_name == “relu” else self.dact(a2)) dW2 = a1.T @ dz2; db2 = dz2.sum(axis=0, keepdims=True) da1 = dz2 @ self.W2.T dz1 = da1 * (self.dact(z1) if self.act_name == “relu” else self.dact(a1)) dW1 = X.T @ dz1; db1 = dz1.sum(axis=0, keepdims=True) for p, g in [(self.W3,dW3),(self.b3,db3),(self.W2,dW2), (self.b2,db2),(self.W1,dW1),(self.b1,db1)]: p -= lr * g def train_step(self, X, y, lr=0.05): self.y_cache = y.reshape(-1, 1) out = self.forward(X, store=True) loss = bce(self.y_cache, out) self.backward(lr) return loss def get_hidden(self, X, layer=1): “””Return post-activation values for layer 1 or 2.””” z1 = X @ self.W1 + self.b1; a1 = self.act(z1) if

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context Lire l’article »

AI, Committee, Actualités, Uncategorized

Is fake grass a bad idea? The AstroTurf wars are far from over.

admin NU / avril 9, 2026

A rare warm spell in January melted enough snow to uncover Cornell University’s newest athletic field, built for field hockey. Months before, it was a meadow teeming with birds and bugs; now it’s more than an acre of synthetic turf roughly the color of the felt on a pool table, almost digital in its saturation. The day I walked up the hill from a nearby creek to take a look, the metal fence around the field was locked, but someone had left a hallway-size piece of the new simulated grass outside the perimeter. It was bristly and tough, but springy and squeaky under my booted feet. I could imagine running around on it, but it would definitely take some getting used to. My companion on this walk seemed even less favorably disposed to the thought. Yayoi Koizumi, a local environmental advocate, has been fighting synthetic-turf projects at Cornell since 2023. A petite woman dressed that day in a faded plum coat over a teal vest, with a scarf the colors of salmon, slate, and sunflowers, Koizumi compulsively picked up plastic trash as we walked: a red Solo cup, a polyethylene Dunkin’ container, a five-foot vinyl panel. She couldn’t bear to leave this stuff behind to fragment into microplastic bits—as she believes the new field will. “They’ve covered the living ground in plastic,” she said. “It’s really maddening.” The new pitch is one part of a $70 million plan to build more recreational space at the university. As of this spring, Cornell plans to install something like a quarter million square feet of synthetic grass—what people have colloquially called “astroturf” since the middle of the last century. University PR says it will be an important part of a “health-promoting campus” that is “supportive of holistic individual, social, and ecological well-being.” Koizumi runs an anti-plastic environmental group called Zero Waste Ithaca, which says that’s mostly nonsense. This fight is more than just the usual town-versus-gown tension. Synthetic turf used to be the stuff of professional sports arenas and maybe a suburban yard or two; today communities across the United States are debating whether to lay it down on playgrounds, parks, and dog runs. Proponents say it’s cheaper and hardier than grass, requiring less water, fertilizer, and maintenance—and that it offers a uniform surface for more hours and more days of the year than grass fields, a competitive advantage for athletes and schools hoping for a more robust athletic program. But while new generations of synthetic turf look and feel better than that mid-century stuff, it’s still just plastic. Some evidence suggests it sheds bits that endanger users and the environment, and that it contains PFAS “forever chemicals”—per- and polyfluoroalkyl substances, which are linked to a host of health issues. The padding within the plastic grass is usually made from shredded tires, which might also pose health risks. And plastic fields need to be replaced about once a decade, creating lots of waste. Yet people are buying a lot of the stuff. In 2001, Americans installed just over 7 million square meters of synthetic turf, just shy of 11,000 metric tons. By 2024, that number was 79 million square meters—enough to carpet all of Manhattan and then some, almost 120,000 metric tons. Synthetic turf covers 20,000 athletic fields and tens of thousands of parks, playgrounds, and backyards. And the US is just 20% of the global market. Where real estate is limited and demand for athletic facilities is high, artificial turf is tempting. “It all comes down to land and demand.” Frank Rossi, professor of turf science, Cornell Those increases worry folks who study microplastics and environmental pollution. Any actual risk is hard to parse; the plastic-making industry insists that synthetic fields are safe if properly installed, but lots of researchers think that isn’t so. “They’re very expensive, they contain toxic chemicals, and they put kids at unnecessary risk,” says Philip Landrigan, a Boston College epidemiologist who has studied environmental toxins like lead and microplastics. But at Cornell, where real estate is limited and demand for athletic facilities is high, synthetic turf was a tempting option. As Frank Rossi, a professor of turf science at Cornell, told me: “It all comes down to land and demand.” In 1965, Houston’s new, domed baseball stadium was an icon of space-age design. But the Astrodome had a problem: the sun. Deep in the heart of Texas, it shined brightly through the Astrodome’s skylights—so much so that players kept missing fly balls. So the club painted over the skylights. Denied sunlight, the grass in the outfield withered and died. A replacement was already in the works. In the late 1950s a Ford Foundation–funded educational laboratory determined that a soft, grasslike surface material would give city kids more places to play outside and had prevailed upon the Monsanto corporation to invent one. The result was clipped blades of nylon stuck to a rubber base, which the company called ChemGrass. Down it went into Houston’s outfield, where it got a new, buzzier name: AstroTurf. Workers lay artificial turf at the Astrodome in Houston on July 13, 1966. Developed by Monsanto, the material was originally known as ChemGrass but was later renamed AstroTurf after the stadium.AP PHOTO/ED KOLENOVSKY, FILE That first generation of simulated lawn was brittle and hard, but quality has improved. Today, there are a few competing products, but they’re all made by extruding a petroleum-based polymer—that’s plastic—through tiny holes and then stitching or fusing the resulting fibers to a carpetlike bottom. That gets attached to some kind of padding, also plastic. In the 1970s the industry started layering that over infill, usually sand; by the 1990s, “third generation” synthetic turf had switched to softer fibers made of polyethylene. Beneath that, they added infill that combined sand and a soft, cheap shredded rubber made from discarded automobile tires, which pile up by the hundreds of millions every year. This “crumb rubber” provides padding and fills spaces between the blades and the backing. In the early 1980s, nearly

Is fake grass a bad idea? The AstroTurf wars are far from over. Lire l’article »

AI, Committee, Actualités, Uncategorized

Desalination technology, by the numbers

admin NU / avril 9, 2026

When I started digging into desalination technology for a new story, I couldn’t help but obsess over the numbers. I’d known on some level that desalination—pulling salt out of seawater to produce fresh water—was an increasingly important technology, especially in water-stressed regions including the Middle East. But just how much some countries rely on desalination, and how big a business it is, still surprised me. For more on how this crucial water infrastructure is increasingly vulnerable during the war in Iran, check out my latest story. Here, though, let’s look at the state of desalination technology, by the numbers. Desalination produces 77% of all fresh water and 99% of drinking water in Qatar. Globally, we rely on desalination for just 1% of fresh-water withdrawals. But for some countries in the Middle East, and particularly for the Gulf Cooperation Council countries (Bahrain, Qatar, Kuwait, the United Arab Emirates, Saudi Arabia, and Oman), it’s crucial. Qatar, home to over 3 million people, is one of the most staggering examples, with nearly all its drinking water supplies coming from desalination. But many major cities in the region couldn’t exist without the technology. There are no permanent rivers on the Arabian Peninsula, and supplies of fresh water are incredibly limited, so countries rely on facilities that can take in seawater and pull out the salt and other impurities. The Middle East is home to just 6% of the world’s population and over 27% of its desalination facilities. The region has historically been water-scarce, and that trend is only continuing as climate change pushes temperatures higher and changes rainfall patterns. Of the 17,910 desalination facilities that are operational globally, 4,897 are located in the Middle East, according to a 2026 study in npj Clean Water. The technology supplies not only municipal water used by homes and businesses, but also industries including agriculture, manufacturing, and increasingly data centers. One massive desalination plant in Saudi Arabia produces over 1 million cubic meters of fresh water per day. The Ras Al-Khair water and power plant in Eastern Province, Saudi Arabia, is one of a growing number of gigantic plants that output upwards of a million cubic meters of water each day. That amount of water can meet the needs of millions of people in Riyadh City. Producing it takes a lot of power—the attached power plant has a capacity of 2.4 gigawatts. While this plant is just one of thousands across the region, it’s an example of a growing trend: The average size of a desalination plant is about 10 times what it was 15 years ago, according to data from the International Energy Agency. Communities are increasingly turning to larger plants, which can produce water more efficiently than smaller ones. Between 2024 and 2028, the Middle East’s desalination capacity could grow by over 40%. Desalination is only going to be more crucial for life in the Middle East. The region is expected to spend over $25 billion on capital expenses for desalination facilities between 2024 and 2028, according to the 2026 npj Clean Water study. More massive plants are expected to come online in Saudi Arabia, Iraq, and Egypt during that time. All this growth could consume a lot of electricity. Between growth of the technology generally and the move toward plants that use electricity rather than fossil fuels, desalination could add 190 terawatt-hours of electricity demand globally by 2035, according to IEA data. That’s the equivalent of about 60 million households. This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Desalination technology, by the numbers Lire l’article »

AI, Committee, Actualités, Uncategorized

The Roadmap to Mastering Agentic AI Design Patterns

admin NU / avril 9, 2026

Most

The Roadmap to Mastering Agentic AI Design Patterns Lire l’article »

AI, Committee, Actualités, Uncategorized

The Download: AstroTurf wars and exponential AI growth

admin NU / avril 9, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Is fake grass a bad idea? The AstroTurf wars are far from over. In 2001, Americans installed just over 7 million square meters of synthetic turf. By 2024, that number was 79 million square meters—enough to carpet all of Manhattan and then some. The increase worries folks who study microplastics and environmental pollution. While the plastic-making industry insists that synthetic fields are safe if properly installed, lots of researchers think that isn’t so. Find out why AstroTurf has ignited heated debates. —Douglas Main This story is from the next issue of our print magazine, packed with stories all about nature. Subscribe now to read the full thing when it lands on Wednesday, April 22. Mustafa Suleyman: AI development won’t hit a development wall anytime soon—here’s why —Mustafa Suleyman, Microsoft AI CEO and Google DeepMind co-founder The skeptics keep predicting that AI compute will soon hit a wall—and keep getting proven wrong. To understand why that is, you need to look at the forces driving the AI explosion. Three advances are enabling exponential progress: faster basic calculators, high-bandwidth memory, and technologies that turn disparate GPUs into enormous supercomputers. Where does all this get us? Read the full op-ed on the future of AI development to learn more. Desalination technology, by the numbers —Casey Crownhart When I started digging into desalination technology for a new story, I couldn’t help but obsess over the numbers. I knew on some level that desalination—pulling salt out of seawater to produce fresh water—was an increasingly important technology, especially in water-stressed regions including the Middle East. But just how much some countries rely on desalination, and how big a business it is, still surprised me. Here are the extraordinary numbers behind the crucial water source. This story is from The Spark, our weekly newsletter on the tech that could combat the climate crisis. Sign up to receive it in your inbox every Wednesday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Meta has launched the first AI model from its Superintelligence LabsMuse Spark is the company’s first model in a year. (Reuters $) + The closed model brings reasoning capabilities to the Meta AI app. (Engadget) + It’s built by Meta’s Superintelligence Labs, the unit led by Alexandr Wang. (TechCrunch) 2 Anthropic has lost a bid to pause the Pentagon’s blacklisting An appeals court in Washington, DC denied the request. (CNBC) + A California judge had temporarily blocked the blacklisting in March. (NPR) + The mixed rulings leave Anthropic in a legal limbo. (Wired $) + And open doors for smaller AI rivals. (Reuters $) 3 New evidence suggests Adam Back invented Bitcoin The British cryptographer may be the real Satoshi Nakamoto. (NYT $) + Back denies the claims. (BBC) + There’s a dark side to crypto’s permissionless dream. (MIT Technology Review) 4 Gen Z is cooling on AI The share feeling angry about it has risen from 22% to 31% in a year. (Axios) + Anti-AI protests are also growing. (MIT Technology Review) 5 War in the Gulf could tilt the cloud race toward China Huawei is pitching “multi-cloud” resilience to Gulf clients. (Rest of World) 6 Meta has killed a leaderboard of its AI token users It showed the top 250 users. (The Information $) + Meta blamed data leaks for the shutdown. (Fortune) + It encouraged “tokenmaxxing,” a growing phenomenon in Big Tech. (NYT $) 7 Did Artemis II really tell us anything new about space? Or was it primarily a PR exercise? (Ars Technica) 8 Israeli attacks have brutally exposed Lebanon’s digital infrastructure It’s managing a modern crisis without modern technology. (Wired $) 9 AI models could offer mathematicians a common language They hope it will simplify the process of verifying proofs. (Economist) 10 A “self-doxing’ rave is helping trans people stay safe online It’s among a series of digital self-defenses. (404 Media) Quote of the day “I feel like anything that I’m interested in has the potential of maybe getting replaced, even in the next few years.” —Sydney Gill, a freshman at Rice University, tells the New York Times why she’s soured on AI. One More Thing A view inside ATLAS,one of two general-purpose detectors at the Large Hadron Collider.MAXIMILIEN BRICE/CERN Inside the hunt for new physics at the world’s largest particle collider In 2012, data from CERN’s Large Hadron Collider (LHC) unearthed a particle called the Higgs boson. The discovery answered a nagging question: where do fundamental particles, such as the ones that make up all the protons and neutrons in our bodies, get their mass? But now particle physicists have reached an impasse in their quest to discover, produce, and study new particles at colliders. Find out what they’re trying to do about it. —Dan Garisto We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Enjoy this tale of the “joke” sound that accidentally defined 90s rave culture. + Take a nostalgic trip through the websites of the early 00s. + One for animal lovers: sperm whales have teamed up to support a newborn. + Here’s a long overdue answer to a vital question: can the world’s largest mousetrap catch a limousine?

The Download: AstroTurf wars and exponential AI growth Lire l’article »

AI, Committee, Actualités, Uncategorized

Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution

admin NU / avril 8, 2026

Z.AI, the AI platform developed by the team behind the GLM model family, has released GLM-5.1 — its next-generation flagship model developed specifically for agentic engineering. Unlike models optimized for clean, single-turn benchmarks, GLM-5.1 is built for agentic tasks, with significantly stronger coding capabilities than its predecessor, and achieves state-of-the-art performance on SWE-Bench Pro while leading GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). Architecture: DSA, MoE, and Asynchronous RL Before diving into what GLM-5.1 can do, it’s worth understanding what it’s built on — because the architecture is meaningfully different from a standard dense transformer. GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. The model uses a glm_moe_dsa architecture (Mixture of Experts (MoE) model combined with DSA). For AI devs evaluating whether to self-host, this matters: MoE models activate only a subset of their parameters per forward pass, which can make inference significantly more efficient than a comparably-sized dense model, though they require specific serving infrastructure. On the training side, GLM-5 implements a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Novel asynchronous agent RL algorithms further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. This is what allows the model to handle agentic tasks with the kind of sustained judgment that single-turn RL training struggles to produce. The Plateau Problem GLM-5.1 is Solving To understand what makes GLM-5.1 different at inference time, it helps to understand a specific failure mode in LLMs used as agents. Previous models — including GLM-5 — tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn’t help. This is a structural limitation for any developer trying to use an LLM as a coding agent. The model applies the same playbook it knows, hits a wall, and stops making progress regardless of how long it runs. GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. The model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls. The sustained performance requires more than a larger context window. This capability requires the model to maintain goal alignment over extended execution, reducing strategy drift, error accumulation, and ineffective trial and error, enabling truly autonomous execution for complex engineering tasks. Benchmarks: Where GLM-5.1 Stands On SWE-Bench Pro, GLM-5.1 achieves a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, setting a new state-of-the-art result. The broader benchmark profile shows a well-rounded model. GLM-5.1 scores 95.3 on AIME 2026, 94.0 on HMMT Nov. 2025, 82.6 on HMMT Feb. 2026, and 86.2 on GPQA-Diamond — a graduate-level science reasoning benchmark. On agentic and tool-use benchmarks, GLM-5.1 scores 68.7 on CyberGym (a substantial jump from GLM-5’s 48.3), 68.0 on BrowseComp, 70.6 on τ³-Bench, and 71.8 on MCP-Atlas (Public Set) — the last one particularly relevant given MCP’s growing role in production agent systems. On Terminal-Bench 2.0, the model scores 63.5, rising to 66.5 when evaluated with Claude Code as the scaffolding. Across 12 representative benchmarks covering reasoning, coding, agents, tool use, and browsing, GLM-5.1 demonstrates a broad and well-balanced capability profile. This shows that GLM-5.1 is not a single-metric improvement — it advances simultaneously across general intelligence, real-world coding, and complex task execution. In terms of overall positioning, GLM-5.1’s general capability and coding performance are overall aligned with Claude Opus 4.6. 8-Hour Sustained Execution: What That Actually Means The most important difference in GLM-5.1 is its capacity for long-horizon task execution. GLM-5.1 can work autonomously on a single task for up to 8 hours, completing the full process from planning and execution to testing, fixing, and delivery. For developers building autonomous agents, this changes the scope of what’s possible. Rather than orchestrating a model over dozens of short-lived tool calls, you can hand GLM-5.1 a complex objective and let it run a complete ‘experiment–analyze–optimize’ loop autonomously. The concrete engineering demonstrations make this tangible: GLM-5.1 can build a complete Linux desktop environment from scratch in 8 hours; perform 178 rounds of autonomous iteration on a vector database task and improve performance to 1.5× the initial version; and optimize a CUDA kernel, increasing speedup from 2.6× to 35.7× through sustained tuning. That CUDA kernel result is notable for ML engineers: improving a kernel from 2.6× to 35.7× speedup through autonomous iterative optimization is a level of depth that would take a skilled human engineer significant time to replicate manually. Model Specifications and Deployment GLM-5.1 is a 754-billion-parameter MoE model released under the MIT license on HuggingFace. It operates with a 200K context window and supports up to 128K maximum output tokens — both important for long-horizon tasks that need to hold large codebases or extended reasoning chains in memory. GLM-5.1 supports thinking mode (offering multiple thinking modes for different scenarios), streaming output, function calling, context caching, structured output, and MCP for integrating external tools and data sources. For local deployment, the following open-source frameworks support GLM-5.1: SGLang (v0.5.10+), vLLM (v0.19.0+), xLLM (v0.8.0+), Transformers (v0.5.3+), and KTransformers (v0.5.3+). For API access, the model is available through Z.AI’s API platform. Getting started requires installing zai-sdk via pip and initializing a ZaiClient with your API key. . Key Takeaways GLM-5.1 sets a new state-of-the-art on SWE-Bench Pro with a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro — making it one of the the strongest publicly benchmarked model for real-world software engineering tasks at the time of release. The model is built for long-horizon autonomous execution, capable of working on a single complex task for up to 8 hours — running experiments, revising strategies, and iterating across hundreds of rounds and thousands of tool

Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution Lire l’article »

AI, Committee, Actualités, Uncategorized

FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation–Full Version

admin NU / avril 8, 2026

arXiv:2604.05551v1 Announce Type: new Abstract: Self-conditioning has been central to the success of continuous diffusion language models, as it allows models to correct previous errors. Yet its ability degrades precisely in the regime where diffusion is most attractive for deployment: few-step sampling for fast inference. In this study, we show that when models only have a few denoising steps, inaccurate self-conditioning induces a substantial approximation gap; this mistake compounds across denoising steps and ultimately dominate the sample quality. To address this, we propose a novel training framework that handles these errors during learning by perturbing the self-conditioning signal to match inference noise, improving robustness to prior estimation errors. In addition, we introduce a token-level noise-awareness mechanism that prevents training from saturation, hence improving optimization. Extensive experiments across conditional generation benchmarks demonstrate that our framework surpasses standard continuous diffusion models while providing up to 400x faster inference speed, and remains competitive against other one-step diffusion frameworks.

FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation–Full Version Lire l’article »

AI, Committee, Actualités, Uncategorized

A Hands-On Guide to Testing Agents with RAGAs and G-Eval

admin NU / avril 8, 2026

A Hands-On Guide to Testing Agents with RAGAs and G-Eval Lire l’article »

AI, Committee, Actualités, Uncategorized

The Download: water threats in Iran and AI’s impact on what entrepreneurs make

admin NU / avril 8, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Desalination plants in the Middle East are increasingly vulnerable As the conflict in Iran has escalated, a crucial resource is under fire: the desalinization technology that supplies water in the region. President Donald Trump has threatened to destroy “possibly all desalinization plants” in Iran if the Strait of Hormuz is not reopened. The impact on farming, industry, and—crucially—drinking in the Middle East could be severe. Find out why. —Casey Crownhart This story is part of MIT Technology Review Explains, our series untangling the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here. AI is changing how small online sellers decide what to make For small entrepreneurs, deciding what to sell and where to make it has traditionally been a slow, labor-intensive process. Now that work is increasingly being done by AI. Tools like Alibaba’s Accio compress weeks of product research and supplier hunting into a single chat. Business owners and e-commerce experts say they’re making sourcing more accessible—and slashing the time from product idea to launch. Read the full story on how AI is leveling the path to global manufacturing. —Caiwei Chen The gig workers who are training humanoid robots at home When Zeus, a medical student in Nigeria, returns to his apartment from a long day at the hospital, he straps his iPhone to his forehead and records himself doing chores. Zeus is a data recorder for Micro1, which sells the data he collects to robotics firms. As these companies race to build humanoids, videos from workers like Zeus have become the hottest new way to train them. Micro1 has hired thousands of them in more than 50 countries, including India, Nigeria, and Argentina. The jobs pay well locally, but raise thorny questions around privacy and informed consent. The work can be challenging—and weird. Read the full story. —Michelle Kim This is our latest story to be turned into an MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Anthropic’s new model found security problems in every OS and browser Claude Mythos has been heralded as a cybersecurity “reckoning.” (The Verge) + Anthrophic is limiting the rollout over hacking fears. (CNBC) + It’s also launching a project that lets Mythos flag vulnerabilities. (Gizmodo) + Apple, Google, and Microsoft have joined the initiative. (ZDNET) 2 Iranian hackers are targeting American critical infrastructure Their focus is on energy and water infrastructure. (Wired) + They’re targeting industrial control devices. (TechCrunch) 3 Google’s AI Overviews deliver millions of incorrect answers per hour Despite a 90% accuracy rate. (NYT $) + AI means the end of internet search as we’ve known it. (MIT Technology Review) 4 Elon Musk is trying to oust OpenAI CEO Sam Altman in a lawsuit As remedies for Altman allegedly defrauding him. (CNBC) + Musk wants any damages given to OpenAI’s nonprofit arm. (WSJ $) 5 ICE has admitted it’s using powerful spyware The tools that can intercept encrypted messages. (NPR) + Immigration agencies are also weaponizing AI videos. (MIT Technology Review) 6 Greece has joined the countries banning kids from social media Under-15s will be blocked from 2027. (Reuters) + Australia introduced the world’s first social media ban for children. (Guardian) + Indonesia recently rolled out the first one in Southeast Asia. (DW) + Experts say they’re a lazy fix. (CNBC) 7 Intel will help Elon Musk build his Terafab in Texas They aim to manufacture chips for AI projects. (Engadget) + Musk says it will be the largest-ever semiconductor factory. (Engadget) + Future AI chips could be built on glass. (MIT Technology Review) 8 TikTok is building a second billion-euro data center in Finland It’s moving data storage for European users. (Reuters) + Finland has become a magnet for data centers. (Bloomberg $) + But nobody wants one in their backyard. (MIT Technology Review) 9 Plans for Canada’s first “virtual gated community” have sparked a row The AI-powered surveillance system has divided neighbors. (Guardian) + Is the Pentagon allowed to surveil Americans with AI? (MIT Technology Review) 10 The high-tech engineering of the “space toilet” has been revealed Artemis II is the first mission to carry one around the world. (Vox) Quote of the day “This case has always been about Elon generating more power and more money for what he wants. His lawsuit remains nothing more than a harassment campaign that’s driven by ego, jealousy and a desire to slow down a competitor.” —OpenAI criticizes Musk’s legal action in an X post. One More Thing USWDS Inside the US government’s brilliantly boring websites You may not notice it, but your experience on every US government website is carefully crafted. Each site aligns an official web design and a custom typeface. They aim to make government websites not only good-looking but accessible and functional for all. MIT Technology Review dug into the system’s history and features. Find out what we discovered. —Jon Keegan We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Rejoice in the splendor of the “Earthset” image captured by Artemis II. + Meet the fearless cat chasing off bears. + This document vividly explains what makes the octopus so unique. + Revealed: the rhythmic secret that makes emo music so angsty.

The Download: water threats in Iran and AI’s impact on what entrepreneurs make Lire l’article »

AI, Committee, Actualités, Uncategorized

Mustafa Suleyman: AI development won’t hit a wall anytime soon—here’s why

admin NU / avril 8, 2026

We evolved for a linear world. If you walk for an hour, you cover a certain distance. Walk for two hours and you cover double that distance. This intuition served us well on the savannah. But it catastrophically fails when confronting AI and the core exponential trends at its heart. From the time I began work on AI in 2010 to now, the amount of training data that goes into frontier AI models has grown by a staggering 1 trillion times—from roughly 10¹⁴ flops (floating-point operations‚ the core unit of computation) for early systems to over 10²⁶ flops for today’s largest models. This is an explosion. Everything else in AI follows from this fact. The skeptics keep predicting walls. And they keep being wrong in the face of this epic generational compute ramp. Often, they point out that Moore’s Law is slowing. They also mention a lack of data, or they cite limitations on energy. But when you look at the combined forces driving this revolution, the exponential trend seems quite predictable. To understand why, it’s worth looking at the complex and fast-moving reality beneath the headlines. Think of AI training as a room full of people working calculators. For years, adding computational power meant adding more people with calculators to that room. Much of the time those workers sat idle, drumming their fingers on desks, waiting for the numbers to come through for their next calculation. Every pause was wasted potential. Today’s revolution goes beyond more and better calculators (although it delivers those); it is actually about ensuring that all those calculators never stop, and that they work together as one. Three advances are now converging to enable this. First, the basic calculators got faster. Nvidia’s chips have delivered an eightfold increase in raw performance in just six years, from 312 teraflops in 2020 to 2,500 teraflops today. Our own Maia 200 chip, launched this January, delivers 30% better performance per dollar than any other hardware in our fleet. Second, the numbers arrive faster thanks to a technology called HBM, or high bandwidth memory, which stacks chips vertically like tiny skyscrapers; the latest generation, HBM3, triples the bandwidth of its predecessor, feeding data to processors fast enough to keep them busy all the time. Third, the room of people with calculators became an office and then a whole campus or city. Technologies like NVLink and InfiniBand connect hundreds of thousands of GPUs into warehouse-size supercomputers that function as single cognitive entities. A few years ago this was impossible. These gains all come together to deliver dramatically more compute. Where training a language model took 167 minutes on eight GPUs in 2020, it now takes under four minutes on equivalent modern hardware. To put this in perspective: Moore’s Law would predict only about a 5x improvement over this period. We saw 50x. We’ve gone from two GPUs training AlexNet, the image recognition model that kicked off the modern boom in deep learning in 2012, to over 100,000 GPUs in today’s largest clusters, each one individually far more powerful than its predecessors. Then there’s the revolution in software. Research from Epoch AI suggests that the compute required to reach a fixed performance level halves approximately every eight months, much faster than the traditional 18-to-24-month doubling of Moore’s Law. The costs of serving some recent models have collapsed by a factor of up to 900 on an annualized basis. AI is becoming radically cheaper to deploy. The numbers for the near future are just as staggering. Consider that leading labs are growing capacity at nearly 4x annually. Since 2020, the compute used to train frontier models has grown 5x every year. Global AI-relevant compute is forecast to hit 100 million H100-equivalents by 2027, a tenfold increase in three years. Put all this together and we’re looking at something like another 1,000x in effective compute by the end of 2028. It’s plausible that by 2030 we’ll bring an additional 200 gigawatts of compute online every year—akin to the peak energy use of the UK, France, Germany, and Italy put together. What does all this get us? I believe it will drive the transition from chatbots to nearly human-level agents—semiautonomous systems capable of writing code for days, carrying out weeks- and months-long projects, making calls, negotiating contracts, managing logistics. Forget basic assistants that answer questions. Think teams of AI workers that deliberate, collaborate, and execute. Right now we’re only in the foothills of this transition, and the implications stretch far beyond tech. Every industry built on cognitive work will be transformed. The obvious constraint here is energy. A single refrigerator-size AI rack consumes 120 kilowatts, equivalent to 100 homes. But this hunger collides with another exponential: Solar costs have fallen by a factor of nearly 100 over 50 years; battery prices have dropped 97% over three decades. There is a pathway to clean scaling coming into view. The capital is deployed. The engineering is delivering. The $100 billion clusters, the 10-gigawatt power draws, the warehouse-scale supercomputers … these are no longer science fiction. Ground is being broken for these projects now across the US and the world. As a result, we are heading toward true cognitive abundance. At Microsoft AI, this is the world our superintelligence lab is planning for and building. Skeptics accustomed to a linear world will continue predicting diminishing returns. They will continue being surprised. The compute explosion is the technological story of our time, full stop. And it is still only just beginning. Mustafa Suleyman is CEO of Microsoft AI.

Mustafa Suleyman: AI development won’t hit a wall anytime soon—here’s why Lire l’article »

AI

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context

Is fake grass a bad idea? The AstroTurf wars are far from over.

Desalination technology, by the numbers

The Roadmap to Mastering Agentic AI Design Patterns

The Download: AstroTurf wars and exponential AI growth

Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution

FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation–Full Version

A Hands-On Guide to Testing Agents with RAGAs and G-Eval

The Download: water threats in Iran and AI’s impact on what entrepreneurs make

Mustafa Suleyman: AI development won’t hit a wall anytime soon—here’s why

Nos services

Accueil

Comment ça marche

Actualités

Tarifs

Support

Centre d'aide

Signaler un problème

Donner un retour

Politique de confidentialité

Compte utilisateur

Suivez-nous