YouZum

AI

AI, Committee, Notizie, Uncategorized

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

arXiv:2601.22795v1 Announce Type: new Abstract: Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion of the parameters, while only marginally impacting performance. This suggests that the computation is not uniformly distributed across the parameters. We introduce here a technique to systematically quantify computation density in LLMs. In particular, we design a density estimator drawing on mechanistic interpretability. We experimentally test our estimator and find that: (1) contrary to what has been often assumed, LLM processing generally involves dense computation; (2) computation density is dynamic, in the sense that models shift between sparse and dense processing regimes depending on the input; (3) per-input density is significantly correlated across LLMs, suggesting that the same inputs trigger either low or high density. Investigating the factors influencing density, we observe that predicting rarer tokens requires higher density, and increasing context length often decreases the density. We believe that our computation density estimator will contribute to a better understanding of the processing at work in LLMs, challenging their symbolic interpretation.

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs

arXiv:2505.17595v3 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula. In this work, we identify the limitations of the Min-Max formula, move beyond its constraints, and propose NeUQI, a method that efficiently determines near-optimal initialization for uniform quantization. Our NeUQI simplifies the joint optimization of the scale and zero-point by deriving the zero-point for a given scale, thereby reducing the problem to a scale-only optimization. Benefiting from the improved quantization parameters, our NeUQI consistently outperforms existing methods in the experiments with the LLaMA and Qwen families on various settings and tasks. Furthermore, when combined with a lightweight distillation strategy, NeUQI even achieves superior performance to PV-tuning, a considerably more resource-intensive method.

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

What’s next for EV batteries in 2026

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here. Demand for electric vehicles and the batteries that power them has never been hotter. In 2025, EVs made up over a quarter of new vehicle sales globally, up from less than 5% in 2020. Some regions are seeing even higher uptake: In China, more than 50% of new vehicle sales last year were battery electric or plug-in hybrids. In Europe, more purely electric vehicles hit the roads in December than gas-powered ones. (The US is the notable exception here, dragging down the global average with a small sales decline from 2024.) As EVs become increasingly common on the roads, the battery world is growing too. Looking ahead, we could soon see wider adoption of new chemistries, including some that deliver lower costs or higher performance. Meanwhile, the geopolitics of batteries are shifting, and so is the policy landscape. Here’s what’s coming next for EV batteries in 2026 and beyond. A big opportunity for sodium-ion batteries Lithium-ion batteries are the default chemistry used in EVs, personal devices, and even stationary storage systems on the grid today. But in a tough environment in some markets like the US, there’s a growing interest in cheaper alternatives. Automakers right now largely care just about batteries’ cost, regardless of performance improvements, says Kara Rodby, a technical principal at Volta Energy Technologies, a venture capital firm that focuses on energy storage technology. Sodium-ion cells have long been held up as a potentially less expensive alternative to lithium. The batteries are limited in their energy density, so they deliver a shorter range than lithium-ion. But sodium is also more abundant, so they could be cheaper. Sodium’s growth has been cursed, however, by the very success of lithium-based batteries, says Shirley Meng, a professor of molecular engineering at the University of Chicago. A lithium-ion battery cell cost $568 per kilowatt-hour in 2013, but that cost had fallen to just $74 per kilowatt-hour by 2025—quite the moving target for cheaper alternatives to chase. Sodium-ion batteries currently cost about $59 per kilowatt-hour on average. That’s less expensive than the average lithium-ion battery. But if you consider only lithium iron phosphate (LFP) cells, a lower-end type of lithium-ion battery that averages $52 per kilowatt-hour, sodium is still more expensive today.  We could soon see an opening for sodium-batteries, though. Lithium prices have been ticking up in recent months, a shift that could soon slow or reverse the steady downward march of prices for lithium-based batteries.  Sodium-ion batteries are already being used commercially, largely for stationary storage on the grid. But we’re starting to see sodium-ion cells incorporated into vehicles, too. The Chinese companies Yadea, JMEV, and HiNa Battery have all started producing sodium-ion batteries in limited numbers for EVs, including small, short-range cars and electric scooters that don’t require a battery with high energy density. CATL, a Chinese battery company that’s the world’s largest, says it recently began producing sodium-ion cells. The company plans to launch its first EV using the chemistry by the middle of this year.  Today, both production and demand for sodium-ion batteries are heavily centered in China. That’s likely to continue, especially after a cutback in tax credits and other financial support for the battery and EV industries in the US. One of the biggest sodium-battery companies in the US, Natron, ceased operations last year after running into funding issues. We could also see progress in sodium-ion research: Companies and researchers are developing new materials for components including the electrolyte and electrodes, so the cells could get more comparable to lower-end lithium-ion cells in terms of energy density, Meng says.  Major tests for solid-state batteries As we enter the second half of this decade, many eyes in the battery world are on big promises and claims about solid-state batteries. These batteries could pack more energy into a smaller package by removing the liquid electrolyte, the material that ions move through when a battery is charging and discharging. With a higher energy density, they could unlock longer-range EVs. Companies have been promising solid-state batteries for years. Toyota, for example, once planned to have them in vehicles by 2020. That timeline has been delayed several times, though the company says it’s now on track to launch the new cells in cars in 2027 or 2028. Historically, battery makers have struggled to produce solid-state batteries at the scale needed to deliver a commercially relevant supply for EVs. There’s been progress in manufacturing techniques, though, and companies could soon actually make good on their promises, Meng says.  Factorial Energy, a US-based company making solid-state batteries, provided cells for a Mercedes test vehicle that drove over 745 miles on a single charge in a real-world test in September. The company says it plans to bring its tech to market as soon as 2027. Quantumscape, another major solid-state player in the US, is testing its cells with automotive partners and plans to have its batteries in commercial production later this decade.   Before we see true solid-state batteries, we could see hybrid technologies, often referred to as semi-solid-state batteries. These commonly use materials like gel electrolytes, reducing the liquid inside cells without removing it entirely. Many Chinese companies are looking to build semi-solid-state batteries before transitioning to entirely solid-state ones, says Evelina Stoikou, head of battery technologies and supply chains at BloombergNEF, an energy consultancy. A global patchwork The picture for the near future of the EV industry looks drastically different depending on where you’re standing. Last year, China overtook Japan as the country with the most global auto sales. And more than one in three EVs made in 2025 had a CATL battery in it. Simply put, China is dominating the global battery industry, and that doesn’t seem likely to change anytime soon. China’s influence outside its domestic market is growing especially quickly. CATL is expected to begin production this year at its second

What’s next for EV batteries in 2026 Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

NVIDIA AI Brings Nemotron-3-Nano-30B to NVFP4 with Quantization Aware Distillation (QAD) for Efficient Reasoning Inference

NVIDIA has released Nemotron-Nano-3-30B-A3B-NVFP4, a production checkpoint that runs a 30B parameter reasoning model in 4 bit NVFP4 format while keeping accuracy close to its BF16 baseline. The model combines a hybrid Mamba2 Transformer Mixture of Experts architecture with a Quantization Aware Distillation (QAD) recipe designed specifically for NVFP4 deployment. Overall, it is an ultra-efficient NVFP4 precision version of Nemotron-3-Nano that delivers up to 4x higher throughput on Blackwell B200. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 What is Nemotron-Nano-3-30B-A3B-NVFP4? Nemotron-Nano-3-30B-A3B-NVFP4 is a quantized version of Nemotron-3-Nano-30B-A3B-BF16, trained from scratch by NVIDIA team as a unified reasoning and chat model. It is built as a hybrid Mamba2 Transformer MoE network: 30B parameters in total 52 layers in depth 23 Mamba2 and MoE layers 6 grouped query attention layers with 2 groups Each MoE layer has 128 routed experts and 1 shared expert 6 experts are active per token, which gives about 3.5B active parameters per token The model is pre-trained on 25T tokens using a Warmup Stable Decay learning rate schedule with a batch size of 3072, a peak learning rate of 1e-3 and a minimum learning rate of 1e-5. Post training follows a 3 stage pipeline: Supervised fine tuning on synthetic and curated data for code, math, science, tool calling, instruction following and structured outputs. Reinforcement learning with synchronous GRPO across multi step tool use, multi turn chat and structured environments, and RLHF with a generative reward model. Post training quantization to NVFP4 with FP8 KV cache and a selective high precision layout, followed by QAD. The NVFP4 checkpoint keeps the attention layers and the Mamba layers that feed into them in BF16, quantizes remaining layers to NVFP4 and uses FP8 for the KV cache. NVFP4 format and why it matters? NVFP4 is a 4 bit floating point format designed for both training and inference on recent NVIDIA GPUs. The main properties of NVFP4: Compared with FP8, NVFP4 delivers 2 to 3 times higher arithmetic throughput. It reduces memory usage by about 1.8 times for weights and activations. It extends MXFP4 by reducing the block size from 32 to 16 and introduces two level scaling. The two level scaling uses E4M3-FP8 scales per block and a FP32 scale per tensor. The smaller block size allows the quantizer to adapt to local statistics and the dual scaling increases dynamic range while keeping quantization error low. For very large LLMs, simple post training quantization (PTQ) to NVFP4 already gives decent accuracy across benchmarks. For smaller models, especially those heavily postage pipelines, the research team notes that PTQ causes non negligible accuracy drops, which motivates a training based recovery method. From QAT to QAD Standard Quantization Aware Training (QAT) inserts a pseudo quantization into the forward pass and reuses the original task loss, such as next token cross entropy. This works well for convolutional networks, but the research team lists 2 main issues for modern LLMs: Complex multi stage post training pipelines with SFT, RL and model merging are hard to reproduce. Original training data for open models is often unavailabublic form. Quantization Aware Distillation (QAD) changes the objective instead of the full pipeline. A frozen BF16 model acts as teacher and the NVFP4 model is a student. Training minimizes KL divergence between their output token distributions, not the original supervised or RL objective. The research team highlights 3 properties of QAD: It aligns the quantized model with the high precision teacher more accurately than QAT. It stays stable even when the teacher has already gone through several stages, such as supervised fine tuning, reinforcement learning and model merging, because QAD only tries to match the final teacher behavior. It works with partial, synthetic or filtered data, because it only needs input text to query the teacher and student, not the original labels or reward models. Benchmarks on Nemotron-3-Nano-30B Nemotron-3-Nano-30B-A3B is one of the RL heavy models in the QAD research. The below Table shows accuracy on AA-LCR, AIME25, GPQA-D, LiveCodeBench-v5 and SciCode-TQ, NVFP4-QAT and NVFP4-QAD. https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf Key Takeaways Nemotron-3-Nano-30B-A3B-NVFP4 is a 30B parameter hybrid Mamba2 Transformer MoE model that runs in 4 bit NVFP4 with FP8 KV cache and a small set of BF16 layers preserved for stability, while keeping about 3.5B active parameters per token and supporting context windows up to 1M tokens. NVFP4 is a 4 bit floating point format with block size 16 and two level scaling, using E4M3-FP8 per block scales and a FP32 per tensor scale, which gives about 2 to 3 times higher arithmetic throughput and about 1.8 times lower memory cost than FP8 for weights and activations. Quantization Aware Distillation (QAD) replaces the original task loss with KL divergence to a frozen BF16 teacher, so the NVFP4 student directly matches the teacher’s output distribution without replaying the full SFT, RL and model merge pipeline or needing the original reward models. Using the new Quantization Aware Distillation method, the NVFP4 version achieves up to 99.4% accuracy of BF16 On AA-LCR, AIME25, GPQA-D, LiveCodeBench and SciCode, NVFP4-PTQ shows noticeable accuracy loss and NVFP4-QAT degrades further, while NVFP4-QAD recovers performance to near BF16 levels, reducing the gap to only a few points across these reasoning and coding benchmarks. Check out the Paper and Model Weights. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post NVIDIA AI Brings Nemotron-3-Nano-30B to NVFP4 with Quantization Aware Distillation (QAD) for Efficient Reasoning Inference appeared first on MarkTechPost.

NVIDIA AI Brings Nemotron-3-Nano-30B to NVFP4 with Quantization Aware Distillation (QAD) for Efficient Reasoning Inference Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

A Coding Implementation to Training, Optimizing, Evaluating, and Interpreting Knowledge Graph Embeddings with PyKEEN

In this tutorial, we walk through an end-to-end, advanced workflow for knowledge graph embeddings using PyKEEN, actively exploring how modern embedding models are trained, evaluated, optimized, and interpreted in practice. We start by understanding the structure of a real knowledge graph dataset, then systematically train and compare multiple embedding models, tune their hyperparameters, and analyze their performance using robust ranking metrics. Also, we focus not just on running pipelines but on building intuition for link prediction, negative sampling, and embedding geometry, ensuring we understand why each step matters and how it affects downstream reasoning over graphs. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser !pip install -q pykeen torch torchvision import warnings warnings.filterwarnings(‘ignore’) import torch import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from typing import Dict, List, Tuple from pykeen.pipeline import pipeline from pykeen.datasets import Nations, FB15k237, get_dataset from pykeen.models import TransE, ComplEx, RotatE, DistMult from pykeen.training import SLCWATrainingLoop, LCWATrainingLoop from pykeen.evaluation import RankBasedEvaluator from pykeen.triples import TriplesFactory from pykeen.hpo import hpo_pipeline from pykeen.sampling import BasicNegativeSampler from pykeen.losses import MarginRankingLoss, BCEWithLogitsLoss from pykeen.trackers import ConsoleResultTracker print(“PyKEEN setup complete!”) print(f”PyTorch version: {torch.__version__}”) print(f”CUDA available: {torch.cuda.is_available()}”) We set up the complete experimental environment by installing PyKEEN and its deep learning dependencies, and by importing all required libraries for modeling, evaluation, visualization, and optimization. We ensure a clean, reproducible workflow by suppressing warnings and verifying the PyTorch and CUDA configurations for efficient computation. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n” + “=”*80) print(“SECTION 2: Dataset Exploration”) print(“=”*80 + “n”) dataset = Nations() print(f”Dataset: {dataset}”) print(f”Number of entities: {dataset.num_entities}”) print(f”Number of relations: {dataset.num_relations}”) print(f”Training triples: {dataset.training.num_triples}”) print(f”Testing triples: {dataset.testing.num_triples}”) print(f”Validation triples: {dataset.validation.num_triples}”) print(“nSample triples (head, relation, tail):”) for i in range(5): h, r, t = dataset.training.mapped_triples[i] head = dataset.training.entity_id_to_label[h.item()] rel = dataset.training.relation_id_to_label[r.item()] tail = dataset.training.entity_id_to_label[t.item()] print(f” {head} –[{rel}]–> {tail}”) def analyze_dataset(triples_factory: TriplesFactory) -> pd.DataFrame: “””Compute basic statistics about the knowledge graph.””” stats = { ‘Metric’: [], ‘Value’: [] } stats[‘Metric’].extend([‘Entities’, ‘Relations’, ‘Triples’]) stats[‘Value’].extend([ triples_factory.num_entities, triples_factory.num_relations, triples_factory.num_triples ]) unique, counts = torch.unique(triples_factory.mapped_triples[:, 1], return_counts=True) stats[‘Metric’].extend([‘Avg triples per relation’, ‘Max triples for a relation’]) stats[‘Value’].extend([counts.float().mean().item(), counts.max().item()]) return pd.DataFrame(stats) stats_df = analyze_dataset(dataset.training) print(“nDataset Statistics:”) print(stats_df.to_string(index=False)) We load and explore the Nation’s knowledge graph to understand its scale, structure, and relational complexity before training any models. We inspect sample triples to build intuition about how entities and relations are represented internally using indexed mappings. We then compute core statistics such as relation frequency and triple distribution, allowing us to reason about graph sparsity and modeling difficulty upfront. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n” + “=”*80) print(“SECTION 3: Training Multiple Models”) print(“=”*80 + “n”) models_config = { ‘TransE’: { ‘model’: ‘TransE’, ‘model_kwargs’: {’embedding_dim’: 50}, ‘loss’: ‘MarginRankingLoss’, ‘loss_kwargs’: {‘margin’: 1.0} }, ‘ComplEx’: { ‘model’: ‘ComplEx’, ‘model_kwargs’: {’embedding_dim’: 50}, ‘loss’: ‘BCEWithLogitsLoss’, }, ‘RotatE’: { ‘model’: ‘RotatE’, ‘model_kwargs’: {’embedding_dim’: 50}, ‘loss’: ‘MarginRankingLoss’, ‘loss_kwargs’: {‘margin’: 3.0} } } training_config = { ‘training_loop’: ‘sLCWA’, ‘negative_sampler’: ‘basic’, ‘negative_sampler_kwargs’: {‘num_negs_per_pos’: 5}, ‘training_kwargs’: { ‘num_epochs’: 100, ‘batch_size’: 128, }, ‘optimizer’: ‘Adam’, ‘optimizer_kwargs’: {‘lr’: 0.001} } results = {} for model_name, config in models_config.items(): print(f”nTraining {model_name}…”) result = pipeline( dataset=dataset, model=config[‘model’], model_kwargs=config.get(‘model_kwargs’, {}), loss=config.get(‘loss’), loss_kwargs=config.get(‘loss_kwargs’, {}), **training_config, random_seed=42, device=’cuda’ if torch.cuda.is_available() else ‘cpu’ ) results[model_name] = result print(f”n{model_name} Results:”) print(f” MRR: {result.metric_results.get_metric(‘mean_reciprocal_rank’):.4f}”) print(f” Hits@1: {result.metric_results.get_metric(‘hits_at_1’):.4f}”) print(f” Hits@3: {result.metric_results.get_metric(‘hits_at_3’):.4f}”) print(f” Hits@10: {result.metric_results.get_metric(‘hits_at_10’):.4f}”) We define a consistent training configuration and systematically train multiple knowledge graph embedding models to enable fair comparison. We use the same dataset, negative sampling strategy, optimizer, and training loop while allowing each model to leverage its own inductive bias and loss formulation. We then evaluate and record standard ranking metrics, such as MRR and Hits@K, to quantitatively assess each embedding approach’s performance on link prediction. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n” + “=”*80) print(“SECTION 4: Model Comparison”) print(“=”*80 + “n”) metrics_to_compare = [‘mean_reciprocal_rank’, ‘hits_at_1’, ‘hits_at_3’, ‘hits_at_10’] comparison_data = {metric: [] for metric in metrics_to_compare} model_names = [] for model_name, result in results.items(): model_names.append(model_name) for metric in metrics_to_compare: comparison_data[metric].append( result.metric_results.get_metric(metric) ) comparison_df = pd.DataFrame(comparison_data, index=model_names) print(“Model Comparison:”) print(comparison_df.to_string()) fig, axes = plt.subplots(2, 2, figsize=(15, 10)) fig.suptitle(‘Model Performance Comparison’, fontsize=16) for idx, metric in enumerate(metrics_to_compare): ax = axes[idx // 2, idx % 2] comparison_df[metric].plot(kind=’bar’, ax=ax, color=’steelblue’) ax.set_title(metric.replace(‘_’, ‘ ‘).title()) ax.set_ylabel(‘Score’) ax.set_xlabel(‘Model’) ax.grid(axis=’y’, alpha=0.3) ax.set_xticklabels(ax.get_xticklabels(), rotation=45) plt.tight_layout() plt.show() We aggregate evaluation metrics from all trained models into a unified comparison table for direct performance analysis. We visualize key ranking metrics using bar charts, allowing us to quickly identify strengths and weaknesses across different embedding approaches. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n” + “=”*80) print(“SECTION 5: Hyperparameter Optimization”) print(“=”*80 + “n”) hpo_result = hpo_pipeline( dataset=dataset, model=’TransE’, n_trials=10, training_loop=’sLCWA’, training_kwargs={‘num_epochs’: 50}, device=’cuda’ if torch.cuda.is_available() else ‘cpu’, ) print(“nBest Configuration Found:”) print(f” Embedding Dim: {hpo_result.study.best_params.get(‘model.embedding_dim’, ‘N/A’)}”) print(f” Learning Rate: {hpo_result.study.best_params.get(‘optimizer.lr’, ‘N/A’)}”) print(f” Best MRR: {hpo_result.study.best_value:.4f}”) print(“n” + “=”*80) print(“SECTION 6: Link Prediction”) print(“=”*80 + “n”) best_model_name = comparison_df[‘mean_reciprocal_rank’].idxmax() best_result = results[best_model_name] model = best_result.model print(f”Using {best_model_name} for predictions”) def predict_tails(model, dataset, head_label: str, relation_label: str, top_k: int = 5): “””Predict most likely tail entities for a given head and relation.””” head_id = dataset.entity_to_id[head_label] relation_id = dataset.relation_to_id[relation_label] num_entities = dataset.num_entities heads = torch.tensor([head_id] * num_entities).unsqueeze(1) relations = torch.tensor([relation_id] * num_entities).unsqueeze(1) tails = torch.arange(num_entities).unsqueeze(1) batch = torch.cat([heads, relations, tails], dim=1) with torch.no_grad(): scores = model.predict_hrt(batch) top_scores, top_indices = torch.topk(scores.squeeze(), k=top_k) predictions = [] for score, idx in zip(top_scores, top_indices): tail_label = dataset.entity_id_to_label[idx.item()] predictions.append((tail_label, score.item())) return predictions if dataset.training.num_entities > 10: sample_head = list(dataset.entity_to_id.keys())[0] sample_relation = list(dataset.relation_to_id.keys())[0] print(f”nTop predictions for: {sample_head} –[{sample_relation}]–> ?”) predictions = predict_tails( best_result.model, dataset.training, sample_head, sample_relation, top_k=5 ) for rank, (entity, score) in enumerate(predictions, 1): print(f” {rank}. {entity} (score: {score:.4f})”) We apply automated hyperparameter optimization to systematically search for a stronger TransE configuration that improves ranking performance without manual tuning. We then select the best-performing model based on MRR and use it to perform practical link prediction by scoring all possible tail entities for a given head–relation pair. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n” +

A Coding Implementation to Training, Optimizing, Evaluating, and Interpreting Knowledge Graph Embeddings with PyKEEN Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Inside the marketplace powering bespoke AI deepfakes of real women

Civitai—an online marketplace for buying and selling AI-generated content, backed by the venture capital firm Andreessen Horowitz—is letting users buy custom instruction files for generating celebrity deepfakes. Some of these files were specifically designed to make pornographic images banned by the site, a new analysis has found. The study, from researchers at Stanford and Indiana University, looked at people’s requests for content on the site, called “bounties.” The researchers found that between mid-2023 and the end of 2024, most bounties asked for animated content—but a significant portion were for deepfakes of real people, and 90% of these deepfake requests targeted women. (Their findings have not yet been peer reviewed.) The debate around deepfakes, as illustrated by the recent backlash to explicit images on the X-owned chatbot Grok, has revolved around what platforms should do to block such content. Civitai’s situation is a little more complicated. Its marketplace includes actual images, videos, and models, but it also lets individuals buy and sell instruction files called LoRAs that can coach mainstream AI models like Stable Diffusion into generating content they were not trained to produce. Users can then combine these files with other tools to make deepfakes that are graphic or sexual. The researchers found that 86% of deepfake requests on Civitai were for LoRAs. In these bounties, users requested “high quality” models to generate images of public figures like the influencer Charli D’Amelio or the singer Gracie Abrams, often linking to their social media profiles so their images could be grabbed from the web. Some requests specified a desire for models that generated the individual’s entire body, accurately captured their tattoos, or allowed hair color to be changed. Some requests targeted several women in specific niches, like artists who record ASMR videos. One request was for a deepfake of a woman said to be the user’s wife. Anyone on the site could offer up AI models they worked on for the task, and the best submissions received payment—anywhere from $0.50 to $5. And nearly 92% of the deepfake bounties were awarded. Neither Civitai nor Andreessen Horowitz responded to requests for comment. It’s possible that people buy these LoRAs to make deepfakes that aren’t sexually explicit (though they’d still violate Civitai’s terms of use, and they’d still be ethically fraught). But Civitai also offers educational resources on how to use external tools to further customize the outputs of image generators—for example, by changing someone’s pose. The site also hosts user-written articles with details on how to instruct models to generate pornography. The researchers found that the amount of porn on the platform has gone up, and that the majority of requests each week are now for NSFW content. “Not only does Civitai provide the infrastructure that facilitates these issues; they also explicitly teach their users how to utilize them,” says Matthew DeVerna, a postdoctoral researcher at Stanford’s Cyber Policy Center and one of the study’s leaders.  The company used to ban only sexually explicit deepfakes of real people, but in May 2025 it announced it would ban all deepfake content. Nonetheless, countless requests for deepfakes submitted before this ban now remain live on the site, and many of the winning submissions fulfilling those requests remain available for purchase, MIT Technology Review confirmed. “I believe the approach that they’re trying to take is to sort of do as little as possible, such that they can foster as much—I guess they would call it—creativity on the platform,” DeVerna says. Users buy LoRAs with the site’s online currency, called Buzz, which is purchased with real money. In May 2025, Civita’s credit card processor cut off the company because of its ongoing problem with nonconsensual content. To pay for explicit content, users must now use gift cards or cryptocurrency to buy Buzz; the company offers a different scrip for non-explicit content.  Civitai automatically tags bounties requesting deepfakes and lists a way for the person featured in the content to manually request its takedown. This system means that Civitai has a reasonably successful way of knowing which bounties are for deepfakes, but it’s still leaving moderation to the general public rather than carrying it out proactively.  A company’s legal liability for what its users do isn’t totally clear. Generally, tech companies have broad legal protections against such liability for their content under Section 230 of the Communications Decency Act, but those protections aren’t limitless. For example, “you cannot knowingly facilitate illegal transactions on your website,” says Ryan Calo, a professor specializing in technology and AI at the University of Washington’s law school. (Calo wasn’t involved in this new study.) Civitai joined OpenAI, Anthropic, and other AI companies in 2024 in adopting design principles to guard against the creation and spread of AI-generated child sexual abuse material . This move followed a 2023 report from the Stanford Internet Observatory, which found that the vast majority of AI models named in child sexual abuse communities were Stable Diffusion–based models “predominantly obtained via Civitai.” But adult deepfakes have not gotten the same level of attention from content platforms or the venture capital firms that fund them. “They are not afraid enough of it. They are overly tolerant of it,” Calo says. “Neither law enforcement nor civil courts adequately protect against it. It is night and day.” Civitai received a $5 million investment from Andreessen Horowitz (a16z) in November 2023. In a video shared by a16z, Civitai cofounder and CEO Justin Maier described his goal of building the main place where people find and share AI models for their own individual purposes. “We’ve aimed to make this space that’s been very, I guess, niche and engineering-heavy more and more approachable to more and more people,” he said.  Civitai is not the only company with a deepfake problem in a16z’s investment portfolio; in February, MIT Technology Review first reported that another company, Botify AI, was hosting AI companions resembling real actors that stated their age as under 18, engaged in sexually charged conversations, offered “hot photos,” and in some instances described

Inside the marketplace powering bespoke AI deepfakes of real women Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

The Download: US immigration agencies’ AI videos, and inside the Vitalism movement

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. DHS is using Google and Adobe AI to make videos The news: The US Department of Homeland Security is using AI video generators from Google and Adobe to make and edit content shared with the public, a new document reveals. The document, released on Wednesday, provides an inventory of which commercial AI tools DHS uses for tasks ranging from generating drafts of documents to managing cybersecurity. Why it matters: It comes as immigration agencies have flooded social media with content to support President Trump’s mass deportation agenda—some of which appears to be made with AI—and as workers in tech have put pressure on their employers to denounce the agencies’ activities. Read the full story. —James O’Donnell How the sometimes-weird world of lifespan extension is gaining influence —Jessica Hamzelou For the last couple of years, I’ve been following the progress of a group of individuals who believe death is humanity’s “core problem.” Put simply, they say death is wrong—for everyone. They’ve even said it’s morally wrong. They established what they consider a new philosophy, and they called it Vitalism. Vitalism is more than a philosophy, though—it’s a movement for hardcore longevity enthusiasts who want to make real progress in finding treatments that slow or reverse aging. Not just through scientific advances, but by persuading influential people to support their movement, and by changing laws and policies to open up access to experimental drugs. And they’re starting to make progress. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. The AI Hype Index: Grok makes porn, and Claude Code nails your job Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. Take a look at this month’s edition of the index here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Capgemini is no longer tracking immigrants for ICEAfter the French company was queried by the country’s government over the contract. (WP $)+ Here’s how the agency typically keeps tabs on its targets. (NYT $)+ US senators are pushing for answers about its recent surveillance shopping spree. (404 Media)+ ICE’s tactics would get real soldiers killed, apparently. (Wired $) 2 The Pentagon is at loggerheads with AnthropicThe AI firm is reportedly worried its tools could be used to spy on Americans. (Reuters)+ Generative AI is learning to spy for the US military. (MIT Technology Review) 3 It’s relatively rare for AI chatbots to lead users down harmful pathsBut when it does, it can have incredibly dangerous consequences. (Ars Technica)+ The AI doomers feel undeterred. (MIT Technology Review) 4 GPT-4o’s days are numberedOpenAI says just 0.1% of users are using the model every day. (CNBC)+ It’s the second time that it’s tried to turn the sycophantic model off in under a year. (Insider $)+ Why GPT-4o’s sudden shutdown left people grieving. (MIT Technology Review) 5 An AI toy company left its chats with kids exposedAnyone with a Gmail account was able to simply access the conversations—no hacking required. (Wired $)+ AI toys are all the rage in China—and now they’re appearing on shelves in the US too. (MIT Technology Review) 6 SpaceX could merge with xAI later this yearAhead of a planned blockbuster IPO of Elon Musk’s companies. (Reuters)+ The move would be welcome news for Musk fans. (The Information $)+ A SpaceX-Tesla merger could also be on the cards. (Bloomberg $) 7 We’re still waiting for a reliable male contraceptiveTake a look at the most promising methods so far. (Bloomberg $) 8 AI is bringing traditional Chinese medicine to the massesAnd it’s got the full backing of the country’s government. (Rest of World) 9 The race back to the Moon is heating up Competition between the US and China is more intense than ever. (Economist $) 10 What did the past really smell like?AI could help scientists to recreate history’s aromas—including mummies and battlefields. (Knowable Magazine) Quote of the day “I think the tidal wave is coming and we’re all standing on the beach.” —Bill Zysblat, a music business manager, tells the Financial Times about the existential threat AI poses to the industry.  One more thing Therapists are secretly using ChatGPT. Clients are triggered. Declan would never have found out his therapist was using ChatGPT had it not been for a technical mishap. The connection was patchy during one of their online sessions, so Declan suggested they turn off their video feeds. Instead, his therapist began inadvertently sharing his screen. For the rest of the session, Declan was privy to a real-time stream of ChatGPT analysis rippling across his therapist’s screen, who was taking what Declan was saying, putting it into ChatGPT, and then parroting its answers. But Declan is not alone. In fact, a growing number of people are reporting receiving AI-generated communiqués from their therapists. Clients’ trust and privacy are being abandoned in the process. Read the full story. —Laurie Clarke We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Sinkholes are seriously mysterious. Is there a way to stay one step ahead of them?+ This beautiful pixel art is super impressive.+ Amid the upheaval in their city, residents of Minneapolis recently demonstrated both their resistance and community spirit in the annual Art Sled Rally (thanks Paul!)+ How on Earth is Tomb Raider 30 years old?!

The Download: US immigration agencies’ AI videos, and inside the Vitalism movement Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

AI2 Releases SERA, Soft Verified Coding Agents Built with Supervised Training Only for Practical Repository Level Automation Workflows

Allen Institute for AI (AI2) Researchers introduce SERA, Soft Verified Efficient Repository Agents, as a coding agent family that aims to match much larger closed systems using only supervised training and synthetic trajectories. What is SERA? SERA is the first release in AI2’s Open Coding Agents series. The flagship model, SERA-32B, is built on the Qwen 3 32B architecture and is trained as a repository level coding agent. On SWE bench Verified at 32K context, SERA-32B reaches 49.5 percent resolve rate. At 64K context it reaches 54.2 percent. These numbers place it in the same performance band as open weight systems such as Devstral-Small-2 with 24B parameters and GLM-4.5 Air with 110B parameters, while SERA remains fully open in code, data, and weights. The series includes four models today, SERA-8B, SERA-8B GA, SERA-32B, and SERA-32B GA. All are released on Hugging Face under an Apache 2.0 license. Soft Verified Generation The training pipeline relies on Soft Verified Generation, SVG. SVG produces agent trajectories that look like realistic developer workflows, then uses patch agreement between two rollouts as a soft signal of correctness. The process is: First rollout: A function is sampled from a real repository. The teacher model, GLM-4.6 in the SERA-32B setup, receives a bug style or change description and operates with tools to view files, edit code, and run commands. It produces a trajectory T1 and a patch P1. Synthetic pull request: The system converts the trajectory into a pull request like description. This text summarizes intent and key edits in a format similar to real pull requests. Second rollout: The teacher starts again from the original repository, but now it only sees the pull request description and the tools. It produces a new trajectory T2 and patch P2 that tries to implement the described change. Soft verification: The patches P1 and P2 are compared line by line. A recall score r is computed as the fraction of modified lines in P1 that appear in P2. When r equals 1 the trajectory is hard verified. For intermediate values, the sample is soft verified. The key result from the ablation study is that strict verification is not required. When models are trained on T2 trajectories with different thresholds on r, even r equals 0, performance on SWE bench Verified is similar at a fixed sample count. This suggests that realistic multi step traces, even if noisy, are valuable supervision for coding agents. https://allenai.org/blog/open-coding-agents Data scale, training, and cost SVG is applied to 121 Python repositories derived from the SWE-smith corpus. Across GLM-4.5 Air and GLM-4.6 teacher runs, the full SERA datasets contain more than 200,000 trajectories from both rollouts, making this one of the largest open coding agent datasets. SERA-32B is trained on a subset of 25,000 T2 trajectories from the Sera-4.6-Lite T2 dataset. Training uses standard supervised fine tuning with Axolotl on Qwen-3-32B for 3 epochs, learning rate 1e-5, weight decay 0.01, and maximum sequence length 32,768 tokens. Many trajectories are longer than the context limit. The research team define a truncation ratio, the fraction of steps that fit into 32K tokens. They then prefer trajectories that already fit, and for the rest they select slices with high truncation ratio. This ordered truncation strategy clearly outperforms random truncation when they compare SWE bench Verified scores. The reported compute budget for SERA-32B, including data generation and training, is about 40 GPU days. Using a scaling law over dataset size and performance, the research team estimated that the SVG approach is around 26 times cheaper than reinforcement learning based systems such as SkyRL-Agent and 57 times cheaper than earlier synthetic data pipelines such as SWE-smith for reaching similar SWE-bench scores. https://allenai.org/blog/open-coding-agents Repository specialization A central use case is adapting an agent to a specific repository. The research team studies this on three major SWE-bench Verified projects, Django, SymPy, and Sphinx. For each repository, SVG generates on the order of 46,000 to 54,000 trajectories. Due to compute limits, the specialization experiments train on 8,000 trajectories per repository, mixing 3,000 soft verified T2 trajectories with 5,000 filtered T1 trajectories. At 32K context, these specialized students match or slightly outperform the GLM-4.5-Air teacher, and also compare well with Devstral-Small-2 on those repository subsets. For Django, a specialized student reaches 52.23 percent resolve rate versus 51.20 percent for GLM-4.5-Air. For SymPy, the specialized model reaches 51.11 percent versus 48.89 percent for GLM-4.5-Air. Key Takeaways SERA turns coding agents into a supervised learning problem: SERA-32B is trained with standard supervised fine tuning on synthetic trajectories from GLM-4.6, with no reinforcement learning loop and no dependency on repository test suites. Soft Verified Generation removes the need for tests: SVG uses two rollouts and patch overlap between P1 and P2 to compute a soft verification score, and the research team show that even unverified or weakly verified trajectories can train effective coding agents. Large, realistic agent dataset from real repositories: The pipeline applies SVG to 121 Python projects from the SWE smith corpus, producing more than 200,000 trajectories and creating one of the largest open datasets for coding agents. Efficient training with explicit cost and scaling analysis: SERA-32B trains on 25,000 T2 trajectories and the scaling study shows that SVG is about 26 times cheaper than SkyRL-Agent and 57 times cheaper than SWE-smith at similar SWE bench Verified performance. Check out the Paper, Repo and Model Weights. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post AI2 Releases SERA, Soft Verified Coding Agents Built with Supervised Training Only for Practical Repository Level Automation Workflows appeared first on MarkTechPost.

AI2 Releases SERA, Soft Verified Coding Agents Built with Supervised Training Only for Practical Repository Level Automation Workflows Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Robbyant Open Sources LingBot World: a Real Time World Model for Interactive Simulation and Embodied AI

Robbyant, the embodied AI unit inside Ant Group, has open sourced LingBot-World, a large scale world model that turns video generation into an interactive simulator for embodied agents, autonomous driving and games. The system is designed to render controllable environments with high visual fidelity, strong dynamics and long temporal horizons, while staying responsive enough for real time control. From text to video to text to world Most text to video models generate short clips that look realistic but behave like passive movies. They do not model how actions change the environment over time. LingBot-World is built instead as an action conditioned world model. It learns the transition dynamics of a virtual world, so that keyboard and mouse inputs, together with camera motion, drive the evolution of future frames. Formally, the model learns the conditional distribution of future video tokens, given past frames, language prompts and discrete actions. At training time, it predicts sequences up to about 60 seconds. At inference time, it can autoregressively roll out coherent video streams that extend to around 10 minutes, while keeping scene structure stable. Data engine, from web video to interactive trajectories A core design in LingBot-World is a unified data engine. It provides rich, aligned supervision for how actions change the world while covering diverse real scenes. The data acquisition pipeline combines 3 sources: Large scale web videos of humans, animals and vehicles, from both first person and third person views Game data, where RGB frames are strictly paired with user controls such as W, A, S, D and camera parameters Synthetic trajectories rendered in Unreal Engine, where clean frames, camera intrinsics and extrinsics and object layouts are all known After collection, a profiling stage standardizes this heterogeneous corpus. It filters for resolution and duration, segments videos into clips and estimates missing camera parameters using geometry and pose models. A vision language model scores clips for quality, motion magnitude and view type, then selects a curated subset. On top of this, a hierarchical captioning module builds 3 levels of text supervision: Narrative captions for whole trajectories, including camera motion Scene static captions that describe environment layout without motion Dense temporal captions for short time windows that focus on local dynamics This separation lets the model disentangle static structure from motion patterns, which is important for long horizon consistency. Architecture, MoE video backbone and action conditioning LingBot-World starts from Wan2.2, a 14B parameter image to video diffusion transformer. This backbone already captures strong open domain video priors. Robbyant team extends it into a mixture of experts DiT, with 2 experts. Each expert has about 14B parameters, so the total parameter count is 28B, but only 1 expert is active at each denoising step. This keeps inference cost similar to a dense 14B model while expanding capacity. A curriculum extends training sequences from 5 seconds to 60 seconds. The schedule increases the proportion of high noise timesteps, which stabilizes global layouts over long contexts and reduces mode collapse for long rollouts. To make the model interactive, actions are injected directly into the transformer blocks. Camera rotations are encoded with Plücker embeddings. Keyboard actions are represented as multi hot vectors over keys such as W, A, S, D. These encodings are fused and passed through adaptive layer normalization modules, which modulate hidden states in the DiT. Only the action adapter layers are fine tuned, the main video backbone stays frozen, so the model retains visual quality from pre training while learning action responsiveness from a smaller interactive dataset. Training uses both image to video and video to video continuation tasks. Given a single image, the model can synthesize future frames. Given a partial clip, it can extend the sequence. This results in an internal transition function that can start from arbitrary time points. LingBot World Fast, distillation for real time use The mid-trained model, LingBot-World Base, still relies on multi step diffusion and full temporal attention, which are expensive for real time interaction. Robbyant team introduces LingBot-World-Fast as an accelerated variant. The fast model is initialized from the high noise expert and replaces full temporal attention with block causal attention. Inside each temporal block, attention is bidirectional. Across blocks, it is causal. This design supports key value caching, so the model can stream frames autoregressively with lower cost. Distillation uses a diffusion forcing strategy. The student is trained on a small set of target timesteps, including timestep 0, so it sees both noisy and clean latents. Distribution Matching Distillation is combined with an adversarial discriminator head. The adversarial loss updates only the discriminator. The student network is updated with the distillation loss, which stabilizes training while preserving action following and temporal coherence. In experiments, LingBot World Fast reaches 16 frames per second when processing 480p videos on a system with 1 GPU node, and, maintains end to end interaction latency under 1 second for real time control. Emergent memory and long horizon behavior One of the most interesting properties of LingBot-World is emergent memory. The model maintains global consistency without explicit 3D representations such as Gaussian splatting. When the camera moves away from a landmark such as Stonehenge and returns after about 60 seconds, the structure reappears with consistent geometry. When a car leaves the frame and later reenters, it appears at a physically plausible location, not frozen or reset. The model can also sustain ultra long sequences. The research team shows coherent video generation that extends up to 10 minutes, with stable layout and narrative structure.] VBench results and comparison to other world models For quantitative evaluation, the research team used VBench on a curated set of 100 generated videos, each longer than 30 seconds. LingBot-World is compared to 2 recent world models, Yume-1.5 and HY-World-1.5. On VBench, LingBot World reports: https://arxiv.org/pdf/2601.20540v1 These scores are higher than both baselines for imaging quality, aesthetic quality and dynamic degree. The dynamic degree margin is large, 0.8857 compared to 0.7612 and 0.7217, which indicates richer scene transitions and more complex motion that respond to user inputs. Motion

Robbyant Open Sources LingBot World: a Real Time World Model for Interactive Simulation and Embodied AI Leggi l'articolo »

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT