YouZum

Actualités

AI, Committee, Actualités, Uncategorized

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations

We implement an advanced, end-to-end Kornia tutorial and demonstrate how modern, differentiable computer vision can be built entirely in PyTorch. We start by constructing GPU-accelerated, synchronized augmentation pipelines for images, masks, and keypoints, then move into differentiable geometry by optimizing a homography directly through gradient descent. We also show how learned feature matching with LoFTR integrates with Kornia’s RANSAC to estimate robust homographies and produce a simple stitched output, even under constrained or offline-safe conditions. Finally, we ground these ideas in practice by training a lightweight CNN on CIFAR-10 using Kornia’s GPU augmentations, highlighting how research-grade vision pipelines translate naturally into learning systems. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import os, math, time, random, urllib.request from dataclasses import dataclass from typing import Tuple import sys, subprocess def pip_install(pkgs): subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”] + pkgs) pip_install([ “kornia==0.8.2”, “torch”, “torchvision”, “matplotlib”, “numpy”, “opencv-python-headless” ]) import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import torchvision import torchvision.transforms.functional as TF import matplotlib.pyplot as plt import cv2 import kornia import kornia.augmentation as K import kornia.geometry.transform as KG from kornia.geometry.ransac import RANSAC from kornia.feature import LoFTR torch.manual_seed(0) np.random.seed(0) random.seed(0) print(“Torch:”, torch.__version__) print(“Kornia:”, kornia.__version__) print(“Device:”, device) We begin by setting up a fully reproducible environment, installing Kornia and its core dependencies to ensure GPU-accelerated, differentiable computer vision runs smoothly in Google Colab. We then import and organize PyTorch, Kornia, and supporting libraries, establishing a clean foundation for geometry, augmentation, and feature-matching workflows. We set the random seed and select the available compute device so that all subsequent experiments remain deterministic, debuggable, and performance-aware. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def to_tensor_img_uint8(img_bgr_uint8: np.ndarray) -> torch.Tensor: img_rgb = cv2.cvtColor(img_bgr_uint8, cv2.COLOR_BGR2RGB) t = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0 return t.unsqueeze(0) def show(img_t: torch.Tensor, title: str = “”, max_size: int = 900): x = img_t.detach().float().cpu().clamp(0, 1) if x.shape[1] == 1: x = x.repeat(1, 3, 1, 1) x = x[0].permute(1, 2, 0).numpy() h, w = x.shape[:2] scale = min(1.0, max_size / max(h, w)) if scale < 1.0: x = cv2.resize(x, (int(w * scale), int(h * scale)), interpolation=cv2.INTER_AREA) plt.figure(figsize=(7, 5)) plt.imshow(x) plt.axis(“off”) plt.title(title) plt.show() def show_mask(mask_t: torch.Tensor, title: str = “”): x = mask_t.detach().float().cpu().clamp(0, 1)[0, 0].numpy() plt.figure(figsize=(6, 4)) plt.imshow(x) plt.axis(“off”) plt.title(title) plt.show() def download(url: str, path: str): os.makedirs(os.path.dirname(path), exist_ok=True) if not os.path.exists(path): urllib.request.urlretrieve(url, path) def safe_download(url: str, path: str) -> bool: try: os.makedirs(os.path.dirname(path), exist_ok=True) if not os.path.exists(path): urllib.request.urlretrieve(url, path) return True except Exception as e: print(“Download failed:”, e) return False def make_grid_mask(h: int, w: int, cell: int = 32) -> torch.Tensor: yy, xx = torch.meshgrid(torch.arange(h), torch.arange(w), indexing=”ij”) m = (((yy // cell) % 2) ^ ((xx // cell) % 2)).float() return m.unsqueeze(0).unsqueeze(0) def draw_matches(img0_rgb: np.ndarray, img1_rgb: np.ndarray, pts0: np.ndarray, pts1: np.ndarray, max_draw: int = 200) -> np.ndarray: h0, w0 = img0_rgb.shape[:2] h1, w1 = img1_rgb.shape[:2] out = np.zeros((max(h0, h1), w0 + w1, 3), dtype=np.uint8) out[:h0, :w0] = img0_rgb out[:h1, w0:w0+w1] = img1_rgb n = min(len(pts0), len(pts1), max_draw) if n == 0: return out idx = np.random.choice(len(pts0), size=n, replace=False) if len(pts0) > n else np.arange(n) for i in idx: x0, y0 = pts0[i] x1, y1 = pts1[i] x1_shift = x1 + w0 p0 = (int(round(x0)), int(round(y0))) p1 = (int(round(x1_shift)), int(round(y1))) cv2.circle(out, p0, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA) cv2.circle(out, p1, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA) cv2.line(out, p0, p1, (255, 255, 255), 1, lineType=cv2.LINE_AA) return out def normalize_img_for_loftr(img_rgb01: torch.Tensor) -> torch.Tensor: if img_rgb01.shape[1] == 3: return kornia.color.rgb_to_grayscale(img_rgb01) return img_rgb01 We define a set of reusable helper utilities for image conversion, visualization, safe data downloading, and synthetic mask generation, keeping the vision pipeline clean and modular. We also implement robust visualization and matching helpers that allow us to inspect augmented images, masks, and LoFTR correspondences directly during experimentation. We normalize image inputs to the exact tensor formats expected by Kornia and LoFTR, ensuring that all downstream geometry and feature-matching components operate consistently and correctly. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n[1] Differentiable augmentations: image + mask + keypoints”) B, C, H, W = 1, 3, 256, 384 img = torch.rand(B, C, H, W, device=device) mask = make_grid_mask(H, W, cell=24).to(device) kps = torch.tensor([[ [40.0, 40.0], [W – 50.0, 50.0], [W * 0.6, H * 0.8], [W * 0.25, H * 0.65], ]], device=device) aug = K.AugmentationSequential( K.RandomResizedCrop((224, 224), scale=(0.6, 1.0), ratio=(0.8, 1.25), p=1.0), K.RandomHorizontalFlip(p=0.5), K.RandomRotation(degrees=18.0, p=0.7), K.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8), data_keys=[“input”, “mask”, “keypoints”], same_on_batch=True ).to(device) img_aug, mask_aug, kps_aug = aug(img, mask, kps) print(“image:”, tuple(img.shape), “->”, tuple(img_aug.shape)) print(“mask :”, tuple(mask.shape), “->”, tuple(mask_aug.shape)) print(“kps :”, tuple(kps.shape), “->”, tuple(kps_aug.shape)) print(“Example keypoints (before -> after):”) print(torch.cat([kps[0], kps_aug[0]], dim=1)) show(img, “Original (synthetic)”) show_mask(mask, “Original mask (synthetic)”) show(img_aug, “Augmented (synced)”) show_mask(mask_aug, “Augmented mask (synced)”) We construct a synchronized, fully differentiable augmentation pipeline that applies the same geometric transformations to images, masks, and keypoints on the GPU. We generate synthetic data to clearly demonstrate how spatial consistency is preserved across modalities while still introducing realistic variability through cropping, rotation, flipping, and color jitter. We visualize the before-and-after results to verify that the augmented images, segmentation masks, and keypoints remain perfectly aligned after transformation. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n[2] Differentiable homography alignment by optimization”) base = torch.rand(1, 1, 240, 320, device=device) show(base, “Base image (grayscale)”) true_H_px = torch.eye(3, device=device).unsqueeze(0) true_H_px[:, 0, 2] = 18.0 true_H_px[:, 1, 2] = -12.0 true_H_px[:, 0, 1] = 0.03 true_H_px[:, 1, 0] = -0.02 true_H_px[:, 2, 0] = 1e-4 true_H_px[:, 2, 1] = -8e-5 target = KG.warp_perspective(base, true_H_px, dsize=(base.shape[-2], base.shape[-1]), align_corners=True) show(target, “Target (base warped by true homography)”) p = torch.zeros(1, 8, device=device, requires_grad=True) def params_to_H(p8: torch.Tensor) -> torch.Tensor: Bp = p8.shape[0] Hm = torch.eye(3, device=p8.device).unsqueeze(0).repeat(Bp, 1, 1) Hm[:, 0, 0] = 1.0 + p8[:, 0] Hm[:, 0, 1] = p8[:, 1] Hm[:, 0, 2] = p8[:, 2] Hm[:, 1, 0] = p8[:, 3] Hm[:, 1, 1] = 1.0 + p8[:, 4] Hm[:, 1, 2] = p8[:, 5] Hm[:, 2, 0] = p8[:, 6] Hm[:, 2, 1] = p8[:, 7] return Hm opt = torch.optim.Adam([p], lr=0.08) losses = [] for step in

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations Lire l’article »

AI, Committee, Actualités, Uncategorized

SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs

arXiv:2509.20758v3 Announce Type: replace Abstract: Supervised Fine-Tuning (SFT) on domain-specific datasets is a common approach to adapt Large Language Models (LLMs) to specialized tasks but is often believed to degrade their general capabilities. In this work, we revisit this trade-off and present both empirical and theoretical insights. First, we show that SFT does not always hurt: using a smaller learning rate can substantially mitigate general performance degradation while preserving comparable target-domain performance. We then provide a theoretical analysis that explains these phenomena and further motivates a new method, Token-Adaptive Loss Reweighting (TALR). Building on this, and recognizing that smaller learning rates alone do not fully eliminate general-performance degradation in all cases, we evaluate a range of strategies for reducing general capability loss, including L2 regularization, LoRA, model averaging, FLOW, and our proposed TALR. Experimental results demonstrate that while no method completely eliminates the trade-off, TALR consistently outperforms these baselines in balancing domain-specific gains and general capabilities. Finally, we distill our findings into practical guidelines for adapting LLMs to new domains: (i) using a small learning rate to achieve a favorable trade-off, and (ii) when a stronger balance is further desired, adopt TALR as an effective strategy.

SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs Lire l’article »

AI, Committee, Actualités, Uncategorized

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters

Maia 200 is Microsoft’s new in house AI accelerator designed for inference in Azure datacenters. It targets the cost of token generation for large language models and other reasoning workloads by combining narrow precision compute, a dense on chip memory hierarchy and an Ethernet based scale up fabric. Why Microsoft built a dedicated inference chip? Training and inference stress hardware in different ways. Training needs very large all to all communication and long running jobs. Inference cares about tokens per second, latency and tokens per dollar. Microsoft positions Maia 200 as its most efficient inference system, with about 30 percent better performance per dollar than the latest hardware in its fleet. Maia 200 is part of a heterogeneous Azure stack. It will serve multiple models, including the latest GPT 5.2 models from OpenAI, and will power workloads in Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use the chip for synthetic data generation and reinforcement learning to improve in house models. Core silicon and numeric specifications Each Maia 200 die is fabricated on TSMC’s 3 nanometer process. The chip integrates more than 140 billion transistors. The compute pipeline is built around native FP8 and FP4 tensor cores. A single chip delivers more than 10 petaFLOPS in FP4 and more than 5 petaFLOPS in FP8, within a 750W SoC TDP envelope. Memory is split between stacked HBM and on die SRAM. Maia 200 provides 216 GB of HBM3e with about 7TB per second of bandwidth and 272MB of on die SRAM. The SRAM is organized into tile level SRAM and cluster level SRAM and is fully software managed. Compilers and runtimes can place working sets explicitly to keep attention and GEMM kernels close to compute. Tile based microarchitecture and memory hierarchy The Maia 200 microarchitecture is hierarchical. The base unit is the tile. A tile is the smallest autonomous compute and storage unit on the chip. Each tile includes a Tile Tensor Unit for high throughput matrix operations and a Tile Vector Processor as a programmable SIMD engine. Tile SRAM feeds both units and tile DMA engines move data in and out of SRAM without stalling compute. A Tile Control Processor orchestrates the sequence of tensor and DMA work. Multiple tiles form a cluster. Each cluster exposes a larger multi banked Cluster SRAM that is shared across tiles in that cluster. Cluster level DMA engines move data between Cluster SRAM and the co packaged HBM stacks. A cluster core coordinates multi tile execution and uses redundancy schemes for tiles and SRAM to improve yield while keeping the same programming model. This hierarchy lets the software stack pin different parts of the model in different tiers. For example, attention kernels can keep Q, K, V tensors in tile SRAM, while collective communication kernels can stage payloads in cluster SRAM and reduce HBM pressure. The design goal is sustained high utilization when models grow in size and sequence length. On chip data movement and Ethernet scale up fabric Inference is often limited by data movement, not peak compute. Maia 200 uses a custom Network on Chip along with a hierarchy of DMA engines. The Network on Chip spans tiles, clusters, memory controllers and I/O units. It has separate planes for large tensor traffic and for small control messages. This separation keeps synchronization and small outputs from being blocked behind large transfers. Beyond the chip boundary, Maia 200 integrates its own NIC and an Ethernet based scale up network that runs the AI Transport Layer protocol. The on-die NIC exposes about 1.4 TB per second in each direction, or 2.8 TB per second bidirectional bandwidth, and scales to 6,144 accelerators in a two tier domain. Within each tray, four Maia accelerators form a Fully Connected Quad. These four devices have direct non switched links to each other. Most tensor parallel traffic stays inside this group, while only lighter collective traffic goes out to switches. This improves latency and reduces switch port count for typical inference collectives. Azure system integration and cooling At system level, Maia 200 follows the same rack, power and mechanical standards as Azure GPU servers. It supports air cooled and liquid cooled configurations and uses a second generation closed loop liquid cooling Heat Exchanger Unit for high density racks. This allows mixed deployments of GPUs and Maia accelerators in the same datacenter footprint. The accelerator integrates with the Azure control plane. Firmware management, health monitoring and telemetry use the same workflows as other Azure compute services. This enables fleet wide rollouts and maintenance without disrupting running AI workloads. Key Takeaways Here are 5 concise, technical takeaways: Inference first design: Maia 200 is Microsoft’s first silicon and system platform built only for AI inference, optimized for large scale token generation in modern reasoning models and large language models. Numeric specs and memory hierarchy: The chip is fabricated on TSMCs 3nm, integrates about 140 billion transistors and delivers more than 10 PFLOPS FP4 and more than 5 PFLOPS FP8, with 216 GB HBM3e at 7TB per second along with 272 MB on chip SRAM split into tile SRAM and cluster SRAM and managed in software. Performance versus other cloud accelerators: Microsoft reports about 30 percent better performance per dollar than the latest Azure inference systems and claims 3 times FP4 performance of third generation Amazon Trainium and higher FP8 performance than Google TPU v7 at the accelerator level. Tile based architecture and Ethernet fabric: Maia 200 organizes compute into tiles and clusters with local SRAM, DMA engines and a Network on Chip, and exposes an integrated NIC with about 1.4 TB per second per direction Ethernet bandwidth that scales to 6,144 accelerators using Fully Connected Quad groups as the local tensor parallel domain. The post Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters appeared first on MarkTechPost.

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters Lire l’article »

AI, Committee, Actualités, Uncategorized

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding

DeepSeek AI released DeepSeek-OCR 2, an open source document OCR and understanding system that restructures its vision encoder to read pages in a causal order that is closer to how humans scan complex documents. The key component is DeepEncoder V2, a language model style transformer that converts a 2D page into a 1D sequence of visual tokens that already follow a learned reading flow before text decoding starts. https://github.com/deepseek-ai/DeepSeek-OCR-2 From raster order to causal visual flow Most multimodal models still flatten images into a fixed raster sequence, top left to bottom right, and apply a transformer with static positional encodings. This is a poor match for documents with multi column layouts, nested tables, and mixed language regions. Human readers instead follow a semantic order that jumps between regions. DeepSeek-OCR 2 keeps the encoder and decoder structure of DeepSeek-OCR, but replaces the original CLIP ViT based visual encoder with DeepEncoder V2. The decoder remains DeepSeek-3B-A500M, a MoE language model with about 3B total parameters and about 500M active parameters per token. The goal is to let the encoder perform causal reasoning over visual tokens and to hand the decoder a sequence that is already aligned with a likely reading order. Vision tokenizer and token budget The vision tokenizer is inherited from DeepSeek-OCR. It uses an 80M parameter SAM base backbone followed by 2 convolution layers. This stage downsamples the image so that the visual token count is reduced by a factor of 16 and compresses features into an embedding dimension of 896. DeepSeek-OCR 2 uses a global and local multi crop strategy to cover dense pages without letting the token count explode. A global view at 1024 × 1024 resolution produces 256 tokens. Up to 6 local crops at 768 × 768 resolution add 144 tokens each. As a result, the visual token count ranges from 256 to 1120 per page. This upper bound is slightly smaller than the 1156 token budget used in the original DeepSeek-OCR’s Gundam mode, and it is comparable to the budget used by Gemini-3 Pro on OmniDocBench. DeepEncoder-V2, language model as vision encoder DeepEncoder-V2 is built by instantiating a Qwen2-0.5B style transformer as the vision encoder. The input sequence is constructed as follows. First, all visual tokens from the tokenizer form the prefix. Then a set of learnable query tokens, called causal flow tokens, is appended as the suffix. The number of causal flow tokens equals the number of visual tokens. The attention pattern is asymmetric. Visual tokens use bidirectional attention and see all other visual tokens. Causal flow tokens use causal attention and can see all visual tokens and only previous causal flow tokens. Only the outputs at causal flow positions are passed to the decoder. In effect, the encoder learns a mapping from a 2D grid of visual tokens into a 1D causal sequence of flow tokens that encode a proposed reading order and local context. This design decomposes the problem into 2 stages. DeepEncoder-V2 performs causal reasoning over visual structure and reading order. DeepSeek-3B-A500M then performs causal decoding over text conditioned on this reordered visual input. https://github.com/deepseek-ai/DeepSeek-OCR-2 Training pipeline The training data pipeline follows DeepSeek-OCR and focuses on OCR intensive content. OCR data accounts for 80 percent of the mixture. The research team rebalances the sampling across text, formulas, and tables using a 3:1:1 ratio so that the model sees enough structure heavy examples. Training runs in 3 stages: In stage 1, encoder pretraining couples DeepEncoder-V2 to a small decoder and uses a standard language modeling objective. The model is trained at 768×768 and 1024×1024 resolutions with multi scale sampling. The vision tokenizer is initialized from the original DeepEncoder. The LLM style encoder is initialized from Qwen2-0.5B base. The optimizer is AdamW with cosine learning rate decay from 1e-4 to 1e-6 over 40k iterations. Training uses about 160 A100 GPUs, sequence length 8k with packing, and a large mixture of document image text samples. In stage 2, query enhancement attaches DeepEncoder-V2 to DeepSeek-3B-A500M and introduces multi crop views. The tokenizer is frozen. The encoder and decoder are jointly trained with 4 stage pipeline parallelism and 40 data parallel replicas. The global batch size is 1280 and the schedule runs for 15k iterations with learning rate decay from 5e-5 to 1e-6. In stage 3, all encoder parameters are frozen. Only the DeepSeek decoder is trained to better adapt to the reordered visual tokens. This stage uses the same batch size but a shorter schedule and a lower learning rate that decays from 1e-6 to 5e-8 over 20k iterations. Freezing the encoder more than doubles training throughput at this stage. Benchmark results on OmniDocBench The main evaluation uses OmniDocBench-v1.5. This benchmark contains 1355 pages in 9 document categories in Chinese and English, including books, academic papers, forms, presentations, and newspapers. Each page is annotated with layout elements such as text spans, equations, tables, and figures. DeepSeek-OCR 2 achieves an overall OmniDocBench score of 91.09 with a visual token maximum of 1120. The original DeepSeek-OCR baseline scores 87.36 with a token maximum of 1156. DeepSeek-OCR 2 therefore gains 3.73 points while using a slightly smaller token budget. Reading order (R-order) Edit Distance, which measures the difference between predicted and ground truth reading sequences, drops from 0.085 to 0.057. Text edit distance falls from 0.073 to 0.048. Formula and table edit distances also decrease, which indicates better parsing of math and structured regions. Viewed as a document parser, DeepSeek-OCR-2 achieves overall element level edit distance 0.100. The original DeepSeek-OCR reaches 0.129 and Gemini-3 Pro reaches 0.115 under similar visual token constraints. This suggests that the causal visual flow encoder improves structural fidelity without expanding the token budget. Category wise, DeepSeek-OCR-2 improves text edit distance for most document types, such as academic papers and books. Performance is weaker on very dense newspapers, where text edit distance remains above 0.13. The research team link this to limited training data for newspapers and heavy compression on extreme text density. Reading order metrics, however, improve across all categories. https://github.com/deepseek-ai/DeepSeek-OCR-2 Key

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding Lire l’article »

AI, Committee, Actualités, Uncategorized

How the sometimes-weird world of lifespan extension is gaining influence

For the last couple of years, I’ve been following the progress of a group of individuals who believe death is humanity’s “core problem.” Put simply, they say death is wrong—for everyone. They’ve even said it’s morally wrong. They established what they consider a new philosophy, and they called it Vitalism. Vitalism is more than a philosophy, though—it’s a movement for hardcore longevity enthusiasts who want to make real progress in finding treatments that slow or reverse aging. Not just through scientific advances, but by persuading influential people to support their movement, and by changing laws and policies to open up access to experimental drugs. And they’re starting to make progress. Vitalism was founded by Adam Gries and Nathan Cheng—two men who united over their shared desire to find ways to extend human lifespan. I first saw Cheng speak back in 2023, at Zuzalu, a pop-up city in Montenegro for people who were interested in life extension and some other technologies. (It was an interesting experience—you can read more about it here.) Zuzalu was where Gries and Cheng officially launched Vitalism. But I’ve been closely following the longevity scene since 2022. That journey took me to Switzerland, Honduras, and a compound in Berkeley, California, where like-minded longevity enthusiasts shared their dreams of life extension. It also took me to Washington, DC, where, last year, supporters of lifespan extension presented politicians including Mehmet Oz, who currently leads the Centers for Medicare & Medicaid Services, with their case for changes to laws and policies. The journey has been fascinating, and at times weird and even surreal. I’ve heard biohacking stories that ended with smoking legs. I’ve been told about a multi-partner relationship that might be made possible through the cryopreservation—and subsequent reanimation—of a man and the multiple wives he’s had throughout his life. I’ve had people tell me to my face that they consider themselves eugenicists, and that they believe that parents should select IVF embryos for their propensity for a long life. I’ve seen people draw blood during dinner in an upscale hotel restaurant to test their biological age. I’ve heard wild plans to preserve human consciousness and resurrect it in machines. Others have told me their plans to inject men’s penises with multiple doses of an experimental gene therapy in order to treat erectile dysfunction and ultimately achieve “radical longevity.” I’ve been shouted at and threatened with legal action. I’ve received barefoot hugs. One interviewee told me I needed Botox. It’s been a ride. My reporting has also made me realize that the current interest in longevity reaches beyond social media influencers and wellness centers. Longevity clinics are growing in number, and there’s been a glut of documentaries about living longer or even forever. At the same time, powerful people who influence state laws, giant federal funding budgets, and even national health policy are prioritizing the search for treatments that slow or reverse aging. The longevity community was thrilled when longtime supporter Jim O’Neill was made deputy secretary of health and human services last year. Other members of Trump’s administration, including Oz, have spoken about longevity too. “It seems that now there is the most pro-longevity administration in American history,” Gries told me. I recently spoke to Alicia Jackson, the new director of ARPA-H. The agency, established in 2022 under Joe Biden’s presidency, funds “breakthrough” biomedical research. And it appears to have a new focus on longevity. Jackson previously founded and led Evernow, a company focused on “health and longevity for every woman.” “There’s a lot of interesting technologies, but they all kind of come back to the same thing: Could we extend life years?” she told me over a Zoom call a few weeks ago. She added that her agency had “incredible support” from “the very top of HHS.” I asked if she was referring to Jim O’Neill. “Yeah,” she said. She wouldn’t go into the specifics. Gries is right: There is a lot of support for advances in longevity treatments, and some of it is coming from influential people in positions of power. Perhaps the field really is poised for a breakthrough. And that’s what makes this field so fascinating to cover. Despite the occasional weirdness. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

How the sometimes-weird world of lifespan extension is gaining influence Lire l’article »

AI, Committee, Actualités, Uncategorized

Google DeepMind Unveils AlphaGenome: A Unified Sequence-to-Function Model Using Hybrid Transformers and U-Nets to Decode the Human Genome

Google DeepMind is expanding its biological toolkit beyond the world of protein folding. After the success of AlphaFold, the Google’s research team has introduced AlphaGenome. This is a unified deep learning model designed for sequence to function genomics. This represents a major shift in how we model the human genome. AlphaGenome does not treat DNA as simple text. Instead, it processes 1,000,000 base pair windows of raw DNA to predict the functional state of a cell. Bridging the Scale Gap with Hybrid Architectures The complexity of the human genome comes from its scale. Most existing models struggle to see the big picture while keeping track of fine details. AlphaGenome solves this by using a hybrid architecture. It combines a U-Net backbone with Transformer blocks. This allows the model to capture long range interactions across 1 Megabase of sequence while maintaining base pair resolution. This is like building a system that can read a thousand page book and still remember the exact location of a single comma. Mapping Sequences to Functional Biological Modalities AlphaGenome is a sequence to function model. This means its primary goal is to map DNA sequences directly to biological activities. These activities are measured in genomic tracks. The research team trained AlphaGenome to predict 11 different genomic modalities. These modalities include RNA-seq, CAGE, and ATAC-seq. They also include ChIP-seq for various transcription factors and chromatin contact maps. By predicting all these tracks at once, the model gains a holistic understanding of how DNA regulates the cell. The Power of Multi-Task Learning in Genomics The technical advancement of AlphaGenome lies in its ability to handle 11 distinct types of data simultaneously. In the past, researchers often built separate models for each task. AlphaGenome uses a multi-task learning approach. This helps the model learn shared features across different biological processes. If the model understands how a protein binds to DNA, it can better predict how that DNA will be expressed as RNA. This unified approach reduces the need for multiple specialized models. Advancing Variant Effect Prediction via Distillation One of the most critical applications for AlphaGenome is Variant Effect Prediction, or VEP. This process determines how a single mutation in DNA affects the body. Mutations can lead to diseases like cancer or heart disease. AlphaGenome excels at this by using a specific training method called Teacher Student distillation. The research team first created an ensemble of ‘all folds’ teacher models. These teachers were trained on vast amounts of genomic data. Then, they distilled that knowledge into a single student model. Compressing Knowledge for Precision Medicine This distillation process makes the model both faster and more robust. This is a standard way to compress knowledge. However, applying it to genomics at this scale is a new milestone. The student model learns to replicate the high quality predictions of the teacher ensemble. This allows it to identify harmful mutations with high accuracy. The model can even predict how a mutation in a distant regulatory element might impact a gene far away on the DNA strand. High-Performance Computing with JAX and TPUs The architecture is implemented using JAX. JAX is a high performance numerical computing library. It is often used for high scale machine learning at Google. Using JAX allows AlphaGenome to run efficiently on Tensor Processing Units, or TPUs. The research team used sequence parallelism to handle the massive 1 Megabase input windows. This ensures that the memory requirements do not explode as the sequence length increases. This shows the importance of selecting the right framework for large scale biological data. Transfer Learning for Data-Scarce Cell Types AlphaGenome also addresses the challenge of data scarcity in certain cell types. Because it is a foundation model, it can be fine tuned for specific tasks. The model learns general biological rules from large public datasets. These rules can then be applied to rare diseases or specific tissues where data is hard to find. This transfer learning capability is one of the reasons why AlphaGenome is so versatile. It can predict how a gene will behave in a brain cell even if it was primarily trained on liver cell data. Toward a New Era of Personalized Care In the future, AlphaGenome could lead to a new era of personalized medicine. Doctors could use the model to scan a patient’s entire genome in 1,000,000 base pair chunks. They could identify exactly which variants are likely to cause health issues. This would allow for treatments that are tailored to a person’s specific genetic code. AlphaGenome moves us closer to this reality by providing a clear and accurate map of the functional genome. Setting the Standard for Biological AI AlphaGenome also marks a turning point for AI in genomics. It proves that we can model the most complex biological systems using the same principles used in modern AI. By combining U-Net structures with Transformers and using teacher student distillation, Google DeepMind team has set a new standard. Key Takeaways Hybrid Sequence Architecture: AlphaGenome uses a specialized hybrid design that combines a U-Net backbone with Transformer blocks. This allows the model to process massive windows of 1,000,000 base pairs while maintaining the high resolution needed to identify single mutations. Multi-Modal Functional Prediction: The model is trained to predict 11 different genomic modalities simultaneously, which include RNA-seq, CAGE, and ATAC-seq. By learning these various biological tracks together, the system gains a holistic understanding of how DNA regulates cellular activity across different tissues. Teacher-Student Distillation: To achieve industry leading accuracy in Variant Effect Prediction (VEP), researchers used a distillation method. They transferred the knowledge from an ensemble of high performing ‘teacher’ models into a single, efficient ‘student’ model that is faster and more robust for identifying disease-causing mutations. Built for High Performance Computing: The framework is implemented in JAX and optimized for TPUs. By using sequence parallelism, AlphaGenome can handle the computational load of analyzing megabase scale DNA sequences without exceeding memory limits, making it a powerful tool for large scale research. Check out the Paper and Repo. Also, feel free to follow us

Google DeepMind Unveils AlphaGenome: A Unified Sequence-to-Function Model Using Hybrid Transformers and U-Nets to Decode the Human Genome Lire l’article »

AI, Committee, Actualités, Uncategorized

TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning

arXiv:2510.07118v2 Announce Type: replace Abstract: Instruction tuning is essential for aligning large language models (LLMs) to downstream tasks and commonly relies on large, diverse corpora. However, small, high-quality subsets, known as coresets, can deliver comparable or superior results, though curating them remains challenging. Existing methods often rely on coarse, sample-level signals like gradients, an approach that is computationally expensive and overlooks fine-grained features. To address this, we introduce TRIM (Token Relevance via Interpretable Multi-layer Attention), a forward-only, token-centric framework. Instead of using gradients, TRIM operates by matching underlying representational patterns identified via attention-based “fingerprints” from a handful of target samples. Such an approach makes TRIM highly efficient and uniquely sensitive to the structural features that define a task. Coresets selected by our method consistently outperform state-of-the-art baselines by up to 9% on downstream tasks and even surpass the performance of full-data fine-tuning in some settings. By avoiding expensive backward passes, TRIM achieves this at a fraction of the computational cost. These findings establish TRIM as a scalable and efficient alternative for building high-quality instruction-tuning datasets.

TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning Lire l’article »

AI, Committee, Actualités, Uncategorized

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

arXiv:2506.07972v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have demonstrated significant advancements in reasoning and agent-based problem-solving, current evaluation methodologies fail to adequately assess their capabilities: existing benchmarks either rely on closed-ended questions prone to saturation and memorization, or subjective comparisons that lack consistency and rigor. In this work, we introduce HeuriGym, an agentic framework designed for evaluating heuristic algorithms generated by LLMs for combinatorial optimization problems, characterized by clearly defined objectives and expansive solution spaces. HeuriGym empowers LLMs to propose heuristics, receive evaluative feedback via code execution, and iteratively refine their solutions. We evaluate nine state-of-the-art models on nine problems across domains such as computer systems, logistics, and biology, exposing persistent limitations in tool use, planning, and adaptive reasoning. To quantify performance, we propose the Quality-Yield Index (QYI), a metric that captures both solution pass rate and quality. Even top models like GPT-o4-mini-high and Gemini-2.5-Pro attain QYI scores of only 0.6, well below the expert baseline of 1. Our open-source benchmark aims to guide the development of LLMs toward more effective and realistic problem-solving in scientific and engineering domains.

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization Lire l’article »

AI, Committee, Actualités, Uncategorized

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

arXiv:2510.13907v2 Announce Type: replace Abstract: Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization based on pairwise preference feedback from an LLM judge. PDO casts prompt selection as a dueling-bandit problem and combines (i) Double Thompson Sampling to prioritize informative comparisons under a fixed judge budget, with (ii) top-performer guided mutation to expand the candidate pool while pruning weak prompts. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently identifies stronger prompts than label-free baselines, while offering favorable quality–cost trade-offs under constrained comparison budgets.

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization Lire l’article »

AI, Committee, Actualités, Uncategorized

Meet the Vitalists: the hardcore longevity enthusiasts who believe death is “wrong”

“Who here believes involuntary death is a good thing?”  Nathan Cheng has been delivering similar versions of this speech over the last couple of years, so I knew what was coming. He was about to try to convince the 80 or so people in the audience that death is bad. And that defeating it should be humanity’s number one priority—quite literally, that it should come above all else in the social and political hierarchy. “If you believe that life is good and there’s inherent moral value to life,” he told them, “it stands to reason that the ultimate logical conclusion here is that we should try to extend lifespan indefinitely.”  Solving aging, he added, is “a problem that has an incredible moral duty for all of us to get involved in.” It was the end of April, and the crowd—with its whoops and yeahs—certainly seemed convinced. They’d gathered at a compound in Berkeley, California, for a three-day event called the Vitalist Bay Summit. It was part of a longer, two-month residency (simply called Vitalist Bay) that hosted various events to explore tools—from drug regulation to cryonics—that might be deployed in the fight against death. One of the main goals, though, was to spread the word of Vitalism, a somewhat radical movement established by Cheng and his colleague Adam Gries a few years ago. No relation to the lowercase vitalism of old, this Vitalism has a foundational philosophy that’s deceptively simple: to acknowledge that death is bad and life is good. The strategy for executing it, though, is far more obviously complicated: to launch a longevity revolution.  Interest in longevity has certainly taken off in recent years, but as the Vitalists see it, it has a branding problem. The term “longevity” has been used to sell supplements with no evidence behind them, “anti-aging” has been used by clinics to sell treatments, and “transhumanism” relates to ideas that go well beyond the scope of defeating death. Not everyone in the broader longevity space shares Vitalists’ commitment to actually making death obsolete. As Gries, a longtime longevity devotee who has largely become the enthusiastic public face of Vitalism, said in an online presentation about the movement in 2024, “We needed some new word.” “Vitalism” became a clean slate: They would start a movement to defeat death, and make that goal the driving force behind the actions of individuals, societies, and nations. Longevity could no longer be a sideshow. For Vitalism to succeed, budgets would need to change. Policy would need to change. Culture would need to change. Consider it longevity for the most hardcore adherents—a sweeping mission to which nothing short of total devotion will do. “The idea is to change the systems and the priorities of society at the highest levels,” Gries said in the presentation. To be clear, the effective anti-aging treatments the Vitalists are after don’t yet exist. But that’s sort of the point: They believe they could exist if Vitalists are able to spread their gospel, influence science, gain followers, get cash, and ultimately reshape government policies and priorities.  For the past few years, Gries and Cheng have been working to recruit lobbyists, academics, biotech CEOs, high-net-worth individuals, and even politicians into the movement, and they’ve formally established a nonprofit foundation “to accelerate Vitalism.” Today, there’s a growing number of Vitalists (some paying foundation members, others more informal followers, and still others who support the cause but won’t publicly admit as much), and the foundation has started “certifying” qualifying biotech companies as Vitalist organizations. Perhaps most consequentially, Gries, Cheng, and their peers are also getting involved in shaping US state laws that make unproven, experimental treatments more accessible. They hope to be able to do the same at the national level. VITALISMFOUNDATION.ORG VITALISMFOUNDATION.ORG Vitalism cofounders Nathan Cheng and Adam Gries want to launch a longevity revolution. All this is helping Vitalists grow in prominence, if not also power. In the past, people who have spoken of living forever or making death “optional” have been dismissed by their academic colleagues. I’ve been covering the broader field of aging science for a decade, and I’ve seen scientists roll their eyes, shrug their shoulders, and turn their backs on people who have talked this way. That’s not the case for the Vitalists.   Even the scientists who think that Vitalist ideas of defeating death are wacky, unattainable ones, with the potential to discredit their field, have shown up on stage with Vitalism’s founders, and these serious researchers provide a platform for them at more traditionally academic events. I saw this collegiality firsthand at Vitalist Bay. Faculty members from Harvard, Stanford, and the University of California, Berkeley, all spoke at events. Eric Verdin, the prominent researcher who directs the Buck Institute for Research on Aging in Novato, California, had also planned to speak, although a scheduling clash meant he couldn’t make it in the end. “I have very different ideas in terms of what’s doable,” he told me. “But that’s part of the [longevity] movement—there’s freedom for people to say whatever they want.”  Many other well-respected scientists attended, including representatives of ARPA-H, the US federal agency for health research and breakthrough technologies. And as I left for a different event on longevity in Washington, DC, just after the Vitalist Bay Summit, a sizable group of Vitalist Bay attendees headed that way too, to make the case for longevity to US lawmakers. The Vitalists feel that momentum is building, not just for the science of aging and the development of lifespan-extending therapies, but for the acceptance of their philosophy that defeating death should be humanity’s top concern.  This, of course, sparks some pretty profound questions. What would a society without death look like—and would we even want it? After all, death has become an important part of human culture the world over. And even if Vitalists aren’t destined to realize their lofty goal, their growing influence could still have implications for us all. As they run more labs and companies, and insert themselves into the

Meet the Vitalists: the hardcore longevity enthusiasts who believe death is “wrong” Lire l’article »

We use cookies to improve your experience and performance on our website. You can learn more at Politique de confidentialité and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
fr_FR