News Archives - YouZum

Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance

admin NU / enero 31, 2026

arXiv:2601.21611v1 Announce Type: cross Abstract: Effective relevance modeling is crucial for e-commerce search, as it aligns search results with user intent and enhances customer experience. Recent work has leveraged large language models (LLMs) to address the limitations of traditional relevance models, especially for long-tail and ambiguous queries. By incorporating Chain-of-Thought (CoT) reasoning, these approaches improve both accuracy and interpretability through multi-step reasoning. However, two key limitations remain: (1) most existing approaches rely on single-perspective CoT reasoning, which fails to capture the multifaceted nature of e-commerce relevance (e.g., user intent vs. attribute-level matching vs. business-specific rules); and (2) although CoT-enhanced LLM’s offer rich reasoning capabilities, their high inference latency necessitates knowledge distillation for real-time deployment, yet current distillation methods discard the CoT rationale structure at inference, using it as a transient auxiliary signal and forfeiting its reasoning utility. To address these challenges, we propose a novel framework that better exploits CoT semantics throughout the optimization pipeline. Specifically, the teacher model leverages Multi-Perspective CoT (MPCoT) to generate diverse rationales and combines Supervised Fine-Tuning (SFT) with Direct Preference Optimization (DPO) to construct a more robust reasoner. For distillation, we introduce Latent Reasoning Knowledge Distillation (LRKD), which endows a student model with a lightweight inference-time latent reasoning extractor, allowing efficient and low-latency internalization of the LLM’s sophisticated reasoning capabilities. Evaluated in offline experiments and online A/B tests on an e-commerce search advertising platform serving tens of millions of users daily, our method delivers significant offline gains, showing clear benefits in both commercial performance and user experience.

Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance Leer entrada »

AI, Committee, Noticias, Uncategorized

Probing Neural Topology of Large Language Models

admin NU / enero 31, 2026

arXiv:2506.01042v3 Announce Type: replace Abstract: Probing large language models (LLMs) has yielded valuable insights into their internal mechanisms by linking neural activations to interpretable semantics. However, the complex mechanisms that link neuron’s functional co-activation with the emergent model capabilities remains largely unknown, hindering a deeper understanding and safer development of LLMs. In this work, we introduce graph probing, a method for uncovering the functional connectivity of LLM neurons and relating it to language generation performance. By probing models across diverse LLM families and scales, we discover a universal predictability of language generation and understanding performance using only neural topology, which persists even when retaining just 1% of neuron connections. Strikingly, probing on topology outperforms probing on activation by up to 130.4% and 67.7% on perplexity and space/time semantic regression respectively, suggesting that neural topology contains orders of richer information of LLM performance than neural activation, which can be easily extracted with simple linear or MLP probes. To explain the dependence between neural topology and language performance, we identify default networks and hub neurons in LLMs and provide causal evidence by interventional experiments on multiple benchmarks, showing that LLMs actually exploit these topological information. Further analyses suggest that graph probing can be effectively leveraged to improve the efficiency and reliability of LLMs through proof-of-concept applications in model pruning and hallucination detection. Codes and data for the graph probing toolbox are available at https://github.com/DavyMorgan/llm-graph-probing.

Probing Neural Topology of Large Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

admin NU / enero 31, 2026

arXiv:2507.04746v2 Announce Type: replace Abstract: Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts. We make our code and data publicly available.

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic Leer entrada »

AI, Committee, Noticias, Uncategorized

RaZeR: Pushing the Limits of NVFP4 Quantization with Redundant Zero Remapping

admin NU / enero 31, 2026

arXiv:2501.04052v2 Announce Type: replace-cross Abstract: The recently introduced NVFP4 format demonstrates remarkable performance and memory benefits for quantized large language model (LLM) inference. However, we observe two types of redundancy in NVFP4 encoding: (1) The FP4 element format naturally exposes an unused quantization value due to its sign-magnitude representation that contains both positive and negative zeros. (2) The FP8 block scaling factor has an unused sign bit because it is always positive. Additionally, we find that LLM weights are more tolerant to a lower-precision block scaling factor. Based on these observations, we propose Redundant Zero Remapping (RaZeR), an enhanced numerical format that pushes the limits of NVFP4 for more accurate LLM quantization under the same memory footprint. RaZeR leverages the redundant bits of the block scaling factor to adaptively remap the redundant FP4 zero to additional quantization values with improved accuracy. To demonstrate the practicality of RaZeR, we design efficient GPU kernels for RaZeR-quantized LLM inference and propose novel hardware to natively support this. Extensive experiments validate RaZeR’s superior performance for 4-bit LLM quantization. For example, relative to native NVFP4, RaZeR reduces the average perplexity loss by 34.6% and 31.2% under weight-only and weight-activation quantization, respectively.

RaZeR: Pushing the Limits of NVFP4 Quantization with Redundant Zero Remapping Leer entrada »

AI, Committee, Noticias, Uncategorized

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

admin NU / enero 31, 2026

arXiv:2601.21682v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing LLM unlearning methods rarely consider the continual and high-volume nature of real-world deletion requests, which can cause utility degradation and catastrophic forgetting as requests accumulate. To address this challenge, we introduce fit, a framework for continual unlearning that handles large numbers of deletion requests while maintaining robustness against both catastrophic forgetting and post-unlearning recovery. fit mitigates degradation through rigorous data underline{F}iltering, underline{I}mportance-aware updates, and underline{T}argeted layer attribution, enabling stable performance across long sequences of unlearning operations and achieving a favorable balance between forgetting effectiveness and utility retention. To support realistic evaluation, we present textbf{PCH}, a benchmark covering textbf{P}ersonal information, textbf{C}opyright, and textbf{H}armful content in sequential deletion scenarios, along with two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), which jointly assess forgetting quality and utility preservation. Extensive experiments on four open-source LLMs with hundreds of deletion requests show that fit achieves the strongest trade-off between F.D. and R.U., surpasses existing methods on MMLU, CommonsenseQA, and GSM8K, and remains resistant against both relearning and quantization recovery attacks.

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning Leer entrada »

AI, Committee, Noticias, Uncategorized

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations

admin NU / enero 30, 2026

We implement an advanced, end-to-end Kornia tutorial and demonstrate how modern, differentiable computer vision can be built entirely in PyTorch. We start by constructing GPU-accelerated, synchronized augmentation pipelines for images, masks, and keypoints, then move into differentiable geometry by optimizing a homography directly through gradient descent. We also show how learned feature matching with LoFTR integrates with Kornia’s RANSAC to estimate robust homographies and produce a simple stitched output, even under constrained or offline-safe conditions. Finally, we ground these ideas in practice by training a lightweight CNN on CIFAR-10 using Kornia’s GPU augmentations, highlighting how research-grade vision pipelines translate naturally into learning systems. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import os, math, time, random, urllib.request from dataclasses import dataclass from typing import Tuple import sys, subprocess def pip_install(pkgs): subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”] + pkgs) pip_install([ “kornia==0.8.2”, “torch”, “torchvision”, “matplotlib”, “numpy”, “opencv-python-headless” ]) import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import torchvision import torchvision.transforms.functional as TF import matplotlib.pyplot as plt import cv2 import kornia import kornia.augmentation as K import kornia.geometry.transform as KG from kornia.geometry.ransac import RANSAC from kornia.feature import LoFTR torch.manual_seed(0) np.random.seed(0) random.seed(0) print(“Torch:”, torch.__version__) print(“Kornia:”, kornia.__version__) print(“Device:”, device) We begin by setting up a fully reproducible environment, installing Kornia and its core dependencies to ensure GPU-accelerated, differentiable computer vision runs smoothly in Google Colab. We then import and organize PyTorch, Kornia, and supporting libraries, establishing a clean foundation for geometry, augmentation, and feature-matching workflows. We set the random seed and select the available compute device so that all subsequent experiments remain deterministic, debuggable, and performance-aware. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def to_tensor_img_uint8(img_bgr_uint8: np.ndarray) -> torch.Tensor: img_rgb = cv2.cvtColor(img_bgr_uint8, cv2.COLOR_BGR2RGB) t = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0 return t.unsqueeze(0) def show(img_t: torch.Tensor, title: str = “”, max_size: int = 900): x = img_t.detach().float().cpu().clamp(0, 1) if x.shape[1] == 1: x = x.repeat(1, 3, 1, 1) x = x[0].permute(1, 2, 0).numpy() h, w = x.shape[:2] scale = min(1.0, max_size / max(h, w)) if scale < 1.0: x = cv2.resize(x, (int(w * scale), int(h * scale)), interpolation=cv2.INTER_AREA) plt.figure(figsize=(7, 5)) plt.imshow(x) plt.axis(“off”) plt.title(title) plt.show() def show_mask(mask_t: torch.Tensor, title: str = “”): x = mask_t.detach().float().cpu().clamp(0, 1)[0, 0].numpy() plt.figure(figsize=(6, 4)) plt.imshow(x) plt.axis(“off”) plt.title(title) plt.show() def download(url: str, path: str): os.makedirs(os.path.dirname(path), exist_ok=True) if not os.path.exists(path): urllib.request.urlretrieve(url, path) def safe_download(url: str, path: str) -> bool: try: os.makedirs(os.path.dirname(path), exist_ok=True) if not os.path.exists(path): urllib.request.urlretrieve(url, path) return True except Exception as e: print(“Download failed:”, e) return False def make_grid_mask(h: int, w: int, cell: int = 32) -> torch.Tensor: yy, xx = torch.meshgrid(torch.arange(h), torch.arange(w), indexing=”ij”) m = (((yy // cell) % 2) ^ ((xx // cell) % 2)).float() return m.unsqueeze(0).unsqueeze(0) def draw_matches(img0_rgb: np.ndarray, img1_rgb: np.ndarray, pts0: np.ndarray, pts1: np.ndarray, max_draw: int = 200) -> np.ndarray: h0, w0 = img0_rgb.shape[:2] h1, w1 = img1_rgb.shape[:2] out = np.zeros((max(h0, h1), w0 + w1, 3), dtype=np.uint8) out[:h0, :w0] = img0_rgb out[:h1, w0:w0+w1] = img1_rgb n = min(len(pts0), len(pts1), max_draw) if n == 0: return out idx = np.random.choice(len(pts0), size=n, replace=False) if len(pts0) > n else np.arange(n) for i in idx: x0, y0 = pts0[i] x1, y1 = pts1[i] x1_shift = x1 + w0 p0 = (int(round(x0)), int(round(y0))) p1 = (int(round(x1_shift)), int(round(y1))) cv2.circle(out, p0, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA) cv2.circle(out, p1, 2, (255, 255, 255), -1, lineType=cv2.LINE_AA) cv2.line(out, p0, p1, (255, 255, 255), 1, lineType=cv2.LINE_AA) return out def normalize_img_for_loftr(img_rgb01: torch.Tensor) -> torch.Tensor: if img_rgb01.shape[1] == 3: return kornia.color.rgb_to_grayscale(img_rgb01) return img_rgb01 We define a set of reusable helper utilities for image conversion, visualization, safe data downloading, and synthetic mask generation, keeping the vision pipeline clean and modular. We also implement robust visualization and matching helpers that allow us to inspect augmented images, masks, and LoFTR correspondences directly during experimentation. We normalize image inputs to the exact tensor formats expected by Kornia and LoFTR, ensuring that all downstream geometry and feature-matching components operate consistently and correctly. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n[1] Differentiable augmentations: image + mask + keypoints”) B, C, H, W = 1, 3, 256, 384 img = torch.rand(B, C, H, W, device=device) mask = make_grid_mask(H, W, cell=24).to(device) kps = torch.tensor([[ [40.0, 40.0], [W – 50.0, 50.0], [W * 0.6, H * 0.8], [W * 0.25, H * 0.65], ]], device=device) aug = K.AugmentationSequential( K.RandomResizedCrop((224, 224), scale=(0.6, 1.0), ratio=(0.8, 1.25), p=1.0), K.RandomHorizontalFlip(p=0.5), K.RandomRotation(degrees=18.0, p=0.7), K.ColorJiggle(0.2, 0.2, 0.2, 0.1, p=0.8), data_keys=[“input”, “mask”, “keypoints”], same_on_batch=True ).to(device) img_aug, mask_aug, kps_aug = aug(img, mask, kps) print(“image:”, tuple(img.shape), “->”, tuple(img_aug.shape)) print(“mask :”, tuple(mask.shape), “->”, tuple(mask_aug.shape)) print(“kps :”, tuple(kps.shape), “->”, tuple(kps_aug.shape)) print(“Example keypoints (before -> after):”) print(torch.cat([kps[0], kps_aug[0]], dim=1)) show(img, “Original (synthetic)”) show_mask(mask, “Original mask (synthetic)”) show(img_aug, “Augmented (synced)”) show_mask(mask_aug, “Augmented mask (synced)”) We construct a synchronized, fully differentiable augmentation pipeline that applies the same geometric transformations to images, masks, and keypoints on the GPU. We generate synthetic data to clearly demonstrate how spatial consistency is preserved across modalities while still introducing realistic variability through cropping, rotation, flipping, and color jitter. We visualize the before-and-after results to verify that the augmented images, segmentation masks, and keypoints remain perfectly aligned after transformation. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“n[2] Differentiable homography alignment by optimization”) base = torch.rand(1, 1, 240, 320, device=device) show(base, “Base image (grayscale)”) true_H_px = torch.eye(3, device=device).unsqueeze(0) true_H_px[:, 0, 2] = 18.0 true_H_px[:, 1, 2] = -12.0 true_H_px[:, 0, 1] = 0.03 true_H_px[:, 1, 0] = -0.02 true_H_px[:, 2, 0] = 1e-4 true_H_px[:, 2, 1] = -8e-5 target = KG.warp_perspective(base, true_H_px, dsize=(base.shape[-2], base.shape[-1]), align_corners=True) show(target, “Target (base warped by true homography)”) p = torch.zeros(1, 8, device=device, requires_grad=True) def params_to_H(p8: torch.Tensor) -> torch.Tensor: Bp = p8.shape[0] Hm = torch.eye(3, device=p8.device).unsqueeze(0).repeat(Bp, 1, 1) Hm[:, 0, 0] = 1.0 + p8[:, 0] Hm[:, 0, 1] = p8[:, 1] Hm[:, 0, 2] = p8[:, 2] Hm[:, 1, 0] = p8[:, 3] Hm[:, 1, 1] = 1.0 + p8[:, 4] Hm[:, 1, 2] = p8[:, 5] Hm[:, 2, 0] = p8[:, 6] Hm[:, 2, 1] = p8[:, 7] return Hm opt = torch.optim.Adam([p], lr=0.08) losses = [] for step in

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations Leer entrada »

AI, Committee, Noticias, Uncategorized

SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs

admin NU / enero 30, 2026

arXiv:2509.20758v3 Announce Type: replace Abstract: Supervised Fine-Tuning (SFT) on domain-specific datasets is a common approach to adapt Large Language Models (LLMs) to specialized tasks but is often believed to degrade their general capabilities. In this work, we revisit this trade-off and present both empirical and theoretical insights. First, we show that SFT does not always hurt: using a smaller learning rate can substantially mitigate general performance degradation while preserving comparable target-domain performance. We then provide a theoretical analysis that explains these phenomena and further motivates a new method, Token-Adaptive Loss Reweighting (TALR). Building on this, and recognizing that smaller learning rates alone do not fully eliminate general-performance degradation in all cases, we evaluate a range of strategies for reducing general capability loss, including L2 regularization, LoRA, model averaging, FLOW, and our proposed TALR. Experimental results demonstrate that while no method completely eliminates the trade-off, TALR consistently outperforms these baselines in balancing domain-specific gains and general capabilities. Finally, we distill our findings into practical guidelines for adapting LLMs to new domains: (i) using a small learning rate to achieve a favorable trade-off, and (ii) when a stronger balance is further desired, adopt TALR as an effective strategy.

SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs Leer entrada »

AI, Committee, Noticias, Uncategorized

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters

admin NU / enero 30, 2026

Maia 200 is Microsoft’s new in house AI accelerator designed for inference in Azure datacenters. It targets the cost of token generation for large language models and other reasoning workloads by combining narrow precision compute, a dense on chip memory hierarchy and an Ethernet based scale up fabric. Why Microsoft built a dedicated inference chip? Training and inference stress hardware in different ways. Training needs very large all to all communication and long running jobs. Inference cares about tokens per second, latency and tokens per dollar. Microsoft positions Maia 200 as its most efficient inference system, with about 30 percent better performance per dollar than the latest hardware in its fleet. Maia 200 is part of a heterogeneous Azure stack. It will serve multiple models, including the latest GPT 5.2 models from OpenAI, and will power workloads in Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use the chip for synthetic data generation and reinforcement learning to improve in house models. Core silicon and numeric specifications Each Maia 200 die is fabricated on TSMC’s 3 nanometer process. The chip integrates more than 140 billion transistors. The compute pipeline is built around native FP8 and FP4 tensor cores. A single chip delivers more than 10 petaFLOPS in FP4 and more than 5 petaFLOPS in FP8, within a 750W SoC TDP envelope. Memory is split between stacked HBM and on die SRAM. Maia 200 provides 216 GB of HBM3e with about 7TB per second of bandwidth and 272MB of on die SRAM. The SRAM is organized into tile level SRAM and cluster level SRAM and is fully software managed. Compilers and runtimes can place working sets explicitly to keep attention and GEMM kernels close to compute. Tile based microarchitecture and memory hierarchy The Maia 200 microarchitecture is hierarchical. The base unit is the tile. A tile is the smallest autonomous compute and storage unit on the chip. Each tile includes a Tile Tensor Unit for high throughput matrix operations and a Tile Vector Processor as a programmable SIMD engine. Tile SRAM feeds both units and tile DMA engines move data in and out of SRAM without stalling compute. A Tile Control Processor orchestrates the sequence of tensor and DMA work. Multiple tiles form a cluster. Each cluster exposes a larger multi banked Cluster SRAM that is shared across tiles in that cluster. Cluster level DMA engines move data between Cluster SRAM and the co packaged HBM stacks. A cluster core coordinates multi tile execution and uses redundancy schemes for tiles and SRAM to improve yield while keeping the same programming model. This hierarchy lets the software stack pin different parts of the model in different tiers. For example, attention kernels can keep Q, K, V tensors in tile SRAM, while collective communication kernels can stage payloads in cluster SRAM and reduce HBM pressure. The design goal is sustained high utilization when models grow in size and sequence length. On chip data movement and Ethernet scale up fabric Inference is often limited by data movement, not peak compute. Maia 200 uses a custom Network on Chip along with a hierarchy of DMA engines. The Network on Chip spans tiles, clusters, memory controllers and I/O units. It has separate planes for large tensor traffic and for small control messages. This separation keeps synchronization and small outputs from being blocked behind large transfers. Beyond the chip boundary, Maia 200 integrates its own NIC and an Ethernet based scale up network that runs the AI Transport Layer protocol. The on-die NIC exposes about 1.4 TB per second in each direction, or 2.8 TB per second bidirectional bandwidth, and scales to 6,144 accelerators in a two tier domain. Within each tray, four Maia accelerators form a Fully Connected Quad. These four devices have direct non switched links to each other. Most tensor parallel traffic stays inside this group, while only lighter collective traffic goes out to switches. This improves latency and reduces switch port count for typical inference collectives. Azure system integration and cooling At system level, Maia 200 follows the same rack, power and mechanical standards as Azure GPU servers. It supports air cooled and liquid cooled configurations and uses a second generation closed loop liquid cooling Heat Exchanger Unit for high density racks. This allows mixed deployments of GPUs and Maia accelerators in the same datacenter footprint. The accelerator integrates with the Azure control plane. Firmware management, health monitoring and telemetry use the same workflows as other Azure compute services. This enables fleet wide rollouts and maintenance without disrupting running AI workloads. Key Takeaways Here are 5 concise, technical takeaways: Inference first design: Maia 200 is Microsoft’s first silicon and system platform built only for AI inference, optimized for large scale token generation in modern reasoning models and large language models. Numeric specs and memory hierarchy: The chip is fabricated on TSMCs 3nm, integrates about 140 billion transistors and delivers more than 10 PFLOPS FP4 and more than 5 PFLOPS FP8, with 216 GB HBM3e at 7TB per second along with 272 MB on chip SRAM split into tile SRAM and cluster SRAM and managed in software. Performance versus other cloud accelerators: Microsoft reports about 30 percent better performance per dollar than the latest Azure inference systems and claims 3 times FP4 performance of third generation Amazon Trainium and higher FP8 performance than Google TPU v7 at the accelerator level. Tile based architecture and Ethernet fabric: Maia 200 organizes compute into tiles and clusters with local SRAM, DMA engines and a Network on Chip, and exposes an integrated NIC with about 1.4 TB per second per direction Ethernet bandwidth that scales to 6,144 accelerators using Fully Connected Quad groups as the local tensor parallel domain. The post Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters appeared first on MarkTechPost.

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters Leer entrada »

AI, Committee, Noticias, Uncategorized

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding

admin NU / enero 30, 2026

DeepSeek AI released DeepSeek-OCR 2, an open source document OCR and understanding system that restructures its vision encoder to read pages in a causal order that is closer to how humans scan complex documents. The key component is DeepEncoder V2, a language model style transformer that converts a 2D page into a 1D sequence of visual tokens that already follow a learned reading flow before text decoding starts. https://github.com/deepseek-ai/DeepSeek-OCR-2 From raster order to causal visual flow Most multimodal models still flatten images into a fixed raster sequence, top left to bottom right, and apply a transformer with static positional encodings. This is a poor match for documents with multi column layouts, nested tables, and mixed language regions. Human readers instead follow a semantic order that jumps between regions. DeepSeek-OCR 2 keeps the encoder and decoder structure of DeepSeek-OCR, but replaces the original CLIP ViT based visual encoder with DeepEncoder V2. The decoder remains DeepSeek-3B-A500M, a MoE language model with about 3B total parameters and about 500M active parameters per token. The goal is to let the encoder perform causal reasoning over visual tokens and to hand the decoder a sequence that is already aligned with a likely reading order. Vision tokenizer and token budget The vision tokenizer is inherited from DeepSeek-OCR. It uses an 80M parameter SAM base backbone followed by 2 convolution layers. This stage downsamples the image so that the visual token count is reduced by a factor of 16 and compresses features into an embedding dimension of 896. DeepSeek-OCR 2 uses a global and local multi crop strategy to cover dense pages without letting the token count explode. A global view at 1024 × 1024 resolution produces 256 tokens. Up to 6 local crops at 768 × 768 resolution add 144 tokens each. As a result, the visual token count ranges from 256 to 1120 per page. This upper bound is slightly smaller than the 1156 token budget used in the original DeepSeek-OCR’s Gundam mode, and it is comparable to the budget used by Gemini-3 Pro on OmniDocBench. DeepEncoder-V2, language model as vision encoder DeepEncoder-V2 is built by instantiating a Qwen2-0.5B style transformer as the vision encoder. The input sequence is constructed as follows. First, all visual tokens from the tokenizer form the prefix. Then a set of learnable query tokens, called causal flow tokens, is appended as the suffix. The number of causal flow tokens equals the number of visual tokens. The attention pattern is asymmetric. Visual tokens use bidirectional attention and see all other visual tokens. Causal flow tokens use causal attention and can see all visual tokens and only previous causal flow tokens. Only the outputs at causal flow positions are passed to the decoder. In effect, the encoder learns a mapping from a 2D grid of visual tokens into a 1D causal sequence of flow tokens that encode a proposed reading order and local context. This design decomposes the problem into 2 stages. DeepEncoder-V2 performs causal reasoning over visual structure and reading order. DeepSeek-3B-A500M then performs causal decoding over text conditioned on this reordered visual input. https://github.com/deepseek-ai/DeepSeek-OCR-2 Training pipeline The training data pipeline follows DeepSeek-OCR and focuses on OCR intensive content. OCR data accounts for 80 percent of the mixture. The research team rebalances the sampling across text, formulas, and tables using a 3:1:1 ratio so that the model sees enough structure heavy examples. Training runs in 3 stages: In stage 1, encoder pretraining couples DeepEncoder-V2 to a small decoder and uses a standard language modeling objective. The model is trained at 768×768 and 1024×1024 resolutions with multi scale sampling. The vision tokenizer is initialized from the original DeepEncoder. The LLM style encoder is initialized from Qwen2-0.5B base. The optimizer is AdamW with cosine learning rate decay from 1e-4 to 1e-6 over 40k iterations. Training uses about 160 A100 GPUs, sequence length 8k with packing, and a large mixture of document image text samples. In stage 2, query enhancement attaches DeepEncoder-V2 to DeepSeek-3B-A500M and introduces multi crop views. The tokenizer is frozen. The encoder and decoder are jointly trained with 4 stage pipeline parallelism and 40 data parallel replicas. The global batch size is 1280 and the schedule runs for 15k iterations with learning rate decay from 5e-5 to 1e-6. In stage 3, all encoder parameters are frozen. Only the DeepSeek decoder is trained to better adapt to the reordered visual tokens. This stage uses the same batch size but a shorter schedule and a lower learning rate that decays from 1e-6 to 5e-8 over 20k iterations. Freezing the encoder more than doubles training throughput at this stage. Benchmark results on OmniDocBench The main evaluation uses OmniDocBench-v1.5. This benchmark contains 1355 pages in 9 document categories in Chinese and English, including books, academic papers, forms, presentations, and newspapers. Each page is annotated with layout elements such as text spans, equations, tables, and figures. DeepSeek-OCR 2 achieves an overall OmniDocBench score of 91.09 with a visual token maximum of 1120. The original DeepSeek-OCR baseline scores 87.36 with a token maximum of 1156. DeepSeek-OCR 2 therefore gains 3.73 points while using a slightly smaller token budget. Reading order (R-order) Edit Distance, which measures the difference between predicted and ground truth reading sequences, drops from 0.085 to 0.057. Text edit distance falls from 0.073 to 0.048. Formula and table edit distances also decrease, which indicates better parsing of math and structured regions. Viewed as a document parser, DeepSeek-OCR-2 achieves overall element level edit distance 0.100. The original DeepSeek-OCR reaches 0.129 and Gemini-3 Pro reaches 0.115 under similar visual token constraints. This suggests that the causal visual flow encoder improves structural fidelity without expanding the token budget. Category wise, DeepSeek-OCR-2 improves text edit distance for most document types, such as academic papers and books. Performance is weaker on very dense newspapers, where text edit distance remains above 0.13. The research team link this to limited training data for newspapers and heavy compression on extreme text density. Reading order metrics, however, improve across all categories. https://github.com/deepseek-ai/DeepSeek-OCR-2 Key

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding Leer entrada »

AI, Committee, Noticias, Uncategorized

How the sometimes-weird world of lifespan extension is gaining influence

admin NU / enero 30, 2026

For the last couple of years, I’ve been following the progress of a group of individuals who believe death is humanity’s “core problem.” Put simply, they say death is wrong—for everyone. They’ve even said it’s morally wrong. They established what they consider a new philosophy, and they called it Vitalism. Vitalism is more than a philosophy, though—it’s a movement for hardcore longevity enthusiasts who want to make real progress in finding treatments that slow or reverse aging. Not just through scientific advances, but by persuading influential people to support their movement, and by changing laws and policies to open up access to experimental drugs. And they’re starting to make progress. Vitalism was founded by Adam Gries and Nathan Cheng—two men who united over their shared desire to find ways to extend human lifespan. I first saw Cheng speak back in 2023, at Zuzalu, a pop-up city in Montenegro for people who were interested in life extension and some other technologies. (It was an interesting experience—you can read more about it here.) Zuzalu was where Gries and Cheng officially launched Vitalism. But I’ve been closely following the longevity scene since 2022. That journey took me to Switzerland, Honduras, and a compound in Berkeley, California, where like-minded longevity enthusiasts shared their dreams of life extension. It also took me to Washington, DC, where, last year, supporters of lifespan extension presented politicians including Mehmet Oz, who currently leads the Centers for Medicare & Medicaid Services, with their case for changes to laws and policies. The journey has been fascinating, and at times weird and even surreal. I’ve heard biohacking stories that ended with smoking legs. I’ve been told about a multi-partner relationship that might be made possible through the cryopreservation—and subsequent reanimation—of a man and the multiple wives he’s had throughout his life. I’ve had people tell me to my face that they consider themselves eugenicists, and that they believe that parents should select IVF embryos for their propensity for a long life. I’ve seen people draw blood during dinner in an upscale hotel restaurant to test their biological age. I’ve heard wild plans to preserve human consciousness and resurrect it in machines. Others have told me their plans to inject men’s penises with multiple doses of an experimental gene therapy in order to treat erectile dysfunction and ultimately achieve “radical longevity.” I’ve been shouted at and threatened with legal action. I’ve received barefoot hugs. One interviewee told me I needed Botox. It’s been a ride. My reporting has also made me realize that the current interest in longevity reaches beyond social media influencers and wellness centers. Longevity clinics are growing in number, and there’s been a glut of documentaries about living longer or even forever. At the same time, powerful people who influence state laws, giant federal funding budgets, and even national health policy are prioritizing the search for treatments that slow or reverse aging. The longevity community was thrilled when longtime supporter Jim O’Neill was made deputy secretary of health and human services last year. Other members of Trump’s administration, including Oz, have spoken about longevity too. “It seems that now there is the most pro-longevity administration in American history,” Gries told me. I recently spoke to Alicia Jackson, the new director of ARPA-H. The agency, established in 2022 under Joe Biden’s presidency, funds “breakthrough” biomedical research. And it appears to have a new focus on longevity. Jackson previously founded and led Evernow, a company focused on “health and longevity for every woman.” “There’s a lot of interesting technologies, but they all kind of come back to the same thing: Could we extend life years?” she told me over a Zoom call a few weeks ago. She added that her agency had “incredible support” from “the very top of HHS.” I asked if she was referring to Jim O’Neill. “Yeah,” she said. She wouldn’t go into the specifics. Gries is right: There is a lot of support for advances in longevity treatments, and some of it is coming from influential people in positions of power. Perhaps the field really is poised for a breakthrough. And that’s what makes this field so fascinating to cover. Despite the occasional weirdness. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

How the sometimes-weird world of lifespan extension is gaining influence Leer entrada »

Noticias

Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance

Probing Neural Topology of Large Language Models

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

RaZeR: Pushing the Limits of NVFP4 Quantization with Redundant Zero Remapping

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations

SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding

How the sometimes-weird world of lifespan extension is gaining influence

Nuestros servicios

Inicio

Cómo funciona

Noticias

Precios

Soporte

Centro de ayuda

Reportar un problema

Dar comentarios

Política de privacidad

Cuenta de usuario

Síguenos