YouZum

Committee

AI, Committee, News, Uncategorized

OCRmyPDF Tutorial: Convert Scanned Documents into Searchable PDF/A Files with Sidecar Text Extraction and Batch Processing

In this tutorial, we build an advanced, self-contained OCRmyPDF workflow. We start by installing the required system and Python dependencies, then create a synthetic image-only PDF for scanning so we can test OCR without relying on external files. From there, we use OCRmyPDF’s real public API to convert scanned documents into searchable PDFs, generate PDF/A outputs, extract sidecar text, validate the results, compare file sizes, tune Tesseract settings, clean noisy scans, handle already-OCRed files, process images with DPI hints, run OCR in memory, and batch-process multiple PDFs. Through this workflow, we understand how OCRmyPDF can serve as a practical document digitization pipeline for archival, search, extraction, and automated processing tasks. Installing OCRmyPDF System Dependencies Copy CodeCopiedUse a different Browser import io import os import re import sys import time import shutil import logging import textwrap import subprocess from pathlib import Path INSTALL_JBIG2 = True def sh(cmd: str, check: bool = True) -> int: “””Run a shell command, echo it, and show the tail of its output.””” print(f” $ {cmd}”) r = subprocess.run(cmd, shell=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) if r.stdout and r.stdout.strip(): for ln in r.stdout.strip().splitlines()[-12:]: print(” ” + ln) if check and r.returncode != 0: raise RuntimeError(f”Command failed ({r.returncode}): {cmd}”) return r.returncode def install_dependencies() -> None: “””Install OCRmyPDF’s system + Python dependencies for Colab/Ubuntu.””” apt_pkgs = ( “tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd ” “tesseract-ocr-deu tesseract-ocr-fra ” “ghostscript unpaper pngquant poppler-utils qpdf” ) sh(“apt-get update -qq”, check=False) sh(f”DEBIAN_FRONTEND=noninteractive apt-get install -y -qq {apt_pkgs}”) sh(f'”{sys.executable}” -m pip install -q –upgrade ocrmypdf img2pdf “pillow<12″‘) if INSTALL_JBIG2 and shutil.which(“jbig2”) is None: try: build_pkgs = (“autoconf automake libtool pkg-config ” “libleptonica-dev zlib1g-dev build-essential git”) sh(f”DEBIAN_FRONTEND=noninteractive apt-get install -y -qq {build_pkgs}”) sh(“rm -rf /tmp/jbig2enc && ” “git clone -q https://github.com/agl/jbig2enc.git /tmp/jbig2enc”) sh(“cd /tmp/jbig2enc && ./autogen.sh >/dev/null 2>&1 && ” “./configure >/dev/null 2>&1 && make -j2 >/dev/null 2>&1 && ” “make install >/dev/null 2>&1 && ldconfig”) print(” jbig2enc:”, “installed” if shutil.which(“jbig2”) else “built, but binary not on PATH”) except Exception as e: print(” jbig2enc build skipped (optional):”, e) def ensure_installed() -> None: have_tools = bool(shutil.which(“tesseract”) and shutil.which(“gs”)) try: import ocrmypdf import img2pdf from PIL import Image have_py = True except Exception: have_py = False if have_tools and have_py: print(“Dependencies already present — skipping installation.”) else: print(“Installing dependencies (first run can take a few minutes)…”) install_dependencies() ensure_installed() We set up the complete OCRmyPDF environment for Google Colab by importing the required standard libraries and defining the installation workflow. We install system tools such as Tesseract, Ghostscript, unpaper, pngquant, poppler, and qpdf, along with Python packages like OCRmyPDF, img2pdf, and Pillow. We also optionally build jbig2enc so that advanced PDF optimization can produce smaller outputs for scanned documents. Loading OCRmyPDF and Building Synthetic Scans Copy CodeCopiedUse a different Browser def _purge(*prefixes): for name in [m for m in list(sys.modules) if any(m == p or m.startswith(p + “.”) for p in prefixes)]: del sys.modules[name] def _load_ocrmypdf(): _purge(“PIL”, “ocrmypdf”) import ocrmypdf return ocrmypdf try: ocrmypdf = _load_ocrmypdf() except ImportError as e: if “_Ink” in str(e) or “PIL” in str(e): print(“Repairing an incompatible Pillow (reinstalling pillow<12)…”) sh(f'”{sys.executable}” -m pip install -q –force-reinstall “pillow<12″‘) try: ocrmypdf = _load_ocrmypdf() print(“Pillow repaired — continuing without a restart.”) except Exception: raise RuntimeError( “Pillow is still incompatible in this session. Use the Colab menu: ” “Runtime > Restart session, then run this cell again.” ) else: raise from ocrmypdf.exceptions import ( ExitCode, PriorOcrFoundError, EncryptedPdfError, MissingDependencyError, TaggedPDFError, DigitalSignatureError, DpiError, InputFileError, UnsupportedImageFormatError, ) from ocrmypdf.helpers import check_pdf from ocrmypdf.pdfa import file_claims_pdfa import img2pdf from PIL import Image, ImageDraw, ImageFont, ImageFilter logging.basicConfig(level=logging.WARNING, format=”%(levelname)s: %(message)s”) logging.getLogger(“ocrmypdf”).setLevel(logging.WARNING) logging.getLogger(“pdfminer”).setLevel(logging.ERROR) logging.getLogger(“PIL”).setLevel(logging.WARNING) SAMPLE_TEXT_PAGES = [ “Optical Character Recognition, commonly abbreviated as OCR, is the ” “process of converting images of typed or printed text into machine ” “encoded text. This page was generated as a synthetic scan so that the ” “OCRmyPDF pipeline has something realistic to recognize and search.”, “On 14 March 2026 the archive contained 1,482 pages across 37 folders. ” “Roughly 92 percent of those pages were scanned at 200 to 300 dots per ” “inch. The remaining 8 percent were skewed and required deskewing before ” “any reliable recognition was possible.”, “After OCRmyPDF finishes, the output is a searchable PDF/A file. You can ” “select text, copy it, and run full text search across thousands of ” “documents. The original image resolution is preserved while a hidden ” “text layer is placed accurately underneath the page image.”, ] def _find_font(): for cand in ( “/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf”, “/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf”, ): if os.path.exists(cand): return cand return None _FONT_PATH = _find_font() FONT = ImageFont.truetype(_FONT_PATH, 40) if _FONT_PATH else ImageFont.load_default() def _add_speckle(img, n=6000, dark=60): “””Sprinkle light dark specks to imitate scanner noise (motivates –clean).””” import random px = img.load() w, h = img.size for _ in range(n): px[random.randint(0, w – 1), random.randint(0, h – 1)] = random.randint(0, dark) return img def render_page(text, skew=False): “””Render one A4 page (1654×2339 px ≈ 200 DPI) of dark text on white.””” W, H = 1654, 2339 img = Image.new(“L”, (W, H), 255) draw = ImageDraw.Draw(img) draw.multiline_text((150, 180), textwrap.fill(text, width=58), fill=25, font=FONT, spacing=18) if skew: img = img.rotate(6, resample=Image.BICUBIC, expand=False, fillcolor=255) img = img.filter(ImageFilter.GaussianBlur(0.6)) img = _add_speckle(img) return img def build_scanned_pdf(pdf_path: Path, pages_text, skew_index=1): “””Render pages to PNGs and wrap them losslessly into an image-only PDF.””” pngs = [] for i, text in enumerate(pages_text): img = render_page(text, skew=(i == skew_index)) p = pdf_path.parent / f”_pg_{pdf_path.stem}_{i}.png” img.save(p, format=”PNG”, dpi=(200, 200)) pngs.append(str(p)) with open(pdf_path, “wb”) as f: f.write(img2pdf.convert(pngs)) for p in pngs: os.remove(p) return pdf_path def do_ocr(input_file, output_file, **kw): “””Wrapper around ocrmypdf.ocr() that disables the progress bar and times it.””” kw.setdefault(“progress_bar”, False) t0 = time.perf_counter() rc = ocrmypdf.ocr(input_file, output_file, **kw) return rc, time.perf_counter() – t0 def tokens(s: str): return re.findall(r”[a-z0-9]+”, s.lower()) def kb(path) -> str: return f”{Path(path).stat().st_size / 1024:,.1f} KB” def banner(title: str): line = “─” * 74 print(f”n{line}n {title}n{line}”) We safely load OCRmyPDF and repair Pillow compatibility issues if they appear in the Colab runtime. We import OCRmyPDF exceptions, PDF validation helpers, img2pdf, and Pillow utilities used throughout the tutorial. We also define the sample document text and helper functions for rendering synthetic scanned pages,

OCRmyPDF Tutorial: Convert Scanned Documents into Searchable PDF/A Files with Sidecar Text Extraction and Batch Processing Read Post »

AI, Committee, News, Uncategorized

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

Liquid AI shipped LFM2.5-230M, it’s the company’s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face. The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware. TL;DR Liquid AI’s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2. Runs on-device at 213 tok/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction. Tuned for tool use and extraction; not for math, code generation, or creative writing. Day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293–375 MB footprint. What is LFM2.5-230M? LFM2.5-230M is a 230-million-parameter, text-only model. It is built on the LFM2 architecture. The model has 14 layers total. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query attention (GQA) blocks. The hybrid layout targets fast CPU inference. The context length is 32,768 tokens. The vocabulary size is 65,536. The knowledge cutoff is mid-2024. It supports ten languages, including English, Chinese, Arabic, and Japanese. Liquid AI team ships two checkpoints. LFM2.5-230M-Base is the pre-trained model for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned version. The license is lfm1.0. Training and Post-Training The model was pre-trained on 19 trillion tokens. That total includes a 32K context extension phase. The post-training recipe then runs in three stages. First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. Second is direct preference optimization (DPO). Third is multi-domain reinforcement learning. This preserves flexibility for downstream specialization. The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks. Benchmark Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use. The instruction-following results support that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51. Model Params IFEval IFBench CaseReportBench BFCLv4 MMLU-Pro LFM2.5-230M 230M 71.71 38.40 22.51 21.03 20.25 LFM2.5-350M 350M 76.96 40.69 32.45 21.86 20.01 Granite 4.0-H-350M 350M 61.27 17.22 12.44 13.28 13.14 Qwen3.5-0.8B (Instruct) 800M 59.94 22.87 13.83 18.70 37.42 Gemma 3 1B IT 1B 63.49 20.33 2.28 7.17 14.04 LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B’s 37.42. It is also weak on some agentic tool use. On τ²-Bench Telecom it scores just 5.26. Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing. Use Cases With Examples The model fits two jobs well. The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields. A 4-bit build with a 293–375 MB memory footprint runs that on commodity CPUs. You extract locally, with no per-token API bill. The second job is lightweight on-device agentic workloads. Think a home automation hub that turns speech into tool calls. Or a phone assistant that routes a request to the right function. As an early signal, Liquid AI deployed the model on a Unitree G1 humanoid robot. It ran entirely on the robot’s onboard NVIDIA Jetson Orin. There the model acted as a skill-selection layer. It turned one natural-language instruction into a sequence of tool calls. Those calls invoked low-level skills from NVIDIA’s SONIC framework. Tool Use: How It Works LFM2.5 supports function calling in four steps. You define tools as JSON in the system prompt. The model writes a Pythonic function call between special tokens. You execute the call and return the result. The model then writes a plain-text answer. By default the call is a Python list. It sits between the <|tool_call_start|> and <|tool_call_end|> tokens. Here is the documented pattern, with the tool JSON abbreviated: Copy CodeCopiedUse a different Browser <|im_start|>system List of tools: [{“name”: “get_candidate_status”, “parameters”: {“candidate_id”: {“type”: “string”}}}]<|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id=”12345″)]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> You can also force JSON-formatted calls through the system prompt. Running It: A Minimal Example The model works with Transformers 5.0.0 and up. The recommended generation settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Note the do_sample=True flag, which is required for those sampling settings to apply. Copy CodeCopiedUse a different Browser from transformers import AutoModelForCausalLM, AutoTokenizer model_id = “LiquidAI/LFM2.5-230M” model = AutoModelForCausalLM.from_pretrained( model_id, device_map=”auto”, dtype=”bfloat16″, ) tokenizer = AutoTokenizer.from_pretrained(model_id) inputs = tokenizer.apply_chat_template( [{“role”: “user”, “content”: “What is C. elegans?”}], add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors=”pt”, ).to(model.device) output = model.generate( **inputs, do_sample=True, temperature=0.1, top_k=50, repetition_penalty=1.05, max_new_tokens=512, ) print(tokenizer.decode(output[0][inputs[“input_ids”].shape[-1]:], skip_special_tokens=True)) Liquid AI also publishes fine-tuning recipes. They cover SFT, DPO, and GRPO with LoRA, via Unsloth and TRL. Each ships as a Colab notebook. Interactive Explainer Check out the Model weight on HF, Technical details and Docs. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference appeared first on MarkTechPost.

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference Read Post »

AI, Committee, News, Uncategorized

OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access

OpenAI has begun a limited preview of GPT-5.6, its next-generation model series. The lineup splits into three named tiers: Sol, Terra, and Luna. Sol is the flagship. Terra targets everyday production work. Luna is the fast, low-cost option. OpenAI is starting with a small group of trusted partners through the API and Codex. According to OpenAI post, they shared the models and plans with the U.S. government first. Broader access in ChatGPT, Codex, and the API is planned in the coming weeks. The change is mostly structural. GPT-5.6 introduces tiered models, two new reasoning modes, and a heavier safety stack. What is GPT-5.6? GPT-5.6 is a family, not a single model. OpenAI also changed how it names releases. The number now marks the generation. The names mark durable capability tiers. Each tier can advance on its own schedule. That gives developers a clearer choice across intelligence, speed, and cost. OpenAI calls Sol its strongest model yet. It cites gains in coding, biology, and cybersecurity. Terra matches GPT-5.5 performance while costing roughly half as much. Luna brings strong capability at OpenAI’s lowest price. New Reasoning Modes: max and ultra GPT-5.6 adds two reasoning controls. The first is a new max reasoning effort. It gives Sol the most time to reason deeply. The second is ultra mode. Instead of one model working alone, ultra leverages subagents. These subagents split complex work to accelerate it. Think of it this way. The max setting deepens a single chain of reasoning. The ultra mode coordinates several workers on one task. Both trade latency and cost for accuracy on long-horizon problems. Interactive Explainer Benchmark OpenAI shared a preview set of evaluations. Sol sets a new state of the art on Terminal-Bench 2.1. The benchmark tests command-line workflows that need planning, iteration, and tool coordination. Model / mode Terminal-Bench 2.1 GPT-5.6 Sol (ultra) 91.91% GPT-5.6 Sol (max) 88.76% Claude Mythos 5 88% GPT-5.5 83.4% source: venturebeat On Agent’s Last Exam, Sol was the only model past the halfway mark. It reached 50.9% in ‘code mode,’. On GeneBench v1, Sol beat GPT-5.5 on long-horizon genomics analysis. It did so while using fewer tokens. On ExploitBench, OpenAI reports Sol was competitive with Mythos Preview using about one-third of the output tokens. Pricing and Access GPT-5.6 is priced per one million tokens. Caching behavior also changes. Model Input / 1M Output / 1M Best for Sol $5 $30 Long-horizon coding, security, agents Terra $2.50 $15 High-volume production work Luna $1 $6 Fast, routine, low-cost tasks Sol’s $5/$30 matches GPT-5.5’s pricing. Terra is about 2x cheaper than GPT-5.5. Prompt caching now supports explicit cache breakpoints and a 30-minute minimum cache life. Cache writes cost 1.25x the uncached input rate. Cache reads keep the 90% discount. OpenAI also plans to run Sol on Cerebras hardware. It targets up to 750 tokens per second in July. Use Cases With Examples Long-horizon coding agents: Sol’s Terminal-Bench gains suit multi-step CLI automation. Example: an agent that plans, edits files, runs tests, then iterates. High-volume production: Terra fits chat features and document processing at scale. Example: summarizing thousands of support tickets each day at lower cost. Latency-sensitive apps: Luna suits autocomplete, routing, and simple extraction. Example: classifying inbound emails before a heavier model handles edge cases. Defensive security work: Sol targets vulnerability research and patching. Example: reviewing a codebase to find and fix a memory bug. Strengths and Open Questions Strengths Clear tiering across cost, speed, and intelligence New ultra subagent mode for complex, parallel work Reported state-of-the-art on Terminal-Bench 2.1 Token-efficiency gains on biology and cyber benchmarks A documented, layered safety stack Open questions Access is limited to about 20 partners at preview Public benchmark detail is partial until general availability Safeguards may block some legitimate dual-use security work Pricing sits above some open-weight competitors like GLM-5.2 Real-world latency for max and ultra is not yet public Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access appeared first on MarkTechPost.

OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access Read Post »

AI, Committee, News, Uncategorized

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving. TL;DR DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay. A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy. Offline, accepted length rises 26–31% over Eagle3 and 16–18% over DFlash. In production on DeepSeek-V4, per-user generation runs 60–85% faster than the MTP-1 baseline. Output stays lossless, and the checkpoints plus DeepSpec training code are open-source. What is DSpark? Speculative decoding splits generation into two roles. A small draft model proposes a block of tokens. The full target model then verifies that block in one forward pass. Rejection sampling accepts the longest valid prefix and appends one bonus token. Because the rule preserves the target distribution exactly, there is no quality loss. DSpark keeps this guarantee. It changes how tokens are drafted and how many get verified. The Latency Math it Optimizes Per-token latency follows one equation from the paper: L = (Tdraft + Tverify) / τ. Here τ is the number of tokens accepted per cycle. Speedup comes from three levers only. You can draft faster, lowering Tdraft. You can draft better, raising τ. Or you can verify smarter, reducing wasted Tverify. DSpark pulls all three levers at once. How It Works: Semi-Autoregressive Generation Earlier drafters force a trade-off. Autoregressive drafters like Eagle3 condition each token on prior ones. That gives strong acceptance, but drafting cost grows with block size. Parallel drafters like DFlash produce the whole block in one pass. Drafting stays cheap, but each position ignores its neighbors. The result is ‘multi-modal collision’ and rapid acceptance decay along the suffix. DSpark splits drafting into two stages. A heavy parallel backbone, DFlash in their setup, produces base logits for every position. Then a lightweight sequential head adds a prefix-dependent bias before sampling each token. The default sequential head is a Markov head. It only looks at the immediately preceding token. A low-rank factorization (rank 256) keeps it cheap, even with large vocabularies. Once position one samples ‘of’, the head boosts ‘course’ and suppresses ‘problem’. An optional RNN head tracks the full block prefix. It adds only marginal gains, so the Markov head ships as the default. The payoff shows up position by position. DSpark inherits the parallel backbone’s high first-token accuracy. The sequential head then holds acceptance steady deep into the block. Training freezes the target model and reuses its embedding and output head. A total-variation loss is the key term. Minimizing that distance directly maximizes the draft’s acceptance rate. How It Works: Confidence-Scheduled Verification More draft tokens do not always mean more speed. Verifying tokens that will be rejected wastes batch capacity under heavy load. DSpark adds two parts to fix this. A confidence head outputs a score for each draft position. The score estimates the chance that token survives verification, given accepted predecessors. It is supervised by the analytical per-step acceptance rate. Raw neural confidence is usually overconfident. So the research team applies Sequential Temperature Scaling, a post-hoc calibration step. It cuts expected calibration error from 3–8% down to about 1%. A hardware-aware prefix scheduler then sets the verification length per request. It uses a profiled throughput curve, SPS(B), measured once at startup. When GPUs are idle, it verifies more tokens. When GPUs are busy, it verifies fewer. The scheduler uses an early-stopping rule to stay lossless. The appendix section gives a counterexample showing why a naive global search would leak information. Metrics Offline tests cover math, code, and daily chat. Targets include Qwen3-4B, 8B, 14B, and Gemma4-12B. DSpark beats both baselines on accepted length across every domain. Against Eagle3, macro-average accepted length rises 30.9%, 26.7%, and 30.0% on the three Qwen3 sizes. Against DFlash, gains are 16.3%, 18.4%, and 18.3%. A 2-layer DSpark even beats a 5-layer DFlash. The sequential head adds little cost. Scaling draft length from 4 to 16 adds only 0.2–1.3% per-round latency. In return, accepted length improves by up to 30%. Production results come from DeepSeek-V4-Flash and V4-Pro under live traffic. The baseline is MTP-1, the prior single-token setup. At matched throughput, per-user speed rises 60–85% on Flash and 57–78% on Pro. The shipped configuration is DSpark-5, a five-token draft block with the Markov head. Drafter Drafting style Block cost Suffix acceptance Verification length Eagle3 Autoregressive Grows with block size High, stable Fixed DFlash Parallel Near-constant Decays fast Fixed (full block) MTP-1 Single-token (MTP) Low — Static 2 tokens DSpark Parallel + sequential head Near-constant High, stable Dynamic, load-aware Use Cases With Examples Structured workloads gain the most from longer verification. In code generation, acceptance is naturally high. The scheduler can verify long prefixes with little waste, so coding agents stream output faster. Open-ended chat behaves differently. A confidence-threshold sweep raised chat acceptance from 45.7% to 95.7%. The confidence head flags uncertain suffix tokens so they can be pruned. Math reasoning sits between the two. Its acceptance rose from 76.9% to 92.5% in the same sweep. Long step-by-step traces benefit from steady deep-block acceptance. High-concurrency serving is the headline case. At moderate load, the scheduler runs roughly 4–6 verified tokens per request. As concurrency rises, it trims that budget to protect throughput. Try It DeepSpec runs in three stages: data preparation, training, then evaluation. A config selects the algorithm and target model. Evaluation benchmarks a trained draft checkpoint across nine datasets. Copy CodeCopiedUse a different Browser # Install dependencies python -m pip install -r requirements.txt # Train a DSpark draft against a Qwen3-4B target. # The algorithm and target are chosen by the config, e.g. # config/dspark/dspark_qwen3_4b.py bash scripts/train/train.sh # Evaluate the trained draft across the 9 benchmark datasets. # Set in

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1 Read Post »

AI, Committee, News, Uncategorized

Meet container: Apple’s Open-Source Swift Tool for Running Linux Containers as Lightweight VMs on Apple Silicon

Apple research team recently released the container project. It is an open-source command-line tool written in Swift. It creates and runs Linux containers as lightweight virtual machines on a Mac. The project ships under the Apache 2.0 license and targets Apple silicon. Containers are how you ship reproducible environments from a laptop to a datacenter. Apple now offers a native path that avoids a single always-on Linux VM. What is Apple’s container ? container is a CLI tool that can be used to build images, run containers, and move images to and from registries. It consumes and produces OCI-compatible container images. So you can pull from Docker Hub or GitHub Container Registry and run those images. You can also push images you build to any standard registry. container uses the open-source Containerization Swift package. That package handles low-level container, image, and process management. The tool requires a Mac with Apple silicon. Intel Macs are not supported. Apple supports container on macOS 26, which adds virtualization and networking enhancements. You can run it on macOS 15, but with networking limitations. How container Runs Your Containers Most macOS container tools run one shared Linux VM that hosts every container. Apple takes a different path. container runs a separate lightweight VM for each container you create. Apple describes three properties of this design: Security: Each container has the isolation of a full VM. A minimal set of core utilities and dynamic libraries reduces resource use and attack surface. Privacy: You mount only the data each VM needs, instead of sharing everything. Performance: These containers use less memory than full VMs. Boot times are comparable to containers in a shared VM. The runtime integrates several macOS frameworks. It uses the Virtualization framework for the VMs, and the vmnet framework for networking. It uses XPC for interprocess communication, launchd for service management, and Keychain services for registry credentials. The control plane has a few moving parts. container system start launches container-apiserver, a launch agent. The apiserver then starts an XPC helper container-core-images for image management and the local content store. It also starts container-network-vmnet for the virtual network. For each container, it launches container-runtime-linux, the per-container management helper. Interactive Explainer Use Cases With Examples Local backend development. Run a service in its own isolated VM, then forward a port to your loopback address. Copy CodeCopiedUse a different Browser container run -d –rm -p 127.0.0.1:8080:8000 node:latest npx http-server -a :: -p 8000 curl http://127.0.0.1:8080 Reproducible CI-style builds. container build starts a builder utility container that uses BuildKit. You can size the builder VM for heavy builds. Copy CodeCopiedUse a different Browser container builder start –cpus 8 –memory 32g container build –tag web-test:latest –file Dockerfile Cross-architecture images for datacenter deployment. Build one image for both Apple silicon and x86-64 servers. The amd64 variant runs under Rosetta translation. Copy CodeCopiedUse a different Browser container build –arch arm64 –arch amd64 –tag registry.example.com/fido/web-test:latest Mounting datasets for analysis. Share a host folder into the container with –volume. This is useful for feeding local data into a containerized job. Copy CodeCopiedUse a different Browser container run –volume ${HOME}/Desktop/assets:/content/assets docker.io/python:alpine ls -l /content/assets Isolating untrusted or generated code. Each container runs in its own VM, not a shared kernel. That boundary suits running code from an agent or an unknown image with less host exposure. Hands-On: Core Commands Default container resources are 1 GiB of RAM and 4 CPUs. You override them per run. Copy CodeCopiedUse a different Browser container run –rm –cpus 8 –memory 32g big Inspect live resource usage, similar to top for processes. Copy CodeCopiedUse a different Browser container stats –no-stream my-web-server Read virtual machine boot and init logs when debugging startup. Copy CodeCopiedUse a different Browser container logs –boot my-web-server On macOS 26, you can create isolated networks. Containers on different networks cannot reach each other. Copy CodeCopiedUse a different Browser container network create foo –subnet 192.168.100.0/24 container run -d –name web –network foo –rm web-test By default, containers start with a restricted set of Linux capabilities. You tune them explicitly. Copy CodeCopiedUse a different Browser container run –cap-drop ALL –cap-add SETUID –cap-add SETGID alpine id Version 1.0.0 also adds container machines. These are persistent Linux environments built from OCI images. Your home directory is mounted in, and the login user matches your Mac account. The filesystem survives stop and start. Any image containing /sbin/init qualifies as a container machine. Two other 1.0.0 changes affect upgraders. System settings moved to a TOML file at ~/.config/container/config.toml. The container system property get and set subcommands were removed. The tool also added structured JSON, YAML, and TOML output for list and inspect, easing automation. Apple container vs Docker Desktop Property Apple container Docker Desktop Isolation model One lightweight VM per container Shared Linux VM, shared kernel Idle footprint Near-zero when nothing runs Always-on background VM Image format OCI-compatible OCI-compatible Build engine BuildKit via builder VM BuildKit License Apache 2.0 Commercial terms for larger orgs Hardware Apple silicon only Apple silicon and Intel Compose / GUI Not built in Yes Best fit Single-container runs, native isolation Compose workflows, mature ecosystem Strengths and Limitations Strengths: Per-container VM isolation reduces shared attack surface versus a shared kernel. Idle memory cost is low, since stopped containers free their footprint. OCI compatibility means your images run elsewhere without conversion. The Apache 2.0 license carries no feature paywall. Limitations: The macOS Virtualization framework supports only partial memory ballooning. Pages freed inside a container are not always relinquished to the host. Heavy workloads may need occasional restarts to reduce memory use. There is no built-in Docker Compose. macOS 15 users face networking restrictions, and Intel Macs are unsupported. Check out the Repo here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Meet container: Apple’s Open-Source Swift

Meet container: Apple’s Open-Source Swift Tool for Running Linux Containers as Lightweight VMs on Apple Silicon Read Post »

AI, Committee, News, Uncategorized

Temporal Validity in Retrieval Memory: Eliminating Stale-Fact Errors for AI Agents over Evolving Knowledge

arXiv:2606.26511v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) gives agents access to accumulated knowledge, but has no model of time. When a fact changes (e.g., a function is renamed or API restructured), RAG retrieves both the stale and current value with near-identical embedding similarity. The agent then either abstains or serves the superseded fact. We show this is a structural problem: on a calibrated dataset, cosine similarity distinguishes a contradicted fact from a duplicated one with AUROC 0.59 (near chance), as contradictions are often more embedding-similar to the original than rephrased duplicates. We present MemStrata, a retrieval memory maintaining temporal validity. It stores facts like RAG, preserving static recall, but when a fact’s value is contradicted, a deterministic (subject, relation, object) supersession rule retires the stale value in a bi-temporal ledger – with no similarity threshold and no LLM call. Across six benchmarks run locally with a 7B model, MemStrata ties RAG on static knowledge and reaches 0.95-1.00 accuracy on evolving knowledge (where RAG reaches 0.20-0.47). The central result is the stale-fact-error rate: when required to answer, RAG serves superseded values 15-40% of the time; MemStrata drives this to ~0%, a failure class RAG cannot avoid. MemStrata achieves this at retrieval latency (~2.1s) versus ~16-18s for LLM-reranking baselines. We release the harness, datasets, and a marker-free evaluation protocol for memory under knowledge evolution.

Temporal Validity in Retrieval Memory: Eliminating Stale-Fact Errors for AI Agents over Evolving Knowledge Read Post »

AI, Committee, News, Uncategorized

Heat waves mess with your brain. Scientists are trying to figure out why.

It’s been hot in London this week. Really hot. A dangerous heat wave has hit Western Europe. Yesterday, the UK recorded its highest ever June temperature at 36.1 °C (about 97 °F). But as the weather app on my phone confirmed, it felt like 39 °C. It’s frightening that we are seeing such temperatures in the UK in June. According to the Met Office, the country’s national weather and climate service, June temperatures peaked at an average 19 °C (66 °F) in England between 1991 and 2020. Across Europe, the heat wave is likely to cause thousands of deaths. There will be other awful consequences for agriculture, infrastructure, and the health system. But this week I want to look at what the heat does to our minds and brains. Personally, I’ve found it almost impossible to think straight. The heat is distracting and my mind is foggy. I dread to think about the conditions of people who work outdoors, in even hotter regions. It’s not just exhaustion and confusion. The effects of heat on the brain can be deadly. And researchers are still trying to figure out why. Studies have confirmed that as temperatures rise, people seem to get more irritable and more violent. Most of these studies are based on associations, though. It’s difficult to directly study how a heat wave might affect our thinking, says Catherine Thompson, a cognitive psychologist at Liverpool Hope University.  She has been studying the effects of extreme heat on firefighters instead. It’s easier to measure people’s cognitive skills before and after they undergo scheduled training that involves entering a burning building.   It’s early days, but the team found that firefighters found it harder to focus and control their attention immediately after heat exposure—something people in heat waves can empathize with, I’m sure.  The firefighters’ skills returned to normal after 20 minutes or so of cooling down. But they’d experienced just 15 minutes of intense heat exposure. Thompson doesn’t know what the effects of living through a days-long heat wave might be—or how long they’ll last. Figuring that out might involve shipping cognitive test kits to thousands of people during the few days’ notice of an impending heat wave. “My guess [is] that no one’s done it because it’s just so difficult to do,” says Thompson.  Still, researchers can learn about some of the impacts of heat waves through studies after the fact. And those studies suggest that the heat seems to have more disastrous outcomes for people with mental-health disorders.  Those outcomes become apparent when temperatures rise above what is considered typical for a given region. “There seems to be a correlation where the hotter it gets, especially during the hottest times of the year, the worse the mental-health outcomes,” says Joshua Wortzel, who directs the Heat-Mind Lab at Hartford HealthCare in Connecticut. In a study published in 2023, Emma Lawrence at the University of Oxford, who studies the effect of climate change on mental health, and her colleagues reviewed the evidence linking mental-health outcomes to ambient outdoor temperatures. They found that during heat waves, there was a 9.7% increase in the rate of hospital admissions for people with such conditions.  “People who live with mental-health conditions are among the most susceptible to the physical impacts of heat,” says Lawrence. People with schizophrenia were found to have been three times more likely to die during the record-breaking heat wave that affected Canada in 2021, for example. In order to protect people, we need a better understanding of the mechanisms underlying these effects. After all, a lot of things change when it’s very, very hot. Some people may end up stuck indoors, avoiding outdoor play and exercise, and it can be difficult to get a good night of sleep, for example. Sleep, socializing, and exercise are all really important for our mental health.  But whether unusual heat does something specific to our brains is, as Wortzel puts it, “the million-dollar question.” Research in lab animals suggests that excessive heat can alter the way chemical signals work in our brain. The levels of neurotransmitters like serotonin, for example, seem to increase when rats and mice are exposed to high temperatures, according to multiple studies. The heat may also interfere with the way networks in our brains communicate with each other. It might affect the way oxygen reaches our brain cells. “There are so many biological reasons why brains may be negatively affected by heat,” says Wortzel. Emerging research suggests that for whatever reason, children and young people are among the most vulnerable. In research published earlier this week, Wortzel and his colleagues saw a 2.97% increase in the suicide rate among people in the US aged 15 to 24 for every 1 °C increase in average monthly temperature. That’s more than double the increase seen in people over the age of 24 (which is concerning in its own right). Other work hints that heat exposure might have long-term consequences for children’s brain development. Babies who were exposed to either extreme heat or cold appeared to have altered white matter by the time they were nine to 12 years old—although it’s not clear how these impacts might affect an individual child. “It seems that extreme temperature exposure for very young children may affect their brain development,” says Lawrence, who spoke to me from Oxford. She was meant to be in London for Climate Action Week, but her event, which focused on extreme heat, ended up being canceled … owing to the extreme heat. We are living through the effects of climate change. And that brings a new urgency to the question of how heat affects our brains. Children born in 2020 are predicted to experience around seven times the number of heat waves their grandparents did, says Lawrance. “[We] need to be serious about adapting to a warming world.” This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up

Heat waves mess with your brain. Scientists are trying to figure out why. Read Post »

AI, Committee, News, Uncategorized

The Download: brain-melting heatwaves and unprecedented OpenAI restrictions

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Heat waves mess with your brain. Scientists are trying to figure out why. —Jessica Hamzelou It’s been hot in London this week. Really hot. A dangerous heat wave has hit Western Europe. On Wednesday, the UK recorded its highest ever June temperature at 36.1 °C (about 97 °F). But as the weather app on my phone confirmed, it felt like 39 °C. Much of Western Europe is suffering, bringing awful consequences for agriculture, infrastructure, and the health system. But heat can also affect the brain. Studies have confirmed that as temperatures rise, people seem to get more irritable and more violent. And they have shown that firefighters find it harder to focus immediately after heat exposure. Rising temperatures can also have particularly disastrous outcomes for children and people with mental health disorders. Research on lab animals suggests that excessive heat can alter the function of chemical signals in our brains. But we still need a better understanding of the mechanisms behind these effects. Here’s what scientists are learning about extreme heat’s impact on the brain. This story is from The Checkup, our weekly biotech newsletter. Sign up to receive it in your inbox every Thursday. For more on Europe’s heat wave, read our stories on why soaring temperatures are shutting down power plants and what they mean for the grid. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The Trump administration has asked OpenAI to limit its next model releaseIt wants to vet the first GPT 5.6 users before a wider launch. (Bloomberg $)+ OpenAI said each of the initial partners will be government-approved. (FT $)+ It’s the first US firm to be told to restrict an AI model before release. (Axios)+ Anthropic is also still feuding with Washington. (MIT Technology Review) 2 Apple and Xbox have hiked prices, blaming AI-driven chip costsSome MacBooks, iPads, and Xboxes are going up in price by over 20%. (BBC)+ Apple’s shares plummeted after the announcement. (NBC)+ AI data center demand has pushed up memory and storage prices. (WSJ $)+ The shortages have been dubbed “RAMaggedon.” (The Verge) 3 Colossal and the US are building an endangered species “biovault”It aims to cryptopreserve over 2,300 plant and animal samples. (Wired $)+ It comes amid growing threats to endangered species protections. (NYT $)+ Colossal is also growing chickens in artificial eggshells. (MIT Technology Review) 4 The US has banned Polestar from selling its EVs due to anti-China rulesThe Sweden-based company is majority-owned by China’s Geely. (CNN)+ The ban is because its connected-vehicle tech is linked to China. (Reuters $)+ What happened to China’s overseas EV factory boom? (Rest of World) 5 China is betting on humanoids to beat its demographic declineIt wants the robots to narrow the labour gap. (FT $)+ Gig workers are training humanoids at home. (MIT Technology Review) 6 The “fingerprints” of a black hole’s event horizon have been detectedThe discovery was made by studying ripples in space-time. (AFP) 7 OpenAI is now expected to delay its IPO until next yearIt’s been spooked by choppy global markets and SpaceX’s slump. (NYT $) 8 Data centers have moved to the forefront of environmental lawsuits The litigation is linked to energy sources, water consumption, and air pollution. (Guardian) 9 A master gene that turns on human development has been uncoveredIt results in cells forming a human body. (New Scientist $) 10 Grok’s most popular feature? SmutIt accounts for “well over half” of the chatbot’s traffic. (The Information $) Quote of the day “The most advanced AI is built by a handful of American companies, on American soil, under American law, and what the rest of us are permitted to do with it can change on a Friday afternoon.” —Nathan Benaich, AI investor at London-based venture firm Air Street Capital, tells the Financial Times about the geopolitical reality of US AI dominance. One More Thing MAX-O-MATIC How technology helped archaeologists dig deeper In 1991, construction workers in Manhattan unearthed hundreds of coffins. Further investigation revealed that the remains were between 200 and 300 years old, and they were all African and African American. This discovery came at an inflection point in scientific history. Breakthroughs in chemical and genetic analysis allowed researchers to figure out where many of these people were born, the physical challenges they faced, and even the routes they took from Africa to North America. Today, archaeologists are using techniques they could only dream of then: lasers, 3D photography, lidar, satellite imagery, and more. These tools are revealing where people came from, how ancient cities were built, and the lives of those who built them. Read the full story on how archaeology is changing our understanding of the past. —Annalee Newitz We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Tantalise your taste buds with this culinary tour of the planet’s rarest fruits.+ This Daft Punk and Justice mashup is the French EDM collab that fans never got.+ Daredevils have delightfully transformed playground equipment into a series of terrifying oversized rides.+ The gadget department of your childhood dreams comes to life in this rocket-powered pen disguised as a spy weapon. Top image credit: Sarah Rogers/MITTR | Photos Getty Please send your childhood dreams to hi@technologyreview.com.  You can follow me on LinkedIn. Thanks for reading! —Thomas

The Download: brain-melting heatwaves and unprecedented OpenAI restrictions Read Post »

AI, Committee, News, Uncategorized

IBM has unveiled chip technology that could help extend Moore’s Law another decade

IBM has built a new prototype chip with around 100 billion transistors on an area the size of a fingernail, which is twice the density of the company’s previous state-of-the-art technology announced in 2021. The design could pave the way for faster and more energy efficient computers for years to come. For more than half a century, chipmakers have been able to make ever more powerful computers by following the key principle of Moore’s Law: Cram more transistors onto the chip. To do this, they shrank transistors—the tiny switches that perform computations—to incrementally smaller sizes. But in the last 15 years, transistors have gotten close to the point where quantum mechanics starts to interfere with their function: just a few dozen nanometers in size. They can’t get smaller. So to fit more transistors on a chip, engineers across the industry are eyeing a pivot to an approach familiar to urban planners: build up. On Thursday, IBM announced it has created a chip that uses this strategy. The new architecture, known as a nanostack, vertically stacks transistors in two layers on a silicon chip. “It’s not just an incremental step,” Jay Gambetta, the director of IBM Research, said during a press conference on Tuesday. “It’s a meaningful leap forward.” Within a decade, Gambetta expects, chips with nanostacking will be widely used in data centers, where their improved efficiency could help the facilities better manage their energy consumption. “Absolutely, it’s transformational,” says Dan Hutcheson, vice chair of TechInsights, a technology analysis company. “This puts another 10, 15 years on the roadmap.”  Compared with IBM’s previous state-of-the-art architecture, the company reports, chips built with this new approach can do as much as 50% more work in the same amount of time and be up to 70% more energy efficient.  The architecture offers a general way of laying out transistors, and IBM will partner with semiconductor manufacturers to make the actual chips. It anticipates that chip designers will deploy the design in many different types of chips, including GPUs and CPUs. “I expect to have many conversations with designers about how they can use this technology,” Huiming Bu, IBM’s vice president of global semiconductor R&D, said in the press conference announcing the new design.  A layer cake Engineers created IBM’s new chip layer by layer, like a cake. They start by fabricating transistors on one layer of silicon. Then they place a silicon layer on top of these devices, and they fabricate another layer of transistors directly on top of that. Finally, they create the electrical connections between the two layers of transistors. This kind of vertical stack, which combines two types of transistors, is known as a complementary field-effect transistor, or CFET, explains Qing Cao, a professor of materials science and engineering at the University of Illinois at Urbana-Champaign, who was not involved with the work.  The company isn’t the only one pursuing this general approach. The biggest chip manufacturers—Intel, Samsung, and TSMC—and the competing research lab Imec in Belgium have been investigating CFETs. IBM says its design is distinguished by the fact that the transistors in the second layer do not sit directly on top of the first layer’s transistors; rather, they are staggered, which the company says simplifies wiring, among other advantages.  CFETs like those in IBM’s nanostack architecture contrast with another common approach to making two-tiered chips, such as AMD’s 3D V-Cache and Huawei’s forthcoming LogicFolding technology, Cao says. In those approaches, engineers fabricate the transistors on each layer of the chip independently before bonding the two together. IBM’s new method allows for more precise alignment of the layers, which is important for performance because transistors are so tiny, says Cao.  Nanostacking builds on an approach called nanosheet technology, which has been used to make current state-of-the-art transistors since around 2022. A transistor is essentially a hose through which electrons flow, with a valve that can turn the flow on or off. Inside the transistor, electrons move through a patch of the silicon called a channel. In IBM’s nanostack approach, the channel consists of three nanosheets that are each 15 atoms thick, spaced nine nanometers apart.  Every chip generation gets a name. IBM refers to its nanostack technology as “sub-nanometer” or “0.7 nanometer,” following a longtime industry convention where each generation is named for a smaller and smaller length. But “0.7 nanometer” is a marketing term and does not correspond to any physical characteristics of the chip. The distance between transistors “has been staying at about 40 nanometers for quite a long period of time,” says Cao.  Putting it into production Looking ahead, chipmakers can try increasing transistor density by building on more tiers, as Bu suggested in the press conference. However, they will face practical challenges, according to Cao. Manufacturing introduces errors, which means a certain number of chips are faulty upon creation. “Here you’re building another layer on top, so if either top layer or bottom layer fail, your entire chip is going to fail,” says Cao. The resulting failure rate will be higher than for single-layer chips, and that will be costly. Another central challenge is what Cao calls “the thermal budget.” Essentially, it means that engineers need to figure out how to build each layer without melting the connections to the one underneath. This means keeping manufacturing processes below 400 °C. IBM figured out how to make the second stack at low enough temperature, although the company is mum about its methods.  Academics are also on the case. Cao’s group, for example, has created a method for stacking transistors layer by layer where the second layer is created with processes below 200 °C. They manage this by using a type of transistor known as the junctionless transistor, which can be created without a typically required step called doping—a process that injects non-silicon atoms into silicon to tune the material’s properties. Doping is usually the hottest part of fabricating transistors. Cao thinks from a thermal management perspective, his approach could be easier to scale up to

IBM has unveiled chip technology that could help extend Moore’s Law another decade Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US