YouZum

AI

AI, Committee, ข่าว, Uncategorized

Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing

Perplexity AI announced what it calls the first hybrid local-server inference orchestrator at Computex 2026. The system is designed to automatically route AI tasks between a user’s local device and cloud-based frontier models without requiring the user to decide in advance. The feature is expected come to Perplexity Computer in July 2026. What is Hybrid Agentic Inference? To understand what Perplexity built, it helps to understand the three-way tension that AI systems face. Accuracy demands the most capable models, which are expensive to run. Privacy demands that some data never leave the device. Cost and energy efficiency demand that you don’t spend a frontier model’s compute on tasks a smaller model can handle. That routing layer is what Perplexity calls hybrid agentic inference. A compact AI model runs locally on the user’s device. This local model evaluates each incoming task or subtask. It determines whether the task involves sensitive data, whether it requires heavy computation, or whether it can be handled entirely on-device. Based on that evaluation, work is either kept local or sent to a frontier model in the cloud. Perplexity describes this local model as deciding “when sensitive data should also be kept locally.” The system is designed to ask for user permission before sending sensitive tasks to the cloud. That design addresses a specific concern enterprises have about agentic AI: data governance — knowing where data goes and who controls that decision. Examples of data the system is intended to keep local include financial records, health information, and personal files. Work that requires a frontier model’s full capability runs on the server. Most real tasks are a mix, so the system splits them and coordinates the parts. How It Fits into Perplexity Computer Perplexity Computer is the company’s cloud-based multi-model agentic product, launched in February 2026. It originally ran entirely in the cloud on the Perplexity Max subscription tier ($200/month). Personal Computer is a separate, related product that brought Computer’s capabilities onto the local device — with access to local files, native Mac apps, the web, and Perplexity’s secure servers. Personal Computer launched on Mac in April 2026. Windows support is planned; a waitlist is open. The new hybrid local-server inference orchestrator is the next step for Personal Computer. Previously, even within Personal Computer, the division was relatively fixed: local file access happened on-device, heavy computation ran on Perplexity’s servers. The orchestrator changes that. The system now reasons about where each piece of a task should execute — not just which model to use, but which physical location should process it. Perplexity Computer coordinates up to 20 AI models in a single workflow. The system is one that creates a team of agents and orchestrates across models, tools and files in one single system. The hybrid orchestrator extends that orchestration to compute location itself. Key Takeaways Perplexity AI announced the first hybrid local-server inference orchestrator at Computex 2026, routing AI tasks automatically between on-device and cloud models. A compact local model acts as the router — classifying each subtask by data sensitivity and compute requirements before dispatching it. Sensitive data (financial records, health files) stays on-device; compute-heavy tasks go to frontier cloud models — no manual configuration required. The orchestration framework is model-agnostic and chip-agnostic, confirmed to run on Intel Core Ultra Series 3 and NVIDIA RTX Spark hardware. The feature arrives in Perplexity Computer in July 2026, initially on Windows; Personal Computer is already available on Mac with a Windows waitlist open. Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing appeared first on MarkTechPost.

Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing Read Post »

AI, Committee, ข่าว, Uncategorized

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes

In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. Cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. ‘Cold start’ means the full sequence a model server must complete before serving any request: pulling the container image, loading model weights into GPU memory, warming up CUDA kernels, compiling or capturing CUDA graphs, and registering with the service discovery layer. This delay increases the risk of SLA violations during traffic spikes, as the system cannot scale quickly enough to absorb sudden increases in demand. The cold-start latency for a single-GPU vLLM (v0.20.0) workload breaks into three segments: container/image pull, engine initialization (weight loading, kernel warmup, graph compilation), and distributed runtime startup. To address this, NVIDIA’s AI research team has introduced NVIDIA Dynamo Snapshot: a checkpoint/restore approach for AI inference workloads on Kubernetes. https://developer.nvidia.com/blog/nvidia-dynamo-snapshot-fast-startup-for-inference-workloads-on-kubernetes/?linkId=100000423964029 What is CRIU and cuda-checkpoint? A running inference worker’s checkpointable state has two components. Device state (GPU-side) includes CUDA contexts, streams, device memory, and virtual address mappings — this is not visible to the host. To serialize it, cuda-checkpoint uses the checkpointing capability of the CUDA driver to dump the device state to CPU memory of the process owning each CUDA context. Host state (CPU-side) includes CPU memory, threads, file descriptors, and namespaces. CRIU (Checkpoint/Restore in Userspace) walks the Linux kernel’s bookkeeping and serializes the process tree’s state to disk. When checkpointing, the two tools run in order: cuda-checkpoint dumps all device state into CPU memory first, then CRIU dumps all host-side process tree state to a folder in storage. When restoring on the same or a different node: CRIU restores the process tree from distributed storage such as NFS or SMB first, then cuda-checkpoint restores the GPU state from what is now in CPU memory onto the new GPUs. CRIU is fundamentally a freeze-and-thaw mechanism. When a process is restored, execution resumes at the exact instruction where it was checkpointed, completely unaware that checkpointing or restoration occurred. Because of this, any coordination required before checkpointing such as quiescing the workload or after restoration such as re-establishing external state — must be handled externally through an orchestrator or workload-specific hooks. How Dynamo Snapshot Works on Kubernetes In Kubernetes, workloads run inside containers inside pods. Because CRIU checkpoints contain references to the container’s writable filesystem layer, checkpointing is done at the container level so the process tree state and filesystem travel together. NVIDIA provides a privileged DaemonSet, snapshot-agent, installable through a Helm chart. An agent runs on every node and handles checkpoint and restore for runc-managed containers without requiring modifications to runc itself. On checkpoint, the agent waits for the workload’s readiness probe, invokes cuda-checkpoint and CRIU from the host side, and writes the artifact to shared storage. The workload may have created or deleted files local to the container (the overlay filesystem), which the agent also checkpoints after the CRIU stage. On restore, the agent launches a lightweight placeholder pod, restores the overlay filesystem, and restores the CRIU/CUDA checkpoint into its namespaces. Each agent operates independently on its local node, allowing checkpoints and restores to parallelize naturally across the cluster. This DaemonSet approach was chosen over Kubernetes native checkpoint/restore support in runc for three reasons: it is fully portable without depending on cloud-provider feature gates, it gives tighter control over CRIU for performance tuning, and it allows checkpoint artifacts to live in flexible storage backends rather than being embedded into OCI images. Quiesce/resume hooks: A Dynamo inference worker initializes in two ordered phases. First, engine initialization: communicators are initialized, weights are loaded, kernels are warmed up, and CUDA graphs are compiled. The worker is fully warm at this point but not yet discoverable outside its pod. Second, distributed runtime startup: the worker connects to the Dynamo control plane and registers with the discovery backend. Open TCP connections to the control plane exist from this point onward. If checkpoint were taken after distributed runtime startup, there would be active TCP connections that CRIU cannot capture. The solution is quiesce/resume hooks: the worker writes a ‘ready for checkpoint’ signal file after engine initialization but before distributed runtime startup. The worker then enters a polling loop waiting for a ‘restore complete’ signal file while the snapshot agent checkpoints it externally. Because CRIU restores execution at the exact instruction where checkpointing occurred, the worker resumes directly inside the polling loop, detects the signal file, and proceeds with distributed runtime initialization without requiring additional synchronization. The quiesce/resume pattern is also important for multi-GPU and multi-node checkpoints (planned for a future release): outbound TCP connections used for RPC cannot be checkpointed in an established state because the pod IP changes between checkpoint and restore, and RDMA registrations and NIC state need to be recreated post-restore. Optimization 1: KV Cache Unmap and Release After measuring peak GPU memory usage while weights, CUDA graphs, and other buffers are allocated, inference engines allocate the remaining GPU memory as a large KV cache buffer. Since the checkpoint is taken before the replica has served any requests, this KV cache buffer does not need to be checkpointed at all. However, its virtual address must remain stable because it is baked into the CUDA graph. The solution is to allocate the KV cache via the CUDA Virtual Memory Management API (cuMemCreate and cuMemMap), then free the underlying physical allocation with cuMemUnmap and cuMemRelease — but not cuMemAddressFree. This keeps the virtual address range intact while releasing the physical memory. This functionality is natively available in vLLM via sleep() and wake_up() and in SGLang via torch_memory_saver. For Qwen3-0.6B on a B200, this reduces the total artifact size from ~190 GiB to ~6 GiB. The wins are most pronounced for large KV cache sizes — that is, smaller model weights relative to GPU size. Optimization 2: Speeding Up CRIU Memory Restore Even after the artifact is smaller, upstream CRIU restore time remains a bottleneck. For larger models, restore time actually exceeds

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes Read Post »

AI, Committee, ข่าว, Uncategorized

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The Meta hack shows there’s more to AI security than Mythos On Monday, reports emerged that attackers had used Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: they asked the agent to link the accounts to email addresses they controlled, and it complied. Since Anthropic announced that its Mythos model was too good at hacking for a general release, cybersecurity concerns have focused on the risk of superpowered AI systems overwhelming computer infrastructure. But the Instagram hack shows that far simpler exploits can still cause damage. As companies offload more work to AI, these comparatively unsophisticated attacks are becoming harder to ignore. Read the full story to understand why. —Grace Huckins Are AI chatbots making us lose control of our brains? Gloria Mark, a psychologist at the University of California, Irvine, fears that digital technologies are weakening our cognitive abilities. Her research suggests attention spans have fallen sharply over time, leading to higher stress and lower performance. She now believes AI tools like ChatGPT and Claude may accelerate this shift. “You’re deferring your cognitive work to AI,” she said. “And it’s not good for us.” Mark argues this could weaken critical thinking and emotional intelligence. Luckily, she thinks we can course-correct by changing our relationship with these technologies. Find out how AI could reshape attention and thinking. —Jessica Hamzelou This story is from The Checkup, our weekly newsletter giving you the inside track on all things biotech. Sign up to receive it in your inbox every Thursday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Anthropic has called for a global slowdown in AI developmentIt flagged the risk of models “self-improving.” (WSJ $)+ And wants a coordinated plan to stop them. (Reuters $)+ Skeptics note that the timing is awfully convenient. (The Register) 2 In a first, scientists have precisely edited human embryo genesThey relied on a newer gene-editing technique. (NYT $)+ Genetically-modified babies could be on their way. (Guardian)+ Companies have big plans for the technology. (MIT Technology Review) 3 US officials have discussed taking financial stakes in the AI firmsThey’ve held talks about the government acquiring shares. (Reuters $)+ Sam Altman pitched the idea to the White House last year. (WSJ $) 4 Bot web traffic has overtaken human web trafficCloudflare said 57.4% of traffic now comes from bots. (NBC News)+ Its CEO expected the milestone at the end of 2027. (CNET) 5 The White House plans to bring AI doctors into American medicineIt wants chatbots to diagnose illness and prescribe medicine. (WSJ $)+ But we don’t even know if healthcare AI actually helps patients. (MIT Technology Review) 6 Meta quietly added facial recognition code for smart glasses to its appThe exploratory feature would identify people via biometric data. (Wired $)+ Smart glasses are also entering warfare. (MIT Technology Review) 7 South Korea’s labour minister wants tech firms to share AI profitsKim Young wants staff and suppliers to get a share. (Reuters $)+ He helped avert a huge strike over AI profit-sharing at Samsung. (NYT $) 8 Canada’s highly-anticipated AI strategy has launchedIt promises over $2 billion in funding and aims to create 250,000 jobs. (BBC)+ AI could strengthen democracy. (MIT Technology Review) 9 Investment in agricultural tech is boomingThat’s good news at a time when we’re facing unprecedented levels of food market volatility. (The Economist $) 10 Bumblebees can use tools to solve problems, new research showsNot just busy—they’re clever too! (Guardian)  Quote of the day “Welp, that happened faster than I predicted.”  —Matthew Prince, co-founder and CEO of Cloudflare, one of the largest internet hosting services, reacts on X to reports that bots have overtaken humans in driving web traffic. One More Thing CHRISTOPHER PAYNE Inside the machine that saved Moore’s Law In a Connecticut clean room, the Dutch company ASML is developing the world’s most advanced machine for extreme ultraviolet (EUV) lithography, a crucial process for manufacturing microchips. The system has become vital to Moore’s Law—the observation that the number of transistors on a chip roughly doubles every two years as components shrink, driving gains in performance and efficiency. “Without this machine, it’s gone,” says Wayne Lam, a director of research at CCS Insight. “You can’t really make any leading-edge processors without EUV.” Discover how ASML’s EUV technology saved Moore’s Law. —Clive Thompson We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Tech bosses love Tolkien. Here’s what the writer might think of them.+ Rare footage captures an underwater volcano erupting beneath the Pacific Ocean.+ Watch a tiny rescued cub grow into adulthood in this heartwarming tiger compilation.+ This medieval version of “Take On Me” is like stepping into a tavern of synth-pop bards.

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains Read Post »

AI, Committee, ข่าว, Uncategorized

Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning

Researchers at Stanford University and Lambda Labs, have published the research paper for OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device. The open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier Intelligence Per Watt study, which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025. Model Overview & Access OpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families. Property Value License Apache 2.0 Framework release March 12, 2026 Paper arXiv:2605.17172 (posted May 16, 2026) Repository github.com/open-jarvis/OpenJarvis Stars / forks ~5.4k / ~1.2k (June 2026) Languages Python (~83%), Rust (~9%), TypeScript (~7%) Evaluated models 11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, Granite Cloud baselines Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro Supported engines Ollama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others) Context window Model-dependent Installation Single command; ~3 minutes on broadband Hardware Tested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark Architecture: Five Primitives and a Spec OpenJarvis decomposes a personal AI system into five typed primitives, composed through a single declarative configuration object called a spec. Intelligence — the model, weights, generation parameters, and quantization format. Engine — the inference runtime (Ollama, vLLM, SGLang, etc.), batching, KV-cache settings, and hardware path. Agents — the reasoning loop (ReAct or CodeAct), system prompts, tool-use policy, and turn limits. Tools & Memory — external interfaces, retrieval backends, 25+ data connectors, and 32+ messaging channels, with native MCP support and interchangeable memory backends. Learning — the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search. Each primitive is independently swappable, and a spec serializes all five into a TOML file. Two specs can share the same agent and tool configuration and differ only in model and engine, so the same behavior runs on a Mac Mini and a workstation without rewriting prompts. LLM-guided spec search is the second contribution. It is a local–cloud collaboration: a frontier cloud model acts as a teacher at search time, reading traces, diagnosing failure clusters, and proposing edits across Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted only if it improves the target failure cluster without causing meaningful regressions elsewhere — the research team calls this the gate (default tolerance 1%). The optimized spec then runs entirely on-device at inference time, with zero cloud calls. The teacher is used only at search time; at 100 queries per day, the amortized teacher cost falls below $0.001 per query within six months. Prior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and prompt optimizers alone recover only about 5 pp of the cloud–local gap. LLM-guided spec search recovers 13–32 pp because it edits across primitives jointly, at 7–11× lower optimization cost than single-primitive baselines. The four-primitive move space contributes 5.5–16.5 pp, and the LLM proposer adds about 10 pp on average over an evolutionary search at the same move space. https://arxiv.org/pdf/2605.17172v1 Capabilities & Performance OpenJarvis was evaluated across 8 benchmarks spanning 508 tasks: tool calling (ToolCall-15), agentic workflows (PinchBench), coding (LiveCodeBench), customer service (τ-Bench V2, τ²-Bench Telecom), general assistance (GAIA), and deep research (LiveResearchBench, DeepResearchBench). The swap test: Replacing the intended cloud model with Qwen3.5-9B in existing frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the same model under an OpenJarvis spec, the residual drop shrinks to 5.6–16.5 pp — recovering 56–77% of the portability loss. The accuracy frontier: The best single local model, Qwen3.5-122B, reaches 80.3% average accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp gap. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, LiveCodeBench, and τ-Bench V2. Cost and latency: Local configurations form the accuracy–efficiency frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per query, versus $0.009 per query for Claude Opus 4.6 — an approximately 800× marginal API-cost advantage. End-to-end latency drops by roughly 4× on the agentic workloads, though the paper notes single-shot prompts can favor cloud serving. Search gains: LLM-guided spec search improves the Qwen3.5-9B student to 100% on PinchBench, 83% on LiveCodeBench, and 91% on LiveResearchBench. Across the full eight-benchmark suite, average gains per student model range from 13.1 to 31.5 pp. The authors report that these gains survive their robustness checks (reward-weight variants, search-seed variance, and random restarts). How to Use it Installation is one command. On macOS, Linux, or WSL2: Copy CodeCopiedUse a different Browser curl -fsSL https://open-jarvis.github.io/OpenJarvis/install.sh | bash Windows users run an equivalent PowerShell script (irm … | iex). The installer provisions uv, a Python virtual environment, Ollama, and a starter model in about three minutes on broadband. A desktop GUI ships as a .dmg, .exe, .deb, .rpm, or .AppImage from the releases page. After install, jarvis starts a chat session. Starter presets cover common workflows: Copy CodeCopiedUse a different Browser jarvis init –preset morning-digest-mac # daily briefing with TTS jarvis init –preset deep-research # multi-hop research with citations jarvis init –preset code-assistant # agent with code execution and shell access jarvis init –preset scheduled-monitor # stateful agent on a schedule The framework ships with eight built-in agents across three execution modes — on-demand, scheduled, and continuous. It connects to 25+ data sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes agents over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others). Skills can be imported from external catalogs — about 150 from Hermes Agent and about 13,700 community skills from OpenClaw — all following the agentskills.io specification. A jarvis optimize skills –policy dspy command refines them from local trace history. Marktechpost’s Visual Explainer OpenJarvis ·

Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning Read Post »

AI, Committee, ข่าว, Uncategorized

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

Miso Labs has released MisoTTS, an open-weights 8-billion-parameter text-to-speech model. It generates expressive speech from both text and audio context. The model uses residual vector quantization (RVQ) to widen its sonic range. This avoids scaling a single flat vocabulary while keeping parameter count fixed. What is MisoTTS MisoTTS is an 8B-parameter text-to-dialogue RVQ Transformer. It is inspired by the Sesame CSM architecture. It pairs a Llama 3.2-style backbone with a smaller audio decoder. It generates Mimi audio codes from text and optional audio context. The model conditions on both text and prior audio. That second input lets it respond to the speaker’s tone. The text vocabulary is 128,256 tokens, and there are 32 audio codebooks. Mimi is the audio tokenizer, and max sequence length is 2,048. Default inference runs in torch.bfloat16. Miso Labs claims 110ms latency. It lists ElevenLabs at 700ms and Sesame at 300ms. The Vocabulary Size Problem Standard transformers generate from a fixed vocabulary of discrete tokens. That works when a small vocabulary covers the target space. Human speech does not fit that assumption. It varies across pitch, rhythm, emphasis, emotion, and accent. Expanding the audio vocabulary is the obvious fix. But larger vocabularies need more parameters in a standard transformer. Each token must be represented and predicted by the model. Miso Labs calls this the vocabulary size problem. The second issue is conditioning. Most TTS models condition only on text. They ignore the interlocutor’s tone. Miso Labs argues this contributes to the “uncanny valley” effect. Residual Vector Quantization: The Core Idea MisoTTS addresses both problems with residual vector quantization (RVQ). Miso Labs traces RVQ to image-generation research and to Sesame’s CSM for audio. Instead of one token index, the model emits a vector of indices. Each audio token is 32 codebook indices over 2048-way codebooks. The model keeps a separate codebook for each position in the vector. To recover the sound, it sums the looked-up vectors. Each codebook adds another refinement to the signal. This is what makes the scaling work. Addressable vocabulary equals codebook size raised to the depth. Growing the depth adds no parameters to the model. So MisoTTS reaches about 204832, or roughly 10105 addressable tokens. Miso Labs notes naive scaling would require a far larger network. https://www.misolabs.ai/blog/miso-tts-8b The Two-Transformer Architecture The model splits into a backbone and a decoder. The backbone is a 7.7B-parameter transformer, autoregressive over time. It predicts the first codebook index and a final hidden state. A 300M-parameter decoder then runs autoregressively over depth. It predicts the remaining codebook indices, one position at a time. Each prediction conditions on the indices already chosen in the frame. The same 300M parameters are reused for every position. Embeddings follow the same logic. Text tokens use a single lookup. An audio token’s embedding is the sum of per-position codebook lookups. Interleaving text and audio lets the backbone use conversation history. That is how it carries context across turns. Strengths and Challenges Strengths: Open weights on day one, under a modified MIT license. RVQ scales the sonic range without scaling parameter count. Conditions on audio context, not text alone. Local deployment keeps sensitive audio data in-house. The architecture and math are documented in a public blog post. Challenges: Half-duplex only, with no turn-taking yet. The large model needs a capable CUDA GPU. API access is announced but not yet available. Latency and quality claims still need third-party testing. Marktechpost’s Visual Explainer Marktechpost · Model Brief 01 / 09 Open-Weights Release · June 3, 2026 MisoTTS An 8B emotive text-to-speech model from Miso Labs, built on residual vector quantization and conditioned on both text and audio. 8B params RVQ Transformer Mimi codes modified MIT What MisoTTS Is A text-to-dialogue RVQ Transformer An 8B-parameter model inspired by the Sesame CSM architecture. Pairs a Llama 3.2-style backbone with a smaller audio decoder. Generates Mimi audio codes from text and optional audio context. Conditions on prior audio, so output responds to speaker tone. At a Glance Published specifications Parameters 8B (7.7B + 300M) Architecture RVQ Transformer Audio codebooks 32 (2048-way) Audio tokenizer Mimi Text vocabulary 128,256 Max sequence length 2,048 Default precision torch.bfloat16 License modified MIT The Motivation The vocabulary size problem Transformers generate from a fixed vocabulary of discrete tokens. Speech varies in pitch, rhythm, emphasis, emotion, and accent. A bigger audio vocabulary needs more parameters in a standard transformer. Most TTS condition only on text, ignoring tone — the “uncanny valley” effect. The Core Idea Residual vector quantization The model emits a vector of indices, not a single token index. Each token is 32 codebook indices over 2048-way codebooks. Summing the looked-up vectors reconstructs the sound. Depth scales addressable vocabulary to ~204832 (≈10105) with no added parameters. Architecture Two transformers, one vector token Backbone (7.7B) — autoregressive over time; predicts codebook index k₁ and hidden state h₀. Decoder (300M) — autoregressive over depth; predicts k₂ through k₃₂. The same 300M parameters are reused for every position. Interleaved text and audio let the backbone use conversation history. Run It Locally Inference in a few lines from generator import load_miso_8b import torchaudio gen = load_miso_8b(device=”cuda”, model_path_or_repo_id=”MisoLabs/MisoTTS”) audio = gen.generate( text=”Hello from Miso.”, speaker=0, context=[], max_audio_length_ms=10_000) torchaudio.save(“miso.wav”, audio.unsqueeze(0).cpu(), gen.sample_rate) Setup uses uv with Python 3.10. Weights download from Hugging Face. Audio is watermarked by default via SilentCipher. One-shot voice cloning works from a ~10-second clip. Limitations Where it stops, for now Handles individual turns only; no turn-taking yet. Generates half-duplex audio — it cannot speak while the other party speaks. Miso Labs frames full-duplex and turn-taking as future work. API access is announced but not yet available. Key Takeaways The short version Open-weights 8B TTS under a modified MIT license. Conditions on text and audio, so output tracks speaker tone. RVQ scales vocabulary to ~204832 without adding parameters. 7.7B backbone over time, 300M decoder over depth. Half-duplex and single-turn today; API access pending. Prev Next Decoded by Marktechpost — AI research, model briefs, and developer tools for practitioners. marktechpost.com Key Takeaways Miso Labs open-sourced MisoTTS, an 8B text-to-speech

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights Read Post »

AI, Committee, ข่าว, Uncategorized

How courts are coping with a flood of AI-generated lawsuits

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mindful of how daunting it is to walk into the courtroom alone.  Lately, like many judges across the US, she has seen a noticeable uptick in such filings. According to a new study that examined 4.5 million federal civil cases from 2005 to 2026, the share of lawsuits brought by self-represented people increased from 11% in 2022 to 16.8% in 2025. Within those cases, the number of filings made more than doubled from pre-2023 levels.  Judge Braswell puts that jump down to AI.  “I do correlate that to AI in part because I see AI use,” she says. As a tech-savvy judge who uses AI to vet court documents, she’s learned to recognize how large language models write. She can tell from the prose and at times, hallucinated cases and fabricated quotes.  “I’m also actually seeing better-drafted pleadings,” she says.  But while AI appears to be expanding access to justice, it doesn’t seem to be improving people’s chances of winning. Judges are also starting to question what kinds of rights and responsibilities large language models should bear as they step into lawyers’ shoes. For example, they ask whether a chatbot has a duty to provide good advice, as a human lawyer does. And a growing number of lawmakers across the US are starting to grapple with who should pay the price when chatbots dish out bad legal advice.  AI supercharges lawsuits To test whether AI was driving the increase in lawsuits filed by people without a lawyer, the authors of the study, Anand Shah at MIT and Joshua Levy at the University of Southern California, ran 1,600 randomly sampled court documents through Pangram, a commercial AI-text detector. The share flagged as containing AI-generated writing rose from 1% in 2023 to 18% in 2026.  To Judge Braswell, that’s not necessarily a cause for concern. While the surge of AI-assisted filings might be adding to their workloads, she and many other judges find the cases easier to rule on because AI is helping people without legal training better articulate their arguments.  Court documents written by people without lawyers are notoriously hard to decipher. Some arrive as handwritten scrawls bordering on gibberish that judges take a while to decode. However cryptic, judges are required to read them charitably. These days, Judge Braswell has been churning through motions drafted by AI faster than the ones written by the litigants. “I have to be really careful because some of them contain hallucinations and errors, but I can generally understand what they’re arguing better with AI assistance from them than without it,” she says. The clearer filings let Judge Braswell hear them better. “If I understand an argument a little bit better, I’m probably going to be able to help a little bit more,” she says. Online communities are springing up to trade self-help guides on using AI to sue. In December 2024, a viral Reddit post walked immigration applicants through suing the United States Citizenship and Immigration Services over delayed review of their applications: draft a writ of mandamus with Microsoft Copilot, pay a lawyer $150 to polish it, and file in the expedient District of Vermont. Cases filed by people without lawyers in Vermont rose from about 45 a year before 2022 to more than 1,100 in 2024.  Even so, people without lawyers are far more likely to lose their case than people with lawyers, and that’s not changing even with the addition of AI, the study found.  “It turns out that mounting a lawsuit is a complex, multifaceted task. Not all of it is just drafting text,” says Levy.  Chatbot-client privilege Judge William Garfinkel, a federal magistrate judge in Connecticut, has served on the bench for three decades, pondering all sorts of questions about lawyers’ relationship with their clients. Lately, he has been wondering whether people’s conversations with chatbots dispensing legal advice should be privileged, the way their conversations with lawyers are.  “You can make a good argument that … conversations with large language models like Claude or ChatGPT or Grok should deserve some protection,” he says. Courts are starting to grapple with this question. In February, a federal court in Michigan ruled that a self-represented person’s conversations with ChatGPT to prepare her case were work product—legal work that is shielded from the opposing side. The decision came on the same day a federal court in New York held that documents a criminal defendant had generated using Claude were not privileged attorney-client conversations or work product. The court argued that Claude is not an attorney and that a user has no “reasonable expectation of confidentiality in his communication” with it because AI companies can disclose user data to third parties.  In March, Judge Braswell ruled that a self-represented person’s use of a chatbot should stay off limits. “It is true that AI systems like ChatGPT, Claude, Gemini, and others … collect user data for training and other purposes. But … that does not eliminate all expectations of privacy,” she wrote. Courts have since remained split on the issue. Malpractice without a pulse Some judges are also wondering whether a chatbot, like a lawyer, has a duty to provide good legal advice. Judge Allison Goddard, a federal magistrate judge in California, has noticed that people without lawyers often get the wrong advice from ChatGPT when trying to assess the value of their case during settlement negotiations. In one case, a plaintiff who slipped and fell in a store asked for $700,000 from the store, which was wildly more than the case was worth. “Where are you getting the idea that you’re getting $700,000? Did you go to ChatGPT?” Judge Goddard asked. “Well …” the plaintiff mumbled. She then walked the person through the

How courts are coping with a flood of AI-generated lawsuits Read Post »

AI, Committee, ข่าว, Uncategorized

The Download: AI-generated lawsuits and virtual power plants for data centers

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How courts are coping with a flood of AI-generated lawsuits Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. The number of these filings has more than doubled compared to before 2023. She puts that jump down to AI.  But while AI appears to be expanding access to justice, it doesn’t seem to be improving people’s chances of winning. Judges are starting to question what rights and duties chatbots should have as they stand in for lawyers. Lawmakers, meanwhile, are grappling with who should pay the price when chatbots produce bad legal advice. Read the full story on how AI is reshaping access to the law. —Michelle Kim How virtual power plants could provide energy for data centers Would you take a payment to ramp down your electricity use? Would it change anything if you were doing so to help power a local data center? A new project backed by Google will put those questions to the test. The company has signed a deal to fund a virtual power plant in the largest power grid in the US. The system will group together devices like electric vehicles and smart thermostats, paying customers to adjust their usage when the grid is stretched. The project could free up capacity for Google’s data centers—but there’s a catch: people might not play along. Find out what the future holds for these virtual power plants. —Casey Crownhart This story is from The Spark, our weekly newsletter giving you the inside track on all things climate. Sign up to receive it in your inbox every Wednesday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The EU has proposed new legislation to end its Big Tech dependenceThe laws aim to boost domestic ​cloud, AI and semiconductors. (CNBC)+ US firms would be blocked from critical public tenders. (Reuters $)+ It also wants to make sure non-EU actors cannot disrupt tech services with a “kill switch.” (The Guardian)+ But the proposal needs to be negotiated with EU member states. (Politico $) 2 Intelligence agencies warn Chinese spies are recruiting on LinkedInThe Five Eyes alliance said Beijing is using job platforms for espionage. (BBC)+ The spies are allegedly recruiting government and military staff. (Politico $)+ The Chinese embassy in the UK condemned the accusations. (Bloomberg $)+ Meet the man hunting the spies in your smartphone. (MIT Technology Review) 3 AI CEOs have called for a law protecting against biological weaponsThey warn that synthetic DNA could be used for bioweapons. (Wired $)+ Sam Altman, Dario Amodei, and Demis Hassabis joined the call. (WSJ $)+ No one’s sure if synthetic mirror life will kill us all. (MIT Technology Review) 4 Firms are using Reddit to manipulate ChatGPT and Google AI searchThey’re spamming subreddits to get posts scraped by chatbots. (404 Media)+ What we’ve been getting wrong about AI’s truth crisis. (MIT Technology Review) 5 Meta keeps delaying the launch of its new AI modelThe new Muse Spark ‌AI model API still has no release date. (WSJ $)+ Which is hampering Meta’s plans to monetize its AI investments. (Reuters $) 6 For the first time, a US city has voted to permanently ban data centersMonterey Park, California, voted in favor of the move. (LA Times)+ Should we be moving data centers to space? (MIT Technology Review) 7 China is betting on household chore training to advance roboticsData harvested in homes and factories provides a scaling edge. (Rest of World)+ Gig workers are training humanoids at home. (MIT Technology Review) 8 Sam Altman will urge US lawmakers not to require AI model approvalsHe’s advocating against proposals for new AI rules. (Reuters $)+ His move comes after President Trump signed a new AI order. (Wired $) 9 Quantinuum raised $1.68 billion in an IPO as quantum computing rises Investors flocked to one of the fast-growing sector’s leaders. (Reuters $) 10 Someone finally wants to hire philosophers: Silicon ValleyBig tech hopes they will help build better machines. (The Atlantic $) Quote of the day “Historically, these companies have been very willing to play Russian roulette—and they’re playing another round.” —Connor Leahy, an AI researcher, former hacker and US director of ControlAI, tells the Financial Times why he’s concerned about Anthropic’s relentless race to the top. One More Thing HENRY HORENSTEIN/GETTY What an octopus’s mind can teach us about AI’s ultimate mystery Emily Bender, a linguist at the University of Washington, has developed a thought experiment she calls the octopus test. It involves an octopus learning to copy patterns in human writing and produce squiggles in response. But does the animal actually understand the language or are we merely projecting meaning onto it? Bender’s octopus is a stand-in for AI systems like ChatGPT. The intelligence we see in these machines is also projected on them by us. The same applies to consciousness: we may claim to see it, but it remains unclear whether it is really there. Read the full story on the debate over machines with minds. —Will Douglas Heaven We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Discover where iconic sound effects actually came from in this fabulous audio history.+ Need a serotonin boost? Then tune into this live puppy cam from Denali National Park.+ Linux lovers can try 570 extinct operating systems at a new virtual museum.+ Beethoven’s “Moonlight Sonata” becomes something entirely different in this lightning-fast bass guitar performance.

The Download: AI-generated lawsuits and virtual power plants for data centers Read Post »

AI, Committee, ข่าว, Uncategorized

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

arXiv:2604.08782v3 Announce Type: replace Abstract: Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap – yielding improved or preserved accuracy across datasets while remaining robust to distractors and irrelevant turns. Our results establish MT-OSC as a scalable solution for multi-turn chats, enabling richer context within constrained input spaces, reducing latency and operational cost, while balancing performance.

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at นโยบายความเป็นส่วนตัว and manage your privacy settings by clicking Settings.

ตั้งค่าความเป็นส่วนตัว

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

ยอมรับทั้งหมด
จัดการความเป็นส่วนตัว
  • เปิดใช้งานตลอด

บันทึกการตั้งค่า
th