AI Archives - Página 6 de 203

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes

admin NU / junio 5, 2026

In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. Cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. ‘Cold start’ means the full sequence a model server must complete before serving any request: pulling the container image, loading model weights into GPU memory, warming up CUDA kernels, compiling or capturing CUDA graphs, and registering with the service discovery layer. This delay increases the risk of SLA violations during traffic spikes, as the system cannot scale quickly enough to absorb sudden increases in demand. The cold-start latency for a single-GPU vLLM (v0.20.0) workload breaks into three segments: container/image pull, engine initialization (weight loading, kernel warmup, graph compilation), and distributed runtime startup. To address this, NVIDIA’s AI research team has introduced NVIDIA Dynamo Snapshot: a checkpoint/restore approach for AI inference workloads on Kubernetes. https://developer.nvidia.com/blog/nvidia-dynamo-snapshot-fast-startup-for-inference-workloads-on-kubernetes/?linkId=100000423964029 What is CRIU and cuda-checkpoint? A running inference worker’s checkpointable state has two components. Device state (GPU-side) includes CUDA contexts, streams, device memory, and virtual address mappings — this is not visible to the host. To serialize it, cuda-checkpoint uses the checkpointing capability of the CUDA driver to dump the device state to CPU memory of the process owning each CUDA context. Host state (CPU-side) includes CPU memory, threads, file descriptors, and namespaces. CRIU (Checkpoint/Restore in Userspace) walks the Linux kernel’s bookkeeping and serializes the process tree’s state to disk. When checkpointing, the two tools run in order: cuda-checkpoint dumps all device state into CPU memory first, then CRIU dumps all host-side process tree state to a folder in storage. When restoring on the same or a different node: CRIU restores the process tree from distributed storage such as NFS or SMB first, then cuda-checkpoint restores the GPU state from what is now in CPU memory onto the new GPUs. CRIU is fundamentally a freeze-and-thaw mechanism. When a process is restored, execution resumes at the exact instruction where it was checkpointed, completely unaware that checkpointing or restoration occurred. Because of this, any coordination required before checkpointing such as quiescing the workload or after restoration such as re-establishing external state — must be handled externally through an orchestrator or workload-specific hooks. How Dynamo Snapshot Works on Kubernetes In Kubernetes, workloads run inside containers inside pods. Because CRIU checkpoints contain references to the container’s writable filesystem layer, checkpointing is done at the container level so the process tree state and filesystem travel together. NVIDIA provides a privileged DaemonSet, snapshot-agent, installable through a Helm chart. An agent runs on every node and handles checkpoint and restore for runc-managed containers without requiring modifications to runc itself. On checkpoint, the agent waits for the workload’s readiness probe, invokes cuda-checkpoint and CRIU from the host side, and writes the artifact to shared storage. The workload may have created or deleted files local to the container (the overlay filesystem), which the agent also checkpoints after the CRIU stage. On restore, the agent launches a lightweight placeholder pod, restores the overlay filesystem, and restores the CRIU/CUDA checkpoint into its namespaces. Each agent operates independently on its local node, allowing checkpoints and restores to parallelize naturally across the cluster. This DaemonSet approach was chosen over Kubernetes native checkpoint/restore support in runc for three reasons: it is fully portable without depending on cloud-provider feature gates, it gives tighter control over CRIU for performance tuning, and it allows checkpoint artifacts to live in flexible storage backends rather than being embedded into OCI images. Quiesce/resume hooks: A Dynamo inference worker initializes in two ordered phases. First, engine initialization: communicators are initialized, weights are loaded, kernels are warmed up, and CUDA graphs are compiled. The worker is fully warm at this point but not yet discoverable outside its pod. Second, distributed runtime startup: the worker connects to the Dynamo control plane and registers with the discovery backend. Open TCP connections to the control plane exist from this point onward. If checkpoint were taken after distributed runtime startup, there would be active TCP connections that CRIU cannot capture. The solution is quiesce/resume hooks: the worker writes a ‘ready for checkpoint’ signal file after engine initialization but before distributed runtime startup. The worker then enters a polling loop waiting for a ‘restore complete’ signal file while the snapshot agent checkpoints it externally. Because CRIU restores execution at the exact instruction where checkpointing occurred, the worker resumes directly inside the polling loop, detects the signal file, and proceeds with distributed runtime initialization without requiring additional synchronization. The quiesce/resume pattern is also important for multi-GPU and multi-node checkpoints (planned for a future release): outbound TCP connections used for RPC cannot be checkpointed in an established state because the pod IP changes between checkpoint and restore, and RDMA registrations and NIC state need to be recreated post-restore. Optimization 1: KV Cache Unmap and Release After measuring peak GPU memory usage while weights, CUDA graphs, and other buffers are allocated, inference engines allocate the remaining GPU memory as a large KV cache buffer. Since the checkpoint is taken before the replica has served any requests, this KV cache buffer does not need to be checkpointed at all. However, its virtual address must remain stable because it is baked into the CUDA graph. The solution is to allocate the KV cache via the CUDA Virtual Memory Management API (cuMemCreate and cuMemMap), then free the underlying physical allocation with cuMemUnmap and cuMemRelease — but not cuMemAddressFree. This keeps the virtual address range intact while releasing the physical memory. This functionality is natively available in vLLM via sleep() and wake_up() and in SGLang via torch_memory_saver. For Qwen3-0.6B on a B200, this reduces the total artifact size from ~190 GiB to ~6 GiB. The wins are most pronounced for large KV cache sizes — that is, smaller model weights relative to GPU size. Optimization 2: Speeding Up CRIU Memory Restore Even after the artifact is smaller, upstream CRIU restore time remains a bottleneck. For larger models, restore time actually exceeds

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes Leer entrada »

AI, Committee, Noticias, Uncategorized

Building Semantic Search with Transformers.js and Sentence Embeddings

admin NU / junio 5, 2026

You’ve probably shipped this bug before, where a user types ” affordable laptop ” into your search bar and gets zero results.

Building Semantic Search with Transformers.js and Sentence Embeddings Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains

admin NU / junio 5, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The Meta hack shows there’s more to AI security than Mythos On Monday, reports emerged that attackers had used Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: they asked the agent to link the accounts to email addresses they controlled, and it complied. Since Anthropic announced that its Mythos model was too good at hacking for a general release, cybersecurity concerns have focused on the risk of superpowered AI systems overwhelming computer infrastructure. But the Instagram hack shows that far simpler exploits can still cause damage. As companies offload more work to AI, these comparatively unsophisticated attacks are becoming harder to ignore. Read the full story to understand why. —Grace Huckins Are AI chatbots making us lose control of our brains? Gloria Mark, a psychologist at the University of California, Irvine, fears that digital technologies are weakening our cognitive abilities. Her research suggests attention spans have fallen sharply over time, leading to higher stress and lower performance. She now believes AI tools like ChatGPT and Claude may accelerate this shift. “You’re deferring your cognitive work to AI,” she said. “And it’s not good for us.” Mark argues this could weaken critical thinking and emotional intelligence. Luckily, she thinks we can course-correct by changing our relationship with these technologies. Find out how AI could reshape attention and thinking. —Jessica Hamzelou This story is from The Checkup, our weekly newsletter giving you the inside track on all things biotech. Sign up to receive it in your inbox every Thursday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Anthropic has called for a global slowdown in AI developmentIt flagged the risk of models “self-improving.” (WSJ $)+ And wants a coordinated plan to stop them. (Reuters $)+ Skeptics note that the timing is awfully convenient. (The Register) 2 In a first, scientists have precisely edited human embryo genesThey relied on a newer gene-editing technique. (NYT $)+ Genetically-modified babies could be on their way. (Guardian)+ Companies have big plans for the technology. (MIT Technology Review) 3 US officials have discussed taking financial stakes in the AI firmsThey’ve held talks about the government acquiring shares. (Reuters $)+ Sam Altman pitched the idea to the White House last year. (WSJ $) 4 Bot web traffic has overtaken human web trafficCloudflare said 57.4% of traffic now comes from bots. (NBC News)+ Its CEO expected the milestone at the end of 2027. (CNET) 5 The White House plans to bring AI doctors into American medicineIt wants chatbots to diagnose illness and prescribe medicine. (WSJ $)+ But we don’t even know if healthcare AI actually helps patients. (MIT Technology Review) 6 Meta quietly added facial recognition code for smart glasses to its appThe exploratory feature would identify people via biometric data. (Wired $)+ Smart glasses are also entering warfare. (MIT Technology Review) 7 South Korea’s labour minister wants tech firms to share AI profitsKim Young wants staff and suppliers to get a share. (Reuters $)+ He helped avert a huge strike over AI profit-sharing at Samsung. (NYT $) 8 Canada’s highly-anticipated AI strategy has launchedIt promises over $2 billion in funding and aims to create 250,000 jobs. (BBC)+ AI could strengthen democracy. (MIT Technology Review) 9 Investment in agricultural tech is boomingThat’s good news at a time when we’re facing unprecedented levels of food market volatility. (The Economist $) 10 Bumblebees can use tools to solve problems, new research showsNot just busy—they’re clever too! (Guardian) Quote of the day “Welp, that happened faster than I predicted.” —Matthew Prince, co-founder and CEO of Cloudflare, one of the largest internet hosting services, reacts on X to reports that bots have overtaken humans in driving web traffic. One More Thing CHRISTOPHER PAYNE Inside the machine that saved Moore’s Law In a Connecticut clean room, the Dutch company ASML is developing the world’s most advanced machine for extreme ultraviolet (EUV) lithography, a crucial process for manufacturing microchips. The system has become vital to Moore’s Law—the observation that the number of transistors on a chip roughly doubles every two years as components shrink, driving gains in performance and efficiency. “Without this machine, it’s gone,” says Wayne Lam, a director of research at CCS Insight. “You can’t really make any leading-edge processors without EUV.” Discover how ASML’s EUV technology saved Moore’s Law. —Clive Thompson We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Tech bosses love Tolkien. Here’s what the writer might think of them.+ Rare footage captures an underwater volcano erupting beneath the Pacific Ocean.+ Watch a tiny rescued cub grow into adulthood in this heartwarming tiger compilation.+ This medieval version of “Take On Me” is like stepping into a tavern of synth-pop bards.

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains Leer entrada »

AI, Committee, Noticias, Uncategorized

Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning

admin NU / junio 4, 2026

Researchers at Stanford University and Lambda Labs, have published the research paper for OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device. The open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier Intelligence Per Watt study, which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025. Model Overview & Access OpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families. Property Value License Apache 2.0 Framework release March 12, 2026 Paper arXiv:2605.17172 (posted May 16, 2026) Repository github.com/open-jarvis/OpenJarvis Stars / forks ~5.4k / ~1.2k (June 2026) Languages Python (~83%), Rust (~9%), TypeScript (~7%) Evaluated models 11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, Granite Cloud baselines Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro Supported engines Ollama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others) Context window Model-dependent Installation Single command; ~3 minutes on broadband Hardware Tested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark Architecture: Five Primitives and a Spec OpenJarvis decomposes a personal AI system into five typed primitives, composed through a single declarative configuration object called a spec. Intelligence — the model, weights, generation parameters, and quantization format. Engine — the inference runtime (Ollama, vLLM, SGLang, etc.), batching, KV-cache settings, and hardware path. Agents — the reasoning loop (ReAct or CodeAct), system prompts, tool-use policy, and turn limits. Tools & Memory — external interfaces, retrieval backends, 25+ data connectors, and 32+ messaging channels, with native MCP support and interchangeable memory backends. Learning — the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search. Each primitive is independently swappable, and a spec serializes all five into a TOML file. Two specs can share the same agent and tool configuration and differ only in model and engine, so the same behavior runs on a Mac Mini and a workstation without rewriting prompts. LLM-guided spec search is the second contribution. It is a local–cloud collaboration: a frontier cloud model acts as a teacher at search time, reading traces, diagnosing failure clusters, and proposing edits across Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted only if it improves the target failure cluster without causing meaningful regressions elsewhere — the research team calls this the gate (default tolerance 1%). The optimized spec then runs entirely on-device at inference time, with zero cloud calls. The teacher is used only at search time; at 100 queries per day, the amortized teacher cost falls below $0.001 per query within six months. Prior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and prompt optimizers alone recover only about 5 pp of the cloud–local gap. LLM-guided spec search recovers 13–32 pp because it edits across primitives jointly, at 7–11× lower optimization cost than single-primitive baselines. The four-primitive move space contributes 5.5–16.5 pp, and the LLM proposer adds about 10 pp on average over an evolutionary search at the same move space. https://arxiv.org/pdf/2605.17172v1 Capabilities & Performance OpenJarvis was evaluated across 8 benchmarks spanning 508 tasks: tool calling (ToolCall-15), agentic workflows (PinchBench), coding (LiveCodeBench), customer service (τ-Bench V2, τ²-Bench Telecom), general assistance (GAIA), and deep research (LiveResearchBench, DeepResearchBench). The swap test: Replacing the intended cloud model with Qwen3.5-9B in existing frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the same model under an OpenJarvis spec, the residual drop shrinks to 5.6–16.5 pp — recovering 56–77% of the portability loss. The accuracy frontier: The best single local model, Qwen3.5-122B, reaches 80.3% average accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp gap. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, LiveCodeBench, and τ-Bench V2. Cost and latency: Local configurations form the accuracy–efficiency frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per query, versus $0.009 per query for Claude Opus 4.6 — an approximately 800× marginal API-cost advantage. End-to-end latency drops by roughly 4× on the agentic workloads, though the paper notes single-shot prompts can favor cloud serving. Search gains: LLM-guided spec search improves the Qwen3.5-9B student to 100% on PinchBench, 83% on LiveCodeBench, and 91% on LiveResearchBench. Across the full eight-benchmark suite, average gains per student model range from 13.1 to 31.5 pp. The authors report that these gains survive their robustness checks (reward-weight variants, search-seed variance, and random restarts). How to Use it Installation is one command. On macOS, Linux, or WSL2: Copy CodeCopiedUse a different Browser curl -fsSL https://open-jarvis.github.io/OpenJarvis/install.sh | bash Windows users run an equivalent PowerShell script (irm … | iex). The installer provisions uv, a Python virtual environment, Ollama, and a starter model in about three minutes on broadband. A desktop GUI ships as a .dmg, .exe, .deb, .rpm, or .AppImage from the releases page. After install, jarvis starts a chat session. Starter presets cover common workflows: Copy CodeCopiedUse a different Browser jarvis init –preset morning-digest-mac # daily briefing with TTS jarvis init –preset deep-research # multi-hop research with citations jarvis init –preset code-assistant # agent with code execution and shell access jarvis init –preset scheduled-monitor # stateful agent on a schedule The framework ships with eight built-in agents across three execution modes — on-demand, scheduled, and continuous. It connects to 25+ data sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes agents over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others). Skills can be imported from external catalogs — about 150 from Hermes Agent and about 13,700 community skills from OpenClaw — all following the agentskills.io specification. A jarvis optimize skills –policy dspy command refines them from local trace history. Marktechpost’s Visual Explainer OpenJarvis ·

Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning Leer entrada »

AI, Committee, Noticias, Uncategorized

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

admin NU / junio 4, 2026

Miso Labs has released MisoTTS, an open-weights 8-billion-parameter text-to-speech model. It generates expressive speech from both text and audio context. The model uses residual vector quantization (RVQ) to widen its sonic range. This avoids scaling a single flat vocabulary while keeping parameter count fixed. What is MisoTTS MisoTTS is an 8B-parameter text-to-dialogue RVQ Transformer. It is inspired by the Sesame CSM architecture. It pairs a Llama 3.2-style backbone with a smaller audio decoder. It generates Mimi audio codes from text and optional audio context. The model conditions on both text and prior audio. That second input lets it respond to the speaker’s tone. The text vocabulary is 128,256 tokens, and there are 32 audio codebooks. Mimi is the audio tokenizer, and max sequence length is 2,048. Default inference runs in torch.bfloat16. Miso Labs claims 110ms latency. It lists ElevenLabs at 700ms and Sesame at 300ms. The Vocabulary Size Problem Standard transformers generate from a fixed vocabulary of discrete tokens. That works when a small vocabulary covers the target space. Human speech does not fit that assumption. It varies across pitch, rhythm, emphasis, emotion, and accent. Expanding the audio vocabulary is the obvious fix. But larger vocabularies need more parameters in a standard transformer. Each token must be represented and predicted by the model. Miso Labs calls this the vocabulary size problem. The second issue is conditioning. Most TTS models condition only on text. They ignore the interlocutor’s tone. Miso Labs argues this contributes to the “uncanny valley” effect. Residual Vector Quantization: The Core Idea MisoTTS addresses both problems with residual vector quantization (RVQ). Miso Labs traces RVQ to image-generation research and to Sesame’s CSM for audio. Instead of one token index, the model emits a vector of indices. Each audio token is 32 codebook indices over 2048-way codebooks. The model keeps a separate codebook for each position in the vector. To recover the sound, it sums the looked-up vectors. Each codebook adds another refinement to the signal. This is what makes the scaling work. Addressable vocabulary equals codebook size raised to the depth. Growing the depth adds no parameters to the model. So MisoTTS reaches about 204832, or roughly 10105 addressable tokens. Miso Labs notes naive scaling would require a far larger network. https://www.misolabs.ai/blog/miso-tts-8b The Two-Transformer Architecture The model splits into a backbone and a decoder. The backbone is a 7.7B-parameter transformer, autoregressive over time. It predicts the first codebook index and a final hidden state. A 300M-parameter decoder then runs autoregressively over depth. It predicts the remaining codebook indices, one position at a time. Each prediction conditions on the indices already chosen in the frame. The same 300M parameters are reused for every position. Embeddings follow the same logic. Text tokens use a single lookup. An audio token’s embedding is the sum of per-position codebook lookups. Interleaving text and audio lets the backbone use conversation history. That is how it carries context across turns. Strengths and Challenges Strengths: Open weights on day one, under a modified MIT license. RVQ scales the sonic range without scaling parameter count. Conditions on audio context, not text alone. Local deployment keeps sensitive audio data in-house. The architecture and math are documented in a public blog post. Challenges: Half-duplex only, with no turn-taking yet. The large model needs a capable CUDA GPU. API access is announced but not yet available. Latency and quality claims still need third-party testing. Marktechpost’s Visual Explainer Marktechpost · Model Brief 01 / 09 Open-Weights Release · June 3, 2026 MisoTTS An 8B emotive text-to-speech model from Miso Labs, built on residual vector quantization and conditioned on both text and audio. 8B params RVQ Transformer Mimi codes modified MIT What MisoTTS Is A text-to-dialogue RVQ Transformer An 8B-parameter model inspired by the Sesame CSM architecture. Pairs a Llama 3.2-style backbone with a smaller audio decoder. Generates Mimi audio codes from text and optional audio context. Conditions on prior audio, so output responds to speaker tone. At a Glance Published specifications Parameters 8B (7.7B + 300M) Architecture RVQ Transformer Audio codebooks 32 (2048-way) Audio tokenizer Mimi Text vocabulary 128,256 Max sequence length 2,048 Default precision torch.bfloat16 License modified MIT The Motivation The vocabulary size problem Transformers generate from a fixed vocabulary of discrete tokens. Speech varies in pitch, rhythm, emphasis, emotion, and accent. A bigger audio vocabulary needs more parameters in a standard transformer. Most TTS condition only on text, ignoring tone — the “uncanny valley” effect. The Core Idea Residual vector quantization The model emits a vector of indices, not a single token index. Each token is 32 codebook indices over 2048-way codebooks. Summing the looked-up vectors reconstructs the sound. Depth scales addressable vocabulary to ~204832 (≈10105) with no added parameters. Architecture Two transformers, one vector token Backbone (7.7B) — autoregressive over time; predicts codebook index k₁ and hidden state h₀. Decoder (300M) — autoregressive over depth; predicts k₂ through k₃₂. The same 300M parameters are reused for every position. Interleaved text and audio let the backbone use conversation history. Run It Locally Inference in a few lines from generator import load_miso_8b import torchaudio gen = load_miso_8b(device=”cuda”, model_path_or_repo_id=”MisoLabs/MisoTTS”) audio = gen.generate( text=”Hello from Miso.”, speaker=0, context=[], max_audio_length_ms=10_000) torchaudio.save(“miso.wav”, audio.unsqueeze(0).cpu(), gen.sample_rate) Setup uses uv with Python 3.10. Weights download from Hugging Face. Audio is watermarked by default via SilentCipher. One-shot voice cloning works from a ~10-second clip. Limitations Where it stops, for now Handles individual turns only; no turn-taking yet. Generates half-duplex audio — it cannot speak while the other party speaks. Miso Labs frames full-duplex and turn-taking as future work. API access is announced but not yet available. Key Takeaways The short version Open-weights 8B TTS under a modified MIT license. Conditions on text and audio, so output tracks speaker tone. RVQ scales vocabulary to ~204832 without adding parameters. 7.7B backbone over time, 300M decoder over depth. Half-duplex and single-turn today; API access pending. Prev Next Decoded by Marktechpost — AI research, model briefs, and developer tools for practitioners. marktechpost.com Key Takeaways Miso Labs open-sourced MisoTTS, an 8B text-to-speech

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights Leer entrada »

AI, Committee, Noticias, Uncategorized

How courts are coping with a flood of AI-generated lawsuits

admin NU / junio 4, 2026

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mindful of how daunting it is to walk into the courtroom alone. Lately, like many judges across the US, she has seen a noticeable uptick in such filings. According to a new study that examined 4.5 million federal civil cases from 2005 to 2026, the share of lawsuits brought by self-represented people increased from 11% in 2022 to 16.8% in 2025. Within those cases, the number of filings made more than doubled from pre-2023 levels. Judge Braswell puts that jump down to AI. “I do correlate that to AI in part because I see AI use,” she says. As a tech-savvy judge who uses AI to vet court documents, she’s learned to recognize how large language models write. She can tell from the prose and at times, hallucinated cases and fabricated quotes. “I’m also actually seeing better-drafted pleadings,” she says. But while AI appears to be expanding access to justice, it doesn’t seem to be improving people’s chances of winning. Judges are also starting to question what kinds of rights and responsibilities large language models should bear as they step into lawyers’ shoes. For example, they ask whether a chatbot has a duty to provide good advice, as a human lawyer does. And a growing number of lawmakers across the US are starting to grapple with who should pay the price when chatbots dish out bad legal advice. AI supercharges lawsuits To test whether AI was driving the increase in lawsuits filed by people without a lawyer, the authors of the study, Anand Shah at MIT and Joshua Levy at the University of Southern California, ran 1,600 randomly sampled court documents through Pangram, a commercial AI-text detector. The share flagged as containing AI-generated writing rose from 1% in 2023 to 18% in 2026. To Judge Braswell, that’s not necessarily a cause for concern. While the surge of AI-assisted filings might be adding to their workloads, she and many other judges find the cases easier to rule on because AI is helping people without legal training better articulate their arguments. Court documents written by people without lawyers are notoriously hard to decipher. Some arrive as handwritten scrawls bordering on gibberish that judges take a while to decode. However cryptic, judges are required to read them charitably. These days, Judge Braswell has been churning through motions drafted by AI faster than the ones written by the litigants. “I have to be really careful because some of them contain hallucinations and errors, but I can generally understand what they’re arguing better with AI assistance from them than without it,” she says. The clearer filings let Judge Braswell hear them better. “If I understand an argument a little bit better, I’m probably going to be able to help a little bit more,” she says. Online communities are springing up to trade self-help guides on using AI to sue. In December 2024, a viral Reddit post walked immigration applicants through suing the United States Citizenship and Immigration Services over delayed review of their applications: draft a writ of mandamus with Microsoft Copilot, pay a lawyer $150 to polish it, and file in the expedient District of Vermont. Cases filed by people without lawyers in Vermont rose from about 45 a year before 2022 to more than 1,100 in 2024. Even so, people without lawyers are far more likely to lose their case than people with lawyers, and that’s not changing even with the addition of AI, the study found. “It turns out that mounting a lawsuit is a complex, multifaceted task. Not all of it is just drafting text,” says Levy. Chatbot-client privilege Judge William Garfinkel, a federal magistrate judge in Connecticut, has served on the bench for three decades, pondering all sorts of questions about lawyers’ relationship with their clients. Lately, he has been wondering whether people’s conversations with chatbots dispensing legal advice should be privileged, the way their conversations with lawyers are. “You can make a good argument that … conversations with large language models like Claude or ChatGPT or Grok should deserve some protection,” he says. Courts are starting to grapple with this question. In February, a federal court in Michigan ruled that a self-represented person’s conversations with ChatGPT to prepare her case were work product—legal work that is shielded from the opposing side. The decision came on the same day a federal court in New York held that documents a criminal defendant had generated using Claude were not privileged attorney-client conversations or work product. The court argued that Claude is not an attorney and that a user has no “reasonable expectation of confidentiality in his communication” with it because AI companies can disclose user data to third parties. In March, Judge Braswell ruled that a self-represented person’s use of a chatbot should stay off limits. “It is true that AI systems like ChatGPT, Claude, Gemini, and others … collect user data for training and other purposes. But … that does not eliminate all expectations of privacy,” she wrote. Courts have since remained split on the issue. Malpractice without a pulse Some judges are also wondering whether a chatbot, like a lawyer, has a duty to provide good legal advice. Judge Allison Goddard, a federal magistrate judge in California, has noticed that people without lawyers often get the wrong advice from ChatGPT when trying to assess the value of their case during settlement negotiations. In one case, a plaintiff who slipped and fell in a store asked for $700,000 from the store, which was wildly more than the case was worth. “Where are you getting the idea that you’re getting $700,000? Did you go to ChatGPT?” Judge Goddard asked. “Well …” the plaintiff mumbled. She then walked the person through the

How courts are coping with a flood of AI-generated lawsuits Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: AI-generated lawsuits and virtual power plants for data centers

admin NU / junio 4, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How courts are coping with a flood of AI-generated lawsuits Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. The number of these filings has more than doubled compared to before 2023. She puts that jump down to AI. But while AI appears to be expanding access to justice, it doesn’t seem to be improving people’s chances of winning. Judges are starting to question what rights and duties chatbots should have as they stand in for lawyers. Lawmakers, meanwhile, are grappling with who should pay the price when chatbots produce bad legal advice. Read the full story on how AI is reshaping access to the law. —Michelle Kim How virtual power plants could provide energy for data centers Would you take a payment to ramp down your electricity use? Would it change anything if you were doing so to help power a local data center? A new project backed by Google will put those questions to the test. The company has signed a deal to fund a virtual power plant in the largest power grid in the US. The system will group together devices like electric vehicles and smart thermostats, paying customers to adjust their usage when the grid is stretched. The project could free up capacity for Google’s data centers—but there’s a catch: people might not play along. Find out what the future holds for these virtual power plants. —Casey Crownhart This story is from The Spark, our weekly newsletter giving you the inside track on all things climate. Sign up to receive it in your inbox every Wednesday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The EU has proposed new legislation to end its Big Tech dependenceThe laws aim to boost domestic cloud, AI and semiconductors. (CNBC)+ US firms would be blocked from critical public tenders. (Reuters $)+ It also wants to make sure non-EU actors cannot disrupt tech services with a “kill switch.” (The Guardian)+ But the proposal needs to be negotiated with EU member states. (Politico $) 2 Intelligence agencies warn Chinese spies are recruiting on LinkedInThe Five Eyes alliance said Beijing is using job platforms for espionage. (BBC)+ The spies are allegedly recruiting government and military staff. (Politico $)+ The Chinese embassy in the UK condemned the accusations. (Bloomberg $)+ Meet the man hunting the spies in your smartphone. (MIT Technology Review) 3 AI CEOs have called for a law protecting against biological weaponsThey warn that synthetic DNA could be used for bioweapons. (Wired $)+ Sam Altman, Dario Amodei, and Demis Hassabis joined the call. (WSJ $)+ No one’s sure if synthetic mirror life will kill us all. (MIT Technology Review) 4 Firms are using Reddit to manipulate ChatGPT and Google AI searchThey’re spamming subreddits to get posts scraped by chatbots. (404 Media)+ What we’ve been getting wrong about AI’s truth crisis. (MIT Technology Review) 5 Meta keeps delaying the launch of its new AI modelThe new Muse Spark ‌AI model API still has no release date. (WSJ $)+ Which is hampering Meta’s plans to monetize its AI investments. (Reuters $) 6 For the first time, a US city has voted to permanently ban data centersMonterey Park, California, voted in favor of the move. (LA Times)+ Should we be moving data centers to space? (MIT Technology Review) 7 China is betting on household chore training to advance roboticsData harvested in homes and factories provides a scaling edge. (Rest of World)+ Gig workers are training humanoids at home. (MIT Technology Review) 8 Sam Altman will urge US lawmakers not to require AI model approvalsHe’s advocating against proposals for new AI rules. (Reuters $)+ His move comes after President Trump signed a new AI order. (Wired $) 9 Quantinuum raised $1.68 billion in an IPO as quantum computing rises Investors flocked to one of the fast-growing sector’s leaders. (Reuters $) 10 Someone finally wants to hire philosophers: Silicon ValleyBig tech hopes they will help build better machines. (The Atlantic $) Quote of the day “Historically, these companies have been very willing to play Russian roulette—and they’re playing another round.” —Connor Leahy, an AI researcher, former hacker and US director of ControlAI, tells the Financial Times why he’s concerned about Anthropic’s relentless race to the top. One More Thing HENRY HORENSTEIN/GETTY What an octopus’s mind can teach us about AI’s ultimate mystery Emily Bender, a linguist at the University of Washington, has developed a thought experiment she calls the octopus test. It involves an octopus learning to copy patterns in human writing and produce squiggles in response. But does the animal actually understand the language or are we merely projecting meaning onto it? Bender’s octopus is a stand-in for AI systems like ChatGPT. The intelligence we see in these machines is also projected on them by us. The same applies to consciousness: we may claim to see it, but it remains unclear whether it is really there. Read the full story on the debate over machines with minds. —Will Douglas Heaven We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Discover where iconic sound effects actually came from in this fabulous audio history.+ Need a serotonin boost? Then tune into this live puppy cam from Denali National Park.+ Linux lovers can try 570 extinct operating systems at a new virtual museum.+ Beethoven’s “Moonlight Sonata” becomes something entirely different in this lightning-fast bass guitar performance.

The Download: AI-generated lawsuits and virtual power plants for data centers Leer entrada »

AI, Committee, Noticias, Uncategorized

Using Scikit-LLM with Open-Source LLMs

admin NU / junio 4, 2026

This article will teach you how to perform a language task like text classification by integrating locally hosted large language models (LLMs) of manageable size, like Mistral, Gemma, and Llama 3: all for free thanks to Ollama — a free repository for local LLMs — and the Scikit-LLM Python library.

Using Scikit-LLM with Open-Source LLMs Leer entrada »

AI, Committee, Noticias, Uncategorized

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

admin NU / junio 3, 2026

arXiv:2604.08782v3 Announce Type: replace Abstract: Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap – yielding improved or preserved accuracy across datasets while remaining robust to distractors and irrelevant turns. Our results establish MT-OSC as a scalable solution for multi-turn chats, enabling richer context within constrained input spaces, reducing latency and operational cost, while balancing performance.

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation Leer entrada »

AI, Committee, Noticias, Uncategorized

Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output

admin NU / junio 3, 2026

Nous Research has released Hermes Desktop in public preview. It is a native application for macOS, Windows, and Linux. It gives the open-source Hermes Agent a graphical interface. Until now, users ran Hermes through a CLI and messaging gateways. The current build is Hermes Agent v0.15.2. Per Nous Research’s documentation, the desktop reuses the same agent core. It shares configuration, API keys, sessions, skills, and memory with the CLI and gateway. The desktop is another surface over one agent, not a fork. What is Hermes Desktop Hermes Agent is an autonomous AI agent. It is not a coding copilot tied to an editor. It runs tasks, calls tools, and keeps state across sessions. An agent here means a model that plans, acts, and observes in a loop. Hermes Desktop is a GUI on top of that same agent core. It needs no terminal to use. The window shows streaming responses and live tool activity. A right-hand pane previews web pages, files, and tool outputs. It also includes a file browser, voice input and output, and a settings UI. Sessions are shared across surfaces. A conversation started in the desktop resumes in the CLI or TUI. The reverse also works, because state is not duplicated. macOS and Windows offer direct installers. Linux installs from the terminal on any distribution. An install script with an –include-desktop flag builds the app against an existing install. The Closed Learning Loop Nous research team describes Hermes as having a closed learning loop. This is what separates it from a simple chat wrapper. After a complex task, the agent writes a reusable skill. Those skills then self-improve during later use. Memory is persistent and agent-curated, with periodic nudges to save knowledge. Cross-session recall uses FTS5 session search with LLM summarization. User modeling runs through Honcho dialectic user modeling. In practice, longer use means more retained context and reuse. Skills follow the agentskills.io open standard. How It Connects, Schedules, and Sandboxes Hermes runs across messaging platforms from one gateway. The desktop lists Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI. You can start a task on one platform and continue on another. Scheduling uses natural language for reports, backups, and briefings. These run unattended through the gateway on a built-in cron scheduler. Delegation spawns isolated subagents with their own conversations and terminals. A subagent is a separate worker that handles one job. Python RPC scripts collapse multi-step pipelines into zero-context-cost turns. Execution is sandboxed. The desktop lists five backends: local, Docker, SSH, Singularity, and Modal. It applies container hardening and namespace isolation. Namespace isolation limits what a running process can see or touch. Built-in tools include web search, browser automation, vision, image generation, text-to-speech, and multi-model reasoning. Hermes also connects external tools through MCP. MCP is the Model Context Protocol, a standard for tool integration. Nous Portal and the Tool Gateway Hermes works with any provider, so API keys are optional. Nous Portal bundles them under one subscription instead. Portal tiers are Free, Plus, Super, and Ultra. Paid tiers include monthly credits and access to 300+ models. They also include built-in tool use. The Tool Gateway routes several tools through one account. Web search uses Firecrawl and image generation uses FAL. Text-to-speech uses OpenAI and the cloud browser uses Browser Use. The next evolution of Hermes Agent is here! Introducing Hermes Desktop: everything you love about Hermes, now native on your machine. First demoed in Jensen’s GTC keynote, it’s now in public preview. pic.twitter.com/8ND1k8hyaz — Nous Research (@NousResearch) June 2, 2026 Strengths and Questions Strengths: Native installers remove the terminal requirement for most users Streaming output and previews make tool calls easier to inspect Persistent memory and self-improving skills reduce repeated instructions Model-agnostic design avoids lock-in to a single provider The MIT license allows audit, self-hosting, and modification Questions: The product is in public preview, so expect rough edges Autonomous memory and scheduling raise oversight and review questions The Linux desktop still installs through the terminal Broad capability means a steeper learning curve for beginners Key Takeaways Nous Research released Hermes Desktop in public preview, a native macOS, Windows, and Linux app for its open-source Hermes Agent. The GUI shares one agent core, configuration, API keys, sessions, skills, and memory with the CLI and gateway; sessions resume across surfaces. It runs no-terminal with streaming tool output, a side-by-side preview pane, file browser, voice I/O, and a settings UI. Hermes is model-agnostic and MIT-licensed, working with Nous Portal, OpenRouter, OpenAI, or any compatible endpoint. The current build is Hermes Agent v0.15.2, backed by a closed learning loop, MCP tool support, and five sandbox backends. Check out the Project here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output appeared first on MarkTechPost.

Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output Leer entrada »

AI

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes

Building Semantic Search with Transformers.js and Sentence Embeddings

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains

Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning

Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

How courts are coping with a flood of AI-generated lawsuits

The Download: AI-generated lawsuits and virtual power plants for data centers

Using Scikit-LLM with Open-Source LLMs

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output

Nuestros servicios

Inicio

Cómo funciona

Noticias

Precios

Soporte

Centro de ayuda

Reportar un problema

Dar comentarios

Política de privacidad

Cuenta de usuario

Síguenos