YouZum

AI

AI, Committee, 新闻, Uncategorized

A Coding Implementation on Deepgram Python SDK for Transcription, Text-to-Speech, Async Audio Processing, and Text Intelligence

In this tutorial, we build an advanced hands-on workflow with the Deepgram Python SDK and explore how modern voice AI capabilities come together in a single Python environment. We set up authentication, connect both synchronous and asynchronous Deepgram clients, and work directly with real audio data to understand how the SDK handles transcription, speech generation, and text analysis in practice. We transcribe audio from both a URL and a local file, inspect confidence scores, word-level timestamps, speaker diarization, paragraph formatting, and AI-generated summaries, and then extend the pipeline to async processing for faster, more scalable execution. We also generate speech with multiple TTS voices, analyze text for sentiment, topics, and intents, and examine advanced transcription controls such as keyword search, replacement, boosting, raw response access, and structured error handling. Through this process, we create a practical, end-to-end Deepgram voice AI workflow that is both technically detailed and easy to adapt for real-world applications. Copy CodeCopiedUse a different Browser !pip install deepgram-sdk httpx –quiet import os, asyncio, textwrap, urllib.request from getpass import getpass from deepgram import DeepgramClient, AsyncDeepgramClient from deepgram.core.api_error import ApiError from IPython.display import Audio, display DEEPGRAM_API_KEY = getpass(” Enter your Deepgram API key: “) os.environ[“DEEPGRAM_API_KEY”] = DEEPGRAM_API_KEY client = DeepgramClient(api_key=DEEPGRAM_API_KEY) async_client = AsyncDeepgramClient(api_key=DEEPGRAM_API_KEY) AUDIO_URL = “https://dpgr.am/spacewalk.wav” AUDIO_PATH = “/tmp/sample.wav” urllib.request.urlretrieve(AUDIO_URL, AUDIO_PATH) def read_audio(path=AUDIO_PATH): with open(path, “rb”) as f: return f.read() def _get(obj, key, default=None): “””Get a field from either a dict or an object — v6 returns both.””” if isinstance(obj, dict): return obj.get(key, default) return getattr(obj, key, default) def get_model_name(meta): mi = _get(meta, “model_info”) if mi is None: return “n/a” return _get(mi, “name”, “n/a”) def tts_to_bytes(response) -> bytes: “””v6 generate() returns a generator of chunks or an object with .stream.””” if hasattr(response, “stream”): return response.stream.getvalue() return b””.join(chunk for chunk in response if isinstance(chunk, bytes)) def save_tts(response, path: str) -> str: with open(path, “wb”) as f: f.write(tts_to_bytes(response)) return path print(” Deepgram client ready | sample audio downloaded”) print(“n” + “=”*60) print(” SECTION 2: Pre-Recorded Transcription from URL”) print(“=”*60) response = client.listen.v1.media.transcribe_url( url=AUDIO_URL, model=”nova-3″, smart_format=True, diarize=True, language=”en”, utterances=True, filler_words=True, ) transcript = response.results.channels[0].alternatives[0].transcript print(f”n Full Transcript:n{textwrap.fill(transcript, 80)}”) confidence = response.results.channels[0].alternatives[0].confidence print(f”n Confidence: {confidence:.2%}”) words = response.results.channels[0].alternatives[0].words print(f”n First 5 words with timing:”) for w in words[:5]: print(f” ‘{w.word}’ start={w.start:.2f}s end={w.end:.2f}s conf={w.confidence:.2f}”) print(f”n Speaker Diarization (first 5 words):”) for w in words[:5]: speaker = getattr(w, “speaker”, None) if speaker is not None: print(f” Speaker {int(speaker)}: ‘{w.word}'”) meta = response.metadata print(f”n Metadata: duration={meta.duration:.2f}s channels={int(meta.channels)} model={get_model_name(meta)}”) We install the Deepgram SDK and its dependencies, then securely set up authentication using our API key. We initialize both synchronous and asynchronous Deepgram clients, download a sample audio file, and define helper functions to make it easier to work with mixed response objects, audio bytes, model metadata, and streamed TTS outputs. We then run our first pre-recorded transcription from a URL and inspect the transcript, confidence score, word-level timestamps, speaker diarization, and metadata to understand the structure and richness of the response. Copy CodeCopiedUse a different Browser print(“n” + “=”*60) print(” SECTION 3: Pre-Recorded Transcription from File”) print(“=”*60) file_response = client.listen.v1.media.transcribe_file( request=read_audio(), model=”nova-3″, smart_format=True, diarize=True, paragraphs=True, summarize=”v2”, ) alt = file_response.results.channels[0].alternatives[0] paragraphs = getattr(alt, “paragraphs”, None) if paragraphs and _get(paragraphs, “paragraphs”): print(“n Paragraph-Formatted Transcript:”) for para in _get(paragraphs, “paragraphs”)[:2]: sentences = ” “.join(_get(s, “text”, “”) for s in (_get(para, “sentences”) or [])) print(f” [Speaker {int(_get(para,’speaker’,0))}, ” f”{_get(para,’start’,0):.1f}s–{_get(para,’end’,0):.1f}s] {sentences[:120]}…”) else: print(f”n Transcript: {alt.transcript[:200]}…”) if getattr(file_response.results, “summary”, None): short = _get(file_response.results.summary, “short”, “”) if short: print(f”n AI Summary: {short}”) print(f”n Confidence: {alt.confidence:.2%}”) print(f” Word count : {len(alt.words)}”) print(“n” + “=”*60) print(” SECTION 4: Async Parallel Transcription”) print(“=”*60) async def transcribe_async(): audio_bytes = read_audio() async def from_url(label): r = await async_client.listen.v1.media.transcribe_url( url=AUDIO_URL, model=”nova-3″, smart_format=True, ) print(f” [{label}] {r.results.channels[0].alternatives[0].transcript[:100]}…”) async def from_file(label): r = await async_client.listen.v1.media.transcribe_file( request=audio_bytes, model=”nova-3″, smart_format=True, ) print(f” [{label}] {r.results.channels[0].alternatives[0].transcript[:100]}…”) await asyncio.gather(from_url(“From URL”), from_file(“From File”)) await transcribe_async() We move from URL-based to file-based transcription by sending raw audio bytes directly to the Deepgram API, enabling richer options such as paragraphs and summarization. We inspect the returned paragraph structure, speaker segmentation, summary output, confidence score, and word count to see how the SDK supports more readable and analysis-friendly transcription results. We also introduce asynchronous processing and run URL-based and file-based transcription in parallel, helping us understand how to build faster, more scalable voice AI pipelines. Copy CodeCopiedUse a different Browser print(“n” + “=”*60) print(” SECTION 5: Text-to-Speech”) print(“=”*60) sample_text = ( “Welcome to the Deepgram advanced tutorial. ” “This SDK lets you transcribe audio, generate speech, ” “and analyse text — all with a simple Python interface.” ) tts_path = save_tts( client.speak.v1.audio.generate(text=sample_text, model=”aura-2-asteria-en”), “/tmp/tts_output.mp3″, ) size_kb = os.path.getsize(tts_path) / 1024 print(f” TTS audio saved → {tts_path} ({size_kb:.1f} KB)”) display(Audio(tts_path)) print(“n” + “=”*60) print(” SECTION 6: Multiple TTS Voices Comparison”) print(“=”*60) voices = { “aura-2-asteria-en”: “Asteria (female, warm)”, “aura-2-orion-en”: “Orion (male, deep)”, “aura-2-luna-en”: “Luna (female, bright)”, } for model_id, label in voices.items(): try: path = save_tts( client.speak.v1.audio.generate(text=”Hello! I am a Deepgram voice model.”, model=model_id), f”/tmp/tts_{model_id}.mp3″, ) print(f” {label}”) display(Audio(path)) except Exception as e: print(f” {label} — {e}”) print(“n” + “=”*60) print(” SECTION 7: Text Intelligence — Sentiment, Topics, Intents”) print(“=”*60) review_text = ( “I absolutely love this product! It arrived quickly, the quality is ” “outstanding, and customer support was incredibly helpful when I had ” “a question. I would definitely recommend it to anyone looking for ” “a reliable solution. Five stars!” ) read_response = client.read.v1.text.analyze( request={“text”: review_text}, language=”en”, sentiment=True, topics=True, intents=True, summarize=True, ) results = read_response.results We focus on speech generation by converting text to audio using Deepgram’s text-to-speech API and saving the resulting audio as an MP3 file. We then compare multiple TTS voices to hear how different voice models behave and how easily we can switch between them while keeping the same code pattern. After that, we begin working with the Read API by passing the review text into Deepgram’s text intelligence system to analyze language beyond simple transcription. Copy CodeCopiedUse a different Browser if getattr(results, “sentiments”, None): overall = results.sentiments.average print(f” Sentiment: {_get(overall,’sentiment’,’?’).upper()} ” f”(score={_get(overall,’sentiment_score’,0):.3f})”) for seg in (_get(results.sentiments, “segments”) or [])[:2]: print(f” •

A Coding Implementation on Deepgram Python SDK for Transcription, Text-to-Speech, Async Audio Processing, and Text Intelligence Read Post »

AI, Committee, 新闻, Uncategorized

A Coding Implementation on Microsoft’s OpenMementos with Trace Structure Analysis, Context Compression, and Fine-Tuning Data Preparation

In this tutorial, we work with Microsoft’s OpenMementos dataset and explore how reasoning traces are structured through blocks and mementos in a practical, Colab-ready workflow. We stream the dataset efficiently, parse its special-token format, inspect how reasoning and summaries are organized, and measure the compression provided by the memento representation across different domains. As we move through the analysis, we also visualize dataset patterns, align the streamed format with the richer full subset, simulate inference-time compression, and prepare the data for supervised fine-tuning. In this way, we build both an intuitive and technical understanding of how OpenMementos captures long-form reasoning while preserving compact summaries that can support efficient training and inference. Copy CodeCopiedUse a different Browser !pip install -q -U datasets transformers matplotlib pandas import re, itertools, textwrap from collections import Counter from typing import Dict import pandas as pd import matplotlib.pyplot as plt from datasets import load_dataset DATASET = “microsoft/OpenMementos” ds_stream = load_dataset(DATASET, split=”train”, streaming=True) first_row = next(iter(ds_stream)) print(“Columns :”, list(first_row.keys())) print(“Domain :”, first_row[“domain”], “| Source:”, first_row[“source”]) print(“Problem head:”, first_row[“problem”][:160].replace(“n”, ” “), “…”) We install the required libraries and import the core tools needed for dataset streaming, parsing, analysis, and visualization. We then connect to the Microsoft OpenMementos dataset in streaming mode to inspect it without downloading the entire dataset locally. By reading the first example, we begin understanding the dataset schema, the problem format, and the domain and source metadata attached to each reasoning trace. Copy CodeCopiedUse a different Browser BLOCK_RE = re.compile(r”<|block_start|>(.*?)<|block_end|>”, re.DOTALL) SUMMARY_RE = re.compile(r”<|summary_start|>(.*?)<|summary_end|>”, re.DOTALL) THINK_RE = re.compile(r”<think>(.*?)</think>”, re.DOTALL) def parse_memento(response: str) -> Dict: blocks = [m.strip() for m in BLOCK_RE.findall(response)] summaries = [m.strip() for m in SUMMARY_RE.findall(response)] think_m = THINK_RE.search(response) final_ans = response.split(“</think>”)[-1].strip() if “</think>” in response else “” return {“blocks”: blocks, “summaries”: summaries, “reasoning”: (think_m.group(1) if think_m else “”), “final_answer”: final_ans} parsed = parse_memento(first_row[“response”]) print(f”n→ {len(parsed[‘blocks’])} blocks, {len(parsed[‘summaries’])} mementos parsed”) print(“First block :”, parsed[“blocks”][0][:140].replace(“n”, ” “), “…”) print(“First memento :”, parsed[“summaries”][0][:140].replace(“n”, ” “), “…”) N_SAMPLES = 500 rows = [] for i, ex in enumerate(itertools.islice( load_dataset(DATASET, split=”train”, streaming=True), N_SAMPLES)): p = parse_memento(ex[“response”]) if not p[“blocks”] or len(p[“blocks”]) != len(p[“summaries”]): continue blk_c = sum(len(b) for b in p[“blocks”]) sum_c = sum(len(s) for s in p[“summaries”]) blk_w = sum(len(b.split()) for b in p[“blocks”]) sum_w = sum(len(s.split()) for s in p[“summaries”]) rows.append(dict(domain=ex[“domain”], source=ex[“source”], n_blocks=len(p[“blocks”]), block_chars=blk_c, summ_chars=sum_c, block_words=blk_w, summ_words=sum_w, compress_char=sum_c / max(blk_c, 1), compress_word=sum_w / max(blk_w, 1))) if (i + 1) % 100 == 0: print(f” processed {i+1}/{N_SAMPLES}”) df = pd.DataFrame(rows) print(f”nAnalyzed {len(df)} rows. Domain counts:”) print(df[“domain”].value_counts().to_string()) per_dom = df.groupby(“domain”).agg( n=(“domain”, “count”), median_blocks=(“n_blocks”, “median”), median_block_words=(“block_words”, “median”), median_summ_words=(“summ_words”, “median”), median_char_ratio=(“compress_char”, “median”), median_word_ratio=(“compress_word”, “median”), ).round(3) print(“nPer-domain medians (ratio = mementos / blocks):”) print(per_dom.to_string()) We define the regex-based parser that extracts reasoning blocks, memento summaries, the main thinking section, and the final answer from each response. We test the parser on the first streamed example and confirm that the block-summary structure is being captured correctly. We then run a streaming analysis over multiple samples to compute block counts, word counts, character counts, and compression ratios, which helps us study how the dataset behaves across examples and domains. Copy CodeCopiedUse a different Browser def compress_trace(response: str, keep_last_k: int = 1) -> str: blocks, summaries = BLOCK_RE.findall(response), SUMMARY_RE.findall(response) if not blocks or len(blocks) != len(summaries): return response out, n = [“<think>”], len(blocks) for i, (b, s) in enumerate(zip(blocks, summaries)): if i >= n – keep_last_k: out.append(f”<|block_start|>{b}<|block_end|>”) out.append(f”<|summary_start|>{s}<|summary_end|>”) else: out.append(f”<|summary_start|>{s}<|summary_end|>”) out.append(“</think>”) out.append(response.split(“</think>”)[-1]) return “n”.join(out) orig, comp = first_row[“response”], compress_trace(first_row[“response”], 1) print(f”nOriginal : {len(orig):>8,} chars”) print(f”Compressed : {len(comp):>8,} chars ({len(comp)/len(orig)*100:.1f}% of original)”) from transformers import AutoTokenizer tok = AutoTokenizer.from_pretrained(“gpt2”) MEM_TOKENS = [“<|block_start|>”, “<|block_end|>”, “<|summary_start|>”, “<|summary_end|>”, “<think>”, “</think>”] tok.add_special_tokens({“additional_special_tokens”: MEM_TOKENS}) def tlen(s): return len(tok(s, add_special_tokens=False).input_ids) blk_tok = sum(tlen(b) for b in parsed[“blocks”]) sum_tok = sum(tlen(s) for s in parsed[“summaries”]) print(f”nTrace-level token compression for this example:”) print(f” block tokens = {blk_tok}”) print(f” memento tokens = {sum_tok}”) print(f” compression = {blk_tok / max(sum_tok,1):.2f}× (paper reports ~6×)”) def to_chat(ex): return {“messages”: [ {“role”: “user”, “content”: ex[“problem”]}, {“role”: “assistant”, “content”: ex[“response”]}, ]} chat_stream = load_dataset(DATASET, split=”train”, streaming=True).map(to_chat) chat_ex = next(iter(chat_stream)) print(“nSFT chat example (truncated):”) for m in chat_ex[“messages”]: print(f” [{m[‘role’]:9s}] {m[‘content’][:130].replace(chr(10),’ ‘)}…”) We visualize the dataset’s structural patterns by plotting block counts, compression ratios, and the relationship between block size and memento size. We compare these distributions across domains to see how reasoning organization differs between math, code, and science examples. We also stream one example from the full subset and inspect its additional sentence-level and block-alignment fields, which helps us understand the richer internal annotation pipeline behind the dataset. Copy CodeCopiedUse a different Browser def compress_trace(response: str, keep_last_k: int = 1) -> str: blocks, summaries = BLOCK_RE.findall(response), SUMMARY_RE.findall(response) if not blocks or len(blocks) != len(summaries): return response out, n = [“<think>”], len(blocks) for i, (b, s) in enumerate(zip(blocks, summaries)): if i >= n – keep_last_k: out.append(f”<|block_start|>{b}<|block_end|>”) out.append(f”<|summary_start|>{s}<|summary_end|>”) else: out.append(f”<|summary_start|>{s}<|summary_end|>”) out.append(“</think>”) out.append(response.split(“</think>”)[-1]) return “n”.join(out) orig, comp = first_row[“response”], compress_trace(first_row[“response”], 1) print(f”nOriginal : {len(orig):>8,} chars”) print(f”Compressed : {len(comp):>8,} chars ({len(comp)/len(orig)*100:.1f}% of original)”) from transformers import AutoTokenizer tok = AutoTokenizer.from_pretrained(“gpt2”) MEM_TOKENS = [“<|block_start|>”, “<|block_end|>”, “<|summary_start|>”, “<|summary_end|>”, “<think>”, “</think>”] tok.add_special_tokens({“additional_special_tokens”: MEM_TOKENS}) def tlen(s): return len(tok(s, add_special_tokens=False).input_ids) blk_tok = sum(tlen(b) for b in parsed[“blocks”]) sum_tok = sum(tlen(s) for s in parsed[“summaries”]) print(f”nTrace-level token compression for this example:”) print(f” block tokens = {blk_tok}”) print(f” memento tokens = {sum_tok}”) print(f” compression = {blk_tok / max(sum_tok,1):.2f}× (paper reports ~6×)”) def to_chat(ex): return {“messages”: [ {“role”: “user”, “content”: ex[“problem”]}, {“role”: “assistant”, “content”: ex[“response”]}, ]} chat_stream = load_dataset(DATASET, split=”train”, streaming=True).map(to_chat) chat_ex = next(iter(chat_stream)) print(“nSFT chat example (truncated):”) for m in chat_ex[“messages”]: print(f” [{m[‘role’]:9s}] {m[‘content’][:130].replace(chr(10),’ ‘)}…”) We simulate inference-time compression by rewriting a reasoning trace so that older blocks are replaced by their mementos while the latest blocks remain intact. We then compare the original and compressed trace lengths to see how much context can be reduced in practice. After that, we integrate a tokenizer, add special memento tokens, measure token-level compression, and convert the dataset to an SFT-style chat format suitable for training workflows. Copy CodeCopiedUse a different Browser def render_trace(response: str, width: int = 220) -> None: p = parse_memento(response) print(“=” * 72) print(f”{len(p[‘blocks’])} blocks ·

A Coding Implementation on Microsoft’s OpenMementos with Trace Structure Analysis, Context Compression, and Fine-Tuning Data Preparation Read Post »

AI, Committee, 新闻, Uncategorized

Three reasons why DeepSeek’s new model matters

On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek’s previous models, V4 is open source, meaning it is available for anyone to download, use, and modify. V4 marks DeepSeek’s most significant release since R1, the reasoning model it launched in January 2025. R1, which was trained on limited computing resources, stunned the global AI industry with its strong performance and efficiency, turning DeepSeek from a little-known research team into China’s best-known AI company almost overnight. It also helped set off a wave of open-weight model releases from other Chinese AI firms.  DeepSeek has kept a relatively low profile since then—but earlier this month, it effectively teased V4’s release when it added “expert” and “flash” modes to the online version of its model, prompting speculation that the updates were tied to a bigger upcoming release. While the company has become a powerful symbol of China’s AI ambitions, its big return to cutting-edge frontier models comes after months of scrutiny—including major personnel departures, delays to previous model launches, and growing scrutiny from both the US and Chinese governments.  So, will V4 shake the AI field the way R1 did? Almost certainly not, but here are three big reasons why this release matters. 1. It breaks new ground for an open-source model. As with R1 before it, DeepSeek claims that V4’s performance rivals the best models available at a fraction of the price. This is great news for developers and for companies using the tech, because it means they can access frontier AI capabilities on their own terms, and without worrying about skyrocketing costs. The new model comes in two versions, both of which are available on DeepSeek’s website and in its app, with API access also open to developers. V4-Pro is a larger model built for coding and complex agent tasks, and V4-Flash is a smaller version designed to be faster and cheaper to run. Both versions offer reasoning modes, in which the model can carefully parse a user’s prompt and show each step as it works through the problem. For V4-Pro, DeepSeek charges $1.74 per million input tokens and $3.48 per million output tokens, a fraction of the cost of comparable models from OpenAI and Anthropic. V4-Flash is even cheaper, at about $0.14 per million input tokens and about $0.28 per million output tokens, making it one of the cheapest top-tier models available. This would make it a very appealing model to build applications on. In terms of performance, V4 is, perhaps unsurprisingly, a huge jump from R1—and it seems to be a strong alternative to just about all the latest big AI models. On the major benchmarks, according to results shared by the company, DeepSeek V4-Pro competes with leading closed-source models, matching the performance of Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. And compared to other open-source models, such as Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released.  DeepSeek also says that V4-Pro now ranks among the strongest open-source models on benchmarks for agentic coding tasks and performs well on other tests that measure ability to carry out multistep problems. Its writing ability and world knowledge also lead the field, according to benchmarking results shared by the company.  In a technical report released alongside the model, DeepSeek shared results from an internal survey of 85 experienced developers: More than 90% included V4-Pro among their top model choices for coding tasks. DeepSeek says it has specifically optimized V4 for popular agent frameworks such as Claude Code, OpenClaw, and CodeBuddy. 2. It delivers on a new approach to memory efficiency. One of the key innovations of V4 is its long context window—the amount of text the model can process at once. Both versions can handle 1 million tokens, which is large enough to fit all three volumes of The Lord of the Rings and The Hobbit combined. The company says this context window size is now the default across all DeepSeek services and it matches what is offered by cutting-edge versions of models like Gemini and Claude.  But it’s important to know not just that DeepSeek has made this leap, but how it did so. V4 makes significant architectural changes to the company’s former models—especially in the attention mechanism, which is the feature of AI models that helps them understand each part of a prompt in relation to the rest. As the prompt text gets longer, these comparisons become much more costly, making attention one of the main bottlenecks for long-context models. DeepSeek’s innovation was to make the model more selective about what it pays attention to. Instead of treating all earlier text as equally important, V4 compresses older information and focuses on the parts most likely to matter in the present moment, while still keeping nearby text in full so it does not miss important details.  DeepSeek says this sharply reduces the cost of using long context. In a 1-million-token context, V4-Pro uses only 27% of the computing power required by its previous model, V3.2, while cutting memory use to 10%. The reduction in V4-Flash is even larger, using just 10% of the computing power and 7% of the memory. In practice, this could make it cheaper to build tools that need to work across huge amounts of material, such as an AI coding assistant that can read an entire codebase or a research agent that can analyze a long archive of documents without constantly forgetting what came before. DeepSeek’s interest in long context windows didn’t start with V4. Over the past year and a half, the company has quietly published a series of papers on how AI models “remember” information, experimenting with compression and mathematical techniques to extend what AI models could realistically handle. 3.

Three reasons why DeepSeek’s new model matters Read Post »

AI, Committee, 新闻, Uncategorized

Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness

There is a quiet failure mode that lives at the center of every AI-assisted coding workflow. You ask Claude Code, Cursor, or Windsurf to modify a function. The agent does it confidently, cleanly, and incorrectly — because it had no idea that 47 other functions depended on the return type it just changed. Breaking changes ship. The test suite screams. And you spend the next two hours untangling what the model should have known before it touched a single line. An Indian Computer Science student built GitNexus to fix that. The open-source project, now sitting at 28,000+ stars and 3,000+ forks on GitHub with 45 contributors, describes itself as ‘the nervous system for agent context.’ That description undersells what it actually does. What Actually is GitNexus? GitNexus is a code intelligence layer, not a documentation tool. It indexes an entire repository into a structured knowledge graph — mapping every function call, import, class inheritance, interface implementation, and execution flow — and then exposes that graph to AI agents through a Model Context Protocol (MCP) server. The agents stop guessing. They query. To understand why this is significant, you need to understand what AI coding agents currently operate on. Most tools like Cursor, Claude Code, and Windsurf rely on either file-based context windows (they read the files nearby and hope for the best) or traditional Graph RAG approaches (they query a graph with a series of prompts, hoping to discover what matters). Neither approach gives an agent a structural map of the repository before it acts. GitNexus pre-computes the entire dependency structure at index time. When an agent asks ‘what depends on this function?’, it gets a complete, confidence-scored answer in one query, instead of chaining 10 successive queries that each risk missing something. The Indexing Pipeline Running npx gitnexus analyze from the root of a repository kicks off a multi-phase indexing pipeline that does the following: First, it walks the file tree and maps folder and file relationships (the Structure phase). Then it parses every function, class, method, and interface using Tree-sitter ASTs (Abstract Syntax Trees). Tree-sitter is a high-performance, incremental parser originally developed at GitHub that produces concrete syntax trees for any supported language. GitNexus uses it to extract symbols with precision that regex or simple text search cannot match. After parsing, GitNexus performs cross-file resolution: it resolves imports, function calls, class heritage, constructor inference, and self/this receiver types across the whole codebase. This is the step where it learns that UserController in src/controllers/user.ts calls into UserService, which authRouter imports, which handleLogin depends on. Next comes clustering — GitNexus groups related symbols into functional communities using Leiden community detection on the call graph, assigning each cluster a cohesion score. Then it traces execution flows from entry points through full call chains to build what it calls ‘processes.’ Finally it indexes everything for hybrid search using BM25 (a keyword ranking algorithm), semantic vector embeddings, and RRF (Reciprocal Rank Fusion) to merge results. The graph is stored in LadybugDB, an embedded graph database with native vector support formerly known as KuzuDB. This entire pipeline runs locally — no code leaves your machine. A particularly useful flag for teams: gitnexus analyze –skills takes the Leiden community detection one step further. Instead of only grouping symbols internally, it generates a custom SKILL.md file for each detected functional area of your codebase under .claude/skills/generated/. Each skill file describes that module’s key files, entry points, execution flows, and cross-area connections — so an AI agent working in the authentication module gets targeted architectural context for that specific area, not a generic overview of the entire repo. Skills are regenerated on each –skills run to stay current. https://github.com/abhigyanpatwari/GitNexus Seven Tools and Two Prompts Your Agent Gets Once indexed, GitNexus registers an MCP server that exposes seven tools and two guided prompts to your AI agent. impact runs blast radius analysis. Given a target symbol, it returns every upstream caller grouped by depth with confidence scores — handleLogin [CALLS 90%], UserController [CALLS 85%] — so the agent knows what it risks breaking before it touches anything. context gives a 360-degree view of any symbol: its callers, its callees, every process it participates in, and which step of each process it occupies. query runs process-grouped hybrid search across the codebase, returning matching symbols alongside the execution flows they belong to. detect_changes performs git-diff impact analysis — it maps changed lines to affected processes and assigns a risk level before you commit. rename executes coordinated multi-file symbol renames using the graph for high-confidence edits and text search for the rest, with a dry-run mode to preview changes before applying them. cypher exposes raw Cypher graph queries for engineers who want to write custom traversals against the knowledge graph directly. list_repos handles the multi-repo case — GitNexus uses a global registry at ~/.gitnexus/ so one MCP server can serve multiple indexed repositories simultaneously. Beyond the tools, GitNexus also exposes two MCP prompts for guided workflows. detect_impact runs a pre-commit change analysis that surfaces scope, affected processes, and an overall risk level — think of it as a structured checklist before any significant edit. generate_map produces architecture documentation directly from the knowledge graph, complete with Mermaid diagrams, making it useful for onboarding engineers or documenting a codebase that has grown faster than its docs. Editor Support and Deepest Integration with Claude Code GitNexus supports Claude Code, Cursor, Codex, OpenCode, and Windsurf. Editor support varies by tier. Windsurf gets MCP only. Cursor, Codex, and OpenCode get MCP plus agent skills. Claude Code gets the full stack: MCP tools, agent skills (Exploring, Debugging, Impact Analysis, Refactoring), PreToolUse hooks that enrich every search with graph context before Claude acts, and PostToolUse hooks that auto-reindex after commits. For Claude Code users, GitNexus installs itself completely — hooks, skills, and an AGENTS.md / CLAUDE.md context file — in a single npx gitnexus analyze command. The Model Democratization Angle One of the less obvious implications of this architecture is what it does for smaller models. Because

Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness Read Post »

AI, Committee, 新闻, Uncategorized

Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation

For years, the computer vision community has operated on two separate tracks: generative models (which produce images) and discriminative models (which understand them). The assumption was straightforward — models good at making pictures aren’t necessarily good at reading them. A new paper from Google, titled “Image Generators are Generalist Vision Learners” (arXiv:2604.20329), published April 22, 2026, blows that assumption apart. A team of Google DeepMind researchers introduced Vision Banana, a single unified model that surpasses or matches state-of-the-art specialist systems across a wide range of visual understanding tasks — including semantic segmentation, instance segmentation, monocular metric depth estimation, and surface normal estimation — while simultaneously retaining the original image generation capabilities of its base model. https://arxiv.org/pdf/2604.20329 The LLM Analogy That Changes Everything If you’ve worked with large language models, you already understand the two-phase playbook: first, pretrain a base model on massive text data using a generative objective, then apply instruction-tuning to align it for downstream tasks. The pretraining phase is where the model develops a rich internal representation of language that can be repurposed for almost anything. The Google team’s core claim is that image generation training plays the exact same foundational role for vision. Their base model, Nano Banana Pro (NBP), is Google’s state-of-the-art image generator. By performing a lightweight instruction-tuning pass — mixing a small proportion of computer vision task data at a very low ratio into NBP’s original training mixture — they created Vision Banana. The key insight: generating photorealistic images implicitly requires a model to understand geometry, semantics, depth, and object relationships. Vision Banana learns to express that latent knowledge in measurable, decodable formats. Critically, no training data from any of the evaluation benchmarks is included in the instruction-tuning mixture — ensuring that all results reflect true generalist capability rather than in-domain memorization. How It Works: Perception as Image Generation Rather than adding specialized decoder heads or regression modules for each task, all vision task outputs are parameterized as RGB images. The model is instruction-tuned to produce visualizations that follow precise, invertible color schemes — meaning the generated images can be decoded back into quantitative outputs for benchmark evaluation. The research team identified three key advantages of this strategy. First, it supports a wide variety of tasks with a single unified model — after instruction-tuning, only the prompt changes, not the weights. Second, it requires relatively little new training data, since instruction-tuning is solely teaching the model how to format computer vision outputs as RGB. Third, it helps the model retain its original image generation capabilities, since the outputs are simply new RGB images. For semantic segmentation, the model is prompted with instructions such as: “Generate a segmentation visualization of this image, using the color mapping: {‘cat’: ‘red’, ‘background’: ‘yellow’}.” Each pixel is colored by its predicted class, and because color assignments are specified in the prompt, no fixed label vocabulary is needed. For instance segmentation, since the number of instances is unknown in advance, Vision Banana uses a per-class inference strategy — running a separate pass per class and dynamically assigning unique colors to each instance. Masks are recovered by clustering pixels with similar colors using a threshold. Metric depth estimation uses a bijective mapping between unbounded metric depth values in [0, ∞) and bounded RGB values in [0, 1]³. A power transform (shape parameter λ = −3, scale parameter c = 10/3) first “curves” metric depth values, which are then encoded as a false-color visualization that traverses the edges of the RGB cube, following the structure of a 3D Hilbert curve. This transform is strictly invertible, so the generated depth image decodes cleanly back to physical metric distances. Crucially, no camera parameters — neither intrinsics nor extrinsics — are required at training or inference time. The model infers absolute scale purely from visual cues and world knowledge embedded during pretraining. The depth training data is also entirely synthetic, generated from simulation rendering engines, with zero real-world depth data used. For surface normal estimation, the mapping is more direct: surface normals are unit vectors (x, y, z) ranging from −1.0 to 1.0, which map naturally to RGB channels. Facing-left normals encode as pinkish-red; facing-up normals encode as light green; normals pointing toward the camera encode as light blue/purple. The Numbers: Beating Specialists at Their Own Game Vision Banana’s results across benchmarks — all in zero-shot transfer settings, where the model has never seen any training data from the evaluated datasets — are significant: Semantic segmentation on Cityscapes val: mIoU of 0.699, compared to SAM 3’s 0.652 — a 4.7-point gain. Referring expression segmentation on RefCOCOg UMD val: cIoU of 0.738, edging out SAM 3 Agent’s 0.734. Reasoning segmentation on ReasonSeg val: gIoU of 0.793, beating SAM 3 Agent’s 0.770 — and notably surpassing even non-zero-shot methods trained on in-domain data, including X-SAM. Instance segmentation on SA-Co/Gold: pmF1 of 0.540, on par with DINO-X (0.552), and ahead of Gemini 2.5 (0.461), APE-D (0.369), and OWLv2 (0.420) under zero-shot transfer. Metric depth estimation: average δ1 of 0.882 across six major benchmarks; on the four datasets where Depth Anything V3 was evaluated (NYU, ETH3D, DIODE-Indoor, KITTI), Vision Banana scores 0.929 versus Depth Anything V3’s 0.918 — while using zero real-world training data and no camera parameters. Surface normal estimation: average mean angle error of 18.928° across four datasets, compared to Lotus-2’s 19.642°. On indoor datasets specifically, Vision Banana achieves the lowest mean angle error (15.549°) and lowest median angle error (9.300°) among all compared methods. On generative benchmarks, Vision Banana holds its own against its base model: it achieves a 53.5% win rate against Nano Banana Pro on GenAI-Bench (text-to-image), and a 47.8% win rate on ImgEdit (image editing), where Nano Banana Pro scores 52.2%. Overall, the results confirm that lightweight instruction-tuning does not degrade the model’s generative capabilities. Key Takeaways Image generation pretraining is a generalist vision learner: Just as LLM pretraining unlocks emergent language understanding, Google’s research shows that training on image generation naturally develops powerful internal visual representations that transfer to perception tasks

Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation Read Post »

AI, Committee, 新闻, Uncategorized

Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

arXiv:2604.21469v1 Announce Type: new Abstract: Automating the detection of regulatory compliance remains a challenging task due to the complexity and variability of legal texts. Models trained on one regulation often fail to generalise to others. This limitation underscores the need for principled methods to improve cross-domain transfer. We study data selection as a strategy to mitigate negative transfer in compliance detection framed as a natural language inference (NLI) task. Specifically, we evaluate four approaches for selecting augmentation data from a larger source domain: random sampling, Moore-Lewis’s cross-entropy difference, importance weighting, and embedding-based retrieval. We systematically vary the proportion of selected data to analyse its effect on cross-domain adaptation. Our findings demonstrate that targeted data selection substantially reduces negative transfer, offering a practical path toward scalable and reliable compliance automation across heterogeneous regulations.

Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection Read Post »

AI, Committee, 新闻, Uncategorized

Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness

arXiv:2604.21235v1 Announce Type: cross Abstract: Multimodal clinical records contain structured measurements and clinical notes recorded over time, offering rich temporal information about the evolution of patient health. Yet these observations are sparse, and whether they are recorded depends on the patient’s latent condition. Observation patterns also differ across modalities, as structured measurements and clinical notes arise under distinct recording processes. While prior work has developed methods that accommodate missingness in clinical time series, how to extract and use the information carried by the observation process itself remains underexplored. We therefore propose a patient representation learning framework for multimodal clinical time series that explicitly leverages informative missingness. The framework combines (1) a multimodal encoder that captures signals from structured and textual data together with their observation patterns, (2) a Bayesian filtering module that updates a latent patient state over time from observed multimodal signals, and (3) downstream modules for offline treatment policy learning and patient outcome prediction based on the learned patient state. We evaluate the framework on ICU sepsis cohorts from MIMIC-III, MIMIC-IV, and eICU. It improves both offline treatment policy learning and adverse outcome prediction, achieving FQE 0.679 versus 0.528 for clinician behavior and AUROC 0.886 for post-72-hour mortality prediction on MIMIC-III.

Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness Read Post »

AI, Committee, 新闻, Uncategorized

Health-care AI is here. We don’t know if it actually helps patients.

I don’t need to tell you that AI is everywhere. Or that it is being used, increasingly, in hospitals. Doctors are using AI to help them with notetaking. AI-based tools are trawling through patient records, flagging people who may require certain support or treatments. They are also used to interpret medical exam results and X-rays. A growing number of studies suggest that many of these tools can deliver accurate results. But there’s a bigger question here: Does using them actually translate into better health outcomes for patients? We don’t yet have a good answer. That’s what Jenna Wiens, a computer scientist at the University of Michigan, and Anna Goldenberg of the University of Toronto, argue in a paper published in the journal Nature Medicine this week. Wiens tells me she has spent years investigating how AI might benefit health care. For the first decade of her career she tried to pitch the technology to clinicians. Over the last few years, she says, it’s as though “a switch flipped.” Health-care providers not only appear much more interested in the promise of these technologies, they have also begun rapidly deploying them. The problem is that many providers aren’t rigorously assessing how well they actually work. Take “ambient AI” tools, for example. Also known as AI scribes, they “listen” to conversations between doctors and patients, then transcribe and summarize them. Multiple tools are available, and they are already being widely adopted by health-care providers. A few months ago, a staffer at a major New York medical center who develops AI tools for doctors told me that, anecdotally, medics are “overjoyed” by the technology—it allows them to focus all their attention on their patients during appointments, and it saves them from a lot of time-consuming paperwork. Early studies support these anecdotes and suggest that the tools can reduce clinician burnout. That’s all well and good. But what about patient health outcomes? “[Researchers] have evaluated provider or clinician and patient satisfaction, but not really how these tools are affecting clinical decision-making,” says Wiens. “We just don’t know.” The same holds true for other AI-based technologies used in health-care settings. Some are used to predict patients’ health trajectories, others to recommend treatments. They are designed to make health care more effective and efficient. But even a tool that is “accurate” won’t necessarily improve health outcomes. AI might speed up the interpretation of a chest X-ray, for example. But how much will a doctor rely on its analysis? How will that tool affect the way a doctor interacts with patients or recommends treatment? And ultimately: What will this mean for those patients? The answers to those questions might vary between hospitals or departments and could depend on clinical workflows, says Wiens. They might also differ between doctors at various stages of their careers. Take the AI scribes, as another example. Some research on AI use in education suggests that such tools can impact the way people cognitively process information. Could they affect the way a doctor processes a patient’s information? Will the tools affect the way medical students think about patient data in a way that impacts care? These questions need to be explored, says Wiens. “We like things that save us time, but we have to think about the unintended consequences of this,” she says. In a study published in January 2025, Paige Nong at the University of Minnesota and her colleagues found that around 65% of US hospitals used AI-assisted predictive tools. Only two-thirds of those hospitals evaluated their accuracy. Even fewer assessed them for bias. The number of hospitals using these tools has probably increased since then, says Wiens. Those hospitals, or entities other than the companies developing the tools, need to evaluate how much they help in specific settings. There’s a possibility that they could leave patients worse off, although it’s more likely that AI tools just aren’t as beneficial as health-care providers might assume they are, says Wiens. “I do believe in the potential of AI to really improve clinical care,” says Wiens, who stresses that she doesn’t want to stop the adoption of AI tools in health care. She just wants more information about how they are affecting people. “I have to believe that in the future it’s not all AI or no AI,” she says. “It’s somewhere in between.” This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

Health-care AI is here. We don’t know if it actually helps patients. Read Post »

AI, Committee, 新闻, Uncategorized

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

Training frontier AI models is, at its core, a coordination problem. Thousands of chips must communicate with each other continuously, synchronizing every gradient update across the network. When one chip fails or even slows down, the entire training run can stall. As models scale toward hundreds of billions of parameters, that fragility becomes increasingly untenable. Google DeepMind is now proposing a different model entirely. Google DeepMind researchers introduced Decoupled DiLoCo (Distributed Low-Communication), a distributed training architecture that decouples compute into asynchronous, fault-isolated ‘islands,’ enabling large language model pre-training across geographically distant data centers without requiring the tight synchronization that makes conventional approaches brittle at scale. The Problem with Traditional Distributed Training To understand why Decoupled DiLoCo is important, it helps to understand how distributed training typically works. Standard Data-Parallel training replicates a model across many accelerators (GPUs or TPUs), each processing a different mini-batch of data. After each forward and backward pass, gradients must be averaged across every device — a process called AllReduce — before the next training step can begin. This blocking synchronization step means every device must wait for the slowest one. Across thousands of chips spanning multiple data centers, that bottleneck is not just inconvenient; it makes global-scale training effectively impractical. Bandwidth is another hard constraint. Conventional Data-Parallel training requires approximately 198 Gbps of inter-datacenter bandwidth across eight data centers — far beyond what standard wide-area networking (WAN) can support between geographically distributed facilities. How Decoupled DiLoCo Works Decoupled DiLoCo builds on two prior systems from Google. The first is Pathways, which introduced a distributed AI system based on asynchronous data flow, allowing different compute resources to work at their own pace without blocking on one another. The second is DiLoCo, which dramatically reduced the inter-datacenter bandwidth required for distributed training by having each worker perform many local gradient steps before communicating with peers — dramatically reducing how much data needs to flow between data centers. Decoupled DiLoCo brings both ideas together. Built on top of Pathways, training is divided across separate clusters of accelerators called learner units — the ‘islands’ of compute. Each learner unit trains semi-independently, performing many local steps, before sharing a compressed gradient signal with an outer optimizer that aggregates updates across all learner units. Because this outer synchronization step is asynchronous, a chip failure or slow learner unit in one island does not block the others from continuing to train. The bandwidth savings are dramatic. Decoupled DiLoCo reduces required inter-datacenter bandwidth from 198 Gbps to just 0.84 Gbps across eight data centers — multiple orders of magnitude lower — making it compatible with standard internet-scale connectivity between datacenter facilities rather than requiring custom high-speed network infrastructure. Self-Healing Through Chaos Engineering One of the most technically significant properties of Decoupled DiLoCo is its fault tolerance. The research team used chaos engineering, a method that deliberately introduces artificial hardware failures into a running system to test its robustness during training runs. The system continued training after the loss of entire learner units, and then seamlessly reintegrated those units when they came back online. This behavior is what the research team describes as ‘self-healing’. In simulations involving 1.2 million chips under high failure rates, Decoupled DiLoCo maintained a goodput (the fraction of time the system is performing useful training) of 88%, compared to just 27% for standard Data-Parallel methods. Goodput is the practical metric that matters here: a training run with high nominal compute but low goodput wastes significant resources. https://deepmind.google/blog/decoupled-diloco/? Critically, these resilience gains come with minimal degradation in model quality. In real-world experiments using Gemma 4 models, Decoupled DiLoCo achieved an average ML benchmark accuracy of 64.1%, compared to 64.4% for the conventional baseline — a difference well within the noise of typical evaluation variance. Training a 12B Model Across Four U.S. Regions The research team validated Decoupled DiLoCo at production scale by successfully training a 12 billion parameter model across four separate U.S. regions using just 2–5 Gbps of wide-area networking, a bandwidth level achievable with existing commercial internet infrastructure between data center facilities. The system accomplished this more than 20 times faster than conventional synchronization methods. The key reason: rather than forcing compute to pause and wait for communication to complete, Decoupled DiLoCo incorporates required communication into longer periods of computation, eliminating the “blocking” bottlenecks that make conventional distributed training slow at global scale. Mixing Hardware Generations An underappreciated implication of the architecture is its support for heterogeneous hardware. Because learner units operate asynchronously, they do not need to run on identical hardware at the same clock speed. The research team demonstrated training runs that mixed TPU v6e and TPU v5p chips — different hardware generations with different performance characteristics — in a single training job, without degrading ML performance relative to homogeneous runs. This has two practical consequences worth noting. First, it extends the useful life of existing hardware, allowing older accelerators to continue contributing meaningfully to large-scale training. Second, because new hardware generations do not arrive everywhere at once, being able to train across generations can alleviate the recurring logistical and capacity bottlenecks that arise during hardware transition periods — a real operational challenge at organizations running large training infrastructure. Key Takeaways Decoupled DiLoCo eliminates the single-point-of-failure problem in large-scale AI training by dividing training across asynchronous, fault-isolated “islands” of compute called learner units — so a chip or cluster failure in one island does not stall the rest of the training run. The architecture reduces inter-datacenter bandwidth requirements by orders of magnitude — from 198 Gbps down to 0.84 Gbps across eight data centers — making globally distributed pre-training feasible over standard wide-area networking rather than requiring custom high-speed infrastructure. Decoupled DiLoCo is self-healing: using chaos engineering to simulate real hardware failures, the system maintained 88% goodput compared to just 27% for standard Data-Parallel training under high failure rates, and seamlessly reintegrated offline learner units when they came back online. The approach was validated at production scale, successfully training a 12 billion parameter model across four

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates Read Post »

AI, Committee, 新闻, Uncategorized

The Download: supercharged scams and studying AI healthcare

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. We’re in a new era of AI-driven scams When ChatGPT was released in late 2022, it showed how easily generative AI could create human-like text. This quickly caught the eye of cybercriminals, who began using LLMs to compose malicious emails. Since then, they’ve adopted AI for everything from turbocharged phishing and hyperrealistic deepfakes to automated vulnerability scans. Many organizations are now struggling to cope with the sheer volume of cyberattacks. AI is making them faster, cheaper, and easier to carry out, a problem set to worsen as more cybercriminals adopt these tools—and their capabilities improve. Read the full story on how AI is reshaping cybercrime. —Rhiannon Williams “Supercharged scams” is one of the 10 Things That Matter in AI Right Now, our essential guide to what’s really worth your attention in the field. Subscribers can watch an exclusive roundtable unveiling the technologies and trends on the list, with analysis from MIT Technology Review’s AI reporter Grace Huckins and executive editors Amy Nordrum and Niall Firth. Healthcare AI is here. We don’t know if it actually helps patients. Doctors are using AI to help them with notetaking. AI-based tools are trawling through patient records, flagging people who may require certain support or treatments. They are also used to interpret medical exam results and X-rays. A growing number of studies suggest that many of these tools can deliver accurate results. But there’s a bigger question here: Does using them actually translate into better health outcomes for patients? We don’t yet have a good answer—here’s why. —Jessica Hamzelou The story is from The Checkup, our weekly newsletter that gives you the latest from the worlds of health and biotech. Sign up to receive it in your inbox every Thursday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 DeepSeek has unveiled its long-awaited new AI modelThe Chinese company has just launched preview versions of DeepSeek-V4. (CNN)+It says V4 is the most powerful open-source platform. (Bloomberg $) + And rivals top closed-source models from OpenAI and DeepMind. (SCMP)+ The model is adapted for Huawei chip technology. (Reuters $) 2 More countries are curbing children’s social media accessNorway is set to enforce the latest ban. (Reuters $)+ The Philippines could follow soon. (Bloomberg $)+ Americans are pushing to get AI out of schools. (The New Yorker) 3 The US has accused China of mass AI theft as tensions riseA White House memo claims Chinese firms are exploiting American models. (BBC)+ Beijing calls the accusations “slander.” (Ars Technica) 4 OpenAI set itself apart from Anthropic by widely releasing its new modelIt’s releasing GPT-5.5 to all ChatGPT users, despite cybersecurity concerns. (NYT $)+ OpenAI says the new model is better at coding and more efficient. (The Verge) 5 Meta is cutting 10% of jobs to offset AI spendingRoughly 8,000 layoffs are set to be announced on May 20. (QZ)+ Anti-AI protests are growing. (MIT Technology Review) 6 Palantir is facing a backlash from employeesThanks to its work with ICE and the Trump administration. (Wired $)+ Surveillance tech is reshaping the fight for privacy. (MIT Technology Review) 7 The era of free access to advanced AI is coming to an endAI labs are under mounting pressure to start turning profits. (The Verge) 8 Elon Musk’s feud with Sam Altman is heading to court The case has already revealed several unflattering secrets. (WP $) 9 A new movement is encouraging people to ditch their smartphones for a month“Month Offline” is like a Dry January for smartphones. (The Atlantic) 10 Spotify has revealed its most-streamed music of the last 20 yearsFeaturing Taylor Swift, Bad Bunny, and The Weeknd. (Gizmodo)  Quote of the day “We want a childhood where children get to be children. Play, friendships, and everyday life must not be taken over by algorithms and screens.”  —Norwegian Prime Minister Jonas Gahr Store announces age restrictions for social media. One More Thing NASA/JPL-CALTECH VIA WIKIMEDIA COMMONS; CRAFT NASA/JPL-CALTECH/SWRI/MSSS; IMAGE PROCESSING: KEVIN M. GILL The search for extraterrestrial life is targeting Jupiter’s icy moon Europa As astronomers have discovered more about Europa over the past few decades, Jupiter’s fourth-largest moon has excited planetary scientists interested in the geophysics of alien worlds.  All that water and energy—and hints of elements essential for building organic molecules —point to an extraordinary possibility. In the depths of its ocean, or perhaps crowded in subsurface lakes or below icy surface vents, Jupiter’s big, bright moon could host life.  To find further evidence, NASA is now searching for signs of alien existence on Europa. Read the full story on the mission. —Stephen Ornes We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Here’s a fun look at the secret collaborations of pop history.+ Meet the mannequins showing how the “ideal” body has evolved.+ A photographer has cataloged all 12,795 objects in her home into an archive of a life.+ Slime molds are unexpectedly beautiful when viewed through these high-detail macro shots.

The Download: supercharged scams and studying AI healthcare Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at 隱私權政策 and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
zh_CN