YouZum

Committee

AI, Committee, Noticias, Uncategorized

Building AI agents is 5% AI and 100% software engineering

Production-grade agents live or die on data plumbing, controls, and observability—not on model choice. The doc-to-chat pipeline below maps the concrete layers and why they matter. What is a “doc-to-chat” pipeline? A doc-to-chat pipeline ingests enterprise documents, standardizes them, enforces governance, indexes embeddings alongside relational features, and serves retrieval + generation behind authenticated APIs with human-in-the-loop (HITL) checkpoints. It’s the reference architecture for agentic Q&A, copilots, and workflow automation where answers must respect permissions and be audit-ready. Production implementations are variations of RAG (retrieval-augmented generation) hardened with LLM guardrails, governance, and OpenTelemetry-backed tracing. How do you integrate cleanly with the existing stack? Use standard service boundaries (REST/JSON, gRPC) over a storage layer your org already trusts. For tables, Iceberg gives ACID, schema evolution, partition evolution, and snapshots—critical for reproducible retrieval and backfills. For vectors, use a system that coexists with SQL filters: pgvector collocates embeddings with business keys and ACL tags in PostgreSQL; dedicated engines like Milvus handle high-QPS ANN with disaggregated storage/compute. In practice, many teams run both: SQL+pgvector for transactional joins and Milvus for heavy retrieval. Key properties Iceberg tables: ACID, hidden partitioning, snapshot isolation; vendor support across warehouses. pgvector: SQL + vector similarity in one query plan for precise joins and policy enforcement. Milvus: layered, horizontally scalable architecture for large-scale similarity search. How do agents, humans, and workflows coordinate on one “knowledge fabric”? Production agents require explicit coordination points where humans approve, correct, or escalate. AWS A2I provides managed HITL loops (private workforces, flow definitions) and is a concrete blueprint for gating low-confidence outputs. Frameworks like LangGraph model these human checkpoints inside agent graphs so approvals are first-class steps in the DAG, not ad hoc callbacks. Use them to gate actions like publishing summaries, filing tickets, or committing code. Pattern: LLM → confidence/guardrail checks → HITL gate → side-effects. Persist every artifact (prompt, retrieval set, decision) for auditability and future re-runs. How is reliability enforced before anything reaches the model? Treat reliability as layered defenses: Language + content guardrails: Pre-validate inputs/outputs for safety and policy. Options span managed (Bedrock Guardrails) and OSS (NeMo Guardrails, Guardrails AI; Llama Guard). Independent comparisons and a position paper catalog the trade-offs. PII detection/redaction: Run analyzers on both source docs and model I/O. Microsoft Presidio offers recognizers and masking, with explicit caveats to combine with additional controls. Access control and lineage: Enforce row-/column-level ACLs and audit across catalogs (Unity Catalog) so retrieval respects permissions; unify lineage and access policies across workspaces. Retrieval quality gates: Evaluate RAG with reference-free metrics (faithfulness, context precision/recall) using Ragas/related tooling; block or down-rank poor contexts. How do you scale indexing and retrieval under real traffic? Two axes matter: ingest throughput and query concurrency. Ingest: Normalize at the lakehouse edge; write to Iceberg for versioned snapshots, then embed asynchronously. This enables deterministic rebuilds and point-in-time re-indexing. Vector serving: Milvus’s shared-storage, disaggregated compute architecture supports horizontal scaling with independent failure domains; use HNSW/IVF/Flat hybrids and replica sets to balance recall/latency. SQL + vector: Keep business joins server-side (pgvector), e.g., WHERE tenant_id = ? AND acl_tag @> … ORDER BY embedding <-> :q LIMIT k. This avoids N+1 trips and respects policies. Chunking/embedding strategy: Tune chunk size/overlap and semantic boundaries; bad chunking is the silent killer of recall. For structured+unstructured fusion, prefer hybrid retrieval (BM25 + ANN + reranker) and store structured features next to vectors to support filters and re-ranking features at query time. How do you monitor beyond logs? You need traces, metrics, and evaluations stitched together: Distributed tracing: Emit OpenTelemetry spans across ingestion, retrieval, model calls, and tools; LangSmith natively ingests OTEL traces and interoperates with external APMs (Jaeger, Datadog, Elastic). This gives end-to-end timing, prompts, contexts, and costs per request. LLM observability platforms: Compare options (LangSmith, Arize Phoenix, LangFuse, Datadog) by tracing, evals, cost tracking, and enterprise readiness. Independent roundups and matrixes are available. Continuous evaluation: Schedule RAG evals (Ragas/DeepEval/MLflow) on canary sets and live traffic replays; track faithfulness and grounding drift over time. Add schema profiling/mapping on ingestion to keep observability attached to data shape changes (e.g., new templates, table evolution) and to explain retrieval regressions when upstream sources shift. Example: doc-to-chat reference flow (signals and gates) Ingest: connectors → text extraction → normalization → Iceberg write (ACID, snapshots). Govern: PII scan (Presidio) → redact/mask → catalog registration with ACL policies. Index: embedding jobs → pgvector (policy-aware joins) and Milvus (high-QPS ANN). Serve: REST/gRPC → hybrid retrieval → guardrails → LLM → tool use. HITL: low-confidence paths route to A2I/LangGraph approval steps. Observe: OTEL traces to LangSmith/APM + scheduled RAG evaluations. Why “5% AI, 100% software engineering” is accurate in practice? Most outages and trust failures in agent systems are not model regressions; they’re data quality, permissioning, retrieval decay, or missing telemetry. The controls above—ACID tables, ACL catalogs, PII guardrails, hybrid retrieval, OTEL traces, and human gates—determine whether the same base model is safe, fast, and credibly correct for your users. Invest in these first; swap models later if needed. References: https://iceberg.apache.org/docs/1.9.0/evolution/ https://iceberg.apache.org/docs/1.5.2/ https://docs.snowflake.com/en/user-guide/tables-iceberg https://docs.dremio.com/current/developer/data-formats/apache-iceberg/ https://github.com/pgvector/pgvector https://www.postgresql.org/about/news/pgvector-070-released-2852/ https://github.com/pgvector/pgvector-go https://github.com/pgvector/pgvector-rust https://github.com/pgvector/pgvector-java https://milvus.io/docs/four_layers.md https://milvus.io/docs/v2.3.x/architecture_overview.md https://milvus.io/docs/v2.2.x/architecture.md https://www.linkedin.com/posts/armand-ruiz_ https://docs.vespa.ai/en/tutorials/hybrid-search.html https://www.elastic.co/what-is/hybrid-search https://www.elastic.co/search-labs/blog/hybrid-search-elasticsearch https://docs.cohere.com/reference/rerank https://docs.cohere.com/docs/rerank https://cohere.com/rerank https://opentelemetry.io/docs/concepts/signals/traces/ https://opentelemetry.io/docs/specs/otel/logs/ https://docs.smith.langchain.com/evaluation https://docs.smith.langchain.com/evaluation/concepts https://docs.smith.langchain.com/reference/python/evaluation https://docs.smith.langchain.com/observability https://www.langchain.com/langsmith https://arize.com/docs/phoenix https://github.com/Arize-ai/phoenix https://langfuse.com/docs/observability/get-started https://langfuse.com/docs/observability/overview https://docs.datadoghq.com/opentelemetry/ https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/ https://langchain-ai.github.io/langgraph/tutorials/get-started/4-human-in-the-loop/ https://docs.langchain.com/oss/python/langgraph/add-human-in-the-loop https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-use-augmented-ai-a2i-human-review-loops.html https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-start-human-loop.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-a2i-runtime.html https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-monitor-humanloop-results.html https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html https://aws.amazon.com/bedrock/guardrails/ https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateGuardrail.html https://docs.aws.amazon.com/bedrock/latest/userguide/agents-guardrail.html https://docs.nvidia.com/nemo-guardrails/index.html https://developer.nvidia.com/nemo-guardrails https://github.com/NVIDIA/NeMo-Guardrails https://docs.nvidia.com/nemo/guardrails/latest/user-guides/guardrails-library.html https://guardrailsai.com/docs/ https://github.com/guardrails-ai/guardrails https://guardrailsai.com/docs/getting_started/quickstart https://guardrailsai.com/docs/getting_started/guardrails_server https://pypi.org/project/guardrails-ai/ https://github.com/guardrails-ai/guardrails_pii https://huggingface.co/meta-llama/Llama-Guard-4-12B https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/ https://microsoft.github.io/presidio/ https://github.com/microsoft/presidio https://github.com/microsoft/presidio-research https://docs.databricks.com/aws/en/data-governance/unity-catalog/access-control https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/ https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/ https://docs.ragas.io/ https://docs.ragas.io/en/stable/references/evaluate/ https://docs.ragas.io/en/latest/tutorials/rag/ https://python.langchain.com/docs/concepts/text_splitters/ https://python.langchain.com/api_reference/text_splitters/index.html https://pypi.org/project/langchain-text-splitters/ https://milvus.io/docs/evaluation_with_deepeval.md https://mlflow.org/docs/latest/genai/eval-monitor/ https://mlflow.org/docs/2.10.1/llms/rag/notebooks/mlflow-e2e-evaluation.html The post Building AI agents is 5% AI and 100% software engineering appeared first on MarkTechPost.

Building AI agents is 5% AI and 100% software engineering Leer entrada »

AI, Committee, Noticias, Uncategorized

MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators

Table of contents Hardware Generation without Templates Input IR: Affine, Relation-Centric Semantics (Deconstruct) Front End: FU Graph + Memory Co-Design (Architect) Back End: Compile & Optimize to RTL (Compile & Optimize) Outcome Importance for each segment How the “Compiler for AI Chips” Works—Step-by-Step ? Where It Lands in the Ecosystem? Summary MIT researchers (Han Lab) introduced LEGO, a compiler-like framework that takes tensor workloads (e.g., GEMM, Conv2D, attention, MTTKRP) and automatically generates synthesizable RTL for spatial accelerators—no handwritten templates. LEGO’s front end expresses workloads and dataflows in a relation-centric affine representation, builds FU (functional unit) interconnects and on-chip memory layouts for reuse, and supports fusing multiple spatial dataflows in a single design. The back end lowers to a primitive-level graph and uses linear programming and graph transforms to insert pipeline registers, rewire broadcasts, extract reduction trees, and shrink area and power. Evaluated across foundation models and classic CNNs/Transformers, LEGO’s generated hardware shows 3.2× speedup and 2.4× energy efficiency over Gemmini under matched resources. https://hanlab.mit.edu/projects/lego Hardware Generation without Templates Existing flows either: (1) analyze dataflows without generating hardware, or (2) generate RTL from hand-tuned templates with fixed topologies. These approaches restrict the architecture space and struggle with modern workloads that need to switch dataflows dynamically across layers/ops (e.g., conv vs. depthwise vs. attention). LEGO directly targets any dataflow and combinations, generating both architecture and RTL from a high-level description rather than configuring a few numeric parameters in a template. https://hanlab.mit.edu/projects/lego Input IR: Affine, Relation-Centric Semantics (Deconstruct) LEGO models tensor programs as loop nests with three index classes: temporal (for-loops), spatial (par-for FUs), and computation (pre-tiling iteration domain). Two affine relations drive the compiler: Data mapping fI→Df_{I→D}: maps computation indices to tensor indices. Dataflow mapping fTS→If_{TS→I}: maps temporal/spatial indices to computation indices. This affine-only representation eliminates modulo/division in the core analysis, making reuse detection and address generation a linear-algebra problem. LEGO also decouples control flow from dataflow (a vector c encodes control signal propagation/delay), enabling shared control across FUs and substantially reducing control logic overhead. Front End: FU Graph + Memory Co-Design (Architect) The main objectives is to maximize reuse and on-chip bandwidth while minimizing interconnect/mux overhead. Interconnection synthesis. LEGO formulates reuse as solving linear systems over the affine relations to discover direct and delay (FIFO) connections between FUs. It then computes minimum-spanning arborescences (Chu-Liu/Edmonds) to keep only necessary edges (cost = FIFO depth). A BFS-based heuristic rewrites direct interconnects when multiple dataflows must co-exist, prioritizing chain reuse and nodes already fed by delay connections to cut muxes and data nodes. Banked memory synthesis. Given the set of FUs that must read/write a tensor in the same cycle, LEGO computes bank counts per tensor dimension from the maximum index deltas (optionally dividing by GCD to reduce banks). It then instantiates data-distribution switches to route between banks and FUs, leaving FU-to-FU reuse to the interconnect. Dataflow fusion. Interconnects for different spatial dataflows are combined into a single FU-level Architecture Description Graph (ADG); careful planning avoids naïve mux-heavy merges and yields up to ~20% energy gains compared to naïve fusion. Back End: Compile & Optimize to RTL (Compile & Optimize) The ADG is lowered to a Detailed Architecture Graph (DAG) of primitives (FIFOs, muxes, adders, address generators). LEGO applies several LP/graph passes: Delay matching via LP. A linear program chooses output delays DvD_v to minimize inserted pipeline registers ∑(Dv−Du−Lv)⋅bitwidthsum (D_v-D_u-L_v)cdot text{bitwidth} across edges—meeting timing alignment with minimal storage. Broadcast pin rewiring. A two-stage optimization (virtual cost shaping + MST-based rewiring among destinations) converts expensive broadcasts into forward chains, enabling register sharing and lower latency; a final LP re-balances delays. Reduction tree extraction + pin reuse. Sequential adder chains become balanced trees; a 0-1 ILP remaps reducer inputs across dataflows so fewer physical pins are required (mux instead of add). This reduces both logic depth and register count. These passes focus on the datapath, which dominates resources (e.g., FU-array registers ≈ 40% area, 60% power), and produce ~35% area savings versus naïve generation. Outcome Setup. LEGO is implemented in C++ with HiGHS as the LP solver and emits SpinalHDL→Verilog. Evaluation covers tensor kernels and end-to-end models (AlexNet, MobileNetV2, ResNet-50, EfficientNetV2, BERT, GPT-2, CoAtNet, DDPM, Stable Diffusion, LLaMA-7B). A single LEGO-MNICOC accelerator instance is used across models; a mapper picks per-layer tiling/dataflow. Gemmini is the main baseline under matched resources (256 MACs, 256 KB on-chip buffer, 128-bit bus @ 16 GB/s). End-to-end speed/efficiency. LEGO achieves 3.2× speedup and 2.4× energy efficiency on average vs. Gemmini. Gains stem from: (i) a fast, accurate performance model guiding mapping; (ii) dynamic spatial dataflow switching enabled by generated interconnects (e.g., depthwise conv layers choose OH–OW–IC–OC). Both designs are bandwidth-bound on GPT-2. Resource breakdown. Example SoC-style configuration shows FU array and NoC dominate area/power, with PPUs contributing ~2–5%. This supports the decision to aggressively optimize datapaths and control reuse. Generative models. On a larger 1024-FU configuration, LEGO sustains >80% utilization for DDPM/Stable Diffusion; LLaMA-7B remains bandwidth-limited (expected for low operational intensity). https://hanlab.mit.edu/projects/lego Importance for each segment For researchers: LEGO provides a mathematically grounded path from loop-nest specifications to spatial hardware with provable LP-based optimizations. It abstracts away low-level RTL and exposes meaningful levers (tiling, spatialization, reuse patterns) for systematic exploration. For practitioners: It is effectively hardware-as-code. You can target arbitrary dataflows and fuse them in one accelerator, letting a compiler derive interconnects, buffers, and controllers while shrinking mux/FIFO overheads. This improves energy and supports multi-op pipelines without manual template redesign. For product leaders: By lowering the barrier to custom silicon, LEGO enables task-tuned, power-efficient edge accelerators (wearables, IoT) that keep pace with fast-moving AI stacks—the silicon adapts to the model, not the other way around. End-to-end results against a state-of-the-art generator (Gemmini) quantify the upside. How the “Compiler for AI Chips” Works—Step-by-Step? Deconstruct (Affine IR). Write the tensor op as loop nests; supply affine f_{I→D} (data mapping), f_{TS→I} (dataflow), and control flow vector c. This specifies what to compute and how it is spatialized, without templates. Architect (Graph Synthesis). Solve reuse equations → FU interconnects (direct/delay) → MST/heuristics for minimal

MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators Leer entrada »

AI, Committee, Noticias, Uncategorized

Top Computer Vision CV Blogs & News Websites (2025)

Computer vision moved fast in 2025: new multimodal backbones, larger open datasets, and tighter model–systems integration. Practitioners need sources that publish rigorously, link code and benchmarks, and track deployment patterns—not marketing posts. This list prioritizes primary research hubs, lab blogs, and production-oriented engineering outlets with consistent update cadence. Use it to monitor SOTA shifts, grab reproducible code paths, and translate papers into deployable pipelines. Google Research (AI Blog) Primary source for advances from Google/DeepMind teams, including vision architectures (e.g., V-MoE) and periodic research year-in-review posts across CV and multimodal. Posts typically include method summaries, figures, and links to papers/code. Marktechpost Consistent reporting on new computer-vision models, datasets, and benchmarks with links to papers, code, and demos. Dedicated CV category plus frequent deep-dives (e.g., DINOv3 releases and analysis). Useful for staying on top of weekly research drops without wading through raw feeds. AI at Meta High-signal posts with preprints and open-source drops. Recent examples include DINOv3—scaled self-supervised backbones with SOTA across dense prediction tasks—which provide technical detail and artifacts. NVIDIA Technical Blog Production-oriented content on VLM-powered analytics, optimized inference, and GPU pipelines. Category feed for Computer Vision includes blueprints, SDK usage, and performance guidance relevant to enterprise deployments. arXiv cs.CV — raw research firehose The canonical preprint feed for CV. Use the recent or new views for daily updates; taxonomy confirms scope (image processing, pattern recognition, scene understanding). Best paired with RSS + custom filters. CVF Open Access (CVPR/ICCV/ECCV) Final versions of main-conference papers and workshops, searchable and citable. CVPR 2025 proceedings and workshop menus are already live, making this the authoritative archive post-acceptance. BAIR Blog (UC Berkeley) Occasional but deep posts on frontier topics (e.g., extremely large image modeling, robotics-vision crossovers). Good for conceptual clarity directly from authors. Stanford Blog Technical explainers and lab roundups (e.g., SAIL at CVPR 2025) with links to papers/talks. Useful to scan emerging directions across perception, generative models, and embodied vision. Roboflow Blog High-frequency, implementation-focused posts (labeling, training, deployment, apps, and trend reports). Strong for practitioners who need working pipelines and edge deployments. Hugging Face Blog Hands-on guides (VLMs, FiftyOne integrations) and ecosystem notes across Transformers, Diffusers, and timm; good for rapid prototyping and fine-tuning CV/VLM stacks. PyTorch Blog Change logs, APIs, and recipes affecting CV training/inference (Transforms V2, multi-weight support, FX feature extraction). Read when upgrading training stacks. The post Top Computer Vision CV Blogs & News Websites (2025) appeared first on MarkTechPost.

Top Computer Vision CV Blogs & News Websites (2025) Leer entrada »

AI, Committee, Noticias, Uncategorized

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit

Qwen has released Qwen3-ASR-Toolkit, an MIT-licensed Python CLI that programmatically bypasses the Qwen3-ASR-Flash API’s 3-minute/10 MB per-request limit by performing VAD-aware chunking, parallel API calls, and automatic resampling/format normalization via FFmpeg. The result is stable, hour-scale transcription pipelines with configurable concurrency, context injection, and clean text post-processing. Python ≥3.8 prerequisite, Install with: Copy CodeCopiedUse a different Browser pip install qwen3-asr-toolkit What the toolkit adds on top of the API Long-audio handling. The toolkit slices input using voice activity detection (VAD) at natural pauses, keeping each chunk under the API’s hard duration/size caps, then merges outputs in order. Parallel throughput. A thread pool dispatches multiple chunks concurrently to DashScope endpoints, improving wall-clock latency for hour-long inputs. You control concurrency via -j/–num-threads. Format & rate normalization. Any common audio/video container (MP4/MOV/MKV/MP3/WAV/M4A, etc.) is converted to the API’s required mono 16 kHz before submission. Requires FFmpeg installed on PATH. Text cleanup & context. The tool includes post-processing to reduce repetitions/hallucinations and supports context injection to bias recognition toward domain terms; the underlying API also exposes language detection and inverse text normalization (ITN) toggles. The official Qwen3-ASR-Flash API is single-turn and enforces ≤3 min duration and ≤10 MB payloads per call. That is reasonable for interactive requests but awkward for long media. The toolkit operationalizes best practices—VAD-aware segmentation + concurrent calls—so teams can batch large archives or live capture dumps without writing orchestration from scratch. Quick start Install prerequisites Copy CodeCopiedUse a different Browser # System: FFmpeg must be available # macOS brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install -y ffmpeg Install the CLI Copy CodeCopiedUse a different Browser pip install qwen3-asr-toolkit Configure credentials Copy CodeCopiedUse a different Browser # International endpoint key export DASHSCOPE_API_KEY=”sk-…” Run Copy CodeCopiedUse a different Browser # Basic: local video, default 4 threads qwen3-asr -i “/path/to/lecture.mp4” # Faster: raise parallelism and pass key explicitly (optional if env var set) qwen3-asr -i “/path/to/podcast.wav” -j 8 -key “sk-…” # Improve domain accuracy with context qwen3-asr -i “/path/to/earnings_call.m4a” -c “tickers, CFO name, product names, Q3 revenue guidance” Arguments you’ll actually use:-i/–input-file (file path or http/https URL), -j/–num-threads, -c/–context, -key/–dashscope-api-key, -t/–tmp-dir, -s/–silence. Output is printed and saved as <input_basename>.txt. Minimal pipeline architecture Load local file or URL → 2) VAD to find silence boundaries → 3) Chunk under API caps → 4) Resample to 16 kHz mono → 5) Parallel submit to DashScope → 6) Aggregate segments in order → 7) Post-process text (dedupe, repetitions) → 8) Emit .txt transcript. Summary Qwen3-ASR-Toolkit turns Qwen3-ASR-Flash into a practical long-audio pipeline by combining VAD-based segmentation, FFmpeg normalization (mono/16 kHz), and parallel API dispatch under the 3-minute/10 MB caps. Teams get deterministic chunking, configurable throughput, and optional context/LID/ITN controls without custom orchestration. For production, pin the package version, verify region endpoints/keys, and tune thread count to your network and QPS—then pip install qwen3-asr-toolkit and ship. Check out the GitHub Page for Codes. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit appeared first on MarkTechPost.

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: the CDC’s vaccine chaos

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A pivotal meeting on vaccine guidance is underway—and former CDC leaders are alarmed This week has been an eventful one for America’s public health agency. Two former leaders of the US Centers for Disease Control and Prevention explained why they suddenly departed in a Senate hearing. They also described how CDC employees are being instructed to turn their backs on scientific evidence. They painted a picture of a health agency in turmoil—and at risk of harming the people it is meant to serve. And, just hours afterwards, a panel of CDC advisers voted to stop recommending the MMRV vaccine for children under four. Read the full story. —Jessica Hamzelou This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. If you’re interested in reading more about US vaccine policy, check out: + Read our profile of Jim O’Neill, the deputy health secretary and current acting CDC director. + Why US federal health agencies are abandoning mRNA vaccines. Read the full story. + Why childhood vaccines are a public health success story. No vaccine is perfect, but these medicines are still saving millions of lives. Read the full story.  + The FDA plans to limit access to covid vaccines. Here’s why that’s not all bad. Meet Sneha Goenka: our 2025 Innovator of the Year Every year, MIT Technology Review selects one individual whose work we admire to recognize as Innovator of the Year. For 2025, we chose Sneha Goenka, who designed the computations behind the world’s fastest whole-genome sequencing method.  Thanks to her work, physicians can now sequence a patient’s genome and diagnose a genetic condition in less than eight hours—an achievement that could transform medical care. Register here to join an exclusive subscriber-only Roundtable conversation with Goenka, Leilani Battle, assistant professor at the University of Washington, and our editor in chief Mat Honan at 1pm ET next Tuesday September 23. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The CDC voted against giving some children a combined vaccine If accepted, the agency will stop recommending the MMRV vaccine for children under 4. (CNN)+ Its vote on hepatitis B vaccines for newborns is expected today too. (The Atlantic $)+ RFK JR’s allies are closing ranks around him. (Politico) 2 Russia is using Charlie Kirk’s murder to sow division in the USIt’s using the momentum to push pro-Kremlin narratives and divide Americans. (WP $)+ The complicated phenomenon of political violence. (Vox)+ We don’t know what being ‘terminally online’ means any more. (Wired $) 3 Nvidia will invest $5 billion in IntelThe partnership allows Intel to develop custom CPUs to work with Nvidia’s chips. (WSJ $)+ It’s a much-needed financial shot in the arm for Intel. (WP $)+ It’s also great news for Intel’s Asian suppliers. (Bloomberg $) 4 Medical AI tools downplay symptoms in women and ethnic minoritiesExperts fear that LLM-powered tools could lead to worse health outcomes. (FT $)+ Artificial intelligence is infiltrating health care. We shouldn’t let it make all the decisions. (MIT Technology Review) 5 AI browsers have hit the mainstreamWhere’s the off switch? (Wired $)+ AI means the end of internet search as we’ve known it. (MIT Technology Review) 6 China has entered the global brain interface raceIts ambitious government-backed startups are primed to challenge Neuralink. (Bloomberg $)+ This patient’s Neuralink brain implant gets a boost from generative AI. (MIT Technology Review) 7 What makes humans unique in the age of AI?Defining the distinctions between us and machines isn’t as easy as it used to be. (New Yorker $)+ How AI can help supercharge creativity. (MIT Technology Review) 8 This ship helps to reconnect Africa’s internetAI needs high speed internet, which needs undersea cables. (Rest of World)+ What Africa needs to do to become a major AI player. (MIT Technology Review) 9 Hundreds of people queued in Beijing to buy Apple’s new iPhoneDesire for Apple products in the country appears to be alive and well. (Reuters) 10 San Francisco’s idea of a great night out? A robot cage fightIt’s certainly one way to have a good time. (NYT $) Quote of the day “Get off the iPad!” —An irate air traffic controller tells the pilots of a Spirit Airlines flight to pay attention to avoid potentially colliding with Donald Trump’s Air Force One aircraft, Ars Technica reports. One more thing We used to get excited about technology. What happened? As a philosopher who studies AI and data, Shannon Vallor’s Twitter feed is always filled with the latest tech news. Increasingly, she’s realized that the constant stream of information is no longer inspiring joy, but a sense of resignation. Joy is missing from our lives, and from our technology. Its absence is feeding a growing unease being voiced by many who work in tech or study it. Fixing it depends on understanding how and why the priorities in our tech ecosystem have changed. Read the full story. We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Would you go about your daily business with a soft toy on your shoulder? This intrepid reporter gave it a go.+ How dying dinosaurs shaped the landscapes around us.+ I can’t believe I missed Pythagorean Theorem day earlier this week.+ Inside the rise in popularity of the no-water yard.

The Download: the CDC’s vaccine chaos Leer entrada »

AI, Committee, Noticias, Uncategorized

Fair-GPTQ: Bias-Aware Quantization for Large Language Models

arXiv:2509.15206v1 Announce Type: new Abstract: High memory demands of generative language models have drawn attention to quantization, which reduces computational cost, memory usage, and latency by mapping model weights to lower-precision integers. Approaches such as GPTQ effectively minimize input-weight product errors during quantization; however, recent empirical studies show that they can increase biased outputs and degrade performance on fairness benchmarks, and it remains unclear which specific weights cause this issue. In this work, we draw new links between quantization and model fairness by adding explicit group-fairness constraints to the quantization objective and introduce Fair-GPTQ, the first quantization method explicitly designed to reduce unfairness in large language models. The added constraints guide the learning of the rounding operation toward less-biased text generation for protected groups. Specifically, we focus on stereotype generation involving occupational bias and discriminatory language spanning gender, race, and religion. Fair-GPTQ has minimal impact on performance, preserving at least 90% of baseline accuracy on zero-shot benchmarks, reduces unfairness relative to a half-precision model, and retains the memory and speed benefits of 4-bit quantization. We also compare the performance of Fair-GPTQ with existing debiasing methods and find that it achieves performance on par with the iterative null-space projection debiasing approach on racial-stereotype benchmarks. Overall, the results validate our theoretical solution to the quantization problem with a group-bias term, highlight its applicability for reducing group bias at quantization time in generative models, and demonstrate that our approach can further be used to analyze channel- and weight-level contributions to fairness during quantization.

Fair-GPTQ: Bias-Aware Quantization for Large Language Models Leer entrada »

AI, Committee, Noticias, Uncategorized

Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction

arXiv:2509.14504v1 Announce Type: new Abstract: In this paper, we introduce OmniGEC, a collection of multilingual silver-standard datasets for the task of Grammatical Error Correction (GEC), covering eleven languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian. These datasets facilitate the development of multilingual GEC solutions and help bridge the data gap in adapting English GEC solutions to multilingual GEC. The texts in the datasets originate from three sources: Wikipedia edits for the eleven target languages, subreddits from Reddit in the eleven target languages, and the Ukrainian-only UberText 2.0 social media corpus. While Wikipedia edits were derived from human-made corrections, the Reddit and UberText 2.0 data were automatically corrected with the GPT-4o-mini model. The quality of the corrections in the datasets was evaluated both automatically and manually. Finally, we fine-tune two open-source large language models – Aya-Expanse (8B) and Gemma-3 (12B) – on the multilingual OmniGEC corpora and achieve state-of-the-art (SOTA) results for paragraph-level multilingual GEC. The dataset collection and the best-performing models are available on Hugging Face.

Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction Leer entrada »

AI, Committee, Noticias, Uncategorized

AgentCompass: Towards Reliable Evaluation of Agentic Workflows in Production

arXiv:2509.14647v1 Announce Type: cross Abstract: With the growing adoption of Large Language Models (LLMs) in automating complex, multi-agent workflows, organizations face mounting risks from errors, emergent behaviors, and systemic failures that current evaluation methods fail to capture. We present AgentCompass, the first evaluation framework designed specifically for post-deployment monitoring and debugging of agentic workflows. AgentCompass models the reasoning process of expert debuggers through a structured, multi-stage analytical pipeline: error identification and categorization, thematic clustering, quantitative scoring, and strategic summarization. The framework is further enhanced with a dual memory system-episodic and semantic-that enables continual learning across executions. Through collaborations with design partners, we demonstrate the framework’s practical utility on real-world deployments, before establishing its efficacy against the publicly available TRAIL benchmark. AgentCompass achieves state-of-the-art results on key metrics, while uncovering critical issues missed in human annotations, underscoring its role as a robust, developer-centric tool for reliable monitoring and improvement of agentic systems in production.

AgentCompass: Towards Reliable Evaluation of Agentic Workflows in Production Leer entrada »

AI, Committee, Noticias, Uncategorized

Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey

arXiv:2503.01513v3 Announce Type: replace Abstract: We present a survey of methods for assessing and enhancing the quality of online discussions, focusing on the potential of LLMs. While online discourses aim, at least in theory, to foster mutual understanding, they often devolve into harmful exchanges, such as hate speech, threatening social cohesion and democratic values. Recent advancements in LLMs enable artificial facilitation agents to not only moderate content, but also actively improve the quality of interactions. Our survey synthesizes ideas from NLP and Social Sciences to provide (a) a new taxonomy on discussion quality evaluation, (b) an overview of intervention and facilitation strategies, (c) along with a new taxonomy of conversation facilitation datasets, (d) an LLM-oriented roadmap of good practices and future research directions, from technological and societal perspectives.

Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey Leer entrada »

AI, Committee, Noticias, Uncategorized

Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language Models

arXiv:2505.16245v3 Announce Type: replace Abstract: Diverse language model responses are crucial for creative generation, open-ended tasks, and self-improvement training. We show that common diversity metrics, and even reward models used for preference optimization, systematically bias models toward shorter outputs, limiting expressiveness. To address this, we introduce Diverse, not Short (Diverse-NS), a length-controlled data selection strategy that improves response diversity while maintaining length parity. By generating and filtering preference data that balances diversity, quality, and length, Diverse-NS enables effective training using only 3,000 preference pairs. Applied to LLaMA-3.1-8B and the Olmo-2 family, Diverse-NS substantially enhances lexical and semantic diversity. We show consistent improvement in diversity with minor reduction or gains in response quality on four creative generation tasks: Divergent Associations, Persona Generation, Alternate Uses, and Creative Writing. Surprisingly, experiments with the Olmo-2 model family (7B, and 13B) show that smaller models like Olmo-2-7B can serve as effective “diversity teachers” for larger models. By explicitly addressing length bias, our method efficiently pushes models toward more diverse and expressive outputs.

Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language Models Leer entrada »

We use cookies to improve your experience and performance on our website. You can learn more at Política de privacidad and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
es_ES