YouZum

Committee

AI, Committee, ニュース, Uncategorized

Localmax dynamics for attention in transformers and its asymptotic behavior

arXiv:2509.15958v1 Announce Type: new Abstract: We introduce a new discrete-time attention model, termed the localmax dynamics, which interpolates between the classic softmax dynamics and the hardmax dynamics, where only the tokens that maximize the influence toward a given token have a positive weight. As in hardmax, uniform weights are determined by a parameter controlling neighbor influence, but the key extension lies in relaxing neighborhood interactions through an alignment-sensitivity parameter, which allows controlled deviations from pure hardmax behavior. As we prove, while the convex hull of the token states still converges to a convex polytope, its structure can no longer be fully described by a maximal alignment set, prompting the introduction of quiescent sets to capture the invariant behavior of tokens near vertices. We show that these sets play a key role in understanding the asymptotic behavior of the system, even under time-varying alignment sensitivity parameters. We further show that localmax dynamics does not exhibit finite-time convergence and provide results for vanishing, nonzero, time-varying alignment-sensitivity parameters, recovering the limiting behavior of hardmax as a by-product. Finally, we adapt Lyapunov-based methods from classical opinion dynamics, highlighting their limitations in the asymmetric setting of localmax interactions and outlining directions for future research.

Localmax dynamics for attention in transformers and its asymptotic behavior 投稿を読む »

AI, Committee, ニュース, Uncategorized

Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack

arXiv:2501.08454v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have become essential tools for digital task assistance. Their training relies heavily on the collection of vast amounts of data, which may include copyright-protected or sensitive information. Recent studies on detecting pretraining data in LLMs have primarily focused on sentence- or paragraph-level membership inference attacks (MIAs), usually involving probability analysis of the target model’s predicted tokens. However, these methods often exhibit poor accuracy, failing to account for the semantic importance of textual content and word significance. To address these shortcomings, we propose Tag&Tab, a novel approach for detecting data used in LLM pretraining. Our method leverages established natural language processing (NLP) techniques to tag keywords in the input text, a process we term Tagging. Then, the LLM is used to obtain probabilities for these keywords and calculate their average log-likelihood to determine input text membership, a process we refer to as Tabbing. Our experiments on four benchmark datasets (BookMIA, MIMIR, PatentMIA, and the Pile) and several open-source LLMs of varying sizes demonstrate an average increase in AUC scores ranging from 5.3% to 17.6% over state-of-the-art methods. Tag&Tab not only sets a new standard for data leakage detection in LLMs, but its outstanding performance is a testament to the importance of words in MIAs on LLMs.

Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack 投稿を読む »

AI, Committee, ニュース, Uncategorized

Xiaomi Released MiMo-Audio, a 7B Speech Language Model Trained on 100M+ Hours with High-Fidelity Discrete Tokens

Xiaomi’s MiMo team released MiMo-Audio, a 7-billion-parameter audio-language model that runs a single next-token objective over interleaved text and discretized speech, scaling pretraining beyond 100 million hours of audio. What’s actually new? Instead of relying on task-specific heads or lossy acoustic tokens, MiMo-Audio uses a bespoke RVQ (residual vector quantization) tokenizer that targets both semantic fidelity and high-quality reconstruction. The tokenizer runs at 25 Hz and outputs 8 RVQ layers (≈200 tokens/s), giving the LM access to “lossless” speech features it can model autoregressively alongside text. Architecture: patch encoder → 7B LLM → patch decoder To handle the audio/text rate mismatch, the system packs four timesteps per patch for LM consumption (downsampling 25 Hz → 6.25 Hz), then reconstructs full-rate RVQ streams with a causal patch decoder. A delayed multi-layer RVQ generation scheme staggers predictions per codebook to stabilize synthesis and respect inter-layer dependencies. All three parts—patch encoder, MiMo-7B backbone, and patch decoder—are trained under a single next-token objective. https://xiaomimimo.github.io/MiMo-Audio-Demo/ Scale is the algorithm Training proceeds in two big phases: (1) an “understanding” stage that optimizes text-token loss over interleaved speech-text corpora, and (2) a joint “understanding + generation” stage that turns on audio losses for speech continuation, S2T/T2S tasks, and instruction-style data. The report emphasizes a compute/data threshold where few-shot behavior appears to “switch on,” echoing emergence curves seen in large text-only LMs. Benchmarks: speech intelligence and general audio MiMo-Audio is evaluated on speech-reasoning suites (e.g., SpeechMMLU) and broad audio understanding benchmarks (e.g., MMAU), reporting strong scores across speech, sound, and music and a reduced “modality gap” between text-only and speech-in/speech-out settings. Xiaomi also releases MiMo-Audio-Eval, a public toolkit to reproduce these results. Listen-and-respond demos (speech continuation, voice/emotion conversion, denoising, and speech translation) are available online. https://xiaomimimo.github.io/MiMo-Audio-Demo/ Why this is important? The approach is intentionally simple—no multi-head task tower, no bespoke ASR/TTS objectives at pretraining time—just GPT-style next-token prediction over lossless audio tokens plus text. The key engineering ideas are (i) a tokenizer the LM can actually use without throwing away prosody and speaker identity; (ii) patchification to keep sequence lengths manageable; and (iii) delayed RVQ decoding to preserve quality at generation time. For teams building spoken agents, those design choices translate into few-shot speech-to-speech editing and robust speech continuation with minimal task-specific finetuning. 6 Technical Takeaways: High-Fidelity TokenizationMiMo-Audio uses a custom RVQ tokenizer operating at 25 Hz with 8 active codebooks, ensuring speech tokens preserve prosody, timbre, and speaker identity while keeping them LM-friendly. Patchified Sequence ModelingThe model reduces sequence length by grouping 4 timesteps into one patch (25 Hz → 6.25 Hz), letting the 7B LLM handle long speech efficiently without discarding detail. Unified Next-Token ObjectiveRather than separate heads for ASR, TTS, or dialogue, MiMo-Audio trains under a single next-token prediction loss across interleaved text and audio, simplifying architecture while supporting multi-task generalization. Emergent Few-Shot AbilitiesFew-shot behaviors such as speech continuation, voice conversion, emotion transfer, and speech translation emerge once training surpasses a large-scale data threshold (~100M hours, trillions of tokens). Benchmark LeadershipMiMo-Audio sets state-of-the-art scores on SpeechMMLU (S2S 69.1, T2S 71.5) and MMAU (66.0 overall), while minimizing the text-to-speech modality gap to just 3.4 points. Open Ecosystem ReleaseXiaomi provides the tokenizer, 7B checkpoints (base and instruct), MiMo-Audio-Eval toolkit, and public demos, enabling researchers and developers to test and extend speech-to-speech intelligence in open-source pipelines. Summary MiMo-Audio demonstrates that high-fidelity, RVQ-based “lossless” tokenization combined with patchified next-token pretraining at scale is sufficient to unlock few-shot speech intelligence without task-specific heads. The 7B stack—tokenizer → patch encoder → LLM → patch decoder—bridges the audio/text rate gap (25→6.25 Hz) and preserves prosody and speaker identity via delayed multi-layer RVQ decoding. Empirically, the model narrows the textspeech modality gap, generalizes across speech/sound/music benchmarks, and supports in-context S2S editing and continuation. Check out the Paper, Technical details and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Xiaomi Released MiMo-Audio, a 7B Speech Language Model Trained on 100M+ Hours with High-Fidelity Discrete Tokens appeared first on MarkTechPost.

Xiaomi Released MiMo-Audio, a 7B Speech Language Model Trained on 100M+ Hours with High-Fidelity Discrete Tokens 投稿を読む »

AI, Committee, ニュース, Uncategorized

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean?

What exactly is being measured when a judge LLM assigns a 1–5 (or pairwise) score? Most “correctness/faithfulness/completeness” rubrics are project-specific. Without task-grounded definitions, a scalar score can drift from business outcomes (e.g., “useful marketing post” vs. “high completeness”). Surveys of LLM-as-a-judge (LAJ) note that rubric ambiguity and prompt template choices materially shift scores and human correlations. How stable are judge decisions to prompt position and formatting? Large controlled studies find position bias: identical candidates receive different preferences depending on order; list-wise and pairwise setups both show measurable drift (e.g., repetition stability, position consistency, preference fairness). Work cataloging verbosity bias shows longer responses are often favored independent of quality; several reports also describe self-preference (judges prefer text closer to their own style/policy). Do judge scores consistently match human judgments of factuality? Empirical results are mixed. For summary factuality, one study reported low or inconsistent correlations with humans for strong models (GPT-4, PaLM-2), with only partial signal from GPT-3.5 on certain error types. Conversely, domain-bounded setups (e.g., explanation quality for recommenders) have reported usable agreement with careful prompt design and ensembling across heterogeneous judges. Taken together, correlation seems task- and setup-dependent, not a general guarantee. How robust are judge LLMs to strategic manipulation? LLM-as-a-Judge (LAJ) pipelines are attackable. Studies show universal and transferable prompt attacks can inflate assessment scores; defenses (template hardening, sanitization, re-tokenization filters) mitigate but do not eliminate susceptibility. Newer evaluations differentiate content-author vs. system-prompt attacks and document degradation across several families (Gemma, Llama, GPT-4, Claude) under controlled perturbations. Is pairwise preference safer than absolute scoring? Preference learning often favors pairwise ranking, yet recent research finds protocol choice itself introduces artifacts: pairwise judges can be more vulnerable to distractors that generator models learn to exploit; absolute (pointwise) scores avoid order bias but suffer scale drift. Reliability therefore hinges on protocol, randomization, and controls rather than a single universally superior scheme. Could “judging” encourage overconfident model behavior? Recent reporting on evaluation incentives argues that test-centric scoring can reward guessing and penalize abstention, shaping models toward confident hallucinations; proposals suggest scoring schemes that explicitly value calibrated uncertainty. While this is a training-time concern, it feeds back into how evaluations are designed and interpreted. Where do generic “judge” scores fall short for production systems? When an application has deterministic sub-steps (retrieval, routing, ranking), component metrics offer crisp targets and regression tests. Common retrieval metrics include Precision@k, Recall@k, MRR, and nDCG; these are well-defined, auditable, and comparable across runs. Industry guides emphasize separating retrieval and generation and aligning subsystem metrics with end goals, independent of any judge LLM. If judge LLMs are fragile, what does “evaluation” look like in the wild? Public engineering playbooks increasingly describe trace-first, outcome-linked evaluation: capture end-to-end traces (inputs, retrieved chunks, tool calls, prompts, responses) using OpenTelemetry GenAI semantic conventions and attach explicit outcome labels (resolved/unresolved, complaint/no-complaint). This supports longitudinal analysis, controlled experiments, and error clustering—regardless of whether any judge model is used for triage. Tooling ecosystems (e.g., LangSmith and others) document trace/eval wiring and OTel interoperability; these are descriptions of current practice rather than endorsements of a particular vendor. Are there domains where LLM-as-a-Judge (LAJ) seems comparatively reliable? Some constrained tasks with tight rubrics and short outputs report better reproducibility, especially when ensembles of judges and human-anchored calibration sets are used. But cross-domain generalization remains limited, and bias/attack vectors persist. Does LLM-as-a-Judge (LAJ) performance drift with content style, domain, or “polish”? Beyond length and order, studies and news coverage indicate LLMs sometimes over-simplify or over-generalize scientific claims compared to domain experts—useful context when using LAJ to score technical material or safety-critical text. Key Technical Observations Biases are measurable (position, verbosity, self-preference) and can materially change rankings without content changes. Controls (randomization, de-biasing templates) reduce but do not eliminate effects. Adversarial pressure matters: prompt-level attacks can systematically inflate scores; current defenses are partial. Human agreement varies by task: factuality and long-form quality show mixed correlations; narrow domains with careful design and ensembling fare better. Component metrics remain well-posed for deterministic steps (retrieval/routing), enabling precise regression tracking independent of judge LLMs. Trace-based online evaluation described in industry literature (OTel GenAI) supports outcome-linked monitoring and experimentation. Summary In conclusion, this article does not argue against the existence of LLM-as-a-Judge but highlights the nuances, limitations, and ongoing debates around its reliability and robustness. The intention is not to dismiss its use but to frame open questions that need further exploration. Companies and research groups actively developing or deploying LLM-as-a-Judge (LAJ) pipelines are invited to share their perspectives, empirical findings, and mitigation strategies—adding valuable depth and balance to the broader conversation on evaluation in the GenAI era. The post LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean? appeared first on MarkTechPost.

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean? 投稿を読む »

AI, Committee, ニュース, Uncategorized

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

xAI introduced Grok-4-Fast, a cost-optimized successor to Grok-4 that merges “reasoning” and “non-reasoning” behaviors into a single set of weights controllable via system prompts. The model targets high-throughput search, coding, and Q&A with a 2M-token context window and native tool-use RL that decides when to browse the web, execute code, or call tools. Architecture note Previous Grok releases split long-chain “reasoning” and short “non-reasoning” responses across separate models. Grok-4-Fast’s unified weight space reduces end-to-end latency and tokens by steering behavior via system prompts, which is relevant for real-time applications (search, assistive agents, and interactive coding) where switching models penalizes both latency and cost. Search and agentic use Grok-4-Fast was trained end-to-end with tool-use reinforcement learning and shows gains on search-centric agent benchmarks: BrowseComp 44.9%, SimpleQA 95.0%, Reka Research 66.0%, plus higher scores on Chinese variants (e.g., BrowseComp-zh 51.2%). xAI also cites private battle-testing on LMArena where grok-4-fast-search (codename “menlo”) ranks #1 in the Search Arena with 1163 Elo, and the text variant (codename “tahoe”) sits at #8 in the Text Arena, roughly on par with grok-4-0709. Performance and efficiency deltas On internal and public benchmarks, Grok-4-Fast posts frontier-class scores while cutting token usage. xAI reports pass@1 results of 92.0% (AIME 2025, no tools), 93.3% (HMMT 2025, no tools), 85.7% (GPQA Diamond), and 80.0% (LiveCodeBench Jan–May), approaching or matching Grok-4 but using ~40% fewer “thinking” tokens on average. The company frames this as “intelligence density,” claiming a ~98% reduction in price to reach the same benchmark performance as Grok-4 when the lower token count and new per-token pricing are combined. Deployment and price The model is generally available to all users in Grok’s Fast and Auto modes across web and mobile; Auto will select Grok-4-Fast for difficult queries to improve latency without losing quality, and—for the first time—free users access xAI’s latest model tier. For developers, xAI exposes two SKUs—grok-4-fast-reasoning and grok-4-fast-non-reasoning—both with 2M context. Pricing (xAI API) is $0.20 / 1M input tokens (<128k), $0.40 / 1M input tokens (≥128k), $0.50 / 1M output tokens (<128k), $1.00 / 1M output tokens (≥128k), and $0.05 / 1M cached input tokens. https://x.ai/news/grok-4-fast 5 Technical Takeaways: Unified model + 2M context. Grok-4-Fast uses a single weight space for “reasoning” and “non-reasoning,” prompt-steered, with a 2,000,000-token window across both SKUs. Pricing for scale. API pricing starts at $0.20/M input, $0.50/M output, with cached input at $0.05/M and higher rates only beyond 128K context. Efficiency claims. xAI reports ~40% fewer “thinking” tokens at comparable accuracy vs Grok-4, yielding a ~98% lower price to match Grok-4 performance on frontier benchmarks. Benchmark profile. Reported pass@1: AIME-2025 92.0%, HMMT-2025 93.3%, GPQA-Diamond 85.7%, LiveCodeBench (Jan–May) 80.0%. Agentic/search use. Post-training with tool-use RL; positioned for browsing/search workflows with documented search-agent metrics and live-search billing in docs. Summary Grok-4-Fast packages Grok-4-level capability into a single, prompt-steerable model with a 2M-token window, tool-use RL, and pricing tuned for high-throughput search and agent workloads. Early public signals (LMArena #1 in Search, competitive Text placement) align with xAI’s claim of similar accuracy using ~40% fewer “thinking” tokens, translating to lower latency and unit cost in production. Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL) appeared first on MarkTechPost.

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL) 投稿を読む »

AI, Committee, ニュース, Uncategorized

Building AI agents is 5% AI and 100% software engineering

Production-grade agents live or die on data plumbing, controls, and observability—not on model choice. The doc-to-chat pipeline below maps the concrete layers and why they matter. What is a “doc-to-chat” pipeline? A doc-to-chat pipeline ingests enterprise documents, standardizes them, enforces governance, indexes embeddings alongside relational features, and serves retrieval + generation behind authenticated APIs with human-in-the-loop (HITL) checkpoints. It’s the reference architecture for agentic Q&A, copilots, and workflow automation where answers must respect permissions and be audit-ready. Production implementations are variations of RAG (retrieval-augmented generation) hardened with LLM guardrails, governance, and OpenTelemetry-backed tracing. How do you integrate cleanly with the existing stack? Use standard service boundaries (REST/JSON, gRPC) over a storage layer your org already trusts. For tables, Iceberg gives ACID, schema evolution, partition evolution, and snapshots—critical for reproducible retrieval and backfills. For vectors, use a system that coexists with SQL filters: pgvector collocates embeddings with business keys and ACL tags in PostgreSQL; dedicated engines like Milvus handle high-QPS ANN with disaggregated storage/compute. In practice, many teams run both: SQL+pgvector for transactional joins and Milvus for heavy retrieval. Key properties Iceberg tables: ACID, hidden partitioning, snapshot isolation; vendor support across warehouses. pgvector: SQL + vector similarity in one query plan for precise joins and policy enforcement. Milvus: layered, horizontally scalable architecture for large-scale similarity search. How do agents, humans, and workflows coordinate on one “knowledge fabric”? Production agents require explicit coordination points where humans approve, correct, or escalate. AWS A2I provides managed HITL loops (private workforces, flow definitions) and is a concrete blueprint for gating low-confidence outputs. Frameworks like LangGraph model these human checkpoints inside agent graphs so approvals are first-class steps in the DAG, not ad hoc callbacks. Use them to gate actions like publishing summaries, filing tickets, or committing code. Pattern: LLM → confidence/guardrail checks → HITL gate → side-effects. Persist every artifact (prompt, retrieval set, decision) for auditability and future re-runs. How is reliability enforced before anything reaches the model? Treat reliability as layered defenses: Language + content guardrails: Pre-validate inputs/outputs for safety and policy. Options span managed (Bedrock Guardrails) and OSS (NeMo Guardrails, Guardrails AI; Llama Guard). Independent comparisons and a position paper catalog the trade-offs. PII detection/redaction: Run analyzers on both source docs and model I/O. Microsoft Presidio offers recognizers and masking, with explicit caveats to combine with additional controls. Access control and lineage: Enforce row-/column-level ACLs and audit across catalogs (Unity Catalog) so retrieval respects permissions; unify lineage and access policies across workspaces. Retrieval quality gates: Evaluate RAG with reference-free metrics (faithfulness, context precision/recall) using Ragas/related tooling; block or down-rank poor contexts. How do you scale indexing and retrieval under real traffic? Two axes matter: ingest throughput and query concurrency. Ingest: Normalize at the lakehouse edge; write to Iceberg for versioned snapshots, then embed asynchronously. This enables deterministic rebuilds and point-in-time re-indexing. Vector serving: Milvus’s shared-storage, disaggregated compute architecture supports horizontal scaling with independent failure domains; use HNSW/IVF/Flat hybrids and replica sets to balance recall/latency. SQL + vector: Keep business joins server-side (pgvector), e.g., WHERE tenant_id = ? AND acl_tag @> … ORDER BY embedding <-> :q LIMIT k. This avoids N+1 trips and respects policies. Chunking/embedding strategy: Tune chunk size/overlap and semantic boundaries; bad chunking is the silent killer of recall. For structured+unstructured fusion, prefer hybrid retrieval (BM25 + ANN + reranker) and store structured features next to vectors to support filters and re-ranking features at query time. How do you monitor beyond logs? You need traces, metrics, and evaluations stitched together: Distributed tracing: Emit OpenTelemetry spans across ingestion, retrieval, model calls, and tools; LangSmith natively ingests OTEL traces and interoperates with external APMs (Jaeger, Datadog, Elastic). This gives end-to-end timing, prompts, contexts, and costs per request. LLM observability platforms: Compare options (LangSmith, Arize Phoenix, LangFuse, Datadog) by tracing, evals, cost tracking, and enterprise readiness. Independent roundups and matrixes are available. Continuous evaluation: Schedule RAG evals (Ragas/DeepEval/MLflow) on canary sets and live traffic replays; track faithfulness and grounding drift over time. Add schema profiling/mapping on ingestion to keep observability attached to data shape changes (e.g., new templates, table evolution) and to explain retrieval regressions when upstream sources shift. Example: doc-to-chat reference flow (signals and gates) Ingest: connectors → text extraction → normalization → Iceberg write (ACID, snapshots). Govern: PII scan (Presidio) → redact/mask → catalog registration with ACL policies. Index: embedding jobs → pgvector (policy-aware joins) and Milvus (high-QPS ANN). Serve: REST/gRPC → hybrid retrieval → guardrails → LLM → tool use. HITL: low-confidence paths route to A2I/LangGraph approval steps. Observe: OTEL traces to LangSmith/APM + scheduled RAG evaluations. Why “5% AI, 100% software engineering” is accurate in practice? Most outages and trust failures in agent systems are not model regressions; they’re data quality, permissioning, retrieval decay, or missing telemetry. The controls above—ACID tables, ACL catalogs, PII guardrails, hybrid retrieval, OTEL traces, and human gates—determine whether the same base model is safe, fast, and credibly correct for your users. Invest in these first; swap models later if needed. References: https://iceberg.apache.org/docs/1.9.0/evolution/ https://iceberg.apache.org/docs/1.5.2/ https://docs.snowflake.com/en/user-guide/tables-iceberg https://docs.dremio.com/current/developer/data-formats/apache-iceberg/ https://github.com/pgvector/pgvector https://www.postgresql.org/about/news/pgvector-070-released-2852/ https://github.com/pgvector/pgvector-go https://github.com/pgvector/pgvector-rust https://github.com/pgvector/pgvector-java https://milvus.io/docs/four_layers.md https://milvus.io/docs/v2.3.x/architecture_overview.md https://milvus.io/docs/v2.2.x/architecture.md https://www.linkedin.com/posts/armand-ruiz_ https://docs.vespa.ai/en/tutorials/hybrid-search.html https://www.elastic.co/what-is/hybrid-search https://www.elastic.co/search-labs/blog/hybrid-search-elasticsearch https://docs.cohere.com/reference/rerank https://docs.cohere.com/docs/rerank https://cohere.com/rerank https://opentelemetry.io/docs/concepts/signals/traces/ https://opentelemetry.io/docs/specs/otel/logs/ https://docs.smith.langchain.com/evaluation https://docs.smith.langchain.com/evaluation/concepts https://docs.smith.langchain.com/reference/python/evaluation https://docs.smith.langchain.com/observability https://www.langchain.com/langsmith https://arize.com/docs/phoenix https://github.com/Arize-ai/phoenix https://langfuse.com/docs/observability/get-started https://langfuse.com/docs/observability/overview https://docs.datadoghq.com/opentelemetry/ https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/ https://langchain-ai.github.io/langgraph/tutorials/get-started/4-human-in-the-loop/ https://docs.langchain.com/oss/python/langgraph/add-human-in-the-loop https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-use-augmented-ai-a2i-human-review-loops.html https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-start-human-loop.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-a2i-runtime.html https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-monitor-humanloop-results.html https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html https://aws.amazon.com/bedrock/guardrails/ https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateGuardrail.html https://docs.aws.amazon.com/bedrock/latest/userguide/agents-guardrail.html https://docs.nvidia.com/nemo-guardrails/index.html https://developer.nvidia.com/nemo-guardrails https://github.com/NVIDIA/NeMo-Guardrails https://docs.nvidia.com/nemo/guardrails/latest/user-guides/guardrails-library.html https://guardrailsai.com/docs/ https://github.com/guardrails-ai/guardrails https://guardrailsai.com/docs/getting_started/quickstart https://guardrailsai.com/docs/getting_started/guardrails_server https://pypi.org/project/guardrails-ai/ https://github.com/guardrails-ai/guardrails_pii https://huggingface.co/meta-llama/Llama-Guard-4-12B https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/ https://microsoft.github.io/presidio/ https://github.com/microsoft/presidio https://github.com/microsoft/presidio-research https://docs.databricks.com/aws/en/data-governance/unity-catalog/access-control https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/ https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/ https://docs.ragas.io/ https://docs.ragas.io/en/stable/references/evaluate/ https://docs.ragas.io/en/latest/tutorials/rag/ https://python.langchain.com/docs/concepts/text_splitters/ https://python.langchain.com/api_reference/text_splitters/index.html https://pypi.org/project/langchain-text-splitters/ https://milvus.io/docs/evaluation_with_deepeval.md https://mlflow.org/docs/latest/genai/eval-monitor/ https://mlflow.org/docs/2.10.1/llms/rag/notebooks/mlflow-e2e-evaluation.html The post Building AI agents is 5% AI and 100% software engineering appeared first on MarkTechPost.

Building AI agents is 5% AI and 100% software engineering 投稿を読む »

AI, Committee, ニュース, Uncategorized

MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators

Table of contents Hardware Generation without Templates Input IR: Affine, Relation-Centric Semantics (Deconstruct) Front End: FU Graph + Memory Co-Design (Architect) Back End: Compile & Optimize to RTL (Compile & Optimize) Outcome Importance for each segment How the “Compiler for AI Chips” Works—Step-by-Step ? Where It Lands in the Ecosystem? Summary MIT researchers (Han Lab) introduced LEGO, a compiler-like framework that takes tensor workloads (e.g., GEMM, Conv2D, attention, MTTKRP) and automatically generates synthesizable RTL for spatial accelerators—no handwritten templates. LEGO’s front end expresses workloads and dataflows in a relation-centric affine representation, builds FU (functional unit) interconnects and on-chip memory layouts for reuse, and supports fusing multiple spatial dataflows in a single design. The back end lowers to a primitive-level graph and uses linear programming and graph transforms to insert pipeline registers, rewire broadcasts, extract reduction trees, and shrink area and power. Evaluated across foundation models and classic CNNs/Transformers, LEGO’s generated hardware shows 3.2× speedup and 2.4× energy efficiency over Gemmini under matched resources. https://hanlab.mit.edu/projects/lego Hardware Generation without Templates Existing flows either: (1) analyze dataflows without generating hardware, or (2) generate RTL from hand-tuned templates with fixed topologies. These approaches restrict the architecture space and struggle with modern workloads that need to switch dataflows dynamically across layers/ops (e.g., conv vs. depthwise vs. attention). LEGO directly targets any dataflow and combinations, generating both architecture and RTL from a high-level description rather than configuring a few numeric parameters in a template. https://hanlab.mit.edu/projects/lego Input IR: Affine, Relation-Centric Semantics (Deconstruct) LEGO models tensor programs as loop nests with three index classes: temporal (for-loops), spatial (par-for FUs), and computation (pre-tiling iteration domain). Two affine relations drive the compiler: Data mapping fI→Df_{I→D}: maps computation indices to tensor indices. Dataflow mapping fTS→If_{TS→I}: maps temporal/spatial indices to computation indices. This affine-only representation eliminates modulo/division in the core analysis, making reuse detection and address generation a linear-algebra problem. LEGO also decouples control flow from dataflow (a vector c encodes control signal propagation/delay), enabling shared control across FUs and substantially reducing control logic overhead. Front End: FU Graph + Memory Co-Design (Architect) The main objectives is to maximize reuse and on-chip bandwidth while minimizing interconnect/mux overhead. Interconnection synthesis. LEGO formulates reuse as solving linear systems over the affine relations to discover direct and delay (FIFO) connections between FUs. It then computes minimum-spanning arborescences (Chu-Liu/Edmonds) to keep only necessary edges (cost = FIFO depth). A BFS-based heuristic rewrites direct interconnects when multiple dataflows must co-exist, prioritizing chain reuse and nodes already fed by delay connections to cut muxes and data nodes. Banked memory synthesis. Given the set of FUs that must read/write a tensor in the same cycle, LEGO computes bank counts per tensor dimension from the maximum index deltas (optionally dividing by GCD to reduce banks). It then instantiates data-distribution switches to route between banks and FUs, leaving FU-to-FU reuse to the interconnect. Dataflow fusion. Interconnects for different spatial dataflows are combined into a single FU-level Architecture Description Graph (ADG); careful planning avoids naïve mux-heavy merges and yields up to ~20% energy gains compared to naïve fusion. Back End: Compile & Optimize to RTL (Compile & Optimize) The ADG is lowered to a Detailed Architecture Graph (DAG) of primitives (FIFOs, muxes, adders, address generators). LEGO applies several LP/graph passes: Delay matching via LP. A linear program chooses output delays DvD_v to minimize inserted pipeline registers ∑(Dv−Du−Lv)⋅bitwidthsum (D_v-D_u-L_v)cdot text{bitwidth} across edges—meeting timing alignment with minimal storage. Broadcast pin rewiring. A two-stage optimization (virtual cost shaping + MST-based rewiring among destinations) converts expensive broadcasts into forward chains, enabling register sharing and lower latency; a final LP re-balances delays. Reduction tree extraction + pin reuse. Sequential adder chains become balanced trees; a 0-1 ILP remaps reducer inputs across dataflows so fewer physical pins are required (mux instead of add). This reduces both logic depth and register count. These passes focus on the datapath, which dominates resources (e.g., FU-array registers ≈ 40% area, 60% power), and produce ~35% area savings versus naïve generation. Outcome Setup. LEGO is implemented in C++ with HiGHS as the LP solver and emits SpinalHDL→Verilog. Evaluation covers tensor kernels and end-to-end models (AlexNet, MobileNetV2, ResNet-50, EfficientNetV2, BERT, GPT-2, CoAtNet, DDPM, Stable Diffusion, LLaMA-7B). A single LEGO-MNICOC accelerator instance is used across models; a mapper picks per-layer tiling/dataflow. Gemmini is the main baseline under matched resources (256 MACs, 256 KB on-chip buffer, 128-bit bus @ 16 GB/s). End-to-end speed/efficiency. LEGO achieves 3.2× speedup and 2.4× energy efficiency on average vs. Gemmini. Gains stem from: (i) a fast, accurate performance model guiding mapping; (ii) dynamic spatial dataflow switching enabled by generated interconnects (e.g., depthwise conv layers choose OH–OW–IC–OC). Both designs are bandwidth-bound on GPT-2. Resource breakdown. Example SoC-style configuration shows FU array and NoC dominate area/power, with PPUs contributing ~2–5%. This supports the decision to aggressively optimize datapaths and control reuse. Generative models. On a larger 1024-FU configuration, LEGO sustains >80% utilization for DDPM/Stable Diffusion; LLaMA-7B remains bandwidth-limited (expected for low operational intensity). https://hanlab.mit.edu/projects/lego Importance for each segment For researchers: LEGO provides a mathematically grounded path from loop-nest specifications to spatial hardware with provable LP-based optimizations. It abstracts away low-level RTL and exposes meaningful levers (tiling, spatialization, reuse patterns) for systematic exploration. For practitioners: It is effectively hardware-as-code. You can target arbitrary dataflows and fuse them in one accelerator, letting a compiler derive interconnects, buffers, and controllers while shrinking mux/FIFO overheads. This improves energy and supports multi-op pipelines without manual template redesign. For product leaders: By lowering the barrier to custom silicon, LEGO enables task-tuned, power-efficient edge accelerators (wearables, IoT) that keep pace with fast-moving AI stacks—the silicon adapts to the model, not the other way around. End-to-end results against a state-of-the-art generator (Gemmini) quantify the upside. How the “Compiler for AI Chips” Works—Step-by-Step? Deconstruct (Affine IR). Write the tensor op as loop nests; supply affine f_{I→D} (data mapping), f_{TS→I} (dataflow), and control flow vector c. This specifies what to compute and how it is spatialized, without templates. Architect (Graph Synthesis). Solve reuse equations → FU interconnects (direct/delay) → MST/heuristics for minimal

MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators 投稿を読む »

AI, Committee, ニュース, Uncategorized

Top Computer Vision CV Blogs & News Websites (2025)

Computer vision moved fast in 2025: new multimodal backbones, larger open datasets, and tighter model–systems integration. Practitioners need sources that publish rigorously, link code and benchmarks, and track deployment patterns—not marketing posts. This list prioritizes primary research hubs, lab blogs, and production-oriented engineering outlets with consistent update cadence. Use it to monitor SOTA shifts, grab reproducible code paths, and translate papers into deployable pipelines. Google Research (AI Blog) Primary source for advances from Google/DeepMind teams, including vision architectures (e.g., V-MoE) and periodic research year-in-review posts across CV and multimodal. Posts typically include method summaries, figures, and links to papers/code. Marktechpost Consistent reporting on new computer-vision models, datasets, and benchmarks with links to papers, code, and demos. Dedicated CV category plus frequent deep-dives (e.g., DINOv3 releases and analysis). Useful for staying on top of weekly research drops without wading through raw feeds. AI at Meta High-signal posts with preprints and open-source drops. Recent examples include DINOv3—scaled self-supervised backbones with SOTA across dense prediction tasks—which provide technical detail and artifacts. NVIDIA Technical Blog Production-oriented content on VLM-powered analytics, optimized inference, and GPU pipelines. Category feed for Computer Vision includes blueprints, SDK usage, and performance guidance relevant to enterprise deployments. arXiv cs.CV — raw research firehose The canonical preprint feed for CV. Use the recent or new views for daily updates; taxonomy confirms scope (image processing, pattern recognition, scene understanding). Best paired with RSS + custom filters. CVF Open Access (CVPR/ICCV/ECCV) Final versions of main-conference papers and workshops, searchable and citable. CVPR 2025 proceedings and workshop menus are already live, making this the authoritative archive post-acceptance. BAIR Blog (UC Berkeley) Occasional but deep posts on frontier topics (e.g., extremely large image modeling, robotics-vision crossovers). Good for conceptual clarity directly from authors. Stanford Blog Technical explainers and lab roundups (e.g., SAIL at CVPR 2025) with links to papers/talks. Useful to scan emerging directions across perception, generative models, and embodied vision. Roboflow Blog High-frequency, implementation-focused posts (labeling, training, deployment, apps, and trend reports). Strong for practitioners who need working pipelines and edge deployments. Hugging Face Blog Hands-on guides (VLMs, FiftyOne integrations) and ecosystem notes across Transformers, Diffusers, and timm; good for rapid prototyping and fine-tuning CV/VLM stacks. PyTorch Blog Change logs, APIs, and recipes affecting CV training/inference (Transforms V2, multi-weight support, FX feature extraction). Read when upgrading training stacks. The post Top Computer Vision CV Blogs & News Websites (2025) appeared first on MarkTechPost.

Top Computer Vision CV Blogs & News Websites (2025) 投稿を読む »

AI, Committee, ニュース, Uncategorized

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit

Qwen has released Qwen3-ASR-Toolkit, an MIT-licensed Python CLI that programmatically bypasses the Qwen3-ASR-Flash API’s 3-minute/10 MB per-request limit by performing VAD-aware chunking, parallel API calls, and automatic resampling/format normalization via FFmpeg. The result is stable, hour-scale transcription pipelines with configurable concurrency, context injection, and clean text post-processing. Python ≥3.8 prerequisite, Install with: Copy CodeCopiedUse a different Browser pip install qwen3-asr-toolkit What the toolkit adds on top of the API Long-audio handling. The toolkit slices input using voice activity detection (VAD) at natural pauses, keeping each chunk under the API’s hard duration/size caps, then merges outputs in order. Parallel throughput. A thread pool dispatches multiple chunks concurrently to DashScope endpoints, improving wall-clock latency for hour-long inputs. You control concurrency via -j/–num-threads. Format & rate normalization. Any common audio/video container (MP4/MOV/MKV/MP3/WAV/M4A, etc.) is converted to the API’s required mono 16 kHz before submission. Requires FFmpeg installed on PATH. Text cleanup & context. The tool includes post-processing to reduce repetitions/hallucinations and supports context injection to bias recognition toward domain terms; the underlying API also exposes language detection and inverse text normalization (ITN) toggles. The official Qwen3-ASR-Flash API is single-turn and enforces ≤3 min duration and ≤10 MB payloads per call. That is reasonable for interactive requests but awkward for long media. The toolkit operationalizes best practices—VAD-aware segmentation + concurrent calls—so teams can batch large archives or live capture dumps without writing orchestration from scratch. Quick start Install prerequisites Copy CodeCopiedUse a different Browser # System: FFmpeg must be available # macOS brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install -y ffmpeg Install the CLI Copy CodeCopiedUse a different Browser pip install qwen3-asr-toolkit Configure credentials Copy CodeCopiedUse a different Browser # International endpoint key export DASHSCOPE_API_KEY=”sk-…” Run Copy CodeCopiedUse a different Browser # Basic: local video, default 4 threads qwen3-asr -i “/path/to/lecture.mp4” # Faster: raise parallelism and pass key explicitly (optional if env var set) qwen3-asr -i “/path/to/podcast.wav” -j 8 -key “sk-…” # Improve domain accuracy with context qwen3-asr -i “/path/to/earnings_call.m4a” -c “tickers, CFO name, product names, Q3 revenue guidance” Arguments you’ll actually use:-i/–input-file (file path or http/https URL), -j/–num-threads, -c/–context, -key/–dashscope-api-key, -t/–tmp-dir, -s/–silence. Output is printed and saved as <input_basename>.txt. Minimal pipeline architecture Load local file or URL → 2) VAD to find silence boundaries → 3) Chunk under API caps → 4) Resample to 16 kHz mono → 5) Parallel submit to DashScope → 6) Aggregate segments in order → 7) Post-process text (dedupe, repetitions) → 8) Emit .txt transcript. Summary Qwen3-ASR-Toolkit turns Qwen3-ASR-Flash into a practical long-audio pipeline by combining VAD-based segmentation, FFmpeg normalization (mono/16 kHz), and parallel API dispatch under the 3-minute/10 MB caps. Teams get deterministic chunking, configurable throughput, and optional context/LID/ITN controls without custom orchestration. For production, pin the package version, verify region endpoints/keys, and tune thread count to your network and QPS—then pip install qwen3-asr-toolkit and ship. Check out the GitHub Page for Codes. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit appeared first on MarkTechPost.

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Download: the CDC’s vaccine chaos

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A pivotal meeting on vaccine guidance is underway—and former CDC leaders are alarmed This week has been an eventful one for America’s public health agency. Two former leaders of the US Centers for Disease Control and Prevention explained why they suddenly departed in a Senate hearing. They also described how CDC employees are being instructed to turn their backs on scientific evidence. They painted a picture of a health agency in turmoil—and at risk of harming the people it is meant to serve. And, just hours afterwards, a panel of CDC advisers voted to stop recommending the MMRV vaccine for children under four. Read the full story. —Jessica Hamzelou This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. If you’re interested in reading more about US vaccine policy, check out: + Read our profile of Jim O’Neill, the deputy health secretary and current acting CDC director. + Why US federal health agencies are abandoning mRNA vaccines. Read the full story. + Why childhood vaccines are a public health success story. No vaccine is perfect, but these medicines are still saving millions of lives. Read the full story.  + The FDA plans to limit access to covid vaccines. Here’s why that’s not all bad. Meet Sneha Goenka: our 2025 Innovator of the Year Every year, MIT Technology Review selects one individual whose work we admire to recognize as Innovator of the Year. For 2025, we chose Sneha Goenka, who designed the computations behind the world’s fastest whole-genome sequencing method.  Thanks to her work, physicians can now sequence a patient’s genome and diagnose a genetic condition in less than eight hours—an achievement that could transform medical care. Register here to join an exclusive subscriber-only Roundtable conversation with Goenka, Leilani Battle, assistant professor at the University of Washington, and our editor in chief Mat Honan at 1pm ET next Tuesday September 23. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The CDC voted against giving some children a combined vaccine If accepted, the agency will stop recommending the MMRV vaccine for children under 4. (CNN)+ Its vote on hepatitis B vaccines for newborns is expected today too. (The Atlantic $)+ RFK JR’s allies are closing ranks around him. (Politico) 2 Russia is using Charlie Kirk’s murder to sow division in the USIt’s using the momentum to push pro-Kremlin narratives and divide Americans. (WP $)+ The complicated phenomenon of political violence. (Vox)+ We don’t know what being ‘terminally online’ means any more. (Wired $) 3 Nvidia will invest $5 billion in IntelThe partnership allows Intel to develop custom CPUs to work with Nvidia’s chips. (WSJ $)+ It’s a much-needed financial shot in the arm for Intel. (WP $)+ It’s also great news for Intel’s Asian suppliers. (Bloomberg $) 4 Medical AI tools downplay symptoms in women and ethnic minoritiesExperts fear that LLM-powered tools could lead to worse health outcomes. (FT $)+ Artificial intelligence is infiltrating health care. We shouldn’t let it make all the decisions. (MIT Technology Review) 5 AI browsers have hit the mainstreamWhere’s the off switch? (Wired $)+ AI means the end of internet search as we’ve known it. (MIT Technology Review) 6 China has entered the global brain interface raceIts ambitious government-backed startups are primed to challenge Neuralink. (Bloomberg $)+ This patient’s Neuralink brain implant gets a boost from generative AI. (MIT Technology Review) 7 What makes humans unique in the age of AI?Defining the distinctions between us and machines isn’t as easy as it used to be. (New Yorker $)+ How AI can help supercharge creativity. (MIT Technology Review) 8 This ship helps to reconnect Africa’s internetAI needs high speed internet, which needs undersea cables. (Rest of World)+ What Africa needs to do to become a major AI player. (MIT Technology Review) 9 Hundreds of people queued in Beijing to buy Apple’s new iPhoneDesire for Apple products in the country appears to be alive and well. (Reuters) 10 San Francisco’s idea of a great night out? A robot cage fightIt’s certainly one way to have a good time. (NYT $) Quote of the day “Get off the iPad!” —An irate air traffic controller tells the pilots of a Spirit Airlines flight to pay attention to avoid potentially colliding with Donald Trump’s Air Force One aircraft, Ars Technica reports. One more thing We used to get excited about technology. What happened? As a philosopher who studies AI and data, Shannon Vallor’s Twitter feed is always filled with the latest tech news. Increasingly, she’s realized that the constant stream of information is no longer inspiring joy, but a sense of resignation. Joy is missing from our lives, and from our technology. Its absence is feeding a growing unease being voiced by many who work in tech or study it. Fixing it depends on understanding how and why the priorities in our tech ecosystem have changed. Read the full story. We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + Would you go about your daily business with a soft toy on your shoulder? This intrepid reporter gave it a go.+ How dying dinosaurs shaped the landscapes around us.+ I can’t believe I missed Pythagorean Theorem day earlier this week.+ Inside the rise in popularity of the no-water yard.

The Download: the CDC’s vaccine chaos 投稿を読む »

We use cookies to improve your experience and performance on our website. You can learn more at プライバシーポリシー and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
ja