YouZum

Committee

AI, Committee, Notizie, Uncategorized

What is AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition)

Artificial Intelligence (AI) has evolved rapidly—especially in how models are deployed and operated in real-world systems. The core function that connects model training to practical applications is “inference”. This article offers a technical deep dive into AI inference as of 2025, covering its distinction from training, latency challenges for modern models, and optimization strategies such as quantization, pruning, and hardware acceleration. Inference vs. Training: The Critical Difference AI model deployment consists of two primary phases: Training is the process where a model learns patterns from massive, labeled datasets, using iterative algorithms (typically, backpropagation on neural networks). This phase is computation-heavy and generally done offline, leveraging accelerators like GPUs. Inference is the model’s “in action” phase—making predictions on new, unseen data. Here, the trained network is fed input, and the output is produced via a forward pass only. Inference happens in production environments, often requiring rapid responses and lower resource use. Aspect Training Inference Purpose Learn patterns, optimize weights Make predictions on new data Computation Heavy, iterative, uses backpropagation Lighter, forward pass only Time Sensitivity Offline, can take hours/days/weeks Real-time or near-real-time Hardware GPUs/TPUs, datacenter-scale CPUs, GPUs, FPGAs, edge devices Inference Latency: Challenges for 2025 Latency—the time from input to output—is one of the top technical challenges in deploying AI, especially large language models (LLMs) and real-time applications (autonomous vehicles, conversational bots, etc.). Key Sources of Latency Computational Complexity: Modern architectures—like transformers—have quadratic computational costs due to self-attention. (e.g., O ( n 2 d ) O(n 2 d) for sequence length n n and embedding dimension d d). Memory Bandwidth: Large models (with billions of parameters) require tremendous data movement, which often bottlenecks on memory speed and system I/O. Network Overhead: For cloud inference, network latency and bandwidth become critical—especially for distributed and edge deployments. Predictable vs. Unpredictable Latency: Some delays can be designed for (e.g., batch inference), while others—hardware contention, network jitter—cause unpredictable delays. Real-World Impact Latency directly affects user experience (voice assistants, fraud detection), system safety (driverless cars), and operational cost (cloud compute resources). As models grow, optimizing latency becomes increasingly complex and essential. Quantization: Lightening the Load Quantization reduces model size and computational requirements by lowering the numerical precision (e.g., converting 32-bit floats to 8-bit integers). How It Works: Quantization replaces high-precision parameters with lower-precision approximations, decreasing memory and compute needs. Types: Uniform/Non-uniform quantization Post-Training Quantization (PTQ) Quantization-Aware Training (QAT) Trade-offs: While quantization can dramatically speed up inference, it might slightly reduce model accuracy—careful application maintains performance within acceptable bounds. LLMs & Edge Devices: Especially valuable for LLMs and battery-powered devices, allowing for fast, low-cost inference. Pruning: Model Simplification Pruning is the process of removing redundant or non-essential model components—such as neural network weights or decision tree branches. Techniques: L1 Regularization: Penalizes large weights, shrinking less useful ones to zero. Magnitude Pruning: Removes lowest-magnitude weights or neurons. Taylor Expansion: Estimates the least impactful weights and prunes them. SVM Pruning: Reduces support vectors to simplify decision boundaries. Benefits: Lower memory. Faster inference. Reduced overfitting. Easier model deployment to resource-constrained environments. Risks: Aggressive pruning may degrade accuracy—balancing efficiency and accuracy is key. Hardware Acceleration: Speeding Up Inference Specialized hardware is transforming AI inference in 2025: GPUs: Offer massive parallelism, ideal for matrix and vector operations. NPUs (Neural Processing Units): Custom processors, optimized for neural network workloads. FPGAs (Field-Programmable Gate Arrays): Configurable chips for targeted, low-latency inference in embedded/edge devices. ASICs (Application-Specific Integrated Circuits): Purpose-built for highest efficiency and speed in large-scale deployments. Trends: Real-time, Energy-efficient Processing: Essential for autonomous systems, mobile devices, and IoT. Versatile Deployment: Hardware accelerators now span cloud servers to edge devices. Reduced Cost and Energy: Emerging accelerator architectures slash operational costs and carbon footprints. Here are the top 9 AI inference providers in 2025: Together AI Specializes in scalable LLM deployments, offering fast inference APIs and unique multi-model routing for hybrid cloud setups. Fireworks AI Renowned for ultra-fast multi-modal inference and privacy-oriented deployments, leveraging optimized hardware and proprietary engines for low latency. Hyperbolic Delivers serverless inference for generative AI, integrating automated scaling and cost optimization for high-volume workloads. Replicate Focuses on model hosting and deployment, allowing developers to run and share AI models rapidly in production with easy integrations. Hugging Face The go-to platform for transformer and LLM inference, providing robust APIs, customization options, and community-backed open-source models. Groq Known for custom Language Processing Unit (LPU) hardware that achieves unprecedented low-latency and high-throughput inference speeds for large models. DeepInfra Offers a dedicated cloud for high-performance inference, catering especially to startups and enterprise teams with customizable infrastructure. OpenRouter Aggregates multiple LLM engines, providing dynamic model routing and cost transparency for enterprise-grade inference orchestration. Lepton (Acquired by NVIDIA) Specializes in compliance-focused, secure AI inference with real-time monitoring and scalable edge/cloud deployment options. Conclusion Inference is where AI meets the real world, turning data-driven learning into actionable predictions. Its technical challenges—latency, resource constraints—are being met by innovations in quantization, pruning, and hardware acceleration. As AI models scale and diversify, mastering inference efficiency is the frontier for competitive, impactful deployment in 2025. Whether deploying conversational LLMs, real-time computer vision systems, or on-device diagnostics, understanding and optimizing inference will be central for technologists and enterprises aiming to lead in the AI era. The post What is AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition) appeared first on MarkTechPost.

What is AI Inference? A Technical Deep Dive and Top 9 AI Inference Providers (2025 Edition) Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing

dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types. Architecture Unified Model: dots.ocr combines layout detection and content recognition into a single transformer-based neural network. This eliminates the complexity of separate detection and OCR pipelines, allowing users to switch tasks by adjusting input prompts. Parameters: The model contains 1.7 billion parameters, balancing computational efficiency with performance for most practical scenarios. Input Flexibility: Inputs can be image files or PDF documents. The model features preprocessing options (such as fitz_preprocess) for optimizing quality on low-resolution or dense multi-page files. Capabilities Multilingual: dots.ocr is trained on datasets spanning more than 100 languages, including major world languages and less common scripts, reflecting broad multilingual support. Content Extraction: The model extracts plain text, tabular data, mathematical formulas (in LaTeX), and preserves reading order within documents. Output formats include structured JSON, Markdown, and HTML, depending on the layout and content type. Preserves Structure: dots.ocr maintains document structure, including table boundaries, formula regions, and image placements, ensuring extracted data remains faithful to the original document. Benchmark Performance dots.ocr has been evaluated against modern document AI systems, with results summarized below: Benchmark dots.ocr Gemini2.5-Pro Table TEDS accuracy 88.6% 85.8% Text edit distance 0.032 0.055 Tables: Outperforms Gemini2.5-Pro in table parsing accuracy. Text: Demonstrates lower text edit distance (indicating higher precision). Formulas and Layout: Matches or exceeds leading models in formula recognition and document structure reconstruction. https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md Deployment and Integration Open-Source: Released under the MIT license, with source, documentation, and pre-trained models available on GitHub. The repository provides installation instructions for pip, Conda, and Docker-based deployments. API and Scripting: Supports flexible task configuration via prompt templates. The model can be used interactively or within automated pipelines for batch document processing. Output Formats: Extracted results are supplied in structured JSON for programmatic use, with options for Markdown and HTML where appropriate. Visualization scripts enable inspection of detected layouts. Conclusion dots.ocr provides a technical solution for high-accuracy, multilingual document parsing by unifying layout detection and content recognition in a single, open-source model. It is particularly suited for scenarios requiring robust, language-agnostic document analysis and structured information extraction in resource-constrained or production environments. Check out the GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Partner with Marktechpost for Promotion The post Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing appeared first on MarkTechPost.

Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

A Coding Guide to Build and Validate End-to-End Partitioned Data Pipelines in Dagster with Machine Learning Integration

In this tutorial, we implement an advanced data pipeline using Dagster. We set up a custom CSV-based IOManager to persist assets, define partitioned daily data generation, and process synthetic sales data through cleaning, feature engineering, and model training. Along the way, we add a data-quality asset check to validate nulls, ranges, and categorical values, and we ensure that metadata and outputs are stored in a structured way. The focus throughout is on hands-on implementation, showing how to integrate raw data ingestion, transformations, quality checks, and machine learning into a single reproducible workflow. Copy CodeCopiedUse a different Browser import sys, subprocess, json, os subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, “dagster”, “pandas”, “scikit-learn”]) import numpy as np, pandas as pd from pathlib import Path from dagster import ( asset, AssetCheckResult, asset_check, Definitions, materialize, Output, DailyPartitionsDefinition, IOManager, io_manager ) from sklearn.linear_model import LinearRegression BASE = Path(“/content/dagstore”); BASE.mkdir(parents=True, exist_ok=True) START = “2025-08-01″ We begin by installing the required libraries, Dagster, Pandas, and scikit-learn, so that we have the full toolset available in Colab. We then import essential modules, set up NumPy and Pandas for data handling, and define a base directory along with a start date to organize our pipeline outputs. Copy CodeCopiedUse a different Browser class CSVIOManager(IOManager): def __init__(self, base: Path): self.base = base def _path(self, key, ext): return self.base / f”{‘_’.join(key.path)}.{ext}” def handle_output(self, context, obj): if isinstance(obj, pd.DataFrame): p = self._path(context.asset_key, “csv”); obj.to_csv(p, index=False) context.log.info(f”Saved {context.asset_key} -> {p}”) else: p = self._path(context.asset_key, “json”); p.write_text(json.dumps(obj, indent=2)) context.log.info(f”Saved {context.asset_key} -> {p}”) def load_input(self, context): k = context.upstream_output.asset_key; p = self._path(k, “csv”) df = pd.read_csv(p); context.log.info(f”Loaded {k} <- {p} ({len(df)} rows)”); return df @io_manager def csv_io_manager(_): return CSVIOManager(BASE) daily = DailyPartitionsDefinition(start_date=START) We define a custom CSVIOManager to save asset outputs as CSV or JSON files and reload them when needed. We then register it with Dagster as csv_io_manager and set up a daily partitioning scheme so that our pipeline can process data for each date independently. Copy CodeCopiedUse a different Browser @asset(partitions_def=daily, description=”Synthetic raw sales with noise & occasional nulls.”) def raw_sales(context) -> Output[pd.DataFrame]: rng = np.random.default_rng(42) n = 200; day = context.partition_key x = rng.normal(100, 20, n); promo = rng.integers(0, 2, n); noise = rng.normal(0, 10, n) sales = 2.5 * x + 30 * promo + noise + 50 x[rng.choice(n, size=max(1, n // 50), replace=False)] = np.nan df = pd.DataFrame({“date”: day, “units”: x, “promo”: promo, “sales”: sales}) meta = {“rows”: n, “null_units”: int(df[“units”].isna().sum()), “head”: df.head().to_markdown()} return Output(df, metadata=meta) @asset(description=”Clean nulls, clip outliers for robust downstream modeling.”) def clean_sales(context, raw_sales: pd.DataFrame) -> Output[pd.DataFrame]: df = raw_sales.dropna(subset=[“units”]).copy() lo, hi = df[“units”].quantile([0.01, 0.99]); df[“units”] = df[“units”].clip(lo, hi) meta = {“rows”: len(df), “units_min”: float(df.units.min()), “units_max”: float(df.units.max())} return Output(df, metadata=meta) @asset(description=”Feature engineering: interactions & standardized columns.”) def features(context, clean_sales: pd.DataFrame) -> Output[pd.DataFrame]: df = clean_sales.copy() df[“units_sq”] = df[“units”] ** 2; df[“units_promo”] = df[“units”] * df[“promo”] for c in [“units”, “units_sq”, “units_promo”]: mu, sigma = df[c].mean(), df[c].std(ddof=0) or 1.0 df[f”z_{c}”] = (df[c] – mu) / sigma return Output(df, metadata={“rows”: len(df), “cols”: list(df.columns)}) We create three core assets for the pipeline. First, raw_sales generates synthetic daily sales data with noise and occasional missing values, simulating real-world imperfections. Next, clean_sales removes nulls and clips outliers to stabilize the dataset, while logging metadata about ranges and row counts. Finally, features perform feature engineering by adding interaction and standardized variables, preparing the data for downstream modeling. Copy CodeCopiedUse a different Browser @asset_check(asset=clean_sales, description=”No nulls; promo in {0,1}; units within clipped bounds.”) def clean_sales_quality(clean_sales: pd.DataFrame) -> AssetCheckResult: nulls = int(clean_sales.isna().sum().sum()) promo_ok = bool(set(clean_sales[“promo”].unique()).issubset({0, 1})) units_ok = bool(clean_sales[“units”].between(clean_sales[“units”].min(), clean_sales[“units”].max()).all()) passed = bool((nulls == 0) and promo_ok and units_ok) return AssetCheckResult( passed=passed, metadata={“nulls”: nulls, “promo_ok”: promo_ok, “units_ok”: units_ok}, ) @asset(description=”Train a tiny linear regressor; emit R^2 and coefficients.”) def tiny_model_metrics(context, features: pd.DataFrame) -> dict: X = features[[“z_units”, “z_units_sq”, “z_units_promo”, “promo”]].values y = features[“sales”].values model = LinearRegression().fit(X, y) return {“r2_train”: float(model.score(X, y)), **{n: float(c) for n, c in zip([“z_units”,”z_units_sq”,”z_units_promo”,”promo”], model.coef_)}} We strengthen the pipeline with validation and modeling. The clean_sales_quality asset check enforces data integrity by verifying that there are no nulls, the promo field only has 0/1 values, and the cleaned units remain within valid bounds. After that, tiny_model_metrics trains a simple linear regression on the engineered features and outputs key metrics like training and learned coefficients, giving us a lightweight but complete modeling step within the Dagster workflow. Copy CodeCopiedUse a different Browser defs = Definitions( assets=[raw_sales, clean_sales, features, tiny_model_metrics, clean_sales_quality], resources={“io_manager”: csv_io_manager} ) if __name__ == “__main__”: run_day = os.environ.get(“RUN_DATE”) or START print(“Materializing everything for:”, run_day) result = materialize( [raw_sales, clean_sales, features, tiny_model_metrics, clean_sales_quality], partition_key=run_day, resources={“io_manager”: csv_io_manager}, ) print(“Run success:”, result.success) for fname in [“raw_sales.csv”,”clean_sales.csv”,”features.csv”,”tiny_model_metrics.json”]: f = BASE / fname if f.exists(): print(fname, “->”, f.stat().st_size, “bytes”) if fname.endswith(“.json”): print(“Metrics:”, json.loads(f.read_text())) We register our assets and the IO manager in Definitions, then materialize the entire DAG for a selected partition key in one run. We persist CSV/JSON artifacts to /content/dagstore and print a quick success flag, plus saved file sizes and model metrics for immediate verification. In conclusion, we materialize all assets and checks in a single Dagster run, confirm data quality, and train a regression model whose metrics are stored for inspection. We keep the pipeline modular, with each asset producing and persisting its outputs in CSV or JSON, and ensure compatibility by explicitly converting metadata values to supported types. This tutorial demonstrates how we can combine partitioning, asset definitions, and checks to build a technically robust and reproducible workflow, giving us a practical framework to extend toward more complex real-world pipelines. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Partner with Marktechpost for Promotion The post A Coding Guide to Build and Validate End-to-End Partitioned Data Pipelines in Dagster with Machine Learning Integration appeared first on MarkTechPost.

A Coding Guide to Build and Validate End-to-End Partitioned Data Pipelines in Dagster with Machine Learning Integration Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

The Download: Taiwan’s silicon shield, and ChatGPT’s personality misstep

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Taiwan’s “silicon shield” could be weakening Taiwanese politics increasingly revolves around one crucial question: Will China invade? China’s ruling party has wanted to seize Taiwan for more than half a century. But in recent years, China’s leader, Xi Jinping, has placed greater emphasis on the idea of “taking back” the island (which the Chinese Communist Party, or CCP, has never controlled). Many in Taiwan and elsewhere think one major deterrent has to do with the island’s critical role in semiconductor manufacturing. Taiwan produces the majority of the world’s semiconductors and more than 90% of the most advanced chips needed for AI applications. But now some Taiwan specialists and some of the island’s citi­zens are worried that this “silicon shield,” if it ever existed, is cracking. Read the full story. —Johanna M. Costigan This story is from our forthcoming print issue, which is all about security. If you haven’t already, subscribe now to receive future issues once they land. Why there’s a big backlash against ChatGPT’s new ‘personality’ When OpenAI made the switch to its new GPT-5 model last week, a number of people reacted with shock, frustration, sadness, or anger to previous model 4o’s sudden disappearance from ChatGPT. Despite its awareness that people are developing emotional bonds with the model, OpenAI appears to have been caught flat-footed by the fervor of users’ pleas for its return. Within a day, the company made 4o available again to its paying customers (free users are stuck with GPT-5). MIT Technology Review spoke with several ChatGPT users who were deeply affected by the loss of 4o. All are women between the ages of 20 and 40, and all bar one considered 4o to be a romantic partner. Read the full story. —Grace Huckins Why US federal health agencies are abandoning mRNA vaccines This time five years ago, we were in the throes of the covid-19 pandemic. Then came the vaccines. The first mRNA vaccines for covid were authorized for use in December 2020. The US government played an important role in the introduction of these vaccines, providing $18 billion to support their development. But now, that government is turning its back on the technology. Funding is being withdrawn. Partnerships are being canceled. Leaders of US health agencies are casting doubt on the vaccines’ effectiveness and safety. And this week, the director of the National Institutes of Health implied that the reversal was due to a lack of public trust in the technology. Plenty of claims are being thrown about. So let’s consider the evidence. Read the full story. —Jessica Hamzelou This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The Trump administration is in talks to buy a stake in Intel Just weeks after Trump called for the CEO to step down. (Bloomberg $)+ It’s part of its plan to increase US market share in chip manufacturing. (WSJ $)+ Intel is probably hoping such a deal could help its beleaguered Ohio factory. (TechCrunch) 2 Meta’s AI rules allowed its chatbots to flirt with childrenAnd it only recently amended the guidelines after being questioned about it. (Reuters)+ We don’t know how long the policies were in place. (The Verge)+ An AI companion site is hosting sexually charged conversations with underage celebrity bots. (MIT Technology Review) 3 Erin is America’s first real test of hurricane readiness under TrumpIt looks like it’ll become the season’s first hurricane. (Vox)+ Trackers are uncertain about where the storm will head. (NYT $)+ Here’s what we know about hurricanes and climate change. (MIT Technology Review) 4 xAI lost a major US government contract after Grok praised HitlerLeaving the government to partner with OpenAI, Anthropic, and Gemini instead. (Wired $)+ xAI’s ‘Grok for Government’ site doesn’t appear to reflect this. (Ars Technica) 5 Tech leaders are upping their securityAs public hostility towards corporate executives deepens. (FT $) 6 These TikTokers are documenting their lives after deportationThey’re sharing their realities and creating new communities. (NY Mag $)+ ICE added a random person to a highly sensitive group chat. (404 Media) 7 We may soon be able to hear some patients’ inner voicesNew research has successfully guessed words imagined by people unable to speak. (NYT $)+ Motor neuron diseases took their voices. AI is bringing them back. (MIT Technology Review) 8 China’s plug-in hybrids are everywhereAnd they’re likely to dominate exports for the next three years at least. (Rest of World)+ China’s EV giants are betting big on humanoid robots. (MIT Technology Review) 9 The UK is working with TikTok influencers to tackle medical tourismIt’s a bid to raise awareness of the risks of undertaking cosmetic surgery abroad. (BBC) 10 AI may experience the passage of time differently to usWhat does this mean for our future? (IEEE Spectrum)+ What is AI? (MIT Technology Review) Quote of the day “We’ve realized the best way to get them is when they’re scrolling social media.” —Ryan Odendahl, president and CEO of construction company Kwest Group, tells the Washington Post how his company is getting young people interested in learning traditional trades. One more thing The next generation of neural networks could live in hardware Networks programmed directly into computer chip hardware can identify images faster, and use much less energy, than the traditional neural networks that underpin most modern AI systems.  Neural networks, from GPT-4 to Stable Diffusion, are built by wiring together perceptrons, which are highly simplified simulations of the neurons in our brains. In very large numbers, perceptrons are powerful, but they also consume enormous volumes of energy. Part of the trouble is that perceptrons are just software abstractions—running a perceptron network on a GPU requires translating that network into the language of hardware, which takes

The Download: Taiwan’s silicon shield, and ChatGPT’s personality misstep Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series Foundation Model Built on a Decoder‑only Transformer Architecture

Salesforce AI Research has unveiled Moirai 2.0, the latest advancement in the world of time series foundation models. Built atop a decoder-only transformer architecture, Moirai 2.0 sets a new bar for performance and efficiency, claiming the #1 spot on the GIFT-Eval benchmark-the gold standard for time-series forecasting model evaluation. Not only is it 44% faster in inference and 96% smaller in size compared to its predecessor, but this substantial leap comes without sacrificing accuracy—making it a game-changer for both research and enterprise environments. What Makes Moirai 2.0 Special? Architecture Innovations Decoder-only Transformer: The switch from a masked encoder to a decoder-only transformer empowers Moirai 2.0 to better model autoregressive forecast generation, enhancing scalability and performance on larger, more complex datasets. Efficient Multi-Token Prediction: By predicting multiple tokens at a time (rather than just one), the model achieves greater efficiency and stability during forecasting. Advanced Data Filtering: Low-quality, non-forecastable time series are automatically filtered out during training, improving robustness. Patch Token Embedding & Random Masking: New techniques in encoding missing value information and robustness to incomplete data during inference. Expanded Dataset for Pretraining Moirai 2.0 leverages a richer mix of training data: Real-world sets like GIFT-Eval Pretrain and Train Chronos mixup: Synthetic time series blending for diversity KernelSynth procedures from Chronos research Internal operational data from Salesforce IT systems This broad data foundation enables Moirai 2.0 to generalize across countless forecasting tasks and domains. Performance: Breaking New Ground Moirai 2.0 is a leap beyond its predecessors: Best MASE Score on GIFT-Eval for non-data-leaking models (industry-accepted metric for forecast accuracy) CRPS Performance matches previous state-of-the-art Compared to Moirai_large: 16% better on MASE 13% better on CRPS 44% faster in inference 96% smaller parameter size https://www.salesforce.com/blog/moirai-2-0/ These results make high-performance, scalable forecasting more accessible to a broader audience. Why Moirai 2.0 Matters for Practitioners Moirai 2.0’s capabilities extend beyond academic benchmarks into enterprise-critical domains such as: IT Operations: Proactive capacity scaling, anomaly detection Sales Forecasting: Accurate, scalable revenue predictions Demand Forecasting: Optimized inventory management Supply Chain Planning: Better scheduling, reduced waste And many more data-driven business processes With dramatically reduced model size and improved speed, high-quality forecasting can now be applied at scale—empowering businesses to make smarter, faster decisions regardless of their data infrastructure. Getting Started: Moirai 2.0 in Practice Integration is seamless for developers and data scientists. Here’s a typical workflow, leveraging open-source modules available on Hugging Face: Sample Python Workflow Import Libraries Copy CodeCopiedUse a different Browser import matplotlib.pyplot as plt from gluonts.dataset.repository import dataset_recipes from uni2ts.eval_util.data import get_gluonts_test_dataset from uni2ts.model.moirai2 import Moirai2Forecast, Moirai2Module Load Moirai 2.0 Copy CodeCopiedUse a different Browser model = Moirai2Forecast( module=Moirai2Module.from_pretrained(“Salesforce/moirai-2.0-R-small”), prediction_length=100, context_length=1680, target_dim=1, feat_dynamic_real_dim=0, past_feat_dynamic_real_dim=0 ) Load Dataset & Generate Forecasts Copy CodeCopiedUse a different Browser test_data, metadata = get_gluonts_test_dataset(“electricity”, prediction_length=None, regenerate=False) predictor = model.create_predictor(batch_size=32) forecasts = predictor.predict(test_data.input) Visualize Results Copy CodeCopiedUse a different Browser # Example visualization fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(25, 10)) # Use Moirai plotting utility to display forecasts Full examples and notebook links are provided by Salesforce for deeper experimentation. Universal, Scalable, Robust By democratizing access to cutting-edge, general-purpose forecasting technology, Moirai 2.0 is poised to reshape the landscape of time series modeling. With flexibility across domains, better robustness, faster inference, and lower computational demands, Salesforce AI Research’s model paves the way for businesses and researchers globally to harness the power of forecasting for transformative decision making. Check out the Technical details and Hugging Face (Model). Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series Foundation Model Built on a Decoder‑only Transformer Architecture appeared first on MarkTechPost.

Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series Foundation Model Built on a Decoder‑only Transformer Architecture Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

R-Zero: A Fully Autonomous AI Framework that Generates Its Own Training Data from Scratch

Large Language Models (LLMs) have revolutionized fields from natural language understanding to reasoning and code generation. However, pushing their reasoning ability to truly superhuman levels has been limited by the need for massive, high-quality, human-annotated datasets. A team of researchers from Tencent AI Seattle Lab, Washington University, the University of Maryland, and the University of Texas have proposed R-Zero, a framework designed to train reasoning LLMs that can self-evolve without relying on external data labels. Beyond Human-Curated Data Most progress in LLM reasoning is tethered to datasets laboriously curated by humans, an approach that is resource-intensive and fundamentally limited by human knowledge. Even label-free methods using LLMs’ own outputs for reward signals still depend on existing collections of unsolved tasks or problems. These dependencies bottleneck scalability and hinder the dream of open-ended AI reasoning beyond human capabilities. R-Zero: Self-Evolution from Zero Data R-Zero forges a novel path by entirely removing the reliance on external tasks and labels. Instead, it introduces a co-evolutionary dynamic between two instances of a base model: Challenger: Responsible for creating new, challenging reasoning tasks near the edge of the Solver’s capability. Solver: Trained to solve increasingly difficult problems posed by the Challenger, improving iteratively. This synergy enables the curriculum—the set of training data—to be self-generated and adapted continuously to the model’s evolving strengths and weaknesses. The process works as follows: Challenger Training: Trained via reinforcement learning (specifically Group Relative Policy Optimization [GRPO]), it generates diverse, hard-to-solve questions. The reward signal for each question is based on the Solver’s uncertainty: highest when Solver’s answers are maximally inconsistent (empirical accuracy approaches 50%). Solver Training: Solver is fine-tuned on the Challenger’s curated problems. Pseudo-labels (answers) are determined by majority vote among Solver’s own responses. Only questions with answers neither too consistent nor too scattered (i.e., in an informative band) are used for training. Iterative Loop: Challenger and Solver alternate roles, co-evolving over several rounds, progressively improving reasoning abilities without human intervention. Key Technical Innovations Group Relative Policy Optimization (GRPO)GRPO is a reinforcement learning algorithm that normalizes the reward for each generated answer relative to the group of responses for the same prompt. This method efficiently fine-tunes policy LLMs without a separate value function. Uncertainty-Driven CurriculumThe Challenger is rewarded for generating problems at the Solver’s frontier—neither too easy nor impossible. The reward function peaks for tasks where the Solver achieves 50% accuracy, maximizing learning efficiency per theoretical analysis. Repetition Penalty and Format ChecksTo guarantee diverse and well-structured training data, a repetition penalty discourages similar questions within a batch, and strict format checks ensure data quality. Pseudo-Label Quality ControlOnly question-answer pairs with intermediate answer consistency are used for training, filtering out ambiguous or ill-posed problems and calibrating label accuracy. Empirical Performance Mathematical Reasoning Benchmarks R-Zero was evaluated using seven rigorous mathematical benchmarks, including AMC, Minerva, MATH-500, GSM8K, Olympiad-Bench, and AIME competitions. Compared with the base model and non-trained Challenger baseline, three iterations of R-Zero led to substantial improvements in reasoning accuracy across all model sizes and architectures (e.g., Qwen3-8B-Base improved from 49.18 to 54.69 average score after three iterations). General Reasoning Benchmarks Crucially, R-Zero’s improvements generalize beyond math. Benchmarks including MMLU-Pro, SuperGPQA, and BIG-Bench Extra Hard (BBEH) show significant gains in general-domain reasoning accuracy (e.g., Qwen3-8B-Base’s overall average jumps from 34.49 to 38.73), demonstrating strong transfer effects. Conclusion R-Zero marks a major milestone toward self-sufficient, superhuman reasoning LLMs. Its fully autonomous co-evolutionary training pipeline offers not only strong empirical gains in reasoning but a new lens through which to view scalable, data-free AI development. Researchers and practitioners can experiment with this framework today, leveraging open-source tools to pioneer the next era of reasoning-centric language models. Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post R-Zero: A Fully Autonomous AI Framework that Generates Its Own Training Data from Scratch appeared first on MarkTechPost.

R-Zero: A Fully Autonomous AI Framework that Generates Its Own Training Data from Scratch Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages

Nvidia has taken a major leap in the development of multilingual speech AI, unveiling Granary, the largest open-source speech dataset for European languages, and two state-of-the-art models: Canary-1b-v2 and Parakeet-tdt-0.6b-v3. This release sets a new standard for accessible, high-quality resources in automatic speech recognition (ASR) and speech translation (AST), especially for underrepresented European languages. Granary: The Foundation of Multilingual Speech AI Granary is a massive, multilingual corpus developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler. It delivers around one million hours of audio, with 650,000 hours for speech recognition and 350,000 for speech translation. The dataset covers 25 European languages—representing nearly all official EU languages, plus Russian and Ukrainian—with a critical focus on languages with limited annotated data, such as Croatian, Estonian, and Maltese. Key features: Largest open-source speech dataset for 25 European languages. Pseudo-labeling pipeline: Unlabeled public audio data is processed using Nvidia NeMo’s Speech Data Processor, which adds structure and enhances quality, reducing the need for resource-intensive manual annotation. Supports both ASR and AST: Designed for transcription and translation tasks. Open access: Available to the global developer community for flexible, production-scale model training. By leveraging clean, high-quality data, Granary enables significantly faster model convergence. Research demonstrates that developers need half as much Granary data to reach target accuracies compared to competing datasets, making it especially valuable for resource-constrained languages and rapid prototyping. Canary-1b-v2: Multilingual ASR + Translation (En 24 Languages) Canary-1b-v2 is a billion-parameter Encoder-Decoder model trained on Granary, delivering high-quality transcription and translation between English and 24 supported European languages. It’s architected for accuracy and multitask capabilities: Languages supported: 25 European languages, doubling Canary’s coverage from 4. State-of-the-art performance: Comparable accuracy to models three times larger, but up to 10× faster inference. Multitask capability: Robust across both ASR and AST tasks. Features: Automatic punctuation, capitalization, word and segment-level timestamps—even timestamped translated outputs. Architecture: FastConformer Encoder with Transformer Decoder; unified vocabulary for all languages via SentencePiece tokenizer. Robustness: Maintains strong performance under noisy conditions and resists output hallucinations. Evaluation highlights: ASR Word Error Rate (WER): 7.15% (AMI dataset), 10.82% (LibriSpeech Clean). AST COMET Scores: 79.3 (X→English), 84.56 (English→X). Deployment: Available under CC BY 4.0 license; optimized for Nvidia GPU-accelerated systems, enabling fast training and inference for scalable production use. Parakeet-tdt-0.6b-v3: Real-Time Multilingual ASR Parakeet-tdt-0.6b-v3 is a 600-million-parameter multilingual ASR model designed for high-throughput or large-volume transcription in all 25 supported languages. It extends the Parakeet family (previously English-centric) to full European coverage. Automatic language detection: Transcribes input audio without needing extra prompts. Real-time capability: Efficiently transcribes up to 24-minute audio segments in a single inference pass. Fast, scalable, and commercial-ready: Prioritizes low latency, batch processing, and accurate outputs, with word-level timestamps, punctuation, and capitalization. Robustness: Reliable even on complex content (numbers, lyrics) and challenging audio conditions. Impact on Speech AI Development Nvidia’s Granary dataset and model suite accelerate the democratization of speech AI for Europe, enabling scalable development of: Multilingual chatbots Customer service voice agents Near-real-time translation services Developers, researchers, and businesses can now build inclusive, high-quality applications supporting linguistic diversity, with open access to these super cool models and datasets Check out the Granary, NVIDIA Canary-1b-v2 and NVIDIA Parakeet-tdt-0.6b-v3. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages appeared first on MarkTechPost.

NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages Leggi l'articolo »

AI, Committee, Notizie, Uncategorized

Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning

arXiv:2508.10057v1 Announce Type: cross Abstract: This study investigates whether large language models (LLMs) mirror human neurocognition during abstract reasoning. We compared the performance and neural representations of human participants with those of eight open-source LLMs on an abstract-pattern-completion task. We leveraged pattern type differences in task performance and in fixation-related potentials (FRPs) as recorded by electroencephalography (EEG) during the task. Our findings indicate that only the largest tested LLMs (~70 billion parameters) achieve human-comparable accuracy, with Qwen-2.5-72B and DeepSeek-R1-70B also showing similarities with the human pattern-specific difficulty profile. Critically, every LLM tested forms representations that distinctly cluster the abstract pattern categories within their intermediate layers, although the strength of this clustering scales with their performance on the task. Moderate positive correlations were observed between the representational geometries of task-optimal LLM layers and human frontal FRPs. These results consistently diverged from comparisons with other EEG measures (response-locked ERPs and resting EEG), suggesting a potential shared representational space for abstract patterns. This indicates that LLMs might mirror human brain mechanisms in abstract reasoning, offering preliminary evidence of shared principles between biological and artificial intelligence.

Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning Leggi l'articolo »

We use cookies to improve your experience and performance on our website. You can learn more at Politica sulla privacy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
it_IT