YouZum

Noticias

AI, Committee, Noticias, Uncategorized

Mistral AI Introduces Mistral Code: A Customizable AI Coding Assistant for Enterprise Workflows

Mistral AI announced the release of Mistral Code, an AI-powered coding assistant tailored for enterprise software development environments. This release signals Mistral’s move toward addressing long-standing requirements in professional development pipelines: control, security, and model adaptability. Addressing Enterprise-Grade Requirements Mistral Code targets several key limitations observed in traditional AI coding tools: Data Sovereignty and Control: Organizations can maintain full control over their code and infrastructure. Mistral Code offers options for on-premises deployment, enabling compliance with internal data governance policies. Customizability: Unlike off-the-shelf assistants, Mistral Code is fully tunable to an enterprise’s internal codebase, allowing the assistant to reflect project-specific conventions and logic structures. Beyond Completion: The tool supports end-to-end workflows including debugging, test generation, and code transformation, moving beyond standard autocomplete functionality. Unified Vendor Management: Mistral provides a single vendor solution with full visibility across the development stack, simplifying integration and support processes. Initial deployments have been conducted with their partners such as Capgemini, Abanca, and SNCF, suggesting the assistant’s applicability across both regulated and large-scale environments. System Architecture and Capabilities Mistral Code integrates four foundational models, each designed for a distinct set of development tasks: Codestral: Specializes in code completion and in-filling, optimized for latency and multi-language support. Codestral Embed: Powers semantic search and code retrieval tasks through dense vector embeddings. Devstral: Designed for longer-horizon tasks, such as multi-step problem-solving and refactoring. Mistral Medium: Enables conversational interactions and contextual Q&A inside the IDE. The assistant supports over 80 programming languages and interfaces seamlessly with development artifacts like file structures, Git diffs, and terminal outputs. Developers can use natural language to initiate refactors, generate unit tests, or receive in-line explanations—all within their IDE. Deployment Models Mistral Code offers flexible deployment modes to meet diverse IT policies and performance needs: Cloud: For teams working in managed cloud environments. Reserved Cloud Capacity: Dedicated infrastructure to meet latency, throughput, or compliance requirements. On-Premises: For enterprises with strict infrastructure control needs, especially in regulated sectors. The assistant is currently in private beta for JetBrains IDEs and Visual Studio Code, with broader IDE support expected as adoption grows. Administrative Features for IT Oversight To align with enterprise security and operational practices, Mistral Code includes a comprehensive management layer: Role-Based Access Control (RBAC): Configurable access policies to manage user permissions at scale. Audit Logs: Full traceability of actions and interactions with the assistant for compliance auditing. Usage Analytics: Detailed reporting dashboards to monitor adoption, performance, and optimization opportunities. These features support internal security reviews, cost accountability, and usage governance. Conclusion Mistral Code introduces a modular and enterprise-aligned approach to AI-assisted development. By prioritizing adaptability, transparency, and data integrity, Mistral AI offers an alternative to generalized coding assistants that often fall short in production-grade environments. The tool’s architecture and deployment flexibility position it as a viable solution for organizations seeking to integrate AI without compromising on internal controls or development rigor. Check out the Technical details and Try it here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Mistral AI Introduces Mistral Code: A Customizable AI Coding Assistant for Enterprise Workflows appeared first on MarkTechPost.

Mistral AI Introduces Mistral Code: A Customizable AI Coding Assistant for Enterprise Workflows Leer entrada »

AI, Committee, Noticias, Uncategorized

Unique Hard Attention: A Tale of Two Sides

arXiv:2503.14615v2 Announce Type: replace-cross Abstract: Understanding the expressive power of transformers has recently attracted attention, as it offers insights into their abilities and limitations. Many studies analyze unique hard attention transformers, where attention selects a single position that maximizes the attention scores. When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. Recently, finite-precision transformers with both leftmost- and rightmost-hard attention were shown to be equivalent to Linear Temporal Logic (LTL). We show that this no longer holds with only leftmost-hard attention — in that case, they correspond to a emph{strictly weaker} fragment of LTL. Furthermore, we show that models with leftmost-hard attention are equivalent to emph{soft} attention, suggesting they may better approximate real-world transformers than right-attention models. These findings refine the landscape of transformer expressivity and underscore the role of attention directionality.

Unique Hard Attention: A Tale of Two Sides Leer entrada »

AI, Committee, Noticias, Uncategorized

KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

arXiv:2506.02503v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models’ capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances standard RAG pipelines across model scales, improving both in-domain and out-of-domain task performance without compromising general capabilities. Notably, these gains are achieved with modest training data, suggesting data-efficient optimization is possible through targeted learning strategies. Our findings establish a new direction for RAG improvement: by improving how models learn to process retrieved content, we can enhance performance across diverse inference paradigms. All data and code will be publicly available on Github.

KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG Leer entrada »

AI, Committee, Noticias, Uncategorized

M$^3$FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset

arXiv:2506.02510v1 Announce Type: new Abstract: Recent breakthroughs in large language models (LLMs) have led to the development of new benchmarks for evaluating their performance in the financial domain. However, current financial benchmarks often rely on news articles, earnings reports, or announcements, making it challenging to capture the real-world dynamics of financial meetings. To address this gap, we propose a novel benchmark called $texttt{M$^3$FinMeeting}$, which is a multilingual, multi-sector, and multi-task dataset designed for financial meeting understanding. First, $texttt{M$^3$FinMeeting}$ supports English, Chinese, and Japanese, enhancing comprehension of financial discussions in diverse linguistic contexts. Second, it encompasses various industry sectors defined by the Global Industry Classification Standard (GICS), ensuring that the benchmark spans a broad range of financial activities. Finally, $texttt{M$^3$FinMeeting}$ includes three tasks: summarization, question-answer (QA) pair extraction, and question answering, facilitating a more realistic and comprehensive evaluation of understanding. Experimental results with seven popular LLMs reveal that even the most advanced long-context models have significant room for improvement, demonstrating the effectiveness of $texttt{M$^3$FinMeeting}$ as a benchmark for assessing LLMs’ financial meeting comprehension skills.

M$^3$FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset Leer entrada »

AI, Committee, Noticias, Uncategorized

Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective

arXiv:2506.02553v1 Announce Type: cross Abstract: We study a common challenge in reinforcement learning for large language models (LLMs): the Zero-Reward Assumption, where non-terminal actions (i.e., intermediate token generations) receive zero task-specific immediate reward, while only the final token receives a reward for the entire response. This assumption arises frequently in practice, as precise token-level rewards are often difficult or infeasible to obtain in LLM applications. In this work, we provide a unifying theoretical perspective. We introduce the Trajectory Policy Gradient Theorem, which shows that the policy gradient based on true, unknown token-level rewards can be unbiasedly estimated using only a response-level reward model, regardless of whether the Zero-Reward Assumption holds or not, for algorithms in the REINFORCE and Actor-Critic families. This result reveals that widely used methods such as PPO, GRPO, ReMax, and RLOO inherently possess the capacity to model token-level reward signals, offering a theoretical justification for response-level reward approaches. Our findings pave the way for more practical, efficient LLM fine-tuning, allowing developers to treat training algorithms as black boxes and focus on improving the response-level reward model with auxiliary sub-models. We also offer a detailed analysis of popular RL and non-RL methods, comparing their theoretical foundations and practical advantages across common LLM tasks. Finally, we propose a new algorithm: Token-Reinforced Policy Optimization (TRePO), a theoretically grounded method that is simpler than PPO, matches GRPO in memory efficiency, and holds promise for broad applicability.

Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective Leer entrada »

AI, Committee, Noticias, Uncategorized

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

arXiv:2502.17214v2 Announce Type: replace Abstract: Large language models (LLMs) excel in many tasks but struggle to accurately quantify uncertainty in their generated responses. This limitation makes it challenging to detect misinformation and ensure reliable decision-making. Existing uncertainty quantification (UQ) methods for LLMs are primarily prompt-wise rather than response-wise, often requiring multiple response samples, which incurs high computational costs. Moreover, LLMs have been shown to be overconfident, particularly when using reasoning steps to derive their answers. In this work, we propose CoT-UQ, a response-wise UQ framework that integrates LLMs’ inherent reasoning capabilities through Chain-of-Thought (CoT) into the UQ process. CoT-UQ captures critical information during inference by extracting keywords from each reasoning step and assessing their importance to the final answer. This key reasoning information is then aggregated to produce a final uncertainty estimate. We conduct extensive experiments based on Llama Family with model sizes varying from 8B to 13B across logical and mathematical reasoning tasks. Experimental results demonstrate that CoT-UQ significantly outperforms existing UQ methods, achieving an average improvement of 5.9% AUROC compared to current UQ methods. The code is available at: https://github.com/ZBox1005/CoT-UQ.

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought Leer entrada »

AI, Committee, Noticias, Uncategorized

Hugging Face Releases SmolVLA: A Compact Vision-Language-Action Model for Affordable and Efficient Robotics

Despite recent progress in robotic control via large-scale vision-language-action (VLA) models, real-world deployment remains constrained by hardware and data requirements. Most VLA models depend on transformer-based backbones with billions of parameters, resulting in significant memory and compute costs. This limits experimentation to well-resourced labs and clouds, excluding practitioners working with lower-cost hardware. Additionally, much of the current progress in VLA research remains either proprietary or based on non-reproducible methodologies, impeding open research. Finally, data heterogeneity across robotic platforms—differences in morphology, sensors, and control modes—poses a further challenge to generalizability and cross-platform learning. Hugging Face Introduces SmolVLA: A Lightweight, Open VLA Framework Hugging Face presents SmolVLA, a compact vision-language-action model developed for affordability and deployment efficiency. Unlike conventional VLAs, SmolVLA is trained entirely on community-collected datasets and is optimized to run on single-GPU or CPU environments. The model architecture integrates a trimmed version of a pretrained vision-language model (SmolVLM-2) and a transformer-based action expert. This structure enables efficient low-level control from natural language instructions and RGB camera inputs. A distinguishing feature of SmolVLA is its asynchronous inference stack, which decouples action prediction from execution. This design enables low-latency control suitable for real-time applications, even in resource-constrained settings. SmolVLA is released under an open license with accompanying code, training data, and deployment tools. Architectural Overview and Design Trade-Offs The SmolVLA model is structured into two primary components: Perception Module (SmolVLM-2): A pretrained compact vision-language encoder processes sequences of RGB images, sensorimotor states, and language instructions. For efficiency, the model limits visual tokens through downsampling and only uses the lower half of transformer layers, based on empirical findings that earlier layers often yield more transferable features. Action Expert: A lightweight transformer, trained with flow matching, predicts sequences of continuous control actions. The action expert alternates between self-attention and cross-attention layers, balancing internal action coherence and conditioning on perception inputs. Causal masking is applied to enforce temporal consistency. To reduce computational overhead, linear projections are used to align the modalities’ token dimensions. Action chunks are generated instead of single-step predictions, reducing the frequency of inference calls. The model is trained using bfloat16 precision and Torch’s JIT compilation for runtime optimization. Empirical Evaluation: Simulation and Real-World Performance SmolVLA is evaluated across both simulation benchmarks (LIBERO and Meta-World) and real-world robotic tasks using low-cost SO100 and SO101 platforms. The model is trained from scratch on ~23K episodes across 481 community datasets, with task labels auto-generated using a VLM. Evaluation metrics include task-level success rates under both in-distribution and out-of-distribution conditions. In the LIBERO benchmark, SmolVLA (0.45B) achieves an average success rate of 87.3%, closely matching or surpassing larger models such as π₀ (3.3B). In Meta-World, the model outperforms diffusion policies and smaller-scale VLAs across task difficulty levels. These results are notable considering SmolVLA’s smaller training footprint and absence of robotics-specific pretraining. In real-world settings, SmolVLA achieves average success rates of 78.3% across pick-place, stacking, and sorting tasks—outperforming both ACT (trained from scratch) and π₀ (finetuned). Moreover, SmolVLA generalizes across robotic embodiments, maintaining performance on SO101 despite training exclusively on SO100 data. Performance Implications of Asynchronous Inference SmolVLA’s asynchronous inference stack improves control efficiency by overlapping prediction and execution. Compared to traditional synchronous inference, this approach reduces average task time by ~30% and doubles the number of completed actions in fixed-time scenarios. This is particularly beneficial for edge deployments where inference delays degrade real-time performance. Conclusion SmolVLA demonstrates that compact, reproducible, and open-source VLA models can support competent robotic control on low-cost hardware. Through careful architectural choices—layer pruning, chunked action prediction, and asynchronous execution—SmolVLA maintains performance while significantly reducing computational demands. The model’s open training and deployment stack, paired with real-world evaluations, offers a practical foundation for further research in efficient and accessible robot learning. Future directions include expanding cross-embodiment datasets, scaling model capacity without sacrificing latency, and exploring joint training on multimodal corpora beyond robotics data. Check out the Paper and Model on Hugging Face . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. The post Hugging Face Releases SmolVLA: A Compact Vision-Language-Action Model for Affordable and Efficient Robotics appeared first on MarkTechPost.

Hugging Face Releases SmolVLA: A Compact Vision-Language-Action Model for Affordable and Efficient Robotics Leer entrada »

es_ES