AI Archives - Página 8 de 203

Rehumanizing global health care with agentic AI

admin NU / junio 2, 2026

The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for aging populations. Gaps in provision are already taking a toll, with fragmented access to care and high rates of stress and burnout among staff. And it’s getting worse. The World Health Organization has warned that current shortfalls will increase to 11 million workers by 2030. In their urgent hunt for a solution, many health-care providers are now pinning their hopes on agentic AI, with more than two-thirds (68%) having already adopted AI agents into their workforce, according to KPMG. The technology is being deployed to automate complex back-office processes, collaborate with medical teams, and even triage patients, all in a bid to reduce the cognitive load on clinicians and improve quality of care for patients as the supply of human health-care workers dwindles. A different type of digitalization Until now, the benefits of digitalization within health care have been limited. Many staff have blamed slow or outdated technology for adding to the administrative burden rather than alleviating it. For example, U.S. patient data was migrated to electronic health records (EHRs) in the early 2000s, but this data remains fragmented and reliant on manual inputs. New telehealth services and digital care tools, like remote monitors, have had similar shortcomings, says Ashis Barad, MD, chief digital and technology officer at Hospital for Special Surgery (HSS), an academic medical center in New York that focuses on musculoskeletal health. Both technologies have helped improve access to health care by removing geographical barriers, he says, but they’ve failed to replicate the quality of in-person care or win trust from patients. Agentic AI is different from these existing technologies, he insists. Rather than relying on manual inputs or defaulting to human workers for any case that sits slightly outside a rigid framework, AI agents can handle nuanced, complex scenarios. They can make autonomous decisions, retrieve information from expert clinical sources, and iterate over time, freeing clinicians to focus on higher-level patient care. As Dr. Barad puts it: “Agentic AI takes your workflow and collapses it, augments it, supercharges it, and makes it more performant.” At HSS, AI agents have already been deployed in multiple areas. They handle complex backend processes, such as insurance claims that previously took several weeks to complete and involved both HSS staff and a third-party contractor to handle the volume. Now, says Dr. Barad, AI agents complete 1,100 claims per month. They’ve reduced the appeals stage from 45 minutes to five and improved the success rate of those appeals from 65% to 100% in the nine months since implementation. HSS now handles all claims in-house. Building on that success, HSS is now deploying AI agents in non-clinical patient-facing settings with an AI scheduling and triage service, as part of a collaboration with enterprise agentic AI developer Ema Unlimited. The service is accessible 24/7 via web, text, or phone. It uses conversational AI to ask patients clarifying questions about their condition and then books appointments with the most appropriate clinician, factoring in location, insurance coverage, and physician availability. “It completes the whole loop,” says Dr. Barad. The AI agent is trained on “all of our context, all of our rules, and all of our knowledge base,” he adds, providing patients with streamlined access to highly specialist knowledge from world-leading surgeons. Given the high-stakes decisions delegated to AI agents, the triage service has built-in safeguards—sensitive, complex, or uncertain scenarios are escalated to human specialists. Every decision made by the AI agent is auditable and human staff can step in at any point. Patient data is kept secure and the system is trained on all HSS protocols, policies, and care pathways. By keeping humans in the loop, Ema says its technology strikes the balance between efficient automation, patient-first safety, and human-informed decision making. As the technology becomes more prolific, it will be incumbent on providers to ensure they have these sorts of guardrails embedded into systems, says Dr. Barad. At HSS all decisions around the technology are filtered through an AI subcommittee that Dr. Barad co-chairs alongside a senior nursing executive. AI agents that may touch on patient care will be scrutinized with far more rigor than, say, backend processes, he explains. AI agents prompt systems-level change For example, Dr. Barad has plans to create a dedicated AI lab at the HSS main campus in New York City—a move that aims to democratize access to the technology across the organization. It will be open to all staff looking to understand or build AI agents, he explains, with informative classes and one-on-one training. “We’re getting agentic AI into everybody’s hands,” he says. This echoes research by Deloitte, which found that leading agentic AI adopters in health care were far more likely to have opted for multiagent solutions, redesigning end-to-end workflows rather than sticking to narrow solutions or individual use cases. The key, it appears, is to integrate AI agents across the entire enterprise, treating them as a general-purpose technology. As Dr. Barad puts it: “It’s wrong to think of agentic AI in use cases… It’s a general-purpose technology, analogous to electricity.” In practice, this means health-care providers need to set the right foundation to achieve value with agentic AI. This includes creating a unified data strategy, one that integrates fragmented data sources across an organization to create a single, comprehensive source of truth. In health care, data is often split across multiple departments and providers, each with their own legacy IT system. In systems that rely on fragmented data sources, metrics often lack standardized definitions too. For example, Dr. Barad says that each hospital he’s worked in has had a slightly different definition for “time to start surgery,” a metric commonly used to gauge operating room efficiency. This level of fragmentation impedes AI agents from retrieving information from different sources or applications and assimilating the tacit knowledge that differentiates them from other technologies. By creating greater interoperability of

Rehumanizing global health care with agentic AI Leer entrada »

AI, Committee, Noticias, Uncategorized

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

admin NU / junio 2, 2026

In recent years, generative AI models like LLMs (large language models) have gradually taken over classical machine learning ones for addressing certain tasks, for instance, text classification .

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM? Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: AI can run your admin department now

admin NU / junio 2, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How small businesses can leverage AI From accounting to design to market research and product development, there’s a staggering breadth of skills needed to run a business. Large companies can hire experts to handle these tasks, but small businesses don’t always have that luxury. That’s where AI comes in. Today’s models can already take on a range of basic administrative work, from organizing notes and summarizing meetings to invoicing, goal-setting, and social media planning. Find out how small-business owners can put AI to work. —Peter Hall This article is from Making AI Work, MIT Technology Review’s limited-run newsletter examining how to apply LLMs across industries. To receive it in your inbox, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Anthropic has confidentially filed for IPO ahead of OpenAIIt aims to go public as early as this fall. (CNN)+ The company did not disclose its target valuation. (Guardian)+ It’s expected to list shortly after a trillion-dollar IPO by SpaceX. (BBC)+ Beating OpenAI in the IPO race could have a big impact. (WSJ $) 2 The EU may exclude US cloud giants from critical contractsThe likes of Amazon, Microsoft, and Google could be shut out. (Reuters $)+ The EU aims to reduce its dependence on US tech. (FT $)+ Trump supercharged this sovereignty push. (Politico $) 3 Florida has become the first state to sue OpenAIThe lawsuit targets ChatGPT’s alleged child safety risks. (NPR) + Florida says OpenAI put profit ahead of safety. (Reuters $)+ Chatbots are now starting to check user ages. (MIT Technology Review) 4 Hackers stole Instagram accounts just by asking Meta AI for themThey easily broke into a host of celebrity profiles. (404 Media)+ The exploit shows the risk of offloading support to AI. (TechCrunch)+ AI is making online crimes easier. (MIT Technology Review) 5 Chinese universities with military ties are seeking Nvidia chipsTwo are blacklisted by the US Commerce Department. (Bloomberg $)+ The Chinese military has sought restricted Nvidia chips for years. (NYT $)+ US senators have slammed a loophole in chip export rules. (Reuters $) 6 Blue Origin and NASA disagree on a crucial rocket’s next flight+ Blue Origin says the rocket will fly again this year. (Engadget)+ But NASA is less optimistic. (CNBC)+ The rocket’s failure cast doubt on NASA’s moon plans. (BBC) 7 Moderna has won funding to develop an Ebola mRNA vaccineThe CEPI has pledged over $60 million to the effort. (Ars Technica)+ To fight an outbreak raging out of control. (MIT Technology Review) 8 China is using AI to predict future political dissentA company called Geedge Networks is developing the tech. (NYT $) 9 Geoengineering can thicken Arctic ice, but melt results are mixedTrials show the tech has had a limited impact. (New Scientist $) 10 Top AI labs are expanding research into machine ‘consciousness’Meta, Anthropic, and DeepMind are increasing their investments. (FT $)+ A new tool could show how consciousness works. (MIT Technology Review) Quote of the day “Sam Altman and ChatGPT have chosen the AI race over the safety and security of our kids. They have chosen profit over public safety, and we’re not going to stand for it in here in Florida.” —Florida Attorney General James Uthmeier tells reporters why his state is suing OpenAI, the LA Times reports. One More Thing The entrance to the Moscow storage facility of KrioRus, which was until recently the only cryonics company in Eurasia.ALESSANDRO GANDOLFI Why the sci-fi dream of cryonics never died Cryonics is best known for its appearance in sci-fi films like 2001: A Space Odyssey. But its adherents have held on to a dream that advances in medicine will one day allow for resuscitation and additional years on Earth. Around 500 people are preserved in liquid nitrogen globally, while another 4,000 are on waiting lists. Despite scant evidence that cryonics can work, believers remain optimistic that future science could eventually revive them. Discover why the hope of human reanimation refuses to die. —Laurie Clarke We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Hear Dolly Parton reimagined through this spot-on Dire Straits-style cover of “Jolene”.+ Find out which birds people search for most in this interactive visualization of bird popularity.+ Explore thousands of Q&As between students and astronauts on the ISS at this interactive site.+ Paris’s oldest bridge disappeared beneath a giant inflatable cave in this surreal public art installation.

The Download: AI can run your admin department now Leer entrada »

AI, Committee, Noticias, Uncategorized

China has approved the world’s first invasive brain-computer chip—here’s what’s next

admin NU / junio 1, 2026

One day last October, sitting in the courtyard of his house in China’s Henan province, Dong Hui decided to see if he could hold a pen to write. Dong, 39, had sustained spinal cord injuries in a car accident six years earlier that left him paralyzed from the neck down. Slowly but determinedly, he wrote his name, “Thank you,” and then the date. This was the result of an 11-month-long rehabilitation enabled by an implant in his brain. Before that process, Dong could move his arms slightly but wasn’t able to use his fingers. “I couldn’t believe I was able to write again. I was so excited I even missed a stroke in my name,” he told MIT Technology Review on a video call. In November 2024, Dong became one of the first people in China to be given an invasive brain-computer interface (BCI) through brain surgery. He had signed up for a clinical trial with the device’s developer one month after seeing on TV how a BCI had apparently enabled another paralyzed Chinese man to hold his granddaughter. This March, the implant Dong uses became the first invasive BCI product in the world to be approved for use beyond clinical trials. It’s now available to some patients with paralysis in their limbs due to spinal cord injuries. We spoke to a range of experts to understand why the device was able to reach this global milestone, what makes this moment so significant, and what to expect next. A world first Dong’s brain implant is a coin-size device called NEO. It was developed by Neuracle Technology, a Shanghai-based startup, together with researchers at Tsinghua University in Beijing. During a procedure that took just over an hour and a half, the device’s sensors, which collect Dong’s brain signals, were placed on his dura mater, the tough outer layer of tissue that covers and protects the brain. The signals are transmitted to a computer by an implant placed on Dong’s skull. The computer then translates the signals into commands for a soft robotic glove Dong wears during the 2.5-hour training sessions he completes each day to help him learn to grab. Dong started his rehabilitation around a week after surgery. “On the ninth day of my training, my right hand successfully grabbed a ball without the glove,” he says. “That was a miraculous moment.” Now he continues with his training at home. He wants to be able to control his hands better in order to put on clothes, eat, and do other daily tasks without troubling his aging parents. A growing number of people with traumatic injuries in China are now poised to tread a similar path thanks to NEO’s recent approval. According to China’s National Medical Products Administration, the bureau responsible for drug supervision, the product is suitable for patients between 18 and 60 who have paralysis in all limbs due to spinal cord injuries but still have some residual function in their arms. NEO beat several other BCIs to approval, including one from Neuralink, a California-based company founded by Elon Musk. Since October 2023, Neuracle has conducted 36 clinical trials using NEO, including the one on Dong. Thirty-two of them took place in the space of a few months in 2025, with the details about one of the four first in-person trials published in a preprint paper last July. Neuracle did not reply to a request for comment from MIT Technology Review. One reason for NEO’s fast approval could be that it has a “relatively less invasive” design than counterparts such as Neuralink’s N1 brain chip, says Avinash Singh, a BCI researcher at the University of Technology Sydney. NEO’s eight sensors sit on top of the brain’s protective membrane while Neuralink’s N1 chip directly penetrates the cortex, the outermost layer of the brain itself. Neuracle’s device faces fewer regulatory constraints because it presents a lower risk of hemorrhage, glial scarring, and long-term signal degradation, Singh says. China’s strong support for its BCI industry also means that NEO was put on an expedited regulatory pathway; in comparison, the approval process of the US Food and Drug Administration can take several years, Singh adds. A big boost for BCIs NEO’s approval is hugely important for the global BCI industry, says Wang Shouyan, a neuroscientist at Fudan University in Shanghai who was not involved in research or trialing for NEO. Even though research and development on BCIs has taken place for several decades, most of it happened in the lab. The news means that BCIs are now ready for large-scale manufacturing and clinical use in China, Wang says. For Dong, however, it means something much more personal. “Now, it will be able to help not only me, but also thousands and thousands of other patients suffering from spinal cord injuries in China who are tortured by despair each day,” he says of NEO. “It will bring them hope and change their lives.” Days after NEO was approved, China started incorporating it into the country’s health insurance system by assigning it a unique code. This is one of the first steps toward a future where eligible Chinese patients pay a certain percentage of the BCI’s price if they need it during their treatment. The growth of China’s BCI industry is expected to accelerate thanks to the government’s policy support and financial backing. The country’s latest five-year plan, published on the same day Neuracle received its approval, lists BCI as one of six key industries important to China’s future tech competitiveness, alongside quantum technology, humanoid robots, and others. Several Chinese startups, including NeuroXess and StairMed, have already worked in the field for many years. “China’s decision to double down on becoming a global leader in the field owes in part to what these companies have already accomplished,” says Meicen Sun, an information scientist at the University of Illinois Urbana-Champaign who studies information and technology policy. But, Sun says, the biggest advantage China may have is that Chinese people, particularly patients like Dong, tend to welcome

China has approved the world’s first invasive brain-computer chip—here’s what’s next Leer entrada »

AI, Committee, Noticias, Uncategorized

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

admin NU / junio 1, 2026

The Transformer’s attention mechanism has barely changed since 2017. Most efficiency work has tried to replace softmax attention outright. A new paper takes a different route. It keeps softmax attention and bolts on a correction branch. A team of researchers from Northwestern University, Tilde Research, and University of Washington introduce a parameterized Local Linear Attention called ‘Parallax’ that scales to LLM pretraining and codesigns with Muon. Parallax does not chase efficiency by cutting compute. It adds compute deliberately, then makes that compute cheaper to run on modern GPUs. What is Parallax Parallax builds on Local Linear Attention (LLA). LLA comes from the test-time regression framework. That framework reads attention as a regression solver over key-value pairs. In this view, keys are training data points. Values are labels. The query is the test point. Softmax attention is a nonparametric estimator called Nadaraya-Watson. It fits a local constant function for each query. LLA upgrades that local constant estimate to a local linear estimate. The research team proves this yields strictly smaller integrated mean squared error. The benefit is better bias-variance tradeoffs for associative memory. But LLA has a problem at scale. Its exact forward requires solving a linear system for every query. That uses a parallel conjugate gradient (CG) solver. The CG solver creates three issues: intensive I/O, a hard regularization-expressiveness tradeoff, and low-precision incompatibility. Parallax removes the solver. Instead, it learns an extra projection matrix. The research team writes this as ρi = WRxi. Here WR is a learnable matrix that probes the KV covariance directly from the layer input. So Parallax keeps the local linear principle. It just replaces the per-query solve with a learned, query-like projector. That makes it simpler, more efficient, and easier to implement. How the Mechanism Works Parallax reformulates LLA as softmax attention plus an additive correction. The output equals the softmax attention output minus a projected covariance term. In the research paper’s notation, that term is the KV covariance multiplied by the learned probe ρi. The research team also drops one piece of LLA called the boundary amplification factor, set to zero. This is necessary for stability. Once the probe is parametric, the original geometric interpretation breaks. Leaving the factor in could cause the scaling to diverge or flip sign. Parallax sits inside a family of attention mechanisms. The research team organizes them in the paper by three axes: the bandwidth, the probe construction, and the affine structure. At one extreme, Parallax degenerates exactly to softmax attention when the probe norm goes to zero. Setting WR = 0 makes a Parallax layer behave identically to softmax attention. So a pretrained Transformer checkpoint can be converted by adding WR and fine-tuning. The Hardware Argument Parallax inherits the streaming structure of FlashAttention. It adds one covariance branch that reuses the same key-value stream. The research team expands the forward into two parallel scoring branches. Both branches share the online maximum, the rescaling factor, and the K and V tiles. So Parallax needs no extra I/O per iteration. The key property is higher arithmetic intensity (AI). AI is the ratio of floating point operations to high-bandwidth memory traffic. In the regime where KV work dominates, Parallax roughly doubles the arithmetic intensity. It adds compute while reusing the same memory stream. This shifts attention toward a more compute-bound regime. That is exactly the regime where kernel optimization helps on modern hardware. The research team prototyped a decode kernel in CuTeDSL on NVIDIA Hopper GPUs. Hopper’s tensor core matmul instructions operate on tiles of at least 64 rows. A decode step supplies only one query row. So the QK and RK products can be computed jointly, within instructions standard attention already issues. They profiled against FlashAttention 2 and 3 on H200 GPUs at BF16 precision. They swept batch sizes from 1 to 2,048 and context lengths from 128 to 32,768. The prototype kernel matches or outperforms FlashAttention across all configurations. The below figure annotates speedups of 1.54× in the compute-matched setting and 1.14× in the I/O-matched setting. https://arxiv.org/pdf/2605.29157 What the Experiments Show The research team validated Parallax on synthetic tasks and on LLM pretraining at 0.6B and 1.7B scales. Models used the Qwen-3 architecture in the torchtitan repository. They trained on the Ultra-FineWeb dataset with a 4096 context length. Baselines included softmax attention (Transformer), Mamba, Gated DeltaNet, MesaNet, and Kimi DeltaAttention. On the MAD-Benchmark, Parallax attained the highest overall accuracy at 0.716 average. It consistently improved recall-oriented tasks like In-Context-Recall and Selective-Copying. It stayed competitive on compression and memorization tasks. On language modeling, Parallax with Muon achieved the best perplexity at both scales. It also posted the highest average downstream accuracy. At 1.7B, Parallax scored 62.45 average against the Transformer’s 61.43. Two controls test where the gain comes from. A parameter-matched Transformer closed only a small fraction of the gap. A compute-matched Parallax still beat both baselines. The paper argues this points to the mechanism itself, not extra parameters or compute. The Optimizer Twist A core finding is an optimizer-architecture interaction. Parallax shows a large advantage under Muon. Under AdamW, the advantage shrinks markedly or even disappears. Muon is a recent optimizer for matrix parameters in hidden layers. It uses the polar factor of the momentum buffer, so updates have condition number exactly one. Prior work shows this produces better-conditioned weight matrices. The research team in the paper traces the gap to the correction branch. They define a correction-to-output ratio (COR). Under Muon, COR exceeds 8 in the deepest layers. Under AdamW, it stays below 4. The WR projection is disproportionately affected. Its stable rank collapses under AdamW but stays high under Muon. A gating experiment confirms the pattern. Under AdamW, the model learns to suppress the correction branch rather than use it. The research team call this the first empirical demonstration of strong architecture-optimizer codesign for attention mechanisms. They do not claim Muon with WSD is the optimal recipe. An appendix ablation shows the advantage shrinks during the decay phase. How the Scores Differ Parallax also produces different

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch Leer entrada »

AI, Committee, Noticias, Uncategorized

The Roadmap for Mastering LLMOps in 2026

admin NU / junio 1, 2026

The LLMOps market is projected to grow from

The Roadmap for Mastering LLMOps in 2026 Leer entrada »

AI, Committee, Noticias, Uncategorized

The Download: China’s brain implant ambitions

admin NU / junio 1, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. China has approved the world’s first invasive brain-computer chip—here’s what’s next Sitting in the courtyard of his house in China’s Henan province last October, Dong Hui decided to try holding a pen. Six years after a car accident left him paralyzed from the neck down, he slowly wrote his name, “Thank you,” and the date. The breakthrough was made possible by a brain implant called NEO. In March, it became the world’s first invasive brain-computer interface approved for use beyond clinical trials. The approval is expected to accelerate China’s push to become a global leader in brain implants. Read the full story on how China reached this milestone—and what it means for the future of brain-computer interfaces. —You Xiaoying The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Nvidia is launching its first AI chip for personal computersThe RTX Spark will power laptops from Dell, HP, Microsoft, and others. (BBC)+ They’re being designed specifically to run AI agents. (WSJ $) + The first devices are set to launch on Windows PCs in the fall. (CNBC)+ The move marks a challenge to Apple and Intel. (FT $) 2 The US is stopping exports of AI chips to Chinese firms abroadIt’s closed a loophole allowing exports to Chinese subsidiaries. (Reuters $)+ Which may have enabled unlicensed access to Nvidia chips. (Al Jazeera)+ Export curbs have led China to redesign its chip industry. (MIT Technology Review) 3 Surgeons have transplanted pig liver and kidneys into a living personThe clinically dead recipient’s organs worked for almost five days. (Nature)+ Pig organs could ease transplant shortages. (Guardian)+ Putin says organ transplants could grant immortality. (MIT Technology Review) 4 The US, Australia, and UK will defend seabed cables with underwater dronesThey’re developing the vehicles via the trilateral AUKUS defense ⁠pact. (CNN)+ Undersea internet cables face growing threats. (BBC) 5 A new study has revealed chatbots’ manipulative ‘dark patterns’ It found they prey on emotions to encourage harmful behavior. (404 Media)+ They can also sway voters better than political ads. (MIT Technology Review) 6 Apple plans to disrupt the traditional glasses marketIts smart glasses target the broader spectacles industry. (Bloomberg $)+ Smart glasses are also gaining traction in warfare. (MIT Technology Review) 7 AI super PACs are dueling over the midtermsSplit between Anthropic and OpenAI, they’re fighting to shape AI regulation. (NYT $) 8 SoftBank has overtaken Toyota as Japan’s most valuable companyThe AI boom pushed SoftBank’s market value above $305 billion. (Bloomberg $) 9 A botnet of more than 17 million devices has been dismantled in EuropeDutch authorities linked the network to a Russian proxy service. (Ars Technica) 10 Tech leaders are uniting around a transhuman vision for AIThey’re working toward a post-human agenda. (Guardian) Quote of the day “It’s just been shoved down their throats in secrecy. And that makes them upset.” —Legendary environmental activist Erin Brockovich tells “The Jim Acosta Show” why citizens are angry about data centers expanding into their communities. One More Thing MIKE BELLEME What happens when you donate your body to science Rebecca George doesn’t mind the vultures. At Western Carolina University’s body farm, forensic anthropologists monitor donors—sometimes for years—as they become nothing but bones. Around 20,000 people donate their cadavers to scientific research and education each year. At anatomy labs and body farms, they help train doctors, advance research, and teach scientists more about the human body long after death. But what actually happens after a body is donated? Read the full story to find out. —A.W. Ohlheiser We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + This map of moments turns the planet into a shared diary.+ Let editors curate your ideal podcast moments with this app.+ Architecture lovers will enjoy this encyclopedia of famous buildings.+ Get in touch with your emotions through this map exploring more than 100 feelings.

The Download: China’s brain implant ambitions Leer entrada »

AI, Committee, Noticias, Uncategorized

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

admin NU / junio 1, 2026

Hermes Agent already remembers across sessions. The open-source agent from Nous Research ships with curated memory files and full-text session search. But a new community project argues that built-in memory is too shallow for serious work. A new library named ‘Memory OS‘ has been released under an MIT license by a developer (ClaudioDrews). It stacks six memory layers onto Hermes. It adds a vector database, structured facts, and an auto-curated knowledge wiki. The project is new but it seems to have a good potential and its architecture shows how agent memory can be layered. Memory OS Memory OS is not a Hermes plugin you toggle on. It is a layered system that sits beside Hermes Agent’s own memory. Hermes already provides workspace files and a session database. Memory OS keeps those and adds four more layers above them. The full stack runs locally using Docker, Qdrant, Redis, and Python 3.11+. It works with any LLM provider Hermes supports, including OpenRouter, OpenAI, Anthropic, and Ollama. The README frames it as a “memory operating system,” not a single feature. The Six Layers, From Files to Vectors Layer 1 is Workspace. It holds MEMORY.md, USER.md, and CREATIVE.md, injected into the system prompt each turn. Layer 2 is Sessions. It uses state.db, a SQLite database with FTS5 full-text search across conversation history. Layer 3 is Structured Facts. It stores durable facts in memory_store.db, using SQLite, HRR, FTS5, and trust scoring. A feedback loop adjusts those trust scores over time, alongside entity resolution. Layer 4 is Fabric, a heavily forked version of the Icarus Plugin. This fork adds LLM-powered session extraction over the upstream esaradev/icarus-plugin. It handles cross-session recall through 16 tools, including fabric_recall, fabric_write, and fabric_brief. Layer 5 is the Vector Database, built on Qdrant. It uses 4096d Cosine vectors plus BM25 sparse search, a keyword-style ranking method. Layer 6 is an LLM Wiki, an auto-curated vault of concepts, entities, and comparisons. That wiki is continuously ingested back into Qdrant through a process called wiki-continuous-ingest. How the Retrieval Flow Works The flow sits on when memory is read and written. On pre_llm_call, Memory OS runs what it calls surgical recall. It pulls from four sources at once: Fabric, Qdrant, Sessions, and Facts. Each source is gated by a relevance threshold before anything reaches the model. Per-session deduplication stops the same context from appearing twice. A social-closer filter skips trivial messages, such as a plain “thanks.” On post_llm_call and on_session_end, the system extracts and captures new learnings automatically. The stated goal is token efficiency, not stuffing the context window. The Fallback Cascade and Cleanup Layer 5’s retrieval uses a four-level fallback. It tries hybrid search first, then dense vectors, then lexical, then SQLite. If one method fails or returns nothing, the next takes over. This design keeps recall working even when the vector database struggles. Memory OS also runs a weekly decay scanner to age out stale entries. Semantic dedup merges near-identical memories when cosine similarity exceeds 0.92. These housekeeping steps aim to stop memory from bloating over months of use. Local-First, And Deliberately So Memory OS positions itself against cloud memory services like mem0, Zep, and Letta. Its pitch is that memory infrastructure should run on your own machine. The memory data stays local, with no memory subscription. LLM calls still go to whichever provider you choose. Hermes itself already supports eight external memory providers, including mem0 and Honcho. Memory OS is not one of those official providers. It is a separate, community-built stack layered on Hermes directly. For teams with data-residency rules, a local memory store can matter. Just open-sourced **Memory OS** — a complete hierarchical persistent memory architecture for the Hermes Agent. 6 layers, fully local:• Structured facts + trust scoring with feedback loop• Hybrid vector search (Qdrant + BM25)• Self-curating LLM Wiki• Semantic… — Claudio Drews (@ClaudioDrews25) May 31, 2026 Strengths and Limitations Strengths: Clear layered design separating files, sessions, facts, vectors, and a wiki Fully local infrastructure with no cloud memory subscription Provider-agnostic, matching Hermes Agent’s own flexibility Token-efficient retrieval by design, via gated sources and per-session deduplication Limitations: Brand new, with few commits A forked Icarus Plugin that the author says is not upstream-compatible Heavier setup: Docker, Qdrant, Redis, and an ARQ Worker all required No published benchmarks on recall quality, latency, or token savings Key Takeaways Memory OS is a community-built, MIT-licensed stack that adds six memory layers on top of Hermes Agent. It combines workspace files, FTS5 session search, trust-scored facts, a forked Icarus fabric, Qdrant vectors, and an auto-curated LLM wiki. Retrieval runs on pre_llm_call with gated, deduplicated recall from four sources; capture runs on post_llm_call and on_session_end. Memory infrastructure is fully local and provider-agnostic, but LLM calls still go to your chosen provider. Check out the Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent Leer entrada »

AI, Committee, Noticias, Uncategorized

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

admin NU / mayo 31, 2026

Trajectory’s concurrent multi-LoRA stack reports a 2.81× experiment-throughput gain over single-tenant RL, with all code in the NovaSky-AI/SkyRL GitHub repository. Most language models improve in discontinuous jumps. A team collects data, trains, and ships a new version. This takes months and produces remarkable or catastrophic behavior for users. Trajectory wants to replace that cycle with continual learning. The Trajectory team published a field report describing how. It built a concurrent, multi-LoRA training platform for continuously learning workloads. The work was done with UC Berkeley Sky Lab and Anyscale. All training code is open-sourced in the NovaSky-AI/SkyRL repository. The result is a 2.81× end-to-end experiment-throughput improvement. The comparison is against a single-tenant training framework. Trajectory reports no regression on any training rewards. What Multi-LoRA Training Actually Is Continual learning requires models to update from live feedback and production interactions. A coding agent could learn engineering patterns as developers correct its work. A support agent could resolve hard tickets as operators intervene on difficult cases. Most training infrastructure still assumes a linear lifecycle. Teams allocate GPUs, initialize the model, run a job, then spin down. Continual learning revises that relationship. When production interactions become training inputs, training becomes part of a live system. Modern RL training reduces to three core primitives. The Sampler generates trajectories from the current policy model. The Trainer computes gradients and updates the policy weights. Parameter synchronization broadcasts updated weights back to inference workers. Trajectory calls its approach Continuous Multi-LoRA Training, or C-LoRA. Each experiment maps to a dedicated LoRA adapter on a warm, multi-tenant engine. The Problems It Targets The Trajectory team identifies four inefficiencies in traditional stacks: (1) Cold starts are slow: Every serial job reloads checkpoints, initializes the distributed runtime, and warms inference engines. For large models, this step alone can exceed 30 minutes per run. (2) RL is memory intensive: Frontier models often exceed 100B parameters. Qwen3.5-397B can require up to eight H200 nodes to fit into memory. LoRA cuts memory usage by an order of magnitude. It freezes the base model and trains only small adapter weights. (3) Traditional stacks are single-tenant: They run one experiment at a time. Multi-LoRA maps each experiment to one adapter, multiplexing throughput by a factor of N. (4) Job utilization is low: Trainers and inference engines stall while waiting for each other. Multi-LoRA load balances across jobs to fill idle capacity. Inside the Architecture Most throughput wins come from inference. In vLLM, all adapters are hot-loaded in GPU memory. Decode steps can then mix tokens from different adapters in the same batch. The key enabler is the SGMV decode kernel. It fuses per-adapter matrix-vector work into one GPU launch per decode step. After each optimization step, updated LoRA weights load in-place into the inference engine. The scheduler does not freeze, so other tenants keep decoding. Training works differently. One active LoRA adapter trains on the GPU. The rest sit in pinned CPU memory. Each tenant’s state lives in an AdapterStore. It holds LoRA parameters, FP32 master weights, optimizer moments, and gradient buffers. The engine swaps one tenant’s state onto the GPU, runs a single forward_backward pass, then swaps it back. This training path is still single-adapter. The inference concurrency gains do not yet apply to training. The Numbers Trajectory tested on a single H200 node with Qwen3-4B-Instruct-2507. It ran sync RL on GSM8K in an agentic setting. The Trajectory team reframed GSM8K as a tool use learning task. The model decides when to call a Calculator and a Final Answer tool. Reward is 1.0 only when Final Answer is called with the correct answer. The policy starts near 40% accuracy at step 0. With the right learning algorithm, it climbs past 90% by step 9. The Trajectory team scaled to eight concurrent multi-LoRA runs. Final Experiment Time hit 5433s at N=8, a 2.81× speedup. Eight concurrent experiments finished before three serial runs back-to-back. Mean Experiment Time also improved, peaking at N=4 with a 1.88× speedup. Every concurrency level reached reward_accuracy above 90% by step 9. The Tradeoffs Higher throughput costs per-step latency. As N grows, First Experiment Time and Step Time degrade. At N=8, the first serial experiment finishes 1.97× faster. Mean step time rises from 191s to 500s, only 2.62× slower. Most of that increase is rollout time. Rollout grows from 162s to 401s, roughly 77% of the increase. At N=2, doubling the load adds only 15% rollout time. That is the ideal case for multi-LoRA. The pattern held on a harder workload. On τ-bench retail with the NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 MoE model, N=2 finished 10 steps 1.28× faster. Per-tenant step time rose 1.57×. Strengths and Weaknesses Strengths: 2.81× end-to-end experiment-throughput gain at eight concurrent runs No accuracy regression; runs tracked the serial baseline within ±1σ in the final steps LoRA cuts memory by an order of magnitude versus full fine-tuning Fully open-sourced in NovaSky-AI/SkyRL for the community to build on Weaknesses: Per-step latency and First Experiment Time degrade as N grows Training remains serialized across tenants; only inference is multiplexed Tested mainly on mid-sized models, not frontier-scale parameters Setup requires an 8× H100/H200 node and a Megatron build Key Takeaways Trajectory built a concurrent, multi-LoRA RL training stack for continual learning, open-sourced in NovaSky-AI/SkyRL. It reports a 2.81× end-to-end experiment-throughput gain over a single-tenant baseline, with no reward regression. Each experiment maps to a dedicated LoRA adapter on an always-hot engine, multiplexing throughput by N. Most gains come from vLLM multi-LoRA inference via the SGMV decode kernel; training stays single-adapter. The tradeoff is per-step latency: at N=8, step time rises from 191s to 500s. Marktechpost’s Visual Explainer Field Report · May 27, 2026 Continuous Multi-LoRA Training for Continual Learning Trajectory, built with UC Berkeley Sky Lab and Anyscale. 2.81× end-to-end experiment-throughput gain Training code open-sourced in the NovaSky-AI/SkyRL repository. 01 — What it is One always-hot engine, many adapters Continual learning updates models from live feedback and production interactions. Trajectory calls its approach Continuous Multi-LoRA Training (C-LoRA). Each experiment maps to a dedicated LoRA adapter on

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain Leer entrada »

AI, Committee, Noticias, Uncategorized

Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison

admin NU / mayo 31, 2026

Text-to-speech TTS moved fast over the past year. The line between synthetic and human speech narrowed. Latency dropped below 100 milliseconds for some real-time systems. Emotional control became a standard feature rather than a research demo. This guide reviews the models that really matter in 2026. It is written for AI professionals choosing a model for production. How to read TTS benchmarks in 2026 Two benchmarks dominate in most community discussions. The first is the Artificial Analysis Speech Arena Leaderboard. It ranks models by blind human preference using an ELO rating. As of 2026 it evaluates dozens of production APIs. The second is the community-run TTS Arena on Hugging Face. It uses the same blind A/B voting method. These leaderboards measure perceived quality, not accuracy. They also change continuously. As of May 30, 2026, the Artificial Analysis Speech Arena lists Gemini 3.1 Flash TTS, Realtime TTS-2 (Research Preview), Sonic 3.5, Realtime TTS 1.5 Max, and Fun-Realtime-TTS-Preview as its top five by ELO. Those positions shifted within the prior weeks, and they will shift again. Treat any single number as a point-in-time reading, not a fixed truth. Accuracy needs separate measurement. Trelis Research tested ten models using a round-trip character error rate, or CER. The method transcribes generated audio with an ASR model, then compares it to the input text. Mean opinion score, or MOS, captures perceived naturalness. Both metrics have limits. Round-trip CER depends on the ASR model’s own accuracy. The UTMOS quality estimator was trained on audio up to ten seconds, so longer samples show less score spread. Latency is the third axis. The relevant figure for voice agents is time-to-first-audio, or TTFA. Time-to-first-byte, or TTFB, can be misleading, since container headers carry no audio. Consistency matters as much as the median. A Gradium benchmark from May 2026 measured the interquartile range across providers. Tail latency, not the average, determines user experience at scale. In short, no benchmark is complete. Quality, accuracy, latency, language coverage, and price all trade off. The right model depends on which axis your application cannot compromise. Commercial leaders #1 Inworld TTS-1.5 and Realtime TTS-2 Inworld AI is a research lab founded by a team from Google and DeepMind. It released TTS-1.5 on January 21, 2026. The model targets real-time, consumer-scale applications. Inworld reports roughly 30 percent more expressive range than TTS-1. It also reports about 40 percent better stability, measured through word error rate and output consistency. TTS-1.5 ships in two tiers. The Mini tier is tuned for latency-sensitive workloads such as voice agents and gaming. The Max tier balances higher stability with low latency. Inworld reports P90 time-to-first-audio under 130 milliseconds for Mini and under 250 milliseconds for Max. The model supports 15 languages and offers both instant and professional voice cloning. Pricing is tiered by plan, not a single rate. On the On-Demand and Creator plans, Inworld lists $25 per million characters for TTS 1.5 Mini and $35 for Realtime TTS-2 and TTS 1.5 Max. The Developer and Growth plans cut those rates; Growth reaches $15 for Mini and $25 for Max and TTS-2. Enterprise pricing goes as low as $5 and $10 respectively. Note that TTS 1.5 covers 15 languages, while TTS-2 covers over 100. Inworld later added Realtime TTS-2 in 2026. It is described as a closed-loop voice model with stronger steering and expressiveness. Across several leaderboard snapshots, Inworld reported holding three of the top five spots on the Artificial Analysis Speech Arena. Inworld suits developers building voice agents at consumer scale. The combination of low latency and aggressive pricing is its main draw. #2 Google Gemini 3.1 Flash TTS Google DeepMind released Gemini 3.1 Flash TTS on April 15, 2026. It is a preview model available through the Gemini API, Google AI Studio, Vertex AI, and Google Vids. The model introduces more than 200 audio tags. These tags steer style, tone, pacing, accent, and scene direction. On Google’s own report, the model reached an ELO of 1,211 on the Artificial Analysis leaderboard. It supports 70-plus languages and native multi-speaker dialogue. Google built it on the Gemini family rather than a standalone speech stack. The model treats generation as a language task: it decides not only what to say, but how to say it. The model has documented limitations that matter for deployment. A TTS session has a 32,000-token context window, and Google’s docs state that Gemini TTS does not support streaming. It is built for controlled text recitation, not interactive voice agents; the separate Live API is Google’s real-time path. Output quality can drift on generations longer than a few minutes, so Google recommends chunking. The model offers 30 prebuilt voices. All generated audio carries a SynthID watermark for AI-content identification. Gemini 3.1 Flash TTS fits podcast and audiobook generation with fine-grained control. It is a strong default for teams already on Google Cloud. #3 ElevenLabs v3 ElevenLabs released Eleven v3 in alpha on June 5, 2025. It reached general availability in early 2026, per the company’s announcement. ElevenLabs describes it as its most expressive model. It introduced inline audio tags formatted in lowercase square brackets. Examples include [whispers], [laughs], [sighs], and scene cues like [interrupting]. The model supports more than 70 languages. The GA release refined the alpha. ElevenLabs reports users preferred the new version about 72 percent of the time. It also improved how the model handles numbers, symbols, and specialized notation. A key feature is Text to Dialogue. It weaves multiple voices into one generation pass. The model matches prosody and emotional range across speakers. It can handle interruptions and shifting moods with limited prompting. Eleven v3 still requires more prompt engineering than earlier models. It is not built for real-time use. ElevenLabs states the larger model and higher-fidelity codec take longer to run. For real-time and conversational use, the company recommends Flash v2.5 instead. Those models stream with low latency, around the 75-millisecond range in vendor figures. ElevenLabs v3 fits narrative content, audiobooks, and character work where quality outweighs speed. It remains a common

Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison Leer entrada »

AI

Rehumanizing global health care with agentic AI

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

The Download: AI can run your admin department now

China has approved the world’s first invasive brain-computer chip—here’s what’s next

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

The Roadmap for Mastering LLMOps in 2026

The Download: China’s brain implant ambitions

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison

Nuestros servicios

Inicio

Cómo funciona

Noticias

Precios

Soporte

Centro de ayuda

Reportar un problema

Dar comentarios

Política de privacidad

Cuenta de usuario

Síguenos