YouZum

AI

AI, Committee, News, Uncategorized

Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output

Nous Research has released Hermes Desktop in public preview. It is a native application for macOS, Windows, and Linux. It gives the open-source Hermes Agent a graphical interface. Until now, users ran Hermes through a CLI and messaging gateways. The current build is Hermes Agent v0.15.2. Per Nous Research’s documentation, the desktop reuses the same agent core. It shares configuration, API keys, sessions, skills, and memory with the CLI and gateway. The desktop is another surface over one agent, not a fork. What is Hermes Desktop Hermes Agent is an autonomous AI agent. It is not a coding copilot tied to an editor. It runs tasks, calls tools, and keeps state across sessions. An agent here means a model that plans, acts, and observes in a loop. Hermes Desktop is a GUI on top of that same agent core. It needs no terminal to use. The window shows streaming responses and live tool activity. A right-hand pane previews web pages, files, and tool outputs. It also includes a file browser, voice input and output, and a settings UI. Sessions are shared across surfaces. A conversation started in the desktop resumes in the CLI or TUI. The reverse also works, because state is not duplicated. macOS and Windows offer direct installers. Linux installs from the terminal on any distribution. An install script with an –include-desktop flag builds the app against an existing install. The Closed Learning Loop Nous research team describes Hermes as having a closed learning loop. This is what separates it from a simple chat wrapper. After a complex task, the agent writes a reusable skill. Those skills then self-improve during later use. Memory is persistent and agent-curated, with periodic nudges to save knowledge. Cross-session recall uses FTS5 session search with LLM summarization. User modeling runs through Honcho dialectic user modeling. In practice, longer use means more retained context and reuse. Skills follow the agentskills.io open standard. How It Connects, Schedules, and Sandboxes Hermes runs across messaging platforms from one gateway. The desktop lists Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI. You can start a task on one platform and continue on another. Scheduling uses natural language for reports, backups, and briefings. These run unattended through the gateway on a built-in cron scheduler. Delegation spawns isolated subagents with their own conversations and terminals. A subagent is a separate worker that handles one job. Python RPC scripts collapse multi-step pipelines into zero-context-cost turns. Execution is sandboxed. The desktop lists five backends: local, Docker, SSH, Singularity, and Modal. It applies container hardening and namespace isolation. Namespace isolation limits what a running process can see or touch. Built-in tools include web search, browser automation, vision, image generation, text-to-speech, and multi-model reasoning. Hermes also connects external tools through MCP. MCP is the Model Context Protocol, a standard for tool integration. Nous Portal and the Tool Gateway Hermes works with any provider, so API keys are optional. Nous Portal bundles them under one subscription instead. Portal tiers are Free, Plus, Super, and Ultra. Paid tiers include monthly credits and access to 300+ models. They also include built-in tool use. The Tool Gateway routes several tools through one account. Web search uses Firecrawl and image generation uses FAL. Text-to-speech uses OpenAI and the cloud browser uses Browser Use. The next evolution of Hermes Agent is here! Introducing Hermes Desktop: everything you love about Hermes, now native on your machine. First demoed in Jensen’s GTC keynote, it’s now in public preview. pic.twitter.com/8ND1k8hyaz — Nous Research (@NousResearch) June 2, 2026 Strengths and Questions Strengths: Native installers remove the terminal requirement for most users Streaming output and previews make tool calls easier to inspect Persistent memory and self-improving skills reduce repeated instructions Model-agnostic design avoids lock-in to a single provider The MIT license allows audit, self-hosting, and modification Questions: The product is in public preview, so expect rough edges Autonomous memory and scheduling raise oversight and review questions The Linux desktop still installs through the terminal Broad capability means a steeper learning curve for beginners Key Takeaways Nous Research released Hermes Desktop in public preview, a native macOS, Windows, and Linux app for its open-source Hermes Agent. The GUI shares one agent core, configuration, API keys, sessions, skills, and memory with the CLI and gateway; sessions resume across surfaces. It runs no-terminal with streaming tool output, a side-by-side preview pane, file browser, voice I/O, and a settings UI. Hermes is model-agnostic and MIT-licensed, working with Nous Portal, OpenRouter, OpenAI, or any compatible endpoint. The current build is Hermes Agent v0.15.2, backed by a closed learning loop, MCP tool support, and five sandbox backends. Check out the Project here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output appeared first on MarkTechPost.

Nous Research Releases Hermes Desktop: A Native Cross-Platform Front End for Hermes Agent v0.15.2 with Streaming Tool Output Read Post »

AI, Committee, News, Uncategorized

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA AI team have released Cosmos 3. It is a family of omnimodal world models for physical AI. The models combine physical reasoning, world generation, and action generation. All three capabilities live inside one open model. NVIDIA open sourced the checkpoints, training scripts, deployment tools, and datasets. The Cosmos 3 release targets robotics, autonomous vehicles, and warehouse monitoring teams. NVIDIA Cosmos 3 Physical AI systems must understand the world before acting in it. Robots and vehicles need to perceive, predict, and then act. Earlier Cosmos releases split these jobs across separate models. Cosmos 3 unifies them with a Mixture-of-Transformers (MoT) architecture. The architecture is built around two towers. The reasoner tower is a vision-language model (VLM). It interprets images, videos, and text using an autoregressive architecture. It understands motion, object interactions, and other physical context. NVIDIA team describes this tower as the model’s brain. The generator tower produces future observations and action sequences. It uses a diffusion-based process for physics-aware video and actions. These outputs are conditioned on the reasoner tower’s understanding. Information flows one way, from reasoner to generator. The reasoner can run alone. The generator always activates both towers for guided generation. A single model can therefore handle reasoning and generation together. https://developer.nvidia.com/blog/develop-physical-ai-reasoning-world-and-action-models-with-nvidia-cosmos-3 The Model Family NVIDIA team describes three model scales: Edge, Nano, and Super. Each uses the dual-tower Mixture-of-Transformers design. The two towers are initialized from pre-trained Qwen3-VL weights. That roughly doubles the parameter count of the backbone transformer. Cosmos3-Nano is a 16B model built on a dense 8B transformer. It adapts the Qwen3-VL 8B architecture. Nano targets efficient inference on workstation GPUs. It runs on hardware like the NVIDIA RTX PRO 6000. That suits real-time robotics and on-device physical AI. Cosmos3-Super is a 64B model built on a dense 32B transformer. It adapts the Qwen3-VL 32B architecture. Super targets datacenter GPUs, including NVIDIA Hopper and Blackwell. It fits large-scale synthetic data generation and advanced reasoning. This release ships Nano and Super, along with task-specific variants. These include Super Text2Image, Super Image2Video, and Nano-Policy-DROID. How the Unified Design Works Both towers share one transformer architecture and a joint attention operator. They use a 3D multimodal rotary position embedding (mRoPE). mRoPE aligns video, audio, and action tokens on one temporal axis. In Reasoner Mode, tokens pass through causal self-attention. This enables next-token prediction for perception, planning, and reasoning. In Generator Mode, noisy tokens are denoised through full attention. The autoregressive tokens are never updated by the diffusion tokens. The model treats action as a core modality with dedicated action tokens. Supported inputs include text, image, video, and JSON action arrays. Outputs include images, video, synchronized sound, action states, and text. The reasoner follows Qwen3-VL-compatible message conventions for vision inputs. Generation supports 256p, 480p, and 720p resolution tiers. Frame counts range from 5 to 300, defaulting to 189. That equals about 7.9 seconds of video at 24 FPS. Sound is generated as stereo AAC at 48 kHz. Action conditioning spans camera, vehicle, egocentric, single-arm, dual-arm, and humanoid embodiments. Each embodiment uses a fixed action dimension, such as 9D for cameras. The Benchmark Case NVIDIA team evaluated Cosmos 3 across reasoning and generation suites. On reasoning, Super and Nano lead VANTAGE-Bench at their respective tiers. VANTAGE-Bench tests VLMs on real-world fixed-camera footage. It covers warehouses, transportation, and smart spaces. Cosmos 3 also tops the Traffic Anomaly Reasoning (TAR) leaderboard. TAR is the official leaderboard for AI City Challenge 2026 Track 3. On generation, NVIDIA reports open-source state-of-the-art results. Cosmos 3 is the open-source SOTA on R-Bench. It also leads PAI-Bench, Physics-IQ, and RoboLab on public leaderboards. On Artificial Analysis, it leads two open-source leaderboards. These cover text-to-image and image-to-video without audio. NVIDIA team also introduced its Cosmos Human Evaluation framework, called HUE. HUE decomposes each generated video into yes/no fact questions. It scores four dimensions across seven physical AI domains. The dimensions are semantic alignment, physical laws, geometric reasoning, and visual integrity. A VLM pipeline drafts the questions, and human experts refine them. Marktechpost’s Visual Explainer marktechpost@guide ~ /nvidia/cosmos-3 01 / 09 DEVELOPER GUIDE · PHYSICAL AI NVIDIA Cosmos 3 Open omnimodal world models for physical AI. Released May 31, 2026. One model for physical reasoning, world generation, and action generation. Mixture-of-Transformers Open weights OpenMDW-1.1 Use ← → or swipe to navigate 01 · WHAT IT IS A unified model for understanding and generation Cosmos 3 is a family of omnimodal world models for physical AI. Earlier Cosmos releases split jobs across separate models. Cosmos 3 unifies them in a single open model. Physical reasoning over images, video, and text. World generation of physics-aware video and sound. Action generation for robots and autonomous systems. Subsumes VLMs, video generators, world simulators, and world-action models. 02 · ARCHITECTURE Two towers, one transformer REASONER TOWER An autoregressive vision-language model (VLM). It interprets motion, object interactions, and physical context. NVIDIA calls it the model’s brain. GENERATOR TOWER A diffusion-based path for physics-aware video and actions. It is conditioned on the reasoner’s understanding. Information flows one way, reasoner → generator. Both towers share a 3D multimodal RoPE (mRoPE). 03 · MODEL FAMILY Pick a size for your hardware Cosmos3-Nano 16B total (dense 8B, Qwen3-VL 8B). Workstation GPUs like RTX PRO 6000. Real-time robotics. Cosmos3-Super 64B total (dense 32B, Qwen3-VL 32B). Datacenter Hopper and Blackwell GPUs. Large-scale SDG. Cosmos3-Edge 4B total (dense 2B). On-device scale. Planned for a later release. Plus variants: Super-Text2Image, Super-Image2Video, and Nano-Policy-DROID. 04 · MODALITIES Inputs, outputs, and generation settings Inputs: text, image, video, and JSON action arrays. Outputs: image, video, synchronized sound, action states, text. Resolution: 256p, 480p, 720p. Sound: stereo AAC at 48 kHz. Length: 5 to 300 frames; default 189 (about 7.9s at 24 FPS). Embodiments: camera, vehicle, egocentric, single-arm, dual-arm, humanoid. 05 · BENCHMARKS What NVIDIA reports REASONING Nano and Super lead VANTAGE-Bench at their tiers. Cosmos 3 tops TAR, the AI City Challenge 2026 Track 3 leaderboard. GENERATION Open-source SOTA on R-Bench. Leads PAI-Bench, Physics-IQ, and RoboLab. Top open-source on Artificial Analysis text-to-image and image-to-video.

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation Read Post »

AI, Committee, News, Uncategorized

The Download: Trump’s new AI order, and smart glasses for warfare

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. 5 key points in Trump’s new AI order Less than two weeks after scrapping an executive order on AI, President Donald Trump signed a new one on Tuesday. Promising to promote innovation and security, the policy represents a turning point in the White House’s AI governance—but is likely to attract criticism from both opponents and supporters of stricter regulation. Here are five key points from the order: 1. It’s created a voluntary review system: tech companies will be asked to share frontier models with the government for review 30 days before they plan to release them.2. There’s no mandatory licensing: the government will not require permits before software can be deployed.3. It establishes a dedicated AI cybersecurity clearinghouse: the new hub will coordinate security checks with the private sector.4. It’s a watered-down version of the order Trump shelved last month: the earlier version requested models 90 days before their release.5. But it’s still a move towards stronger AI oversight: the policy marks a clear departure from the White House’s previous hands-off approach. Plus: here’s why a previous Trump administration’s AI policy was a distraction and how AI is already making online crimes easier.  MIT Technology Review Narrated: inside Anduril and Meta’s quest to make smart glasses for warfare The defense-tech company Anduril has shared new details about the augmented-reality headset for the military it’s prototyping with Meta, including a vision for ordering drone strikes via eye-tracking and voice commands. Quay Barnett, who leads the effort at Anduril following a career in the Army’s Special Operations Command, aims to optimize “the human as a weapons system.” His vision is cyborg-inspired: drones and soldiers will see together, share information seamlessly, and make decisions as one. —James O’Donnell This is our latest story to be turned into an MIT Technology Review Narrated podcast, which we publish each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 President Trump has signed an AI order that expands model oversightThe long-awaited executive order aims to mitigate security threats. (NYT $)+It asks companies to submit models voluntarily for tests before release. (NPR)+ It’s a slimmed-down version of the order Trump shelved in May. (WSJ $)+ And marks a strategic shift in his AI strategy. (Reuters $)+ A war over AI regulation is coming to the US. (MIT Technology Review) 2 SpaceX plans to raise $75 billion in IPO at $135 per shareThe company intends to sell 555.6 million shares. (Reuters $)+ The fixed price breaks from the traditional IPO process. (Bloomberg $)+ Morningstar says the valuation should be nearly 50% lower. (BI) 3 Meta has scaled back plans to track workers’ clicks and keystrokes to train AIAll staff can pause it for 30 minutes, with some fully exempt.(The Information $)+ The changes follow a fierce backlash to the tracking plans. (Reuters $)+ AI is supercharging surveillance. (MIT Technology Review) 4 Microsoft wants to ‘make users addicted’ to its new AI assistantAccording tointernal documents for the “Scout” tool. (404 Media)+ Microsoft launched the assistant on Tuesday. (TechCrunch)  5 Mathematicians fear that AI threatens their fieldA new declaration raises concerns about AI’s trustworthiness. (Ars Technica)+ It arrives a week after OpenAI said it solved a famous math problem. (WSJ $)+ A startup wants to change how mathematicians do math. (MIT Technology Review) 6 Scientists have found a way to supercharge computer worms with AIThe worm could target any known flaw in the world’s computers. (NYT $)+ AI supercharging scams. (MIT Technology Review) 7 Google must let UK publishers opt out of AI search featuresOnline publishers can choose not to appear in the AI Overviews. (BBC)+ Google is now testing features for sites to exit AI search. (Reuters $) 8 America’s data center build-out is falling way behind schedule60% of those planned for completion in 2027 aren’t yet under construction. (WSJ $)+ Nobody wants a data center in their backyard. (MIT Technology Review) 9 EVs are getting cheaper worldwide—except in the USThe US is short on supportive policies and affordable Chinese EVs. (Rest of World) 10 The European Parliament is ditching Google for… QuantThe French search engine is the new default on in-house computers. (Politico)+ The switch comes amid a broader push to wean the EU off US tech. (FT $) Quote of the day “SpaceX’s valuation could be richer than a plate of dauphinoise potatoes.” —Dan Coatsworth, head of markets at AJ Bell, tells CNBC that SpaceX’s IPO price looks overloaded with expectations. One More Thing Marseille’s battle against the surveillance state Heading toward Marseille’s central train station, Eda Nano points out what looks like a streetlamp on the Rue des Abeilles. But this sleek piece of urban furniture is not a lamp. It’s a video camera, with a 360-degree view of the narrow street. Nano, a 39-year-old developer, wants to make Marseille residents more aware that they’re being watched. She’s part of a growing group of activists resisting the rise of policing cameras in their hometown. Find out how the rebellious port city of Marseille is fighting the surveillance state. —Fleur Macdonald We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + These aerial photos of solar farms transform renewable energy into abstract art.+ Open a window over Earth’s water with this hypnotic 4K atmospheric film made from satellite imagery.+ Spend three relaxing hours with David Attenborough narrating this collection of extraordinary wildlife moments.+ Radiohead sounds beautiful on traditional Japanese instruments in this koto performance of “Weird Fishes/Arpeggi”.

The Download: Trump’s new AI order, and smart glasses for warfare Read Post »

AI, Committee, News, Uncategorized

How virtual power plants could provide energy for data centers

Would you take a payment to ramp down your electricity use? Would it change anything if you were doing so to help power a local data center? Google just signed a new deal to help pay for a virtual power plant (VPP) in the largest power grid in the US. The agreement is with Voltus, a leading VPP and distributed energy resources platform. Voltus will set up the virtual power plant, grouping together devices like electric vehicles and smart thermostats. It’ll pay customers to participate, and the company will dial back power or use the stored energy during times when the grid is stressed. Google will foot the bill for setting it up, and the extra capacity generated by the project will help run its data centers in the region. This is one of the most concrete examples so far of a tech giant using a VPP to help meet energy demand for data centers. But there are still some lingering questions about just how far this sort of program can go, and what the limits are. Last year, it felt as if everyone was talking about data center flexibility. A high-profile study from Duke University found that if data centers agreed to decrease their energy demand for roughly 40 hours per year, a whole bunch of them (about 100 gigawatts’ worth) could come online without making new power plants or transmission equipment necessary. The underlying reason is that our power grid is designed not for our average energy use, but for the absolute maximum: the brutally hot July evening when everyone is blasting their air conditioners, watching Love Island, and microwaving popcorn. If a data center is willing to refrain from pulling so much power during those high-stress times, the grid can happily support it the rest of the year. One lingering question here is about incentives: How would you get data centers to agree to this? After all, they might not have a very flexible load, especially now that AI use is more widespread—training a model can easily be delayed or shifted, but customer demand is more immediate. Giving up computing capacity could mean losing revenue. Regulation is one approach that could work here. One proposal in the US would allow new data centers to come online years sooner if they agree to lower demand when the grid is nearing its max.  And a new Texas law requires large users to switch to backup power or curtail their demand in emergency situations. Another approach is for data center operators to pay for other people to be flexible. Voltus announced a new program in September that allows data centers to finance flexibility on their local grid. The company calls it “Bring your own capacity.” Google is now the first named customer taking advantage of this program. In the new agreement, Voltus will pay people who agree to participate in the virtual power plant. The plant will be part of PJM, the grid that covers much of the US East Coast. The company says it will be able to aggregate up to 100 megawatts of distributed energy resources each year. The plant should be operational in 2027, according to Voltus. This isn’t Google’s first foray into flexibility; the company has agreements with utilities across the US to limit or shift its own energy demand, which can help free up grid capacity. As the company pointed out in a blog post earlier this year, though, there are limits on how flexible a data center can be, and not every facility will be able to ramp down its power demand. “There is no one solution for expanding grid capacity and we’re continuing to explore all options, including the many avenues for load flexibility,” said Michael Terrell, Google’s global head of advanced energy, in an emailed statement in response to written questions. Once again, I’m wondering about incentives here. These companies are asking homes and businesses to be flexible. Will they agree? A recent study in California looked at local people’s willingness to participate in managed electric-vehicle charging. Essentially, the program pays people to give up control of when they charge their EVs. This is another way to help smooth out electricity demand and ease the burden on the grid. The problem? Not many people signed up. With no economic incentive, only 1% of EV owners enrolled in managed charging. At $40 per month (about 15% of their power bill), only 4.6% did. This is a different situation and a different region from the one in which Google is working with Voltus. (It’s worth noting that the companies aren’t sharing how much they plan to pay the participants, which will obviously be a big determinant in participation for this kind of project.)  But this study shows that even with money on the table, people may not always jump at the chance to cede control of their electricity demand. And it certainly feels relevant that about 70% of Americans oppose AI data centers in their area, according to recent Gallup polling.  Being flexible sounds like a great idea in theory, and these financed VPPs could provide an immediate route to meeting energy demand. But as we move from idea to implementation, it’ll be interesting to see whether trial runs work as intended.   This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here. 

How virtual power plants could provide energy for data centers Read Post »

AI, Committee, News, Uncategorized

How small businesses can leverage AI

This article is from Making AI Work, MIT Technology Review’s limited-run newsletter examining how to apply LLMs across industries. To receive it in your inbox,sign up here. From accounting to design to market research and product development, there’s a staggering breadth of skills needed to run a business. A large company can hire experts to handle these tasks, but small businesses don’t always have this luxury. That’s where AI comes in. Today’s AI models do a decent job at these tasks. The trick for small businesses is to understand where AI is good enough and where it’s not. One place where a “good enough” AI can already be quite valuable to small business owners is in providing secretarial skills and handling basic administrative matters. Let’s take a look at how one private tutor is using it to improve his recordkeeping and free up his time. Case study Sam Finnegan-Dehn works in fundraising for a charity, but he moonlights as a math and philosophy tutor for university students from his home in London. Through this part-time business, he can leverage his degrees in philosophy and share his love of the subject with clients. But meeting with students is only a fraction of the work it takes to be a good tutor. He also plans lessons and finds fresh reading materials, creates assignments, sends invoices, and keeps up with new research—all on top of his regular job. Given these demands, Finnegan-Dehn doesn’t have as much time as he’d like to grow his tutoring roster. So he’s turned to AI for some help in managing the day-to-day aspects of his business. He says AI has taken on a secretarial role across all of his digital notebooks, where he jots down reminders about his clients’ progress and new readings to keep himself up-to-date. He describes using AI as kind of like having a second memory that helps him connect ideas he’s written down in various places. While he has experimented with different tools like Claude and ChatGPT, he’s now landed on Notion AI because it integrates better with his tutoring notes, which live across his notebook tabs in the Notion app. Finnegan-Dehn doesn’t use AI to create teaching materials, but he does let Notion AI record meetings with his clients (after getting their consent), and then uses its automated summaries to refine his teaching strategy. For example, if he notices from the AI’s summary that it seems like a certain technique was not helping a student, he may change how he approaches the subject next time. Beyond this, Notion AI also helps him with goal-setting, drafting lesson notes, invoicing, and generating and syncing social media posts. For goal-setting, for example, Finnegan-Dehn says he understands his long-term goals for his business but not always the concrete steps to build to them. He uses AI to help fill in these gaps. He starts by writing down a “North Star” goal—say, to have a certain number of clients by the end of the year. Next, he asks his AI to generate the steps that he needs to take to get there, given the profile he has built up in the app. Then, he can reflect on the results and choose which tasks to tackle first. The tool Notion has been a big player in note-taking software for many years. Its AI add-on, released in late 2023, now has tools that enable it to interact with many other online productivity platforms. There’s an email client, calendar integrations, and a newly released agent. And while this level of access has raised privacy concerns, it can also make for a pretty powerful virtual assistant. Many of the tasks targeted by Notion AI are less creative and more rote: syncing information across documents or searching through old scribbles, for example. This makes the tool especially appealing to small business owners, who have limited bandwidth, particularly for menial work. Other companies are developing tools targeted at specific industries. For example, Grandma’s Quilt Shop in Yuma, Arizona, uses Rain, which has a software suite tailored to craft companies, to generate inventory descriptions and pricing for its stock of fabric designs. The owners claim this AI tool cuts the time it takes to list items by 60 to 80%. There are drawbacks, though, as Finnegan-Dehn described some of Notion AI’s idiosyncrasies as “clunky” at times. And the AI add-on for Notion costs $20 per month. As with all new tools, small business owners should carefully assess how the potential gains and headaches measure up against the cost of just doing the job themselves. User tips Consider these points when thinking about whether AI might be able to help you run a business, or make any part of your work life just a little bit easier.  Look before you leap. Since LLMs feed on the data you input to answer your queries or complete tasks, you want to give them information in a way that’s convenient for you and for the model. For many of these notebook AI services, this means, for example, using their platform for notetaking so you don’t have to input or upload notes later. Because of this, it’s a good idea to weigh your options carefully before committing to an AI-powered ecosystem. Work to your strengths. Think about what skills you lack in-house, and see if AI can either help with training or take these tasks on for you. Just be aware: AI hallucinates and makes mistakes, so think about where accuracy is needed and keep humans in charge there. AI isn’t always the best tool. It’s okay to use something off the shelf when that’s the better choice. It’s going to be safer, for example, to use existing payment processing platforms like Shopify or Square than to vibe-code one using AI. Consider using local models for any sensitive information. Our reporting has covered the risks that online AI models have in leaking sensitive data, and there have been many reports about how AI companies collect your data when you

How small businesses can leverage AI Read Post »

AI, Committee, News, Uncategorized

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Alibaba Cloud’s Bailian platform. Bailian is the console international users access as Model Studio. It offers API services to external developers. The release follows Alibaba’s May unveiling of the Qwen3.7 generation. Qwen3.7-Plus Qwen3.7-Plus is a multimodal large language model. The model understands images and video, alongside written prompts. Its sibling, Qwen3.7-Max, is text-only. This is visual understanding, not generation. The model reads images and video; it does not create them. Alibaba’s image and video generation work sits in separate model families. Alibaba team describes the release as a step in multimodal hybrid agent technology. An agent is a model that plans and acts across steps. Building on image and video understanding, Qwen3.7-Plus adds five abilities. These are deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration. Self-programming means the model writes and revises its own code. Tool invocation means it calls external functions or APIs. Verification and testing means it runs outputs and checks results. Autonomous iteration means it loops until the task is done. Together, they describe a model built to act, not just answer. The Vision Case Qwen3.7-Plus is the multimodal half of the 3.7 family. Its preview already posted measurable vision results. In Vision Arena, Qwen3.7-Plus-Preview ranked #16 overall. That placed Alibaba as the #5 lab in vision. The model rank and the lab rank are separate figures. Vision Arena is a neutral leaderboard run by LM Arena. Users vote on image-understanding answers in blind matchups. The #16 result sits behind the top US labs, but inside the field. For image-heavy work, this is the signal that matters. Think OCR at scale, chart reading, or video-frame analysis. The text-only Max sibling anchors the generation’s reasoning. Max scored 56.6 on the Artificial Analysis Intelligence Index. That was the highest placement for a Chinese model at release. https://qwen.ai/blog?id=qwen3.7-plus The Agentic Loop The clear shift in Qwen3.7 is its agentic focus. Alibaba team is positioning the models for long-running tasks. Bailian, the host platform, adds two relevant pieces. The first is an Agentic RL (reinforcement learning) mechanism. The platform uses real-world execution feedback to refine model accuracy over time. The second is a set of built-in safety guardrails. These keep autonomous tools inside preset operational limits. That detail matters when an agent runs commands or edits files. Marktechpost’s Visual Explainer AI Models · Field Guide 1 / 7 Alibaba Qwen · June 2, 2026 Qwen3.7-PlusAlibaba’s multimodal agent model, now on Bailian A multimodal large language model with image and video understanding, deep reasoning, and agentic features. Available via API on Alibaba Cloud’s Bailian platform, accessed internationally as Model Studio. Use the arrows or swipe to explore → 01 · What it is A multimodal large language model Multimodal — it reads images and video, alongside text input. Visual understanding, not generation — it reads media, it does not create it. The multimodal sibling to the text-only Qwen3.7-Max. Alibaba describes it as multimodal hybrid agent technology. 02 · Capabilities Five abilities beyond seeing Deep reasoning — works through problems step by step. Self-programming — writes and revises its own code. Tool invocation — calls external functions or APIs. Verification and testing — runs outputs and checks results. Autonomous iteration — loops until the task is done. 03 · Vision benchmarks Where it stands on vision The preview ranked #16 overall in Vision Arena (LM Arena). That placed Alibaba as the #5 lab in vision. Model rank and lab rank are separate figures. Relevant for OCR, chart reading, and video-frame analysis. For reference, the text-only Max sibling scored 56.6 on the Artificial Analysis Intelligence Index, the highest Chinese model at release. 04 · The agentic loop Built for long-running tasks Bailian adds an Agentic RL (reinforcement learning) mechanism. It uses real-world execution feedback to refine accuracy. Built-in safety guardrails keep autonomous tools within limits. That matters when an agent runs commands or edits files. 05 · Confirmed vs unconfirmed What we know today Confirmed Image and video understanding Agentic feature set Bailian API access Proprietary, API-only Not yet published Public price sheet Context window size Output token limits Open weights 06 · Why it matters The practical read A vision-capable agent backend through one API. Suits workloads mixing images, video, and tool use. A leaderboard rank shows promise, not a guarantee. Validate accuracy on your own data before committing. ‹ › Marktechpost AI research, news, and developer signal for engineers and data scientists. Read more at marktechpost.com. Key Takeaways Alibaba released Qwen3.7-Plus, a multimodal model now available via API on its Bailian platform (Model Studio). It understands images and video as input — understanding, not generation — and adds agentic features. Capabilities include deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration. Its preview ranked #16 in Vision Arena, making Alibaba the #5 lab in vision. Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us The post Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform appeared first on MarkTechPost.

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform Read Post »

AI, Committee, News, Uncategorized

Rehumanizing global health care with agentic AI

The global health care sector is under increasing strain.  Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for aging populations. Gaps in provision are already taking a toll, with fragmented access to care and high rates of stress and burnout among staff. And it’s getting worse. The World Health Organization has warned that current shortfalls will increase to 11 million workers by 2030.  In their urgent hunt for a solution, many health-care providers are now pinning their hopes on agentic AI, with more than two-thirds (68%) having already adopted AI agents into their workforce, according to KPMG.  The technology is being deployed to automate complex back-office processes, collaborate with medical teams, and even triage patients, all in a bid to reduce the cognitive load on clinicians and improve quality of care for patients as the supply of human health-care workers dwindles. A different type of digitalization  Until now, the benefits of digitalization within health care have been limited.  Many staff have blamed slow or outdated technology for adding to the administrative burden rather than alleviating it. For example, U.S. patient data was migrated to electronic health records (EHRs) in the early 2000s, but this data remains fragmented and reliant on manual inputs.  New telehealth services and digital care tools, like remote monitors, have had similar shortcomings, says Ashis Barad, MD, chief digital and technology officer at Hospital for Special Surgery (HSS), an academic medical center in New York that focuses on musculoskeletal health. Both technologies have helped improve access to health care by removing geographical barriers, he says, but they’ve failed to replicate the quality of in-person care or win trust from patients.  Agentic AI is different from these existing technologies, he insists.  Rather than relying on manual inputs or defaulting to human workers for any case that sits slightly outside a rigid framework, AI agents can handle nuanced, complex scenarios. They can make autonomous decisions, retrieve information from expert clinical sources, and iterate over time, freeing clinicians to focus on higher-level patient care. As Dr. Barad puts it: “Agentic AI takes your workflow and collapses it, augments it, supercharges it, and makes it more performant.”  At HSS, AI agents have already been deployed in multiple areas. They handle complex backend processes, such as insurance claims that previously took several weeks to complete and involved both HSS staff and a third-party contractor to handle the volume. Now, says Dr. Barad, AI agents complete 1,100 claims per month. They’ve reduced the appeals stage from 45 minutes to five and improved the success rate of those appeals from 65% to 100% in the nine months since implementation. HSS now handles all claims in-house.  Building on that success, HSS is now deploying AI agents in non-clinical patient-facing settings with an AI scheduling and triage service, as part of a collaboration with enterprise agentic AI developer Ema Unlimited. The service is accessible 24/7 via web, text, or phone. It uses conversational AI to ask patients clarifying questions about their condition and then books appointments with the most appropriate clinician, factoring in location, insurance coverage, and physician availability. “It completes the whole loop,” says Dr. Barad. The AI agent is trained on “all of our context, all of our rules, and all of our knowledge base,” he adds, providing patients with streamlined access to highly specialist knowledge from world-leading surgeons. Given the high-stakes decisions delegated to AI agents, the triage service has built-in safeguards—sensitive, complex, or uncertain scenarios are escalated to human specialists. Every decision made by the AI agent is auditable and human staff can step in at any point. Patient data is kept secure and the system is trained on all HSS protocols, policies, and care pathways. By keeping humans in the loop, Ema says its technology strikes the balance between efficient automation, patient-first safety, and human-informed decision making.  As the technology becomes more prolific, it will be incumbent on providers to ensure they have these sorts of guardrails embedded into systems, says Dr. Barad. At HSS all decisions around the technology are filtered through an AI subcommittee that Dr. Barad co-chairs alongside a senior nursing executive. AI agents that may touch on patient care will be scrutinized with far more rigor than, say, backend processes, he explains. AI agents prompt systems-level change For example, Dr. Barad has plans to create a dedicated AI lab at the HSS main campus in New York City—a move that aims to democratize access to the technology across the organization. It will be open to all staff looking to understand or build AI agents, he explains, with informative classes and one-on-one training. “We’re getting agentic AI into everybody’s hands,” he says. This echoes research by Deloitte, which found that leading agentic AI adopters in health care were far more likely to have opted for multiagent solutions, redesigning end-to-end workflows rather than sticking to narrow solutions or individual use cases. The key, it appears, is to integrate AI agents across the entire enterprise, treating them as a general-purpose technology. As Dr. Barad puts it: “It’s wrong to think of agentic AI in use cases… It’s a general-purpose technology, analogous to electricity.” In practice, this means health-care providers need to set the right foundation to achieve value with agentic AI. This includes creating a unified data strategy, one that integrates fragmented data sources across an organization to create a single, comprehensive source of truth. In health care, data is often split across multiple departments and providers, each with their own legacy IT system. In systems that rely on fragmented data sources, metrics often lack standardized definitions too. For example, Dr. Barad says that each hospital he’s worked in has had a slightly different definition for “time to start surgery,” a metric commonly used to gauge operating room efficiency. This level of fragmentation impedes AI agents from retrieving information from different sources or applications and assimilating the tacit knowledge that differentiates them from other technologies. By creating greater interoperability of

Rehumanizing global health care with agentic AI Read Post »

AI, Committee, News, Uncategorized

The Download: AI can run your admin department now

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How small businesses can leverage AI From accounting to design to market research and product development, there’s a staggering breadth of skills needed to run a business. Large companies can hire experts to handle these tasks, but small businesses don’t always have that luxury. That’s where AI comes in. Today’s models can already take on a range of basic administrative work, from organizing notes and summarizing meetings to invoicing, goal-setting, and social media planning. Find out how small-business owners can put AI to work. —Peter Hall This article is from Making AI Work, MIT Technology Review’s limited-run newsletter examining how to apply LLMs across industries. To receive it in your inbox, sign up here. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Anthropic has confidentially filed for IPO ahead of OpenAIIt aims to go public as early as this fall. (CNN)+ The company did not disclose its target valuation. (Guardian)+ It’s expected to list shortly after a trillion-dollar IPO by SpaceX. (BBC)+ Beating OpenAI in the IPO race could have a big impact. (WSJ $) 2 The EU may exclude US cloud giants from critical contractsThe likes of Amazon, Microsoft, and Google could be shut out. (Reuters $)+ The EU aims to reduce its dependence on US tech. (FT $)+ Trump supercharged this sovereignty push. (Politico $) 3 Florida has become the first state to sue OpenAIThe lawsuit targets ChatGPT’s alleged child safety risks. (NPR) + Florida says OpenAI put profit ahead of safety. (Reuters $)+ Chatbots are now starting to check user ages. (MIT Technology Review) 4 Hackers stole Instagram accounts just by asking Meta AI for themThey easily broke into a host of celebrity profiles. (404 Media)+ The exploit shows the risk of offloading support to AI. (TechCrunch)+ AI is making online crimes easier. (MIT Technology Review) 5 Chinese universities with military ties are seeking Nvidia chipsTwo are blacklisted by the US Commerce Department. (Bloomberg $)+ The Chinese military has sought restricted Nvidia chips for years. (NYT $)+ US senators have slammed a loophole in chip export rules. (Reuters $) 6 Blue Origin and NASA disagree on a crucial rocket’s next flight+ Blue Origin says the rocket will fly again this year. (Engadget)+ But NASA is less optimistic. (CNBC)+ The rocket’s failure cast doubt on NASA’s moon plans. (BBC) 7 Moderna has won funding to develop an Ebola mRNA vaccineThe CEPI has pledged over $60 million to the effort. (Ars Technica)+ To fight an outbreak raging out of control. (MIT Technology Review) 8 China is using AI to predict future political dissentA company called Geedge Networks is developing the tech. (NYT $) 9 Geoengineering can thicken Arctic ice, but melt results are mixedTrials show the tech has had a limited impact. (New Scientist $) 10 Top AI labs are expanding research into machine ‘consciousness’Meta, Anthropic, and DeepMind are increasing their investments. (FT $)+ A new tool could show how consciousness works. (MIT Technology Review) Quote of the day “Sam Altman and ChatGPT have chosen the AI race over the safety and security of our kids. They have chosen profit over public safety, and we’re not going to stand for it in here in Florida.”  —Florida Attorney General James Uthmeier tells reporters why his state is suing OpenAI, the LA Times reports. One More Thing The entrance to the Moscow storage facility of KrioRus, which was until recently the only cryonics company in Eurasia.ALESSANDRO GANDOLFI Why the sci-fi dream of cryonics never died Cryonics is best known for its appearance in sci-fi films like 2001: A Space Odyssey. But its adherents have held on to a dream that advances in medicine will one day allow for resuscitation and additional years on Earth. Around 500 people are preserved in liquid nitrogen globally, while another 4,000 are on waiting lists. Despite scant evidence that cryonics can work, believers remain optimistic that future science could eventually revive them. Discover why the hope of human reanimation refuses to die. —Laurie Clarke We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Hear Dolly Parton reimagined through this spot-on Dire Straits-style cover of “Jolene”.+ Find out which birds people search for most in this interactive visualization of bird popularity.+ Explore thousands of Q&As between students and astronauts on the ISS at this interactive site.+ Paris’s oldest bridge disappeared beneath a giant inflatable cave in this surreal public art installation.

The Download: AI can run your admin department now Read Post »

AI, Committee, News, Uncategorized

China has approved the world’s first invasive brain-computer chip—here’s what’s next

One day last October, sitting in the courtyard of his house in China’s Henan province, Dong Hui decided to see if he could hold a pen to write.  Dong, 39, had sustained spinal cord injuries in a car accident six years earlier that left him paralyzed from the neck down. Slowly but determinedly, he wrote his name, “Thank you,” and then the date. This was the result of an 11-month-long rehabilitation enabled by an implant in his brain. Before that process, Dong could move his arms slightly but wasn’t able to use his fingers. “I couldn’t believe I was able to write again. I was so excited I even missed a stroke in my name,” he told MIT Technology Review on a video call.  In November 2024, Dong became one of the first people in China to be given an invasive brain-computer interface (BCI) through brain surgery. He had signed up for a clinical trial with the device’s developer one month after seeing on TV how a BCI had apparently enabled another paralyzed Chinese man to hold his granddaughter.  This March, the implant Dong uses became the first invasive BCI product in the world to be approved for use beyond clinical trials. It’s now available to some patients with paralysis in their limbs due to spinal cord injuries. We spoke to a range of experts to understand why the device was able to reach this global milestone, what makes this moment so significant, and what to expect next.  A world first Dong’s brain implant is a coin-size device called NEO. It was developed by Neuracle Technology, a Shanghai-based startup, together with researchers at Tsinghua University in Beijing.  During a procedure that took just over an hour and a half, the device’s sensors, which collect Dong’s brain signals, were placed on his dura mater, the tough outer layer of tissue that covers and protects the brain. The signals are transmitted to a computer by an implant placed on Dong’s skull. The computer then translates the signals into commands for a soft robotic glove Dong wears during the 2.5-hour training sessions he completes each day to help him learn to grab.  Dong started his rehabilitation around a week after surgery. “On the ninth day of my training, my right hand successfully grabbed a ball without the glove,” he says. “That was a miraculous moment.”  Now he continues with his training at home. He wants to be able to control his hands better in order to put on clothes, eat, and do other daily tasks without troubling his aging parents.  A growing number of people with traumatic injuries in China are now poised to tread a similar path thanks to NEO’s recent approval. According to China’s National Medical Products Administration, the bureau responsible for drug supervision, the product is suitable for patients between 18 and 60 who have paralysis in all limbs due to spinal cord injuries but still have some residual function in their arms.  NEO beat several other BCIs to approval, including one from Neuralink, a California-based company founded by Elon Musk. Since October 2023, Neuracle has conducted 36 clinical trials using NEO, including the one on Dong. Thirty-two of them took place in the space of a few months in 2025, with the details about one of the four first in-person trials published in a preprint paper last July. Neuracle did not reply to a request for comment from MIT Technology Review. One reason for NEO’s fast approval could be that it has a “relatively less invasive” design than counterparts such as Neuralink’s N1 brain chip, says Avinash Singh, a BCI researcher at the University of Technology Sydney. NEO’s eight sensors sit on top of the brain’s protective membrane while Neuralink’s N1 chip directly penetrates the cortex, the outermost layer of the brain itself. Neuracle’s device faces fewer regulatory constraints because it presents a lower risk of hemorrhage, glial scarring, and long-term signal degradation, Singh says. China’s strong support for its BCI industry also means that NEO was put on an expedited regulatory pathway; in comparison, the approval process of the US Food and Drug Administration can take several years, Singh adds. A big boost for BCIs NEO’s approval is hugely important for the global BCI industry, says Wang Shouyan, a neuroscientist at Fudan University in Shanghai who was not involved in research or trialing for NEO. Even though research and development on BCIs has taken place for several decades, most of it happened in the lab. The news means that BCIs are now ready for large-scale manufacturing and clinical use in China, Wang says.  For Dong, however, it means something much more personal. “Now, it will be able to help not only me, but also thousands and thousands of other patients suffering from spinal cord injuries in China who are tortured by despair each day,” he says of NEO. “It will bring them hope and change their lives.”  Days after NEO was approved, China started incorporating it into the country’s health insurance system by assigning it a unique code. This is one of the first steps toward a future where eligible Chinese patients pay a certain percentage of the BCI’s price if they need it during their treatment. The growth of China’s BCI industry is expected to accelerate thanks to the government’s policy support and financial backing. The country’s latest five-year plan, published on the same day Neuracle received its approval, lists BCI as one of six key industries important to China’s future tech competitiveness, alongside quantum technology, humanoid robots, and others. Several Chinese startups, including NeuroXess and StairMed, have already worked in the field for many years.  “China’s decision to double down on becoming a global leader in the field owes in part to what these companies have already accomplished,” says Meicen Sun, an information scientist at the University of Illinois Urbana-Champaign who studies information and technology policy.  But, Sun says, the biggest advantage China may have is that Chinese people, particularly patients like Dong, tend to welcome

China has approved the world’s first invasive brain-computer chip—here’s what’s next Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US