YouZum

ข่าว

AI, Committee, ข่าว, Uncategorized

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

arXiv:2604.17745v1 Announce Type: new Abstract: Recent advances in large language models have highlighted their potential to automate computational research, particularly reproducing experimental results. However, existing approaches still use fixed sequential agent pipelines with weak global coordination, which limits their robustness and overall performance. In this work, we propose Hierarchical Research Agent System (HiRAS), a hierarchical multi-agent framework for end-to-end experiment reproduction that employs supervisory manager agents to coordinate specialised agents across fine-grained stages. We also identify limitations in the reference-free evaluation of the Paper2Code benchmark and introduce Paper2Code-Extra (P2C-Ex), a refined protocol that incorporates repository-level information and better aligns with the original reference-based metric. We conduct extensive evaluation, validating the effectiveness and robustness of our proposed methods, and observing improvements, including >10% relative performance gain beyond the previous state-of-the-art using open-source backbone models and significantly reduced hallucination in evaluation. Our work is available on GitHub: https://github.com/KOU-199024/HiRAS.

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution Read Post »

AI, Committee, ข่าว, Uncategorized

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence

In this tutorial, we build an end-to-end implementation around Qwen 3.6-35B-A3B and explore how a modern multimodal MoE model can be used in practical workflows. We begin by setting up the environment, loading the model adaptively based on available GPU memory, and creating a reusable chat framework that supports both standard responses and explicit thinking traces. From there, we work through important capabilities such as thinking-budget control, streamed generation with separated reasoning and answers, vision input handling, tool calling, structured JSON generation, MoE routing inspection, benchmarking, retrieval-augmented generation, and session persistence. Through this process, we run the model for inference and also examine how to design a robust application layer on top of Qwen 3.6 for real experimentation and advanced prototyping. Copy CodeCopiedUse a different Browser import subprocess, sys def _pip(*a): subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, *a]) _pip(“–upgrade”, “pip”) _pip(“–upgrade”, “transformers>=4.48.0”, “accelerate>=1.2.0”, “bitsandbytes>=0.44.0”, “pillow”, “requests”, “sentencepiece”, “qwen-vl-utils[decord]”, “sentence-transformers”, “jsonschema”) import torch, os, json, time, re, gc, io, threading, textwrap, warnings from collections import Counter from typing import Any, Optional warnings.filterwarnings(“ignore”) assert torch.cuda.is_available(), “GPU required. Switch runtime to A100 / L4.” p = torch.cuda.get_device_properties(0) VRAM_GB = p.total_memory / 1e9 print(f”GPU: {p.name} | VRAM: {VRAM_GB:.1f} GB | CUDA {torch.version.cuda} | torch {torch.__version__}”) if VRAM_GB >= 75: LOAD_MODE = “bf16” elif VRAM_GB >= 40: LOAD_MODE = “int8” else: LOAD_MODE = “int4” try: import flash_attn ATTN_IMPL = “flash_attention_2” except Exception: ATTN_IMPL = “sdpa” print(f”-> mode={LOAD_MODE} attn={ATTN_IMPL}”) from transformers import ( AutoModelForImageTextToText, AutoProcessor, BitsAndBytesConfig, TextIteratorStreamer, StoppingCriteria, StoppingCriteriaList, ) MODEL_ID = “Qwen/Qwen3.6-35B-A3B” kwargs = dict(device_map=”auto”, trust_remote_code=True, low_cpu_mem_usage=True, attn_implementation=ATTN_IMPL, torch_dtype=torch.bfloat16) if LOAD_MODE == “int8”: kwargs[“quantization_config”] = BitsAndBytesConfig(load_in_8bit=True) elif LOAD_MODE == “int4”: kwargs[“quantization_config”] = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=”nf4″, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True) print(“Loading processor…”) processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True) print(f”Loading model in {LOAD_MODE} (first run downloads ~70GB) …”) t0 = time.time() model = AutoModelForImageTextToText.from_pretrained(MODEL_ID, **kwargs); model.eval() print(f”Loaded in {time.time()-t0:.0f}s | VRAM used: {torch.cuda.memory_allocated()/1e9:.1f} GB”) SAMPLING = { “thinking_general”: dict(temperature=1.0, top_p=0.95, top_k=20, presence_penalty=1.5), “thinking_coding”: dict(temperature=0.6, top_p=0.95, top_k=20, presence_penalty=0.0), “instruct_general”: dict(temperature=0.7, top_p=0.80, top_k=20, presence_penalty=1.5), “instruct_reason”: dict(temperature=1.0, top_p=1.00, top_k=40, presence_penalty=2.0), } THINK_OPEN, THINK_CLOSE = “<think>”, “</think>” def split_thinking(text: str): if THINK_OPEN in text and THINK_CLOSE in text: a = text.index(THINK_OPEN) + len(THINK_OPEN); b = text.index(THINK_CLOSE) return text[a:b].strip(), text[b + len(THINK_CLOSE):].strip() if THINK_CLOSE in text: b = text.index(THINK_CLOSE) return text[:b].strip(), text[b + len(THINK_CLOSE):].strip() return “”, text.strip() We set up the full environment required to run Qwen 3.6-35B-A3B in Google Colab and installed all supporting libraries for quantization, multimodal processing, retrieval, and schema validation. We then probe the available GPU, dynamically select the loading mode based on VRAM, and configure the attention backend so the model runs as efficiently as possible on the given hardware. After that, we load the processor and model from Hugging Face and define the core sampling presets and the thinking-splitting utility, which lay the foundation for all later interactions. Copy CodeCopiedUse a different Browser class QwenChat: def __init__(self, model, processor, system=None, tools=None): self.model, self.processor = model, processor self.tokenizer = processor.tokenizer self.history: list[dict] = [] if system: self.history.append({“role”: “system”, “content”: system}) self.tools = tools def user(self, content): self.history.append({“role”:”user”,”content”:content}); return self def assistant(self, content, reasoning=””): m = {“role”:”assistant”,”content”:content} if reasoning: m[“reasoning_content”] = reasoning self.history.append(m); return self def tool_result(self, name, result): self.history.append({“role”:”tool”,”name”:name, “content”: result if isinstance(result, str) else json.dumps(result)}) return self def _inputs(self, enable_thinking, preserve_thinking): return self.processor.apply_chat_template( self.history, tools=self.tools, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors=”pt”, enable_thinking=enable_thinking, preserve_thinking=preserve_thinking, ).to(self.model.device) def generate(self, *, enable_thinking=True, preserve_thinking=False, max_new_tokens=2048, preset=”thinking_general”, stopping_criteria=None, append_to_history=True): inp = self._inputs(enable_thinking, preserve_thinking) cfg = SAMPLING[preset] gk = dict(**inp, max_new_tokens=max_new_tokens, do_sample=True, temperature=cfg[“temperature”], top_p=cfg[“top_p”], top_k=cfg[“top_k”], repetition_penalty=1.0, pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id) if stopping_criteria is not None: gk[“stopping_criteria”] = stopping_criteria with torch.inference_mode(): out = self.model.generate(**gk) raw = self.tokenizer.decode(out[0, inp[“input_ids”].shape[-1]:], skip_special_tokens=True) think, ans = split_thinking(raw) if append_to_history: self.assistant(ans, reasoning=think) return think, ans def stream(self, *, enable_thinking=True, preserve_thinking=False, max_new_tokens=2048, preset=”thinking_general”, on_thinking=None, on_answer=None): inp = self._inputs(enable_thinking, preserve_thinking) cfg = SAMPLING[preset] streamer = TextIteratorStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True) gk = dict(**inp, streamer=streamer, max_new_tokens=max_new_tokens, do_sample=True, temperature=cfg[“temperature”], top_p=cfg[“top_p”], top_k=cfg[“top_k”], pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id) t = threading.Thread(target=self.model.generate, kwargs=gk); t.start() buf, in_think = “”, enable_thinking think_text, answer_text = “”, “” for piece in streamer: buf += piece if in_think: if THINK_CLOSE in buf: close_at = buf.index(THINK_CLOSE) resid = buf[:close_at] if on_thinking: on_thinking(resid[len(think_text):]) think_text = resid buf = buf[close_at + len(THINK_CLOSE):] in_think = False if buf and on_answer: on_answer(buf) answer_text = buf; buf = “” else: if on_thinking: on_thinking(piece) think_text += piece else: if on_answer: on_answer(piece) answer_text += piece t.join() self.assistant(answer_text.strip(), reasoning=think_text.strip()) return think_text.strip(), answer_text.strip() def save(self, path): with open(path, “w”) as f: json.dump({“history”: self.history, “tools”: self.tools}, f, indent=2) @classmethod def load(cls, model, processor, path): with open(path) as f: data = json.load(f) c = cls(model, processor, tools=data.get(“tools”)) c.history = data[“history”]; return c class ThinkingBudget(StoppingCriteria): def __init__(self, tokenizer, budget: int): self.budget = budget self.open_ids = tokenizer.encode(THINK_OPEN, add_special_tokens=False) self.close_ids = tokenizer.encode(THINK_CLOSE, add_special_tokens=False) self.start = None def _find(self, seq, needle): n = len(needle) for i in range(len(seq)-n+1): if seq[i:i+n] == needle: return i return None def __call__(self, input_ids, scores, **kwargs): seq = input_ids[0].tolist() if self.start is None: idx = self._find(seq, self.open_ids) if idx is not None: self.start = idx + len(self.open_ids) return False if self._find(seq[self.start:], self.close_ids) is not None: return False return (len(seq) – self.start) >= self.budget TOOL_CALL_RE = re.compile(r”<tool_call>s*({.*?})s*</tool_call>”, re.S) def run_calculate(expr: str) -> str: if any(c not in “0123456789+-*/().% ” for c in expr): return json.dumps({“error”:”illegal chars”}) try: return json.dumps({“result”: eval(expr, {“__builtins__”: {}}, {})}) except Exception as e: return json.dumps({“error”: str(e)}) _DOCS = { “qwen3.6”: “Qwen3.6-35B-A3B is a 35B MoE with 3B active params and 262k native context.”, “deltanet”: “Gated DeltaNet is a linear-attention variant used in Qwen3.6’s hybrid layers.”, “moe”: “Qwen3.6 uses 256 experts with 8 routed + 1 shared per token.”, } def run_search_docs(q): hits = [v for k,v in _DOCS.items() if k in q.lower()] return json.dumps({“results”: hits or [“no hits”]}) def run_get_time(): import datetime as dt return json.dumps({“iso”: dt.datetime.utcnow().isoformat()+”Z”}) TOOL_FNS = { “calculate”: lambda a: run_calculate(a[“expression”]), “search_docs”: lambda a: run_search_docs(a[“query”]), “get_time”: lambda a: run_get_time(), } TOOLS_SCHEMA = [ {“type”:”function”,”function”:{“name”:”calculate”,”description”:”Evaluate arithmetic.”, “parameters”:{“type”:”object”,”properties”:{“expression”:{“type”:”string”}},”required”:[“expression”]}}}, {“type”:”function”,”function”:{“name”:”search_docs”,”description”:”Search internal docs.”, “parameters”:{“type”:”object”,”properties”:{“query”:{“type”:”string”}},”required”:[“query”]}}}, {“type”:”function”,”function”:{“name”:”get_time”,”description”:”Get current UTC time.”, “parameters”:{“type”:”object”,”properties”:{}}}}, ] We build the main QwenChat conversation manager, which handles message history, tool messages, chat template formatting, standard generation, streaming generation, and session persistence. We also define the ThinkingBudget stopping criterion to

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence Read Post »

AI, Committee, ข่าว, Uncategorized

Digging for clues about the North Pole’s past

In the past, even with an icebreaker and during peak melt season, getting to the North Pole wasn’t a sure bet. It took favorable winds to crack the frozen ocean surface, and ships had to fight through ice that had grown many meters thick over several winters. In the summer of 2025, though, Jochen Knies from the Arctic University of Norway, Tromsø, and his team met little resistance on their way to 90 degrees North with the research vessel Kronprins Haakon. The geologist “didn’t hear the usual grinding of ice” against the hull that he remembered from 1996, when he first reached the pole by ship. Instead, thin floes and large stretches of open water made for an easy, quiet passage. To him, it was “a reminder of how quickly the Arctic is changing.” Since the late 1970s, when satellite observations of the polar seas began, summer ice cover of the Arctic Ocean has declined by more than 40%. In less than half a century, a frozen area the size of the Mediterranean Sea has turned into blue open water with the rapid warming of the high northern latitudes. If this trend continues, there could soon be summers at the North Pole with no sea ice whatsoever. The last time this happened may have been some 120,000 years ago. But no one knows for certain. That’s why Knies and his colleagues, a team of researchers from Norway and Germany, set out from Svalbard to the central Arctic last August. The aim of their five-week mission was to determine whether this region had been ice-free in recent Earth history—and if so, when. As part of a €12.5 million project financed by the European Union, they also came to answer some questions about the future of the Arctic and beyond: How does the loss of sea ice affect the marine ecosystem? What are the consequences for ocean circulation and global climate? In search of clues, the expedition collected sediment cores up to 22 meters in length at different locations across the Arctic seafloor. Marine sediments are valuable climate archives that give scientists a window into bygone eras. Like diligent record keepers, they can log past water temperatures, sea-ice coverage, and the strength of ocean currents. These data are encrypted in the chemical and physical properties of the plankton remains and weathered rock deposited on the seabed.  The ship’s crew and researchers recover the sediment corer, a 25-meter-long steel pipe that is driven into the seafloor using a top weight of more than three metric tons.TIM KALVELAGE Together, the scientists pull out long plastic pipes filled with precious deep-sea mud.TIM KALVELAGE The pipes are cut into shorter pieces and split in half before being processed in the ship’s laboratories. Each of these one-meter sections covers several tens of thousands of years of Earth’s history.TIM KALVELAGE While sediment cores several meters long had been recovered on earlier expeditions in the central Arctic, there is no scientific consensus on how old the deposits actually are or whether sea ice ever completely disappeared in summer.  To decode the Arctic’s climate archive, Knies brought a team of experts from various disciplines onboard the Kronprins Haakon to dig deeper and obtain fresh samples they could subject to the latest analytical techniques.    Samples await paleomagnetic dating. Like tiny compass needles, iron-rich particles align with Earth’s shifting magnetic field as they settle on the seabed. By measuring their orientation, researchers can estimate the age of the different sediment layers.TIM KALVELAGE Under the microscope, PhD student Paulina Romel picks shells of unicellular foraminifera from a sample. The chemical composition of these microfossils can give clues about the age of the sediment and the surface water temperature when the organisms were still alive. “These are really cool creatures!” says Romel.TIM KALVELAGE Agathe Ollive, a geochemist from the Alfred Wegener Institute in Germany, takes water samples from a CTD rosette, an instrument package that measures conductivity (salinity) and temperature at various depths. She uses certain elements to trace the inflow of fresh water and seawater from rivers and adjacent ocean basins into the Arctic. “I didn’t expect there to be so little ice up here,” Ollive says. She is worried about how the Arctic will look 20 years from now.TIM KALVELAGE Some of this work was done while the researchers were still at sea. Now, at their home laboratories, they are finalizing their analysis of the seafloor samples. One important task is dating the sediments, which may be up to 2 million years old. The team uses a combination of methods to do this, including measuring magnetization, the decay of radioactive elements, and the exposure of mineral grains to sunlight before sinking to the depths. Once they can place them on a timeline, the materials in the cores will help researchers paint a picture of what the Arctic Ocean looked like in times that were warmer than today. For example, the presence or absence of the molecule IP25, which is produced exclusively by ice algae, could tell them how far the sea ice receded at a given time.  Toward the end of the expedition, the Kronprins Haakon passes this iceberg near the northeast coast of Greenland.TIM KALVELAGE At the end of the study, the team hopes to have data that could improve climate projections for a future ice-free “blue Arctic,” helping us understand how it could affect marine life and carbon storage, Atlantic Ocean circulation, or extreme weather events in Europe and North America.  Tim Kalvelage is a freelance science reporter based in Bremen, Germany, who focuses on climate, ocean, and polar research. He has been to the North Pole twice.

Digging for clues about the North Pole’s past Read Post »

AI, Committee, ข่าว, Uncategorized

The Download: turning down human noise, and LA’s stunning subway upgrade

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The noise we make is hurting animals. Can we learn to shut up? As human society has expanded, animals have started struggling to hear one another. For many birds, the noise has grown so loud that they’ve begun to sing with faster trills. Now, their mating calls aren’t as effective.  The growing hubbub can also increase bird-on-bird conflict, and entire species that can’t handle urban clamor simply leave town for good. But there are technological solutions to the noises hurting animals—and they could help humans, too. Read the full story. —Clive Thompson Los Angeles is finally going underground In May, a new subway segment will connect downtown Los Angeles to the Pacific Ocean. What today can be an hours-long drive through a busy, museum-­packed stretch of the city will be, if all goes well, a 25-minute train ride. The existence of subway stops in this part of town—known as Miracle Mile—is a technological triumph over geography and geology. Find out why. —Adam Rogers Both of these stories are from the next issue of our print magazine, which is all about nature. Subscribe now to read it when it lands tomorrow. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Apple’s Tim Cook is stepping down as CEOHardware chief John Ternus will take over from him in September. (CNN)+ Ternus’ defining challenge may be fixing Apple’s AI strategy. (CNBC)+ How does Cook compare with Apple’s other CEOs through the years? (NYT $) 2 Anthropic’s new Amazon deal escalates the compute war with OpenAIAnthropic will spend more than $100 billion on Amazon compute.(Axios $)+ OpenAI touted its compute advantage over Anthropic two weeks ago. (Bloomberg $)+ Here’s why the AI compute explosion has only just begun. (MIT Technology Review) 3 Silicon Valley is trying to get into the news businessThe latest addition is Andreessen Horowitz’s MTS. (The Information $)+ OpenAI recently bought a business talk show. (NPR)+ They join Elon Musk’s X and a new Peter Thiel-backed startup. (Axios) 4 The banking industry is scrambling to get access to Anthropic’s MythosAs regulators review the risks to financial services. (Reuters $)+ Germany’s central bank has called for wider access to Mythos. (Bloomberg $) 5 War memes are turning conflict into contentFueled by recommendation systems designed to keep you hooked. (Wired $)+ AI is turning the Iran conflict into theater. (MIT Technology Review) 6 AI is boosting worker productivity, but not their paychecksEmployees aren’t financially benefiting from their extra efficiency. (Quartz)+ New data sheds light on the current state of AI. (MIT Technology Review) 7 Amazon’s ambition to rival Starlink has hit a setbackAfter a Blue Origin rocket was grounded. (FT $) 8 Jeff Bezos’s AI lab has neared a $38 billion valuationIn an imminent $10 billion fundraising deal from investors. (FT $)+ The startup focuses on AI for engineering ‌and ⁠manufacturing. (Reuters $) 9 Scientific AI agents have got their own social networkWhere they share, debate, and discuss research papers. (Nature) 10 A Mars rover has discovered new “origin-of-life” moleculesThey suggest Mars wasn’t always a lifeless red desert. (Gizmodo) Quote of the day “He’s been a transformational Apple CEO that’s always had a steady hand at the wheel. I think that will be his legacy. He had massive shoes to step into, and he was the ​right person for the job. That’s the ​way he’ll be remembered.” One More Thing MIKE MCQUADE The race to save our online lives from a digital dark age There is more stuff being created now than at any time in history, but our data is more fragile than ever. One day in the future, YouTube’s videos may permanently disappear. Facebook—and your uncle’s holiday posts—will vanish.  For many archivists, alarm bells are ringing. Across the world, they’re scraping up defunct websites, saving at-risk data collections, and developing data storage technologies that could last thousands of years.  Their work raises complex questions. What is important to us? How do we decide what to keep—and what do we let go? Read our story on the thorny problems of digital preservation. —Niall Firth We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Apple’s forgotten co-founder recently shared his story of the company’s early days.+ Witness a rare underwater volcanic eruption in the Solomon Islands.+ Learn what makes Shakespeare’s writing so effective in this masterful analysis.+ An Artemis II astronaut shared a stunning iPhone video showing Earth disappear behind the Moon at 8x zoom.

The Download: turning down human noise, and LA’s stunning subway upgrade Read Post »

AI, Committee, ข่าว, Uncategorized

Chinese tech workers are starting to train their AI doubles–and pushing back

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters.  Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with an AI agent, went viral on Chinese social media. Though the project was created as a spoof, it struck a nerve among tech workers, a number of whom told MIT Technology Review that their bosses are encouraging them to document their workflows in order to automate specific tasks and processes using AI agent tools like OpenClaw or Claude Code.  To set up Colleague Skill, a user names the coworker whose tasks they want to replicate and adds basic profile details. The tool then automatically imports chat history and files from Lark and DingTalk, both popular workplace apps in China, and generates reusable manuals describing that coworker’s duties—and even their unique quirks—for an AI agent to replicate.  Colleague Skill was created by Tianyi Zhou, who works as an engineer at the Shanghai Artificial Intelligence Laboratory. Earlier this week he told Chinese outlet Southern Metropolis Daily that the project was started as a stunt, prompted by AI-related layoffs and by the growing tendency of companies to ask employees to automate themselves. He didn’t respond to requests for further comment. Internet users have found humor in the idea behind the tool, joking about automating their coworkers before themselves. However, Colleague Skill’s virality has sparked a lot of debate about workers’ dignity and individuality in the age of AI. After seeing Colleague Skill on social media, Amber Li, 27, a tech worker in Shanghai, used it to recreate a former coworker as a personal experiment. Within minutes, the tool created a file detailing how that person did their job. “It is surprisingly good,” Li says. “It even captures the person’s little quirks, like how they react and their punctuation habits.” With this skill, Li can use an AI agent as a new “coworker” that helps debug her code and replies instantly. It felt uncanny and uncomfortable, Li says.  Even so,  replacing coworkers with agents could become a norm. Since OpenClaw became a national craze, bosses in China have been pushing tech workers to experiment with agents.  Although AI agents can take control of your computer, read and summarize news, reply to emails, and book restaurant reservations for you, tech workers on the ground say their utility has so far proven to be limited in business contexts. Asking employees to make manuals describing the minutiae of their day-to-day jobs the way Colleague Skill does is one way to help bridge that gap.  Hancheng Cao, an assistant professor at Emory University who studies AI and work, believes that companies have good reasons to push employees to create work blueprints like these, beyond simply following a trend. “Firms gain not only internal experience with the tools, but also richer data on employee know-how, workflows, and decision patterns. That helps companies see which parts of work can be standardized or codified into systems, and which still depend on human judgment,” he says. To employees, though, making agents or even blueprints for them can feel strange and alienating. One software engineer, who spoke with MIT Technology Review anonymously because of concerns about their job security, trained an AI (not Colleague Skill) on their workflow and found that the process felt reductive—as if their work had been flattened into modules in a way that made them easier to replace. On social media, workers have turned to bleak humor to express similar feelings. In one comment on Rednote, a user wrote that “a cold farewell can be turned into warm tokens,” quipping that if they use Colleague Skill to distill their coworkers into tasks first, they themselves might survive a little longer. The push for creating agents has also spurred clever countermeasures. Irritated by the idea of reducing a person to a skill, Koki Xu, 26 an AI product manager in Beijing, published an “anti-distillation” skill on GitHub on April 4. The tool, which took Xu about an hour to build, is designed to sabotage the process of creating workflows for agents. Users can choose between light, medium, and heavy sabotage modes depending on how closely their boss is observing the process, and the agent rewrites the material into generic, non-actionable language that would produce a less useful AI stand-in. A video Xu posted about the project went viral, drawing more than 5 million likes across platforms. Xu told MIT Technology Review that she has been following the Colleague Skill trend from the start and that it has made her think about alienation, disempowerment, and broader implications for labor. “I originally wanted to write an op-ed, but decided it would be more useful to make something that pushes back against it,” she says. Xu, who has undergraduate and master’s degrees in law, said the trend also raises legal questions. While a company may be able to argue that work chat histories and materials created on a work laptop are corporate property, a skill like this can also capture elements of personality, tone, and judgment, making ownership much less clear. She said she hopes Colleague Skill prompts more discussion about how to protect workers’ dignity and identity in the age of AI. “I believe it’s important to keep up with these trends so we (employees) can participate in shaping how they are used,” she says. Xu herself is an avid AI adopter, with seven OpenClaw agents set up across her personal and work devices. Li, the tech worker in Shanghai, says her company has not yet found a way to replace actual workers with AI tools, largely because they remain unreliable and require constant supervision. “I don’t feel like my job is immediately at risk,” she says. “But I do feel that my value is being cheapened, and I don’t know what to do about it.”

Chinese tech workers are starting to train their AI doubles–and pushing back Read Post »

AI, Committee, ข่าว, Uncategorized

OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders

Cybersecurity has always had a dual-use problem: the same technical knowledge that helps defenders find vulnerabilities can also help attackers exploit them. For AI systems, that tension is sharper than ever. Restrictions intended to prevent harm have historically created friction for good-faith security work, and it can be genuinely difficult to tell whether any particular cyber action is intended for defensive usage or to cause harm. OpenAI is now proposing a concrete structural solution to that problem: verified identity, tiered access, and a purpose-built model for defenders. OpenAI team announced that it is scaling up its Trusted Access for Cyber (TAC) program to thousands of verified individual defenders and hundreds of teams responsible for defending critical software. The main focus of this expansion is the introduction of GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned specifically for defensive cybersecurity use cases. What Is GPT-5.4-Cyber and How Does It Differ From Standard Models? If you’re an AI engineer or data scientist who has worked with large language models on security tasks, you’re likely familiar with the frustrating experience of a model refusing to analyze a piece of malware or explain how a buffer overflow works — even in a clearly research-oriented context. GPT-5.4-Cyber is designed to eliminate that friction for verified users. Unlike standard GPT-5.4, which applies blanket refusals to many dual-use security queries, GPT-5.4-Cyber is described by OpenAI as ‘cyber-permissive’ — meaning it has a deliberately lower refusal threshold for prompts that serve a legitimate defensive purpose. That includes binary reverse engineering, enabling security professionals to analyze compiled software for malware potential, vulnerabilities, and security robustness without access to the source code. Binary reverse engineering without source code is a significant capability unlock. In practice, defenders routinely need to analyze closed-source binaries — firmware on embedded devices, third-party libraries, or suspected malware samples — without having access to the original code. That model was described as a GPT-5.4 variant purposely fine-tuned for additional cyber capabilities, with fewer capability restrictions and support for advanced defensive workflows including binary reverse engineering without source code. There are also hard limits. Users with trusted access must still abide by OpenAI’s Usage Policies and Terms of Use. The approach is designed to reduce friction for defenders while preventing prohibited behavior, including data exfiltration, malware creation or deployment, and destructive or unauthorized testing. This distinction matters: TAC lowers the refusal boundary for legitimate work, but does not suspend policy for any user. There are also deployment constraints. Use in zero-data-retention environments is limited, given that OpenAI has less visibility into the user, environment, and intent in those configurations — a tradeoff the company frames as a necessary control surface in a tiered-access model. For dev teams accustomed to running API calls in Zero-Data-Retention mode, this is an important implementation constraint to plan around before building pipelines on top of GPT-5.4-Cyber. The Tiered Access Framework: How TAC Actually Works TAC is not a checkbox feature — it is an identity-and-trust-based access framework with multiple tiers. Understanding the structure matters if you or your organization plans to integrate these capabilities. The access process runs through two paths. Individual users can verify their identity at chatgpt.com/cyber. Enterprises can request trusted access for their team through an OpenAI representative. Customers approved through either path gain access to model versions with reduced friction around safeguards that might otherwise trigger on dual-use cyber activity. Approved uses include security education, defensive programming, and responsible vulnerability research. TAC customers who want to go further and authenticate as cyber defenders can express interest in additional access tiers, including GPT-5.4-Cyber. Deployment of the more permissive model is starting with a limited, iterative rollout to vetted security vendors, organizations, and researchers. That means OpenAI is now drawing at least three practical lines instead of one: there is baseline access to general models; there is trusted access to existing models with less accidental friction for legitimate security work; and there is a higher tier of more permissive, more specialized access for vetted defenders who can justify it. The framework is grounded in three explicit principles. The first is democratized access: using objective criteria and methods, including strong KYC and identity verification, to determine who can access more advanced capabilities, with the goal of making those capabilities available to legitimate actors of all sizes, including those protecting critical infrastructure and public services. The second is iterative deployment — OpenAI updates models and safety systems as it learns more about the benefits and risks of specific versions, including improving resilience to jailbreaks and adversarial attacks. The third is ecosystem resilience, which includes targeted grants, contributions to open-source security initiatives, and tools like Codex Security. How the Safety Stack Is Built: From GPT-5.2 to GPT-5.4-Cyber It’s worth understanding how OpenAI has structured its safety architecture across model versions — because TAC is built on top of that architecture, not instead of it. OpenAI began cyber-specific safety training with GPT-5.2, then expanded it with additional safeguards through GPT-5.3-Codex and GPT-5.4. A critical milestone in that progression: GPT-5.3-Codex is the first model OpenAI is treating as High cybersecurity capability under its Preparedness Framework, which requires additional safeguards. These safeguards include training the model to refuse clearly malicious requests like stealing credentials. The Preparedness Framework is OpenAI’s internal evaluation rubric for classifying how dangerous a given capability level could be. Reaching ‘High’ under that framework is what triggered the full cybersecurity safety stack being deployed — not just model-level training, but an additional automated monitoring layer. In addition to safety training, automated classifier-based monitors detect signals of suspicious cyber activity and route high-risk traffic to a less cyber-capable model, GPT-5.2. In other words, if a request looks suspicious enough to exceed a threshold, the platform doesn’t just refuse — it silently reroutes the traffic to a safer fallback model. This is a key architectural detail: safety is enforced not only inside model weights, but also at the infrastructure routing layer. GPT-5.4-Cyber extends this stack further upward — more permissive for verified defenders, but wrapped in

OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders Read Post »

AI, Committee, ข่าว, Uncategorized

Colossal Biosciences said it cloned red wolves. Is it for real?

If you want to capture something wolflike, it’s best to embark before dawn. So on a morning this January, with the eastern horizon still pink-hued, I drove with two young scientists into a blanket of fog. Forty miles to the west, the industrial sprawl of Houston spawned a golden glow. Tanner Broussard’s old Toyota Tacoma bumped over the levee-top roads as killdeer, flushed from their rest, flew across the beams of his headlights.  Broussard peered into the darkness, looking for traps. “I have one over here,” he said, slowing slightly. A master’s student at McNeese State University, he was quiet and contemplative, his bearded face half-hidden under a black ball cap. “Nothing on it,” he said, blandly. The truck rolled on. Wolves and their relations—dogs, jackals, coyotes, and so on—are classed in the family Canidae, and the canid that dominated this landscape in eastern Texas was once the red wolf. But as soon as white settlers arrived on the continent, Canis rufus found itself under siege. The war on wolves “lasted 200 years,” federal researchers once put it, in a surprisingly evocative report. “The wolf lost.” By 1980, the red wolf was declared extinct in the wild, its population reduced to a small captive breeding population. Still, for decades afterward, people noted that strange wolflike creatures persisted along the Gulf Coast. Finally, in 2018, scientists confirmed that some local coyotes were more than coyotes: They were taller, long-legged, their coats shaded with hints of cinnamon. These animals contained relict red wolf genes. They became known as the ghost wolves. Broussard grew up in southwest Louisiana, watching coyotes trot across his parents’ ranch. The thrilling fact that these might have been not just coyotes but something more? That reset a rambling academic career. In 2023, Broussard had recently returned to college after a seven-year pause, and his budding obsession with wolves narrowed his focus. Before he finished his bachelor’s degree, he began to supply field data to a prominent conservation nonprofit. The American red wolf, Canis rufus, is the most endangered wolf species in the world. This pup is one of four animals said to be clones of this native North American species.COURTESY OF COLOSSAL BIOSCIENCES Then, last year, just before he began his master’s studies, he woke to disconcerting news. A startup called Colossal Biosciences claimed to have resuscitated the dire wolf, a large canid that went extinct more than 10,000 years ago. Pundits debated the utility of the project and whether the clones—technically, gray wolves with some genetic tweaks—could really be called dire wolves. But what mattered to Broussard was Colossal’s simultaneous announcement that it had cloned four red wolves.   “That surprised pretty much everybody in the wolf community,” Broussard said as we toured the wildlife refuge where he’d set his traps. The Association of Zoos and Aquariums runs a program that sustains red wolves through captive breeding; its leadership had no idea a cloning project was underway. Nor did ecologist Joey Hinton, one of Broussard’s advisors, who had trapped the canids Colossal used to source the DNA for its clones. Some of Hinton’s former partners were collaborating with the company, but he didn’t know that clones were on the table. There was already disagreement among scientists about the entire idea of de-extinction. Now Colossal had made these mystery clones, whose location was kept secret. Even the purpose of the clones was murky to some scientists; just how they might restore red wolf populations was unclear.  Red wolves had always been a contentious species, hard for scientists to pin down. The red wolf research community was already marked by the inevitable interpersonal tensions of a small and passionate group. Now Colossal’s clones became one more lightning rod. Perhaps the most curious question, though, was whether the company had cloned red wolves at all.  You can think of the red wolf as the wolf of the East—an apex predator that once roamed the forests and grasslands and marshes everywhere from Texas to Illinois to New York. Smaller than a gray wolf (though a good bit larger than a coyote), this was a sleek beast, with, according to one old field guide, a “cunning fox-like appearance”: long body, long legs; clearly built to run across long distances. Its coat was smooth and flat and came in many colors: a reddish tone that comes out in the right light, yes, but also, despite the name, white and gray and, in certain regions and populations, an ominous all black. We know these details thanks to a few notes from early naturalists. As writer Andrew Moore writes in his new book, The Beasts of the East, by the time a mammalogist decided to class these eastern wolves as a standalone species in the 1930s, the red wolf had been extirpated from the East Coast and was rapidly dwindling across its range. Working with remnant skulls and other specimens, the mammalogist chose the name red wolf—which was later enshrined with the Latinate Canis rufus—because that’s what these wolves were called in the last place they survived.  The looming extinction of the red wolf turned out to be a good thing for coyotes. Canis latrans is a distant relative of wolves that split away from a common ancestor thousands of years ago and might be considered, as one canid biologist put it to me, the “wolf of the Anthropocene.” Their smaller size means they need less food and can survive in smaller and more fragmented territory, the kind that modern humans tend to build.  The last red wolves, which lived in Louisiana and Texas, decided a strange and smaller mate was preferable to no mate at all. Red wolves had kept coyotes out of eastern America, outcompeting them for prey. Now, as the wolves declined, the coyotes began to slip in. The last red wolves, which lived in Louisiana and Texas, decided a strange and smaller mate was preferable to no mate at all. Soon the territory became a genetic jumble, home to both

Colossal Biosciences said it cloned red wolves. Is it for real? Read Post »

AI, Committee, ข่าว, Uncategorized

The Download: murderous ‘mirror’ bacteria, and Chinese workers fighting AI doubles

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. No one’s sure if synthetic mirror life will kill us all In February 2019, a group of scientists proposed a high-risk, cutting-edge, irresistibly exciting idea that the National Science Foundation should fund: making “mirror” bacteria. These lab-created microbes would be organized like ordinary bacteria, but their proteins and sugars would be mirror images of those found in nature. Researchers believed they could reveal new insights into building cells, designing drugs, and even the origins of life. But now, many of them have reversed course. They’ve become convinced that mirror organisms could trigger a catastrophic event threatening every form of life on Earth. Find out why they’re ringing alarm bells. —Stephen Ornes This story is from the next issue of our print magazine, which is all about nature. Subscribe now to read it when it lands this Wednesday. Chinese tech workers are starting to train their AI doubles—and pushing back Earlier this month, a GitHub project called Colleague Skill struck a nerve by claiming to “distill” a worker’s skills and personality—and replicate them with an AI agent. Though the project was a spoof, it prompted a wave of soul-searching among otherwise enthusiastic early adopters. A number of tech workers told MIT Technology Review that their bosses are already encouraging them to document their workflows for automation via tools like OpenClaw. Many now fear that they are being flattened into code and losing their professional identity. In response, some are fighting back with tools designed to sabotage the automation process. Read the full story. —Caiwei Chen The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The White House and Anthropic are working toward a compromiseThe Trump administration says they had a “productive meeting.” (Reuters $)+ Trump had ordered US agencies to phase out Anthropic’s tech. (Guardian)+ Despite the blacklist, the NSA is using Anthropic’s new Mythos model. (Axios) 2 Palantir has unveiled a manifesto calling for universal national serviceWhile denouncing inclusivity and “regressive” cultures. (TechCrunch)+ It’s a summary of CEO Alex Karp’s book “The Technological Republic.” (Engadget)+ One critic called the book “a piece of corporate sales material.“ (Bloomberg $) 3 Germany’s chancellor and largest company want looser AI rules Chancellor Merz said industrial AI needs ‌more regulatory freedom. (Reuters $)+ Siemens says it plans to shift investments to the US if EU rules don’t change. (Bloomberg $)+ Fractures over AI regulation are also emerging in the US. (MIT Technology Review)   4 Nvidia’s once-tight bond with gamers is cracking over AI  Consumer graphics cards are no longer the priority. (CNBC)+ But generative AI could reinvent what it means to play. (MIT Technology Review) 5 Insurers are trying to exclude AI-related harms from their coverageAnd escape legal liability for AI’s mistakes. (FT $)+ AI images are being used in insurance scams. (BBC) 6 AI is about to make the global e-waste crisis much worseAnd most of the trash will end up in non-Western countries. (Rest of World)+ Here’s what we can do about it. (MIT Technology Review) 7 Tinder and Zoom have partnered with Sam Altman’s eye-scanning firmTo offer a “proof of humanity” badge to users. (BBC) 8 Islamist insurgents in West Africa are driving surging demand for dronesA Nigerian UAV startup is opening its first factory abroad in Ghana. (Bloomberg $) 9 Hundreds of fake pro-Trump AI influencers are flooding social mediaIn an apparent bid to hook conservative voters. (NYT) 10 A Chinese humanoid has smashed the human half-marathon recordDespite crashing into a railing near the end of the race. (NBC News)+ Chinese tech firm Honor swept the podium spots. (Engadget)+ Last year, humans won the race by a mile. (CNN) Quote of the day “This is the only issue where you’ve got Steve Bannon and Ralph Nader, Glenn Beck and Bernie Sanders fighting for the same thing.” —Ben Cumming, head of communications at the AI safety nonprofit Future of Life Institute, tells the Washington Post that diverse public figures are endorsing a declaration of AI policy priorities. One More Thing NASA The great commercial takeover of low Earth orbit The International Space Station will be decommissioned as soon as 2030, but the story of America in low Earth orbit (LEO) will continue.  Using lessons from the ISS, NASA has partnered with private companies to develop new commercial space stations for research, manufacturing, and tourism. If they are successful, these businesses will bring about a new era of space exploration: private rockets flying to private destinations. They will also demonstrate a new model in which NASA builds infrastructure and the private sector takes it from there—freeing the agency to explore deeper and deeper into space. Read the full story. —David W. Brown We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Bask in thisadorable test of a dog’s devotion.+ This vocal pitch trainer improves your singing straight from your browser.+ Master international etiquette with this interactive guide to the world’s cultures.+ Explore the networks of public figures with this intriguing interactive graph. 

The Download: murderous ‘mirror’ bacteria, and Chinese workers fighting AI doubles Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at นโยบายความเป็นส่วนตัว and manage your privacy settings by clicking Settings.

ตั้งค่าความเป็นส่วนตัว

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

ยอมรับทั้งหมด
จัดการความเป็นส่วนตัว
  • เปิดใช้งานตลอด

บันทึกการตั้งค่า
th