AI Archives - 第2页共223页

Prompt Engineering vs Loop Engineering vs Graph Engineering: What Changes at Each Layer

admin NU / 7 月 29, 2026

Three terms now compete for the same line in AI engineering job descriptions. Prompt engineering is the established one. Loop engineering entered the AI vocabulary in late 2025 and dominated developer discussion through June 2026. Graph engineering followed roughly six weeks later. They get used interchangeably. Should they be? The three are not competing techniques. They are three different units of control, stacked. A prompt controls one model response. A loop controls one agent’s behavior cycle. A graph controls the organization of many agents. Each layer preserves the layer beneath it. A prompt does not disappear once a loop is built around it. it stops being the thing typed by hand. This article separates the three: what gets designed at each layer, what the published claim says about when the higher layers pay for themselves, and where the skepticism is warranted. One task, three layers Interactive explainer · 01 One task, three layers — watch what changes The three terms are not competing techniques. They are three different units of control. Here is the same job handled at each layer, step by step. Orange dots mark the moments a human is required. Task Fix the failing tests in the auth module, then open a pull request. Run all three Reset Layer 1 Prompt engineering You control one model response. You are the loop. Your turns 0 · Model calls 0 Layer 2 Loop engineering You control one agent’s cycle. The loop does the prompting. Your turns 0 · Model calls 0 Layer 3 Graph engineering You control how many agents are organised. Your turns 0 · Nodes 0 · Parallel 0 What actually differs Prompt Loop Graph Unit of control One model response One agent’s behaviour cycle An organisation of agents What you write Instructions, examples, output format Trigger, tools, stop condition, retry budget Nodes, edges, shared state, failure routes Who says “again?” A human, every turn A verifier the loop calls itself A routing rule written in advance Where it breaks Ambiguous or overstuffed instruction It cannot tell done from stuck Context never crossed an edge you forgot to draw Enough when One shot, a person reads the result Repetitive, machine-checkable, one domain Cross-domain work with parallel branches Illustrative walkthrough · step counts are not benchmarks Built by Marktechpost The stack, in order Each step in the progression was named in practice before it appeared in vendor documentation. Prompt engineering covers writing and structuring the instruction for a single call. Anthropic’s guidance is to separate a system prompt into labeled sections — background information, instructions, tool guidance, output description — delineated with XML tags or Markdown headers. The recommendation is to supply the minimal set of information that fully specifies the expected behavior. Minimal does not mean short. Context engineering came next. Anthropic describes it as the natural progression of prompt engineering. The question moves from finding the right words to deciding what configuration of tokens belongs in the window at all. Context is a finite resource, and the engineering problem is optimizing the utility of those tokens against model constraints. Harness engineering covers the environment a single agent runs inside: files, tools, memory, feedback. Loop engineering sits one floor above the harness. A June 2026 arXiv paper on agentic AI in building engineering, Buildrix, sets out the same four-step progression explicitly — prompt, then context, then harness, then loop — with the final layer defining how a system repeatedly observes, acts, verifies and recovers. Graph engineering is the newest label and the least settled. One enterprise writeup notes that the term’s provenance is unresolved and that it collides with an older knowledge-graph usage of the same word. The underlying practice, graph-based orchestration, has a documented lineage in multi-agent systems research. Layer 1: Prompt Engineering The defining assumption is that a human is present at every iteration. A prompt is written, the model responds, the output is judged, the prompt is revised. That assumption is what breaks. High volume. Multi-step tasks. No human available to grade the output. Results that feed the next step automatically. Any one of these, and the prompt alone stops being sufficient. Nothing about the prompt got worse. The surrounding conditions changed. Prompt engineering also does not vanish inside the higher layers. Anthropic’s multi-agent research writeup reports that prompt engineering was the primary lever for fixing coordination failures. Early versions spawned 50 subagents for simple queries, and the fix was prompting rather than topology. Prompt engineering, explained Interactive explainer · 02 · Layer 1 What prompt engineering is Prompt engineering is designing the text of a single call to a model. One input, one forward pass, one output — and a human who reads the result and decides whether to run it again. That last part is the assumption everything above this layer exists to remove. Prompt what you write Model one forward pass Response one output you you judge it, then rewrite the prompt yourself Swipe the diagram sideways → The cycle exists here too — but you are the one closing it, every single turn. Anatomy of a prompt Tap a section. Anthropic’s guidance is to delineate parts with XML tags or Markdown headers rather than writing one undifferentiated block. <background_information>What the model needs to know before it starts <instructions>The task itself, and the rules for doing it ## Tool guidanceWhich tool to reach for, and when ## Output descriptionThe exact shape the answer should take <examples>A few cases that cover the edges you actually hit What this section does Background information The assumption that breaks Prompt engineering assumes a human is present at every iteration. Tap a condition to see what happens when that stops being true. high volume multi-step task nobody available to grade it output feeds the next step Layer 1 of 3 · the layer most tasks stop at Built by Marktechpost Layer 2: loop engineering The framing is that a coding agent is a brute-force tool for finding solutions. The craft is designing the goal, the

Prompt Engineering vs Loop Engineering vs Graph Engineering: What Changes at Each Layer Read Post »

AI, Committee, 新闻, Uncategorized

Deploying a 1-Bit Bonsai-27B Model with PrismML llama.cpp and OpenAI-Compatible Local Inference Workflows

admin NU / 7 月 28, 2026

In this tutorial, we deploy the 1-bit Bonsai-27B language model using the PrismML fork of llama.cpp, which provides the specialized CUDA kernels required to decode the model’s Q1_0_g128 GGUF quantization format. We begin by validating the GPU runtime, installing the required Python dependencies, compiling the CUDA-enabled inference binaries, and downloading the compressed model weights from Hugging Face. We then test the model through llama-cli, launch an OpenAI-compatible local inference server, and interact with it through a reusable Python client that supports standard completions, streamed responses, multi-turn conversations, and code generation. We also examine optional configurations for throughput benchmarking, quantized key-value caching, long-context inference, speculative decoding, and multimodal extensions. Copy CodeCopiedUse a different Browser import os import sys import time import json import shutil import subprocess import multiprocessing WORK_DIR = “/content” REPO_URL = “https://github.com/PrismML-Eng/llama.cpp” REPO_DIR = os.path.join(WORK_DIR, “llama.cpp”) BUILD_DIR = os.path.join(REPO_DIR, “build”) BIN_DIR = os.path.join(BUILD_DIR, “bin”) HF_REPO = “prism-ml/Bonsai-27B-gguf” MODEL_FILE = “Bonsai-27B-Q1_0.gguf” MODEL_PATH = os.path.join(WORK_DIR, MODEL_FILE) SERVER_HOST = “127.0.0.1” SERVER_PORT = 8080 SERVER_URL = f”http://{SERVER_HOST}:{SERVER_PORT}” GEN_PARAMS = {“temperature”: 0.7, “top_p”: 0.95, “top_k”: 20} CTX_SIZE = 8192 N_GPU_LAYERS = 99 USE_KV_Q4 = False def sh(cmd, check=True, **kw): “””Run a shell command, streaming output to the notebook.””” print(f”n$ {cmd}”) return subprocess.run(cmd, shell=True, check=check, **kw) print(“=” * 70) print(“[1/7] Checking environment”) print(“=” * 70) gpu = subprocess.run(“nvidia-smi –query-gpu=name,memory.total –format=csv,noheader”, shell=True, capture_output=True, text=True) if gpu.returncode != 0: sys.exit(“No GPU detected. In Colab: Runtime -> Change runtime type -> GPU (T4).”) print(f”GPU detected: {gpu.stdout.strip()}”) print(“Bonsai-27B needs only ~5.2 GB peak at 4K context — any Colab GPU works.”) sh(“pip -q install huggingface_hub requests”) We configure the Colab workspace, model repository, server endpoint, inference parameters, context size, and GPU offloading settings required throughout the tutorial. We define a reusable shell-command function and verify that the runtime exposes a compatible NVIDIA GPU before continuing. We then install the Hugging Face Hub and HTTP client dependencies needed for model retrieval and API communication. Copy CodeCopiedUse a different Browser print(“=” * 70) print(“[2/7] Building PrismML llama.cpp fork with CUDA (cached after 1st run)”) print(“=” * 70) if not os.path.isdir(REPO_DIR): sh(f”git clone –depth 1 {REPO_URL} {REPO_DIR}”) else: print(“Repo already cloned — skipping.”) cli_bin = os.path.join(BIN_DIR, “llama-cli”) server_bin = os.path.join(BIN_DIR, “llama-server”) bench_bin = os.path.join(BIN_DIR, “llama-bench”) if not (os.path.exists(cli_bin) and os.path.exists(server_bin)): jobs = multiprocessing.cpu_count() sh(f”cmake -S {REPO_DIR} -B {BUILD_DIR} -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release”) sh(f”cmake –build {BUILD_DIR} -j{jobs} –target llama-cli llama-server llama-bench”) else: print(“Binaries already built — skipping.”) We clone the PrismML fork of llama.cpp, which provides the specialized kernels required for the model’s Q1_0_g128 quantization format. We configure a CUDA-enabled release build with CMake and compile the command-line, server, and benchmarking executables. We also reuse previously generated binaries when they already exist, reducing repeated setup time in the same Colab session. Copy CodeCopiedUse a different Browser print(“=” * 70) print(“[3/7] Downloading weights from Hugging Face”) print(“=” * 70) from huggingface_hub import hf_hub_download if not os.path.exists(MODEL_PATH): downloaded = hf_hub_download(repo_id=HF_REPO, filename=MODEL_FILE, local_dir=WORK_DIR) print(f”Downloaded to: {downloaded}”) else: print(“Model already on disk — skipping.”) print(f”Model size on disk: {os.path.getsize(MODEL_PATH) / 1e9:.2f} GB”) We connect to the Hugging Face Hub and download the Bonsai-27B GGUF model into the Colab workspace. We skip the transfer when the model file is already available locally, allowing subsequent runs to proceed more efficiently. We then calculate and display the deployed model size to confirm that the compressed weights are stored correctly. Copy CodeCopiedUse a different Browser print(“=” * 70) print(“[4/7] Smoke test with llama-cli”) print(“=” * 70) sh( f'{cli_bin} -m {MODEL_PATH} ‘ f’-p “Explain in two sentences why 1-bit quantization saves memory.” ‘ f’-n 128 -ngl {N_GPU_LAYERS} ‘ f’–temp {GEN_PARAMS[“temperature”]} ‘ f’–top-p {GEN_PARAMS[“top_p”]} –top-k {GEN_PARAMS[“top_k”]} ‘ f’-no-cnv 2>/dev/null’, check=False, ) print(“=” * 70) print(“[5/7] Starting llama-server (OpenAI-compatible API)”) print(“=” * 70) import requests kv_flags = “-ctk q4_0 -ctv q4_0” if USE_KV_Q4 else “” server_cmd = ( f”{server_bin} -m {MODEL_PATH} ” f”–host {SERVER_HOST} –port {SERVER_PORT} ” f”-ngl {N_GPU_LAYERS} -c {CTX_SIZE} {kv_flags}” ) print(f”$ {server_cmd} (background)”) server_log = open(os.path.join(WORK_DIR, “server.log”), “w”) server_proc = subprocess.Popen(server_cmd, shell=True, stdout=server_log, stderr=server_log) for _ in range(120): try: if requests.get(f”{SERVER_URL}/health”, timeout=2).status_code == 200: print(“Server is up.”) break except requests.exceptions.RequestException: pass time.sleep(2) else: server_proc.kill() sys.exit(“Server failed to start — check /content/server.log”) We perform a command-line smoke test to verify that the compiled runtime can load the quantized model and generate a valid response. We then start llama-server with full GPU layer offloading, the selected context window, and optional quantized KV-cache settings. We repeatedly query the health endpoint until the OpenAI-compatible inference service becomes ready for client requests. Copy CodeCopiedUse a different Browser print(“=” * 70) print(“[6/7] Talking to Bonsai-27B via the OpenAI-compatible API”) print(“=” * 70) def chat(messages, stream=False, max_tokens=512, **overrides): “””Minimal OpenAI-compatible chat client for the local llama-server.””” payload = { “model”: “bonsai-27b”, “messages”: messages, “max_tokens”: max_tokens, “stream”: stream, **GEN_PARAMS, **overrides, } if not stream: r = requests.post(f”{SERVER_URL}/v1/chat/completions”, json=payload) r.raise_for_status() return r.json()[“choices”][0][“message”][“content”] r = requests.post(f”{SERVER_URL}/v1/chat/completions”, json=payload, stream=True) r.raise_for_status() full = [] for line in r.iter_lines(): if not line or not line.startswith(b”data: “): continue chunk = line[len(b”data: “):] if chunk == b”[DONE]”: break delta = json.loads(chunk)[“choices”][0][“delta”].get(“content”, “”) full.append(delta) print(delta, end=””, flush=True) print() return “”.join(full) SYSTEM = {“role”: “system”, “content”: “You are a helpful assistant”} print(“n— 6a: basic completion —“) answer = chat([SYSTEM, {“role”: “user”, “content”: “What is the capital of France? One sentence.”}]) print(answer) print(“n— 6b: math reasoning, streamed token-by-token —“) chat([SYSTEM, {“role”: “user”, “content”: “A train travels 120 km at 80 km/h, then 90 km at ” “60 km/h. What is its average speed for the whole ” “trip? Show your reasoning briefly.”}], stream=True, max_tokens=700) print(“n— 6c: multi-turn chat —“) history = [SYSTEM] for user_msg in [“My name is Priya and I love graph algorithms.”, “Suggest one project idea that combines my interest with LLMs.”, “What was my name again?”]: history.append({“role”: “user”, “content”: user_msg}) reply = chat(history, max_tokens=300) history.append({“role”: “assistant”, “content”: reply}) print(f”nUSER: {user_msg}nBONSAI: {reply}”) print(“n— 6d: code generation —“) print(chat([SYSTEM, {“role”: “user”, “content”: “Write a Python function that returns the n-th ” “Fibonacci number using memoization. Code only.”}], max_tokens=400)) We define a reusable Python chat client that sends OpenAI-compatible requests to the locally hosted Bonsai-27B server. We support both standard

Deploying a 1-Bit Bonsai-27B Model with PrismML llama.cpp and OpenAI-Compatible Local Inference Workflows Read Post »

AI, Committee, 新闻, Uncategorized

Procedural Knowledge at Scale Improves Reasoning

admin NU / 7 月 28, 2026

arXiv:2604.01348v3 Announce Type: replace Abstract: Test-time scaling has emerged as an effective way to improve language models on challenging reasoning tasks. However, most existing methods treat each problem in isolation and do not systematically reuse knowledge from prior reasoning trajectories. In particular, they underutilize procedural knowledge: how to reframe a problem, choose an approach, and verify or backtrack when needed. We introduce textbf{Reasoning Memory}, a retrieval-augmented generation (RAG) framework for reasoning models that explicitly retrieves and reuses procedural knowledge at scale. Starting from existing corpora of step-by-step reasoning trajectories, we decompose each trajectory into self-contained subquestion-subroutine pairs, yielding a datastore of 32 million compact procedural knowledge entries. At inference time, a lightweight in-thought prompt lets the model verbalize the core subquestion, retrieve relevant subroutines within its reasoning trace, and reason under diverse retrieved subroutines as implicit procedural priors. Across six math, science, and coding benchmarks, Reasoning Memory consistently outperforms RAG with document, trajectory, and template knowledge, as well as a compute-matched test-time scaling baseline. With a higher inference budget, averaged across models and tasks, it improves over no retrieval by 19.5% and over the compute-matched baseline by 8.9%. Ablation studies show that these gains come from two key factors: the broad procedural coverage of the source trajectories and our decomposition and retrieval design, which together enable effective extraction and reuse of procedural knowledge. Our experiment code is available at https://github.com/facebookresearch/reasoning-memory.

Procedural Knowledge at Scale Improves Reasoning Read Post »

AI, Committee, 新闻, Uncategorized

Samsung’s chip workers are jumping ship to rival SK Hynix

admin NU / 7 月 28, 2026

Lee, an engineer at Samsung’s semiconductor division, clocks out when his shift ends. He used to work longer hours, going the extra mile to excel at his projects. But lately, he’s been coming straight home to work on his job application for the chipmaker’s South Korean rival SK Hynix, sharing tips with his coworkers on how to draft a stellar personal statement. Even his boss encourages him to make the move. “My team lead tells us all to jump ship to SK Hynix,” says Lee. He and his coworkers are feeling demoralized by the $476,000 bonus that SK Hynix is set to pay its employees, flush with record profits from making the high-bandwidth memory (HBM) chips that power Nvidia’s AI accelerators. The figure dwarfs what chip workers at Samsung are set to receive and is sparking an exodus. As the AI boom heats up, the semiconductor titans are waging a fierce talent war with flashy bonuses, aggressive recruiting, and even a courtroom injunction. Who wins could tilt the race to dominate the next generation of the HBM chips at the heart of the AI boom. “Except for our two team leads, my entire team [of 30 people] just applied to SK Hynix,” Lee says, referring to a job posting the company published in July. Lee, who has worked at Samsung for three years, even applied for an entry-level position at its rival. A coworker, who has worked at Samsung for eight years, applied for the same one. They both got rejected. But they’re hopeful that they’ll get a callback for a posting seeking a more experienced engineer. All employees at Samsung and SK Hynix that MIT Technology Review spoke with asked to be identified by just their last name or a pseudonym because they feared retaliation from their employer. Samsung declined to comment, and SK Hynix did not respond to requests for comment. After prolonged negotiations with its labor union, Samsung struck a deal in May to pay out 10.5% of the semiconductor division’s operating profits to employees as bonuses annually for 10 years, mostly in company stock that vests over three years. The move came after SK Hynix agreed last year to pay out 10% of operating profits to employees, which translates to $476,000 per employee this year—mostly in cash. But at Samsung, each division’s bonus is tied to its own bottom line. Chip workers in its memory division, which is also reaping a windfall from making HBM chips, are getting paid a bonus of roughly $400,000 per employee this year. But those who, like Lee, work in Samsung’s foundry division, which manufactures logic chips that companies like Tesla and Google design and has been operating at a loss, are getting a bonus of roughly $135,000. Employees told MIT Technology Review that Samsung said it can’t give as many bonuses to divisions that aren’t performing well. Lee, after watching the labor union wrestle with the company for months, says he has felt disappointed by what he ended up with: “Even if Samsung does well in the future, I don’t think any of it will trickle down to me.” The workers’ lagging bonuses are making SK Hynix suddenly look appealing. According to a survey by the Samsung labor union in June, 81.5% of employees in the company’s foundry division, and nearly half of employees in the semiconductor division as a whole, said they wanted to go to another company in the next two years. In April, Samsung labor union chief Choi Seung-ho said more than 200 members of the union had left for SK Hynix over the past four months. On Blind, an anonymous workplace forum, a chorus of disgruntled engineers at Samsung confess that they want to defect to SK Hynix for the bigger bonuses. For decades, SK Hynix lived in Samsung’s shadow. It was the smaller, scrappier memory maker that elite engineering students at universities looked past when applying for jobs. But in 2019, Samsung downsized its HBM team, betting the market would stay niche, while SK Hynix doubled down on the technology. Then the AI boom supercharged the demand for HBMs, which feed AI chips the enormous amounts of data they need at ultra-high speed, driving prices to unprecedented levels. SK Hynix now leads the global market for HBMs, while Samsung is playing catch-up. Both companies topped $1 trillion in market value in May, and SK Hynix briefly dethroned Samsung as South Korea’s most valuable company in June. Predicting that demand for memory chips will continue to surge, the semiconductor titans are making aggressive investments to expand their business. Last month, the companies unveiled plans to invest more than $2 trillion by 2040, including a semiconductor “mega-cluster” in Yongin, a city south of Seoul. To staff the expansion, SK Hynix added 2,152 employees in the first half of 2026 alone and aims to double its manufacturing capacity in five years. Samsung plans to hire 60,000 employees over the next five years, especially for its semiconductor division. Even so, the pipeline will fall short: South Korea’s semiconductor industry will need about 304,000 workers by 2031 and faces a shortage of roughly 54,000, according to the Korea Semiconductor Industry Association. Now the longtime rivals are showering workers with big bonuses to keep—and poach—talent. “[SK Hynix] seems to target Samsung engineers when hiring because it’s a rival,” says Baek, a manager at SK Hynix. “From what I heard internally, the big performance bonuses we got were aimed at luring away talent from our competitor.” Courts are starting to weigh in. In July, Samsung won an injunction barring two former chip workers from working at SK Hynix for 18 months, on the grounds that chips are a national core technology deserving protection. “With competition in the semiconductor industry fierce, it’s necessary to establish a fair market order,” the court ruled. The talent exodus threatens a crucial advantage that Samsung still holds in the HBM race. “Samsung is the only memory maker in the world that owns a foundry business,”

Samsung’s chip workers are jumping ship to rival SK Hynix Read Post »

AI, Committee, 新闻, Uncategorized

Microsoft AI Releases MAI-Cyber-1-Flash: A 5B-Active-Parameter Cyber Model That Pushes MDASH to 95.95% on CyberGym

admin NU / 7 月 28, 2026

Microsoft AI has released MAI-Cyber-1-Flash, its first model built specifically for cyber defense. The model does not ship as a standalone endpoint. It runs inside MDASH, Microsoft’s multi-model agentic scanning harness. MAI-Cyber-1-Flash MAI-Cyber-1-Flash is a transformer with self-attention and sparse Mixture-of-Experts layers. It carries 137B total parameters with 5B active, and a 256k context length. Inputs and outputs are text only. It is a cybersecurity-specialized fine-tune of MAI-Code-1-Flash, the lightweight agentic coding model already embedded in GitHub Copilot and VS Code. The release describes it as derived from the MAI-Thinking-1 lineage. Benchmarks CyberGym is a public suite of 1,507 real-world vulnerability reproduction tasks drawn from 188 OSS-Fuzz projects. Microsoft evaluated at CyberGym’s default level 1 configuration, which supplies vulnerable source and a high-level description. MDASH running MAI-Cyber-1-Flash alongside GPT-5.4 scores 95.95%. Microsoft frames this as roughly 12 points above Anthropic’s Mythos, and the launch chart places the four competing systems between 83.2% and 85.6%. When Microsoft first detailed MDASH in May 2026, the harness scored 88.45% on CyberGym using only generally available models. That was already the top public leaderboard score, about five points ahead of the next entry at 83.1%. The research team states the improvement plainly: replacing 80% of the existing models in MDASH moved the harness from 88.4% to 95.95%. Why the routing is the real product MDASH manages over 100 specialized agents through five stages: Prepare, Scan, Validate, Dedupe, and Prove. Auditor agents flag findings, debater agents argue exploitability (using disagreement as signal), and the Prove stage executes triggering inputs with ASan for C/C++ targets. To control frontier model costs at scale, MAI-Cyber-1-Flash handles up to 90% of MDASH tasks, escalating the hardest 10% to GPT-5.4. This routing yields a 50% cost saving over the previous configuration of GPT-5.4, 5.4 mini, and 5.3 codex. MDASH was developed by Microsoft’s Autonomous Code Security (ACS) team, featuring members from the DARPA AI Cyber Challenge-winning Team Atlanta. In May, MDASH-assisted work generated 16 CVEs (including four Critical remote code execution flaws) in the Windows networking and authentication stack. Retrospectively, it recovered 96% of 28 MSRC cases in clfs.sys and 100% of 7 cases in tcpip.sys over a five-year window. Performance The research team present standalone results from a lightweight terminal harness: Benchmark MAI-Cyber-1-Flash CVEBench 0.314 CyberSecEval4 — Threat Intel 0.553 CyberSecEval4 — Malware Analysis 0.33 CRSBench 0.651 (POV=1200) ExploitGym — Kernel / Userspace / Browser 0 / 0 / 0 The straight zeros on ExploitGym are deliberate, not a defect. Microsoft team states the model was trained to perform defensive tasks such as patching bugs, and not offensive tasks such as deploying malware. A 5B-active model that cannot generate exploits but can drive a 95.95% discovery pipeline is exactly the artifact a defender-only product needs. How to use it Key Takeaways MAI-Cyber-1-Flash is 137B total / 5B active, a sparse MoE fine-tune of MAI-Code-1-Flash with 256k context. 95.95% on CyberGym is a system score — MDASH plus the new model plus GPT-5.4, up from 88.45% in May 2026. It handles up to 90% of MDASH tasks, escalating the hard 10% to GPT-5.4 for a claimed 50% cost cut. ExploitGym scores are 0/0/0 by design — the model patches bugs, it does not write exploits. Access is gated The post Microsoft AI Releases MAI-Cyber-1-Flash: A 5B-Active-Parameter Cyber Model That Pushes MDASH to 95.95% on CyberGym appeared first on MarkTechPost.

Microsoft AI Releases MAI-Cyber-1-Flash: A 5B-Active-Parameter Cyber Model That Pushes MDASH to 95.95% on CyberGym Read Post »

AI, Committee, 新闻, Uncategorized

The Download: OpenAI’s predictable hack, and an AI stock sell-off

admin NU / 7 月 28, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. OpenAI called the Hugging Face attack unprecedented. But we’ve been here before. —Will Douglas Heaven, senior AI editor Reading OpenAI’s account last week of how some of its models broke their containment and hacked into the computer systems of Hugging Face, another AI company, was the first time I got genuine chills about what large language models are now able to do. But this is a case of human hubris, not rogue AI. I am not an alarmist. In fact, I have been pushing back against AI scare stories for years. Even so, this incident crossed a line. I think it’s the clearest illustration yet of how the people building and testing this technology do not fully understand what they’re doing. Here’s why OpenAI could—and should—have seen this coming. This story is from The Algorithm, our weekly AI newsletter. Sign up to receive it in your inbox every Monday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 There’s a growing global AI stock sell-off underway Chip and memory stocks are bearing the brunt so far. (FT $)+ It was partly sparked by a report that a Chinese company has started making a key piece of chip equipment for the first time. (The Information $)+ Like their US rivals, Chinese AI firms are struggling to find a path to profitability. (NYT $)+ What even is the AI bubble? (MIT Technology Review) 2 Some people’s chats with Claude were open to anyone onlineOpenAI had a near-identical issue with ChatGPT last year. (BBC)+ Is a secure AI assistant even possible? (MIT Technology Review) 3 France just witnessed its first ever “fire cloud”It’s a scary sign of the intensity of the blazes burning there. (Wired $) 4 Meta is screwing up its smart glasses rolloutPrivacy issues keep cropping up, and its response is invariably too little, and too late. (The Verge $)+ They even have a nickname: “pervert glasses.” (Vox $) 5 Efforts are underway to measure Spotify’s AI slop problemPeople are desperate for it to start labelling AI-generated music. (404 Media)+ We’re in a weird era regarding AI’s role in literature. (The Atlantic $) 6 People are renting out their faces to AI in ChinaOften for microdramas, which are big business and increasingly rely on AI. (Rest of World)+ How Chinese short dramas became AI content machines. (MIT Technology Review) 7 Microsoft is having a torrid yearRivals building better AI tools are threatening its business on multiple fronts. (Business Insider $) 8 How bots took over the internetThey now outnumber humans, in terms of overall web traffic. (New Yorker $) 9 Robotaxis are hitting London’s streets this summer Just for testing for now, though there are plans to launch rides to the public next year. (Engadget) 10 Why video games are obsessed with BackroomsThese spaces, which players aren’t supposed to access, can provoke a strange, liminal sort of discomfort. (Guardian) Quote of the day “It’s going to be a wild ride.” —Christine Peterson, co-founder of the grantmaking group Foresight Institute, tells Wired charities anticipate a huge philanthropy windfall from Anthropic and OpenAI IPOs. One More Thing PHOTOGRAPHS BY ELENA SUBACH On the ground in Ukraine’s largest Starlink repair shop Starlink is critical to Ukraine’s ability to continue in the fight against Russia, but Donald Trump’s fickle foreign policy and reports suggesting Elon Musk might remove Ukraine’s access to the services have cast the technology’s future in the country into doubt. For now Starlink access largely comes down to the unofficial community of users and engineers, including the expert “Dr. Starlink”—famous for his creative ways of customizing the systems—who have kept Ukraine in the fight, both on and off the front line. He gave MIT Technology Review exclusive access to his unofficial Starlink repair workshop in the city of Lviv. Read the full story. —Charlie Metcalfe We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + As these retirees have discovered, dancing is incredibly good for you.+ Do you feel lucky, punk? Life feels a lot easier if you do!+ 100% would visit this cat every day if I lived in Istanbul. + Watch this to understand how the movie Sinners had such impressive visual effects.

The Download: OpenAI’s predictable hack, and an AI stock sell-off Read Post »

AI, Committee, 新闻, Uncategorized

The Download: lasers for nuclear fuel, and organ preservation advances

admin NU / 7 月 27, 2026

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How lasers could help provide fuel for nuclear reactors Nuclear power provides about 9% of global electricity today, and that fraction could tick up as countries look to build new reactors. New, cheaper methods to obtain fuel could help ensure that those nuclear projects stay on track. One of those methods is called laser enrichment. It allows you to separate out the material you want (in this case, uranium) from others in a mixture of old waste. A company called Global Laser Enrichment (GLE) is about to start testing whether the technology works at commercial scale. Read our story about their efforts. —Casey Crownhart The quest to keep organs alive outside the body It’s super difficult to freeze organs. Once ice forms in them, they’re done. The ice crystals create all kinds of damage and render the organs unusable. That hasn’t stopped many researchers from trying. In new research, one team has been able to supercool the kidneys of pigs and preserve them for days. The kidneys survived being stored at −4 °C (25 °F) and eventually reimplanted back into pigs. And that’s just the latest development in a field that is positively buzzing. Read about why it’s such an exciting time for organ preservation—and what could be coming next. —Jessica Hamzelou This story is from The Checkup, our weekly biotech newsletter. Sign upto receive it in your inbox every Thursday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Silicon Valley is split over how to respond to Chinese AIIt boils down to whether AI models should be open or closed. (NYT $)+ Nvidia, Microsoft and Meta warn that restricting open models would backfire. (CNBC)+ AI companies are spending record sums on lobbying Washington. (FT $)+ China’s AI models have Trump’s AI world at war with itself. (MIT Technology Review) 2 Trump can’t post his way out of this war Iran has revealed hard limits to his ability to bend reality to his will. (Atlantic $)+ Trump has been forced to abandon further escalation due to dwindling munitions stockpiles. (NYT $)+ An Iranian strike on CIA facilities has raised questions about Russian involvement. (Reuters $) 3 Wildfires are surging across EuropeRepeated heat waves have turned parts of the continent into a tinder box. (BBC)+ One of the fires forced NASA to evacuate a tracking station in Spain. (Ars Technica)+ Americans are increasingly grappling with smoky skies too. (Atlantic $) 4 OpenAI didn’t notice its agent going on a days-long hacking spreeIt only cottoned on after the threat was contained and the FBI had been alerted, sources say. (Reuters $) 5 The AI jobs wipeout still hasn’t arrivedIn fact, a lot of companies are now embarking on hiring sprees. (WSJ $)+ AI’s impact is increasingly falling short of expectations. (The Guardian) + Here’s a much-needed reality check on the AI jobs hysteria. (MIT Technology Review) 6 A six-year-old girl died in a Chinese gene-editing trialExperts say it should have never been allowed to go ahead. (Science)+ This baby boy was treated with the first personalized gene-editing drug. (MIT Technology Review) 7 What it’s like to use a North Korean smartphoneThey’re growing in popularity—but represent another avenue for government control. (WP $) 8 The FCC’s ban on foreign-made drones is not workingYou can’t change global supply chains at the stroke of a pen. (The Verge $) 9 The “summer of Ludd” shows it’s fun to be a Luddite A growing anti-tech movement is all about raw, anarchic joy. (404 Media)+ We’re in the era of AI malaise. (MIT Technology Review) 10 Why Jimothy the racoon is the internet’s latest obsession It’s his irresistible combination of chaos and cuteness. (BBC) Quote of the day “I think that [AI] should stand for artificial idiot.” —Marian Agnew, a nine-year-old, from Norman, Oklahoma, tells Wired she’s not impressed by AI models’ tendency to make up facts. One More Thing KATHERINE LAM Inside a romance scam compound—and how people get tricked into being there Gavesh’s journey started, seemingly innocently, with a job ad on Facebook promising work he desperately needed. Instead, he found himself trafficked into a business commonly known as “pig butchering”—a form of fraud in which scammers form close relationships with targets online and extract money from them. The Chinese crime syndicates behind the scams have netted billions of dollars, and they have used violence and coercion to force their workers to carry out the frauds from large compounds, several of which operate openly in the quasi-lawless borderlands of Myanmar. Read our story about these scam syndicates and how they could be broken up. — Peter Guest and Emily Fishbein We can still have nice things A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.) + I want to make every single one of these delicious-looking Korean dishes. + If you like origami, you’ll love this guy’s tutorials.+ How to deal with those old gadgets that are collecting dust in your drawer.+ Enjoy these old art deco public transport posters from London.

The Download: lasers for nuclear fuel, and organ preservation advances Read Post »

AI, Committee, 新闻, Uncategorized

5 Architectural Patterns for Persistent Memory and State in AI Agents

admin NU / 7 月 27, 2026

Memory & State For AI Agents Building an AI agent can be tricky. Keeping it on track over a six-month deployment is incredibly hard. LLMs…

5 Architectural Patterns for Persistent Memory and State in AI Agents Read Post »

AI, Committee, 新闻, Uncategorized

Perplexity Releases pplx, a Single-Binary CLI That Puts Its Search API in the Terminal for Coding Agents

admin NU / 7 月 27, 2026

Perplexity has released pplx, an official command line client for its Search API. The tool returns grounded search results and extracted page text, all as JSON. According to its docs, it targets humans and coding agents equally. It is not a chat client. There is no conversational mode, no model selection and no synthesized answer. Two surfaces, one output contract The tool exposes exactly two working surfaces. pplx search web runs a live web search. pplx content fetch pulls a URL and returns cleaned page text. The contract around them is the interesting part. Per the official pplx-cli Agent Skill, success means exit code 0 and exactly one JSON object on stdout. Search returns {hits: [{url, title, domain, snippet, …}], total, saved_to?}. Every failure exits 1 with an empty stdout. One JSON error object goes to stderr, shaped {“error”:{“code”,”message”,”command”,”hint”?}}. Documented codes include AUTHENTICATION, UNKNOWN_ARGUMENT, ARGUMENT_ERROR and BAD_REQUEST. The skill notes that list is not exhaustive, so callers should branch on error.code. Install path and platform support Installation is a single shell command that pipes a script into sh: Copy CodeCopiedUse a different Browser curl -fsSL https://github.com/perplexityai/perplexity-cli/releases/latest/download/install.sh | sh Reading install.sh shows what it actually does. It downloads manifest.json from the latest release and extracts the tag and version. It then pins every remaining download to that tag, explicitly to avoid racing a concurrent publish. It fetches SHA256SUMS and the platform binary, verifies the checksum, and installs to ~/.local/bin/pplx. No sudo is required. A receipt is written to ~/.config/pplx/pplx-receipt.json, but only after the installed binary runs successfully. Platform coverage is limited to three targets: macOS on Apple Silicon, Linux x86_64 and Linux arm64. Anything else exits with an error. There is no Windows build and no Intel macOS build. Context-window economics are a first-class design concern The most agent-specific feature is token budgeting. –output-dir writes the full result set to a JSON file. –stdout-preview[=<CHARS>] truncates long string fields in stdout, adding …<truncated> markers. The skill is blunt about the trap: –stdout-preview is a no-op without a save directory. It truncates only when the result is also saved via –output-dir or $PPLX_OUTPUT_DIR. Used alone it returns full-size output, and hits can be multiple KB each. Saved search results land at {dir}/web/{rand}.json and fetches at {dir}/fetch/{rand}.json. Files are written only after a successful request. PPLX_OUTPUT_DIR sets a workspace default so the flag need not be repeated. For content fetch, the skill adds a correctness check rather than a cost one. Verify error and is_paywall in the output before trusting content. –html adds a raw_html field fetched live via crawler, and –no-cache forces a live fetch. Key Takeaways pplx is a single verified binary exposing two commands: pplx search web and pplx content fetch. Success is exit 0 plus one JSON object on stdout; failures put one JSON error object on stderr. pplx auth login is TTY-only, so agents and CI must export PERPLEXITY_API_KEY. –stdout-preview only truncates when paired with –output-dir or $PPLX_OUTPUT_DIR. Search API billing is $5.00 per 1,000 requests, capped at 50 QPS on every usage tier. Sources: perplexityai/perplexity-cli, pplx-cli SKILL.md, api-platform-developers, Perplexity API pricing, Rate limits and usage tiers, and Search API quickstart The post Perplexity Releases pplx, a Single-Binary CLI That Puts Its Search API in the Terminal for Coding Agents appeared first on MarkTechPost.

Perplexity Releases pplx, a Single-Binary CLI That Puts Its Search API in the Terminal for Coding Agents Read Post »

AI, Committee, 新闻, Uncategorized

How lasers could help provide fuel for nuclear reactors

admin NU / 7 月 27, 2026

Outside the small town of Paducah, Kentucky, a wealth of uranium is locked away in thousands of storage cylinders filled with waste material from a now-closed nuclear enrichment facility. Lasers could help get it out. A company called Global Laser Enrichment (GLE) is looking to reprocess this old material with a new technology called laser enrichment. It could be more efficient than conventional enrichment methods, allowing the company to refresh the material and produce feedstock at the same concentration as a natural mined source. And in the future, the company claims, laser enrichment could be used to make material for nuclear fuel, including the kind used in advanced reactors. Nuclear power provides about 9% of global electricity today, and that fraction could tick up as major world powers like the US and China look to build new reactors, including some based on next-generation technology. New, cheaper methods to obtain fuel could help ensure that those nuclear projects stay on track. Naturally occurring uranium is largely made up of uranium-238 (over 99%) and uranium-235 (about 0.7%). Uranium-235 is the fissile type, meaning that, when hit with slow low-energy neutrons, it can sustain a chain reaction that generates electricity. So reactors generally use material with a higher concentration of U-235 than what’s pulled from the ground. Today’s conventional reactors usually use low-enriched uranium, typically is about 5% U-235, though some advanced reactor designs will use fuel that’s up to 20% U-235. Today, centrifuges are the dominant tech used to enrich uranium. The equipment essentially takes uranium-containing material and spins it around incredibly quickly, so the heavier material (which contains U-238) spins out to the edge, while the lighter material (which has U-235) stays closer to the center. (If you’ve ever swung a mustard bottle to get the last of it out, you’ve used the same basic idea behind a centrifuge.) Then the material that has a higher concentration of U-235 can go on to be made into nuclear fuel. Laser enrichment, on the other hand, takes advantage of the fact that all molecules vibrate and rotate at an atomic scale in ways that depend on their specific material. Even different uranium isotopes have distinct fingerprints. Lasers are so precise they can target one particular material (like molecules that contain U-235, for example). If you shine a laser at a mixture, you can selectively excite just the material you’re targeting, giving it a bit more energy. This changes the way it behaves, which can make it easier to separate out the material you want using chemical or physical methods. A wide range of separation approaches have been developed in research and industry. Some aim to electrically charge U-235 atoms, allowing them to be moved with electrostatic or magnetic fields. Others change how the material reacts chemically. The details of GLE’s specific technology are classified, and company officials declined to share how the process works. There’s been interest in using lasers for uranium enrichment for decades, says Charles Forsberg, a principal research scientist in nuclear science and engineering at MIT. However, in their early days lasers tended to be high-maintenance, unstable and difficult to operate. They’ve improved dramatically, making laser enrichment a more attractive prospect than it was during the early research. Even more than technological improvements, a recent geopolitical shift could boost new enrichment technology. Russia has the largest uranium enrichment ecosystem in the world, and the country has historically dominated the market. “Nobody in the West was going to build a new enrichment plant while the Russians flooded the world with enriched uranium,” says Forsberg. Since the start of the Ukraine war, however, countries including the US and UK have taken steps to limit or ban imports of Russian uranium. That’s opened the door for companies to set up new enrichment operations, including some that use new technologies, Forsberg says. Demand for fuel is increasing as countries look beyond Russia for uranium supply. “The gap is just becoming bigger and bigger, and this technology is right in the middle,” says Christo Liebenberg, president of LIS Technologies, one of the companies aiming to build laser enrichment capacity in the US. LIS Technologies was founded in 2023, and the company recently purchased a 200-acre site in Oak Ridge, Tennessee. It’s currently in the pre-application process with the US Nuclear Regulatory Commission for its facility. The company plans to take in natural-grade uranium and make a product that’s roughly 5% U-235, though it hopes to eventually make more concentrated material that can be used as fuel for next-generation reactors. GLE is taking a different approach: Rather than using its technology to enrich freshly mined material to the 5% concentration that can be used in fuels, it’s hoping to start by rehabilitating old waste. The company has a contract with the US Department of Energy to reprocess waste material at the enrichment site in Paducah. The facility could enrich up to 200,000 metric tons of material that contains small amounts of uranium leftover from an older enrichment process. GLE is taking the material that’s at least 0.25% U-235 and enriching it to about 0.7%. That material can then be further processed and slotted into the uranium supply chain in place of freshly mined material. “It’s kind of like a large aboveground uranium mine for us,” says Nima Ashkeboussi, vice president of government relations and communications at GLE. While each one of its units is more complex and expensive than a centrifuge, far fewer are needed to do the same work. A similar centrifugation plant would have many thousands of centrifuges working together, but a full-scale plant using GLE’s laser enrichment process would have fewer than a thousand of its units, says Stephen Long, the company’s CEO. Up-front investment should be smaller, and operating costs are also expected to be lower, partly because the process uses less energy than centrifuges, Long says. GLE has a testing facility in Wilmington, North Carolina. In fall 2025, the company completed a demonstration pilot, processing several hundred kilograms of uranium.

How lasers could help provide fuel for nuclear reactors Read Post »

AI

Prompt Engineering vs Loop Engineering vs Graph Engineering: What Changes at Each Layer

Deploying a 1-Bit Bonsai-27B Model with PrismML llama.cpp and OpenAI-Compatible Local Inference Workflows

Procedural Knowledge at Scale Improves Reasoning

Samsung’s chip workers are jumping ship to rival SK Hynix

Microsoft AI Releases MAI-Cyber-1-Flash: A 5B-Active-Parameter Cyber Model That Pushes MDASH to 95.95% on CyberGym

The Download: OpenAI’s predictable hack, and an AI stock sell-off

The Download: lasers for nuclear fuel, and organ preservation advances

5 Architectural Patterns for Persistent Memory and State in AI Agents

Perplexity Releases pplx, a Single-Binary CLI That Puts Its Search API in the Terminal for Coding Agents

How lasers could help provide fuel for nuclear reactors

我们的服务

首页

工作原理

新闻

定价

支持

幫助中心

报告问题

提供反馈

隱私權政策

用户账户

关注我们