YouZum

AI

AI, Committee, Nachrichten, Uncategorized

Job titles of the future: Wildlife first responder

Grizzly bears have made such a comeback across eastern Montana that in 2017, the state hired its first-ever prairie-based grizzly manager: wildlife biologist Wesley Sarmento.  For some seven years, Sarmento worked to keep both the bears, which are still listed as threatened under the Endangered Species Act, and the humans, who are sprawling into once-wild spaces, out of trouble. Based in the small city of Conrad, population 2,553, he acted sort of like a first responder, trying to defuse potentially dangerous situations. He even got caught in some himself—which is why, before he left the role to pursue a PhD, he turned to drones to get the job done.  The bear necessities Sarmento was studying mountain goats in Glacier National Park when he first started working with bears. To better understand how goats responded to the apex predator, he dressed up in a bear costume once a week for over three years.  When he later started as grizzly manager, he often drove long distances to push bears away from farms. Bears are drawn to spilled or leaking grains, and an open silo quickly turns into a buffet. Sarmento would typically arrive armed with a shotgun, cracker shells, and bear spray, but after he narrowly escaped getting mauled one day, he knew he had to pivot. “In that moment,” he says, “I was like, I am gonna get myself killed.” A bird’s-eye view Sarmento first turned to two Airedale dogs, a breed known for deterring bears on farms, but the dogs were easily sidetracked. Meanwhile, drones were slowly becoming more common tools for biologists in a range of activities, including counting birds and mapping habitats. He first took one into the field in 2022, when a grizzly mom and two cubs were found rummaging around in a silo outside of town. The drone’s infrared sensors helped him quickly find their location, and he used the aircraft’s sound to drive them away from the property. (Researchers suspect bears instinctively dislike the whir of blades because it sounds like a swarm of bees.) “The whole thing was so clean and controlled,” he says. “And I did it all from the safety of my truck.” Since then, the flying machine that Sarmento bought for $4,000—a fairly simple model with a thermal camera and 30 minutes of battery life—has shown its potential for detecting grizzlies in perilous terrain he’d otherwise have to approach on foot, like dense brush or hard-to-reach river bottoms. A new technological foundation Now studying wildlife ecology at the University of Montana, Sarmento is hoping to design a drone campus police can use to deter black bears from school grounds. In the future, he hopes, AI image recognition might be broadly integrated into his wildlife management work—maybe even helping drones identify bears and autonomously divert them from high-traffic areas. All this helps keep bears from learning behaviors that lead to conflict with people—which typically ends badly for the bear and is occasionally fatal for humans. “The out-of-the-box technology doesn’t exist yet, but the hope is to keep exploring applications,” he says. “Drones are the next frontier.”  Emily Senkosky is a writer with a master’s degree in environmental science journalism from the University of Montana.

Job titles of the future: Wildlife first responder Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

You have no choice in reading this article—maybe

Uri Maoz loved doing his human research, back when he was getting his PhD. He was studying a very specific topic in computational neuroscience: how the brain instructs our arms to move and how our gray matter in turn perceives that motion.  Then his professor asked him to deliver an undergrad lecture. Maoz assumed his boss was going to tell him exactly what to do, or at least throw some PowerPoint slides his way. But no. Maoz had free rein to teach anything, as long as it was relevant to the students. “I could have gone to human brain augmentation,” he says. “Cyborgs or whatever.” Yet that admittedly fun and borderline sci-fi topic wasn’t what popped, unbidden, into his mind. His idea, he recalls with excitement: “What neuroscience has to say about the question of free will!”  How—or whether—humans make decisions (like, say, about what to discuss in an undergrad lecture) had been on his mind since he’d read an article in his early twenties suggesting that … maybe they didn’t. This question might naturally beget others: Had he even had a choice about whether to read that article in the first place? How would he ever know if he was responsible for making decisions in his life or if he just had the illusion of control? “After that, there was no turning back,” says Maoz, now a professor at Chapman University, in California. He finished his PhD work in human movement, but afterward he scooted further up the neural chain to find out how desires and beliefs turn into actions—from raising an arm to choosing someone to ask out to dinner on a Friday night. Today, Maoz is a central figure in the attempt to (sort of, maybe) answer how that neural chain functions. His research has since overturned and reinter­preted canonical neuroscience studies and united the straight-scientific and philosophical sides of the free-will question. More than anything, though, he’s succeeded in uncovering new wrinkles in the debate. Machines and magic tricks The concept of free will seems straightforward, but it doesn’t have a universally accepted definition. One intuitive notion is that it’s the ability to make our own decisions and take our own actions on purpose—that we control our lives. But physicists might ask if the universe is deterministic, following a preordained path, and if human choices can still happen in such a universe.  That’s a question for them, Maoz says. What neuroscientists can do is figure out what’s going on in the brain when people make decisions. “And that’s what we’re trying to do: to understand how our wishes, desires, beliefs, turn into actions,” he says. By the time Maoz had finished his PhD, in 2008, neuroscientific research into the question had been going on for decades. One foundational study from the 1960s showed that a hand movement—something a person seemingly decides to do—was preceded by the appearance in the brain of an electrical signal called the “readiness potential.”  Building on that result, in the 1980s a neuroscientist named Benjamin Libet did the experiment that had first piqued Maoz’s interest in the topic—one that many, until recently, interpreted as a death knell for the concept of free will. An electrical impulse in our brains can shed only so much light on whether we truly are the architects of our own fates. “He just had people sit there, and whenever they feel like it, they would go like this,” says Maoz, wiggling his wrist. Libet would then ask where a rotating dot was on a screen when they first had the urge to flick. He found that the readiness potential appeared not only before they moved their hand but before they reported having the urge to move—or, in Libet’s interpretation, before they knew they were going to move.  Studies since have confirmed the observation and shown that the readiness potential appears a second or two—and maybe, fMRI implies, up to 10 seconds—before participants report making a conscious decision. “It suggests we are essentially passengers in a self-driving car,” says Maoz. “The unconscious biological machine does all the steering, but our conscious mind sits in the driver’s seat and takes the credit.”  Maoz initially approached his own research with variations on Libet’s experiments. He worked with epilepsy patients who already had electrodes in their brains, for clinical purposes, and was able to predict which hand they would raise before they raised it.  Still, some of the Libet-inspired studies people were doing nagged at him. “All these results were about completely arbitrary decisions. Raise your hand whenever you feel like it,” he says. “Why? No reason.” A decision like that is quite different from, say, choosing to break up with your partner. Try telling someone they weren’t in the driver’s seat for that.  The field wasn’t looking at meaningful decisions, he says—the ones that actually set the course of lives.  Maoz began pulling in philosophers to help guide his approach. They would challenge him to confront the semantic differences between things like intention, desire, and urge. Neuroscientists have tended to lump those concepts together, but philosophers tease them apart: Desire is a want that doesn’t necessarily progress toward an action; urge carries implications of immediacy and compulsion; and intention involves committing to a plan. (Maoz has come to focus specifically on intention—including, recently, the potential intentions of AI.) In 2017, he organized his first in a series of free-will conferences, drawing many autonomy-interested philosophers. “Thank you so much for coming,” he recalls saying at the opening of the meeting. “As if you had a choice.” One day, the crew took an excursion out on a lake. As the group munched on shrimp, someone joked that they hoped the boat didn’t sink, because everybody in the field would die.  The comment didn’t make Maoz feel existential dread. Instead, he figured that if the whole field was already there, why not lasso them all into writing a research grant? “He just thinks what should be the next step and just has

You have no choice in reading this article—maybe Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

The Download: how humans make decisions, and Moderna’s “vaccine” word games

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. You have no choice in reading this article—maybe How do humans make decisions? The question has been on Uri Maoz’s mind since he read an article in his early twenties suggesting that… maybe they didn’t.   Had he even had a choice about whether to read that article in the first place? How would he ever know if he was truly responsible for making any decisions? “After that, there was no turning back,” says Maoz, now a professor of computational neuroscience at Chapman University.  Today, Maoz is a central figure in efforts to understand how desires and beliefs turn into actions. He’s also uncovered new wrinkles in the debate. Read the full story on his discoveries. —Sarah Scoles This article is from the next issue of our print magazine, packed with stories all about nature. Subscribe now to read the full thing when it lands on Wednesday, April 22. What’s in a name? Moderna’s “vaccine” vs. “therapy” dilemma  Moderna, the covid-19 shot maker, is using its mRNA technology to destroy tumors through a very, very promising technique known as a cancer vacc—  “It’s not a vaccine,” a spokesperson for Merck said before the V-word could be uttered. “It’s an individualized neoantigen therapy.”  Oh, but it is a vaccine, and it looks like a possible breakthrough. But it’s been rebranded to avoid vaccine fearmongering—and not everyone is happy about the word game. Read the full story.  —Antonio Regalado This article is from The Checkup, our weekly newsletter covering the latest in biotech. Sign up to receive it in your inbox every Thursday.  The must reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 Sam Altman’s home has been attacked twice in two days A driver reportedly fired a gun at his property on Sunday. (SF Standard) + A Molotov cocktail was thrown at his home on Friday. (NBC News) + The suspect wrote essays warning AI would end humanity. (SF Chronicle) + The attacks expose growing divides in opinion on AI. (Axios)  2 AI weapons are ushering in a new kind of arms race Countries are racing to deploy AI in military systems. (NYT $) + The Pentagon wants AI firms to train on classified data. (MIT Technology Review) + Where OpenAI’s technology could show up in Iran. (MIT Technology Review)  3 Artemis II was a success Astronauts did an array of experiments that will be crucial to the future of both the program itself and deep-space missions. (Guardian) + But next steps for the Artemis missions are uncertain. (Ars Technica)  4 OpenAI and Elon Musk are heading toward a massive courtroom clashThe company has accused Musk of a “legal ambush.” (Engadget) + He’s lost a streak of cases ahead of the showdown. (FT $)  5 AI job fears in China are fueling a viral “ability harvester” project It claims to turn human skills into AI tools. (SCMP) + Hustlers are cashing in on China’s OpenClaw AI craze. (MIT Technology Review)  6 Governments are hiding information about the Iran war online Through restrictions on internet access and satellite imagery. (NPR)   7 Apple is testing four smart glasses that could rival Meta Ray-Bans They’re part of a broader wearables strategy. (Bloomberg $)  8 Meta is building an AI version of Mark Zuckerberg to interact with staffIt’s being trained on his mannerisms, voice, and statements. (FT $)  9 Anthropic is asking Christian leaders for guidance It’s seeing advice on building moral machines. (WP $) + AI agents have spread their own religions. (MIT Technology Review)  10 A dancer with MND is performing again through an avatar Her brainwaves powered the digital dancer. (BBC)  Quote of the day “Earth was this lifeboat hanging in the universe.” —Artemis II astronaut Christina Koch describes her view of Earth from space, the Guardian reports. One more thing RAVEN JIANG How AI and Wikipedia have sent vulnerable languages into a doom spiral When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia, he discovered that almost every article had been written by people who didn’t speak the language.   A growing number of them had been copy-pasted into Wikipedia from machine translators—and were riddled with elementary mistakes. This is beginning to cause a wicked problem.  AI systems, from Google Translate to ChatGPT, learn new languages by scraping text from Wikipedia. This could push the most vulnerable languages on Earth toward the precipice.  Read the full story on what happens when AI gets trained on junk pages.  —Jacob Judah  We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line.) + Hungary’s next health minister can throw some serious shapes.  + Here’s a welcome route to an AI-free Google search. + Movievia eschews endless scrolling to find the right film for your needs+ A photography trick has turned a giant glacier into a tiny, living diorama.

The Download: how humans make decisions, and Moderna’s “vaccine” word games Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Want to understand the current state of AI? Check out these charts.

If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence, AI’s annual report card, comes out today and cuts through some of that noise.  Despite predictions that AI development may hit a wall, the report says that the top models just keep getting better. People are adopting AI faster than they picked up the personal computer or the internet. AI companies are generating revenue faster than companies in any previous technology boom, but they’re also spending hundreds of billions of dollars on data centers and chips. The benchmarks designed to measure AI, the policies meant to govern it, and the job market are struggling to keep up. AI is sprinting, and the rest of us are trying to find our shoes. All that speed comes at a cost. AI data centers around the world can now draw 29.6 gigawatts of power, enough to run the entire state of New York at peak demand. Annual water use from running OpenAI’s GPT-4o alone may exceed the drinking water needs of 12 million people. At the same time, the supply chain for chips is alarmingly fragile. The US hosts most of the world’s AI data centers, and one company in Taiwan, TSMC, fabricates almost every leading AI chip.  The data reveals a technology evolving faster than we can manage. Here’s a look at some of the key points from this year’s report.  The US and China are nearly tied In a long, heated race with immense geopolitical stakes, the US and China are almost neck and neck on AI model performance, according to Arena, a community-driven ranking platform that allows users to compare the outputs of large language models on identical prompts. In early 2023, OpenAI had a lead with ChatGPT, but this gap narrowed in 2024 as Google and Anthropic released their own models. In February 2025, R1, an AI model built by the Chinese lab DeepSeek, briefly matched the top US model, ChatGPT. As of March 2026, Anthropic leads, trailed closely by xAI, Google, and OpenAI. Chinese models like DeepSeek and Alibaba lag only modestly. With the best AI models separated in the rankings by razor-thin margins, they’re now competing on cost, reliability, and real-world usefulness.  The index notes that the US and China have different AI advantages. While the US has more powerful AI models, more capital, and an estimated 5,427 data centers (more than 10 times as many as any other country), China leads in AI research publications, patents, and robotics.  As competition intensifies, companies like OpenAI, Anthropic, and Google no longer disclose their training code, parameter counts, or data-set sizes. “We don’t know a lot of things about predicting model behaviors,” says Yolanda Gil, a computer scientist at the University of Southern California who coauthored the report. This lack of transparency makes it difficult for independent researchers to study how to make AI models safer, she says. AI models are advancing super fast Despite predictions that development will plateau, AI models keep getting better and better. By some measures, they now meet or exceed the performance of human experts on tests that aim to measure PhD-level science, math, and language understanding. SWE-bench Verified, a software engineering benchmark for AI models, saw top scores jump from around 60% in 2024 to almost 100% in 2025. In 2025, an AI system produced a weather forecast on its own.   “I am stunned that this technology continues to improve, and it’s just not plateauing in any way,” says Gil. However, AI still struggles in plenty of other areas. Because the models learn by processing enormous amounts of text and images rather than by experiencing the physical world, AI exhibits “jagged intelligence.” Robots are still in their early days and succeed in only 12% of household tasks. Self-driving cars are farther along: Waymos are now roaming across five US cities, and Baidu’s Apollo Go vehicles are shuttling riders around in China. AI is also expanding into professional domains like law and finance, but no model dominates the field yet.  But the way we test AI is broken These reports of progress should be taken with a grain of salt. The benchmarks designed to track AI progress are struggling to keep up as models quickly blow past their ceilings, the Stanford report says. Some are poorly constructed—a popular benchmark that tests a model’s math abilities has a 42% error rate. Others can be gamed: when models are trained on benchmark test data, for example, they can learn to score well without getting smarter.  Because AI is rarely used the same way it’s tested, strong benchmark performance doesn’t always translate to real-world performance. And for complex, interactive technologies such as AI agents and robots, benchmarks barely exist yet.  AI companies are also sharing less about how their models are trained, and independent testing sometimes tells a different story from what they report. “A lot of companies are not releasing how their models do in certain benchmarks, particularly the responsible-AI benchmarks,” says Gil. “The absence of how your model is doing on a benchmark maybe says something.”  AI is starting to affect jobs Within three years of going mainstream, AI is now used by more than half of people around the world, a rate of adoption faster than the personal computer or the internet. An estimated 88% of organizations now use AI, and four in five university students use it.  It’s early days for deployment, and AI’s impact on jobs is hard to measure. Still, some studies suggest AI is beginning to affect young workers in certain professions. According to a 2025 study by economists at Stanford, employment for software developers aged 22 to 25 has fallen nearly 20% since 2022. The decline might not be pinned on AI alone, as broader macroeconomic conditions could be to blame, but AI appears to be playing

Want to understand the current state of AI? Check out these charts. Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding

arXiv:2604.07753v1 Announce Type: cross Abstract: Empowering Large Multimodal Models (LMMs) with image generation often leads to catastrophic forgetting in understanding tasks due to severe gradient conflicts. While existing paradigms like Mixture-of-Transformers (MoT) mitigate this conflict through structural isolation, they fundamentally sever cross-modal synergy and suffer from capacity fragmentation. In this work, we present Symbiotic-MoE, a unified pre-training framework that resolves task interference within a native multimodal Mixture-of-Experts (MoE) Transformers architecture with zero-parameter overhead. We first identify that standard MoE tuning leads to routing collapse, where generative gradients dominate expert utilization. To address this, we introduce Modality-Aware Expert Disentanglement, which partitions experts into task-specific groups while utilizing shared experts as a multimodal semantic bridge. Crucially, this design allows shared experts to absorb fine-grained visual semantics from generative tasks to enrich textual representations. To optimize this, we propose a Progressive Training Strategy featuring differential learning rates and early-stage gradient shielding. This mechanism not only shields pre-trained knowledge from early volatility but eventually transforms generative signals into constructive feedback for understanding. Extensive experiments demonstrate that Symbiotic-MoE achieves rapid generative convergence while unlocking cross-modal synergy, boosting inherent understanding with remarkable gains on MMLU and OCRBench.

Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

HyperMem: Hypergraph Memory for Long-Term Conversations

arXiv:2604.08256v1 Announce Type: new Abstract: Long-term memory is essential for conversational agents to maintain coherence, track persistent tasks, and provide personalized interactions across extended dialogues. However, existing approaches as Retrieval-Augmented Generation (RAG) and graph-based memory mostly rely on pairwise relations, which can hardly capture high-order associations, i.e., joint dependencies among multiple elements, causing fragmented retrieval. To this end, we propose HyperMem, a hypergraph-based hierarchical memory architecture that explicitly models such associations using hyperedges. Particularly, HyperMem structures memory into three levels: topics, episodes, and facts, and groups related episodes and their facts via hyperedges, unifying scattered content into coherent units. Leveraging this structure, we design a hybrid lexical-semantic index and a coarse-to-fine retrieval strategy, supporting accurate and efficient retrieval of high-order associations. Experiments on the LoCoMo benchmark show that HyperMem achieves state-of-the-art performance with 92.73% LLM-as-a-judge accuracy, demonstrating the effectiveness of HyperMem for long-term conversations.

HyperMem: Hypergraph Memory for Long-Term Conversations Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

arXiv:2604.07941v1 Announce Type: new Abstract: Post-training has become central to turning pretrained large language models (LLMs) into aligned and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet these methods are often discussed in fragmented ways, organized by labels or objective families rather than by the behavioral bottlenecks they address. This survey argues that LLM post-training is best understood as structured intervention on model behavior. We organize the field first by trajectory provenance, which defines two primary learning regimes: off-policy learning on externally supplied trajectories, and on-policy learning on learner-generated rollouts. We then interpret methods through two recurring roles — effective support expansion, which makes useful behaviors more reachable, and policy reshaping, which improves behavior within already reachable regions — together with a complementary systems-level role, behavioral consolidation, which preserves, transfers, and amortizes behavior across stages and model transitions. This perspective yields a unified reading of major paradigms. SFT may serve either support expansion or policy reshaping, whereas preference-based methods are usually off-policy reshaping. On-policy RL often improves behavior on learner-generated states, though under stronger guidance it can also make hard-to-reach reasoning paths reachable. Distillation is often best understood as consolidation rather than only compression, and hybrid pipelines emerge as coordinated multi-stage compositions. Overall, the framework helps diagnose post-training bottlenecks and reason about stage composition, suggesting that progress in LLM post-training increasingly depends on coordinated system design rather than any single dominant objective.

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning Beitrag lesen »

AI, Committee, Nachrichten, Uncategorized

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

Complex prediction problems often lead to ensembles because combining multiple models improves accuracy by reducing variance and capturing diverse patterns. However, these ensembles are impractical in production due to latency constraints and operational complexity. Instead of discarding them, Knowledge Distillation offers a smarter approach: keep the ensemble as a teacher and train a smaller student model using its soft probability outputs. This allows the student to inherit much of the ensemble’s performance while being lightweight and fast enough for deployment. In this article, we build this pipeline from scratch — training a 12-model teacher ensemble, generating soft targets with temperature scaling, and distilling it into a student that recovers 53.8% of the ensemble’s accuracy edge at 160× the compression. What is Knowledge Distillation? Knowledge distillation is a model compression technique in which a large, pre-trained “teacher” model transfers its learned behavior to a smaller “student” model. Instead of training solely on ground-truth labels, the student is trained to mimic the teacher’s predictions—capturing not just final outputs but the richer patterns embedded in its probability distributions. This approach enables the student to approximate the performance of complex models while remaining significantly smaller and faster. Originating from early work on compressing large ensemble models into single networks, knowledge distillation is now widely used across domains like NLP, speech, and computer vision, and has become especially important in scaling down massive generative AI models into efficient, deployable systems. Knowledge Distillation: From Ensemble Teacher to Lean Student Setting up the dependencies Copy CodeCopiedUse a different Browser pip install torch scikit-learn numpy Copy CodeCopiedUse a different Browser import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader, TensorDataset from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler import numpy as np Copy CodeCopiedUse a different Browser torch.manual_seed(42) np.random.seed(42) Creating the dataset This block creates and prepares a synthetic dataset for a binary classification task (like predicting whether a user clicks an ad). First, make_classification generates 5,000 samples with 20 features, of which some are informative and some redundant to simulate real-world data complexity. The dataset is then split into training and testing sets to evaluate model performance on unseen data. Next, StandardScaler normalizes the features so they have a consistent scale, which helps neural networks train more efficiently. The data is then converted into PyTorch tensors so it can be used in model training. Finally, a DataLoader is created to feed the data in mini-batches (size 64) during training, improving efficiency and enabling stochastic gradient descent. Copy CodeCopiedUse a different Browser X, y = make_classification( n_samples=5000, n_features=20, n_informative=10, n_redundant=5, random_state=42 ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Convert to tensors X_train_t = torch.tensor(X_train, dtype=torch.float32) y_train_t = torch.tensor(y_train, dtype=torch.long) X_test_t = torch.tensor(X_test, dtype=torch.float32) y_test_t = torch.tensor(y_test, dtype=torch.long) train_loader = DataLoader( TensorDataset(X_train_t, y_train_t), batch_size=64, shuffle=True ) Model Architecture This section defines two neural network architectures: a TeacherModel and a StudentModel. The teacher represents one of the large models in the ensemble—it has multiple layers, wider dimensions, and dropout for regularization, making it highly expressive but computationally expensive during inference. The student model, on the other hand, is a smaller and more efficient network with fewer layers and parameters. Its goal is not to match the teacher’s complexity, but to learn its behavior through distillation. Importantly, the student still retains enough capacity to approximate the teacher’s decision boundaries—too small, and it won’t be able to capture the richer patterns learned by the ensemble. Copy CodeCopiedUse a different Browser class TeacherModel(nn.Module): “””Represents one heavy model inside the ensemble.””” def __init__(self, input_dim=20, num_classes=2): super().__init__() self.net = nn.Sequential( nn.Linear(input_dim, 256), nn.ReLU(), nn.Dropout(0.3), nn.Linear(256, 128), nn.ReLU(), nn.Dropout(0.3), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, num_classes) ) def forward(self, x): return self.net(x) class StudentModel(nn.Module): “”” The lean production model that learns from the ensemble. Two hidden layers — enough capacity to absorb distilled knowledge, still ~30x smaller than the full ensemble. “”” def __init__(self, input_dim=20, num_classes=2): super().__init__() self.net = nn.Sequential( nn.Linear(input_dim, 64), nn.ReLU(), nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, num_classes) ) def forward(self, x): return self.net(x) Helpers This section defines two utility functions for training and evaluation. train_one_epoch handles one full pass over the training data. It puts the model in training mode, iterates through mini-batches, computes the loss, performs backpropagation, and updates the model weights using the optimizer. It also tracks and returns the average loss across all batches to monitor training progress. evaluate is used to measure model performance. It switches the model to evaluation mode (disabling dropout and gradients), makes predictions on the input data, and computes the accuracy by comparing predicted labels with true labels. Copy CodeCopiedUse a different Browser def train_one_epoch(model, loader, optimizer, criterion): model.train() total_loss = 0 for xb, yb in loader: optimizer.zero_grad() loss = criterion(model(xb), yb) loss.backward() optimizer.step() total_loss += loss.item() return total_loss / len(loader) def evaluate(model, X, y): model.eval() with torch.no_grad(): preds = model(X).argmax(dim=1) return (preds == y).float().mean().item() Training the Ensemble This section trains the teacher ensemble, which serves as the source of knowledge for distillation. Instead of a single model, 12 teacher models are trained independently with different random initializations, allowing each one to learn slightly different patterns from the data. This diversity is what makes ensembles powerful. Each teacher is trained for multiple epochs until convergence, and their individual test accuracies are printed. Once all models are trained, their predictions are combined using soft voting—by averaging their output logits rather than taking a simple majority vote. This produces a stronger, more stable final prediction, giving you a high-performing ensemble that will act as the “teacher” in the next step. Copy CodeCopiedUse a different Browser print(“=” * 55) print(“STEP 1: Training the 12-model Teacher Ensemble”) print(” (this happens offline, not in production)”) print(“=” * 55) NUM_TEACHERS = 12 teachers = [] for i in range(NUM_TEACHERS): torch.manual_seed(i) # different init per teacher model = TeacherModel() optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) criterion = nn.CrossEntropyLoss() for epoch in range(30): # train until convergence train_one_epoch(model, train_loader, optimizer, criterion) acc = evaluate(model,

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model Beitrag lesen »

We use cookies to improve your experience and performance on our website. You can learn more at Datenschutzrichtlinie and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
de_DE