YouZum

AI

AI, Committee, News, Uncategorized

How Machine Learning and Semantic Embeddings Reorder CVE Vulnerabilities Beyond Raw CVSS Scores

In this tutorial, we build an AI-assisted vulnerability scanner that goes beyond static CVSS scoring and instead learns to prioritize vulnerabilities using semantic understanding and machine learning. We treat vulnerability descriptions as rich linguistic artifacts, embed them using modern sentence transformers, and combine these representations with structural metadata to produce a data-driven priority score. Also, we demonstrate how security teams can shift from rule-based triage to adaptive, explainable, ML-driven risk assessment. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser print(“Installing required packages…”) import subprocess import sys packages = [ ‘sentence-transformers’, ‘scikit-learn’, ‘pandas’, ‘numpy’, ‘matplotlib’, ‘seaborn’, ‘requests’ ] for package in packages: subprocess.check_call([sys.executable, ‘-m’, ‘pip’, ‘install’, ‘-q’, package]) import requests import pandas as pd import numpy as np from datetime import datetime, timedelta import json import re from collections import Counter import warnings warnings.filterwarnings(‘ignore’) from sentence_transformers import SentenceTransformer from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, mean_squared_error import matplotlib.pyplot as plt import seaborn as sns print(“✓ All packages installed successfully!n”) We install and load all required NLP, machine learning, and visualization libraries for the end-to-end pipeline. We ensure the runtime is fully self-contained and ready to execute in Colab or similar notebook environments. It establishes a reproducible foundation for the scanner. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class CVEDataFetcher: def __init__(self): self.base_url = “https://services.nvd.nist.gov/rest/json/cves/2.0″ def fetch_recent_cves(self, days=30, max_results=100): print(f”Fetching CVEs from last {days} days…”) end_date = datetime.now() start_date = end_date – timedelta(days=days) params = { ‘pubStartDate’: start_date.strftime(‘%Y-%m-%dT00:00:00.000’), ‘pubEndDate’: end_date.strftime(‘%Y-%m-%dT23:59:59.999’), ‘resultsPerPage’: min(max_results, 2000) } try: response = requests.get(self.base_url, params=params, timeout=30) response.raise_for_status() data = response.json() cves = [] for item in data.get(‘vulnerabilities’, [])[:max_results]: cve = item.get(‘cve’, {}) cve_id = cve.get(‘id’, ‘Unknown’) descriptions = cve.get(‘descriptions’, []) description = next((d[‘value’] for d in descriptions if d[‘lang’] == ‘en’), ‘No description’) metrics = cve.get(‘metrics’, {}) cvss_v3 = metrics.get(‘cvssMetricV31’, [{}])[0].get(‘cvssData’, {}) cvss_v2 = metrics.get(‘cvssMetricV2’, [{}])[0].get(‘cvssData’, {}) base_score = cvss_v3.get(‘baseScore’) or cvss_v2.get(‘baseScore’) or 0.0 severity = cvss_v3.get(‘baseSeverity’) or ‘UNKNOWN’ published = cve.get(‘published’, ”) references = cve.get(‘references’, []) cves.append({ ‘cve_id’: cve_id, ‘description’: description, ‘cvss_score’: float(base_score), ‘severity’: severity, ‘published’: published, ‘reference_count’: len(references), ‘attack_vector’: cvss_v3.get(‘attackVector’, ‘UNKNOWN’), ‘attack_complexity’: cvss_v3.get(‘attackComplexity’, ‘UNKNOWN’), ‘privileges_required’: cvss_v3.get(‘privilegesRequired’, ‘UNKNOWN’), ‘user_interaction’: cvss_v3.get(‘userInteraction’, ‘UNKNOWN’) }) print(f”✓ Fetched {len(cves)} CVEsn”) return pd.DataFrame(cves) except Exception as e: print(f”Error fetching CVEs: {e}”) return self._generate_sample_data(max_results) def _generate_sample_data(self, n=50): print(“Using sample CVE data for demonstration…n”) sample_descriptions = [ “A buffer overflow vulnerability in the network driver allows remote code execution”, “SQL injection vulnerability in web application login form enables unauthorized access”, “Cross-site scripting (XSS) vulnerability in user input validation”, “Authentication bypass in admin panel due to weak session management”, “Remote code execution via deserialization of untrusted data”, “Path traversal vulnerability allows reading arbitrary files”, “Privilege escalation through improper input validation”, “Denial of service through resource exhaustion in API endpoint”, “Information disclosure via error messages exposing sensitive data”, “Memory corruption vulnerability in image processing library”, “Command injection in file upload functionality”, “Integer overflow leading to heap buffer overflow”, “Use-after-free vulnerability in memory management”, “Race condition in multi-threaded application”, “Cryptographic weakness in password storage mechanism” ] severities = [‘LOW’, ‘MEDIUM’, ‘HIGH’, ‘CRITICAL’] attack_vectors = [‘NETWORK’, ‘ADJACENT’, ‘LOCAL’, ‘PHYSICAL’] complexities = [‘LOW’, ‘HIGH’] data = [] for i in range(n): severity = np.random.choice(severities, p=[0.1, 0.3, 0.4, 0.2]) score_ranges = {‘LOW’: (0.1, 3.9), ‘MEDIUM’: (4.0, 6.9), ‘HIGH’: (7.0, 8.9), ‘CRITICAL’: (9.0, 10.0)} data.append({ ‘cve_id’: f’CVE-2024-{10000+i}’, ‘description’: np.random.choice(sample_descriptions), ‘cvss_score’: np.random.uniform(*score_ranges[severity]), ‘severity’: severity, ‘published’: (datetime.now() – timedelta(days=np.random.randint(1, 30))).isoformat(), ‘reference_count’: np.random.randint(1, 10), ‘attack_vector’: np.random.choice(attack_vectors), ‘attack_complexity’: np.random.choice(complexities), ‘privileges_required’: np.random.choice([‘NONE’, ‘LOW’, ‘HIGH’]), ‘user_interaction’: np.random.choice([‘NONE’, ‘REQUIRED’]) }) return pd.DataFrame(data) We implement a robust CVE ingestion component that pulls recent vulnerabilities directly from the NVD API. We normalize raw CVE records into structured features while gracefully falling back to synthetic data when API access fails. It allows the tutorial to remain runnable while reflecting real-world challenges in data ingestion. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class VulnerabilityFeatureExtractor: def __init__(self): print(“Loading sentence transformer model…”) self.model = SentenceTransformer(‘all-MiniLM-L6-v2’) print(“✓ Model loadedn”) self.critical_keywords = { ‘execution’: [‘remote code execution’, ‘rce’, ‘execute’, ‘arbitrary code’], ‘injection’: [‘sql injection’, ‘command injection’, ‘code injection’], ‘authentication’: [‘bypass’, ‘authentication’, ‘authorization’], ‘overflow’: [‘buffer overflow’, ‘heap overflow’, ‘stack overflow’], ‘exposure’: [‘information disclosure’, ‘data leak’, ‘exposure’], } def extract_semantic_features(self, descriptions): print(“Generating semantic embeddings…”) embeddings = self.model.encode(descriptions, show_progress_bar=True) return embeddings def extract_keyword_features(self, df): print(“Extracting keyword features…”) for category, keywords in self.critical_keywords.items(): df[f’has_{category}’] = df[‘description’].apply( lambda x: any(kw in x.lower() for kw in keywords) ).astype(int) df[‘desc_length’] = df[‘description’].apply(len) df[‘word_count’] = df[‘description’].apply(lambda x: len(x.split())) return df def encode_categorical_features(self, df): print(“Encoding categorical features…”) categorical_cols = [‘attack_vector’, ‘attack_complexity’, ‘privileges_required’, ‘user_interaction’] for col in categorical_cols: dummies = pd.get_dummies(df[col], prefix=col) df = pd.concat([df, dummies], axis=1) return df We transform unstructured vulnerability descriptions into dense semantic embeddings using a sentence-transformer model. We also extract keyword-based risk indicators and textual statistics that capture exploit intent and complexity. Together, these features bridge linguistic context with quantitative ML inputs. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class VulnerabilityPrioritizer: def __init__(self): self.severity_classifier = RandomForestClassifier(n_estimators=100, random_state=42) self.score_predictor = GradientBoostingRegressor(n_estimators=100, random_state=42) self.scaler = StandardScaler() self.feature_cols = None def prepare_features(self, df, embeddings): numeric_features = [‘reference_count’, ‘desc_length’, ‘word_count’] keyword_features = [col for col in df.columns if col.startswith(‘has_’)] categorical_features = [col for col in df.columns if any(col.startswith(prefix) for prefix in [‘attack_vector_’, ‘attack_complexity_’, ‘privileges_required_’, ‘user_interaction_’])] self.feature_cols = numeric_features + keyword_features + categorical_features X_structured = df[self.feature_cols].values X_embeddings = embeddings X_combined = np.hstack([X_structured, X_embeddings]) return X_combined def train_models(self, X, y_severity, y_score): print(“nTraining ML models…”) X_scaled = self.scaler.fit_transform(X) X_train, X_test, y_sev_train, y_sev_test, y_score_train, y_score_test = train_test_split( X_scaled, y_severity, y_score, test_size=0.2, random_state=42 ) self.severity_classifier.fit(X_train, y_sev_train) sev_pred = self.severity_classifier.predict(X_test) self.score_predictor.fit(X_train, y_score_train) score_pred = self.score_predictor.predict(X_test) print(“n— Severity Classification Report —“) print(classification_report(y_sev_test, sev_pred)) print(f”n— CVSS Score Prediction —“) print(f”RMSE: {np.sqrt(mean_squared_error(y_score_test, score_pred)):.2f}”) return X_scaled def predict_priority(self, X): X_scaled = self.scaler.transform(X) severity_pred = self.severity_classifier.predict_proba(X_scaled) score_pred = self.score_predictor.predict(X_scaled) severity_weight = severity_pred[:, -1] * 0.4 score_weight = (score_pred / 10.0) * 0.6 priority_score = severity_weight + score_weight return priority_score, severity_pred, score_pred def get_feature_importance(self): importance = self.score_predictor.feature_importances_ n_structured = len(self.feature_cols) structured_importance = importance[:n_structured] embedding_importance = importance[n_structured:] feature_imp_df = pd.DataFrame({ ‘feature’: self.feature_cols, ‘importance’: structured_importance }).sort_values(‘importance’, ascending=False) return feature_imp_df, embedding_importance.mean() We train supervised

How Machine Learning and Semantic Embeddings Reorder CVE Vulnerabilities Beyond Raw CVSS Scores Read Post »

AI, Committee, News, Uncategorized

How an AI Agent Chooses What to Do Under Tokens, Latency, and Tool-Call Budget Constraints?

In this tutorial, we build a cost-aware planning agent that deliberately balances output quality against real-world constraints such as token usage, latency, and tool-call budgets. We design the agent to generate multiple candidate actions, estimate their expected costs and benefits, and then select an execution plan that maximizes value while staying within strict budgets. With this, we demonstrate how agentic systems can move beyond “always use the LLM” behavior and instead reason explicitly about trade-offs, efficiency, and resource awareness, which is critical for deploying agents reliably in constrained environments. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import os, time, math, json, random from dataclasses import dataclass, field from typing import List, Dict, Optional, Tuple, Any from getpass import getpass USE_OPENAI = True if USE_OPENAI: if not os.getenv(“OPENAI_API_KEY”): os.environ[“OPENAI_API_KEY”] = getpass(“Enter OPENAI_API_KEY (hidden): “).strip() try: from openai import OpenAI client = OpenAI() except Exception as e: print(“OpenAI SDK import failed. Falling back to offline mode.nError:”, e) USE_OPENAI = False We set up the execution environment and securely load the OpenAI API key at runtime without hardcoding it. We also initialize the client so the agent gracefully falls back to offline mode if the API is unavailable. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def approx_tokens(text: str) -> int: return max(1, math.ceil(len(text) / 4)) @dataclass class Budget: max_tokens: int max_latency_ms: int max_tool_calls: int @dataclass class Spend: tokens: int = 0 latency_ms: int = 0 tool_calls: int = 0 def within(self, b: Budget) -> bool: return (self.tokens <= b.max_tokens and self.latency_ms <= b.max_latency_ms and self.tool_calls <= b.max_tool_calls) def add(self, other: “Spend”) -> “Spend”: return Spend( tokens=self.tokens + other.tokens, latency_ms=self.latency_ms + other.latency_ms, tool_calls=self.tool_calls + other.tool_calls ) We define the core budgeting abstractions that enable the agent to reason explicitly about costs. We model token usage, latency, and tool calls as first-class quantities and provide utility methods to accumulate and validate spend. It gives us a clean foundation for enforcing constraints throughout planning and execution. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser @dataclass class StepOption: name: str description: str est_spend: Spend est_value: float executor: str payload: Dict[str, Any] = field(default_factory=dict) @dataclass class PlanCandidate: steps: List[StepOption] spend: Spend value: float rationale: str = “” def llm_text(prompt: str, *, model: str = “gpt-5”, effort: str = “low”) -> str: if not USE_OPENAI: return “” t0 = time.time() resp = client.responses.create( model=model, reasoning={“effort”: effort}, input=prompt, ) _ = (time.time() – t0) return resp.output_text or “” We introduce the data structures that represent individual action choices and full plan candidates. We also define a lightweight LLM wrapper that standardizes how text is generated and measured. This separation allows the planner to reason about actions abstractly without being tightly coupled to execution details. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def generate_step_options(task: str) -> List[StepOption]: base = [ StepOption( name=”Clarify deliverables (local)”, description=”Extract deliverable checklist + acceptance criteria from the task.”, est_spend=Spend(tokens=60, latency_ms=20, tool_calls=0), est_value=6.0, executor=”local”, ), StepOption( name=”Outline plan (LLM)”, description=”Create a structured outline with sections, constraints, and assumptions.”, est_spend=Spend(tokens=600, latency_ms=1200, tool_calls=1), est_value=10.0, executor=”llm”, payload={“prompt_kind”:”outline”} ), StepOption( name=”Outline plan (local)”, description=”Create a rough outline using templates (no LLM).”, est_spend=Spend(tokens=120, latency_ms=40, tool_calls=0), est_value=5.5, executor=”local”, ), StepOption( name=”Risk register (LLM)”, description=”Generate risks, mitigations, owners, and severity.”, est_spend=Spend(tokens=700, latency_ms=1400, tool_calls=1), est_value=9.0, executor=”llm”, payload={“prompt_kind”:”risks”} ), StepOption( name=”Risk register (local)”, description=”Generate a standard risk register from a reusable template.”, est_spend=Spend(tokens=160, latency_ms=60, tool_calls=0), est_value=5.0, executor=”local”, ), StepOption( name=”Timeline (LLM)”, description=”Draft a realistic milestone timeline with dependencies.”, est_spend=Spend(tokens=650, latency_ms=1300, tool_calls=1), est_value=8.5, executor=”llm”, payload={“prompt_kind”:”timeline”} ), StepOption( name=”Timeline (local)”, description=”Draft a simple timeline from a generic milestone template.”, est_spend=Spend(tokens=150, latency_ms=60, tool_calls=0), est_value=4.8, executor=”local”, ), StepOption( name=”Quality pass (LLM)”, description=”Rewrite for clarity, consistency, and formatting.”, est_spend=Spend(tokens=900, latency_ms=1600, tool_calls=1), est_value=8.0, executor=”llm”, payload={“prompt_kind”:”polish”} ), StepOption( name=”Quality pass (local)”, description=”Light formatting + consistency checks without LLM.”, est_spend=Spend(tokens=120, latency_ms=50, tool_calls=0), est_value=3.5, executor=”local”, ), ] if USE_OPENAI: meta_prompt = f””” You are a planning assistant. For the task below, propose 3-5 OPTIONAL extra steps that improve quality, like checks, validations, or stakeholder tailoring. Keep each step short. TASK: {task} Return JSON list with fields: name, description, est_value(1-10). “”” txt = llm_text(meta_prompt, model=”gpt-5″, effort=”low”) try: items = json.loads(txt.strip()) for it in items[:5]: base.append( StepOption( name=str(it.get(“name”,”Extra step (local)”))[:60], description=str(it.get(“description”,””))[:200], est_spend=Spend(tokens=120, latency_ms=60, tool_calls=0), est_value=float(it.get(“est_value”, 5.0)), executor=”local”, ) ) except Exception: pass return base We focus on generating a diverse set of candidate steps, including both LLM-based and local alternatives with different cost–quality trade-offs. We optionally use the model itself to suggest additional low-cost improvements while still controlling their impact on the budget. By doing so, we enrich the action space without losing efficiency. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def plan_under_budget( options: List[StepOption], budget: Budget, *, max_steps: int = 6, beam_width: int = 12, diversity_penalty: float = 0.2 ) -> PlanCandidate: def redundancy_cost(chosen: List[StepOption], new: StepOption) -> float: key_new = new.name.split(“(“)[0].strip().lower() overlap = 0 for s in chosen: key_s = s.name.split(“(“)[0].strip().lower() if key_s == key_new: overlap += 1 return overlap * diversity_penalty beams: List[PlanCandidate] = [PlanCandidate(steps=[], spend=Spend(), value=0.0, rationale=””)] for _ in range(max_steps): expanded: List[PlanCandidate] = [] for cand in beams: for opt in options: if opt in cand.steps: continue new_spend = cand.spend.add(opt.est_spend) if not new_spend.within(budget): continue new_value = cand.value + opt.est_value – redundancy_cost(cand.steps, opt) expanded.append( PlanCandidate( steps=cand.steps + [opt], spend=new_spend, value=new_value, rationale=cand.rationale ) ) if not expanded: break expanded.sort(key=lambda c: c.value, reverse=True) beams = expanded[:beam_width] best = max(beams, key=lambda c: c.value) return best We implement the budget-constrained planning logic that searches for the highest-value combination of steps under strict limits. We apply a beam-style search with redundancy penalties to avoid wasteful action overlap. This is where the agent truly becomes cost-aware by optimizing value subject to constraints. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def run_local_step(task: str, step: StepOption, working: Dict[str, Any]) -> str: name = step.name.lower() if “clarify deliverables” in name: return ( “Deliverables checklist:n” “- Executive summaryn- Scope & assumptionsn- Workplan + milestonesn” “- Risk register (risk, impact, likelihood, mitigation, owner)n” “- Next steps + data neededn” ) if “outline plan” in name: return

How an AI Agent Chooses What to Do Under Tokens, Latency, and Tool-Call Budget Constraints? Read Post »

AI, Committee, News, Uncategorized

The Download: chatbots for health, and US fights over AI regulation

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. “Dr. Google” had its issues. Can ChatGPT Health do better?   For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice was so common that it gained the pejorative moniker “Dr. Google.” But times are changing, and many medical-information seekers are now using LLMs. According to OpenAI, 230 million people ask ChatGPT health-related queries each week.   That’s the context around the launch of OpenAI’s new ChatGPT Health product, which debuted earlier this month. The big question is: can the obvious risks of using AI for health-related queries be mitigated enough for them to be a net benefit? Read the full story.  —Grace Huckins America’s coming war over AI regulation   In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry.   Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy. The move marked a victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation. In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead. Read our story about what’s on the horizon.  —Michelle Kim This story is from MIT Technology Review’s What’s Next series of stories that look across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.   Measles is surging in the US. Wastewater tracking could help. This week marked a rather unpleasant anniversary: It’s a year since Texas reported a case of measles—the start of a significant outbreak that ended up spreading across multiple states. Since the start of January 2025, there have been over 2,500 confirmed cases of measles in the US. Three people have died.  As vaccination rates drop and outbreaks continue, scientists have been experimenting with new ways to quickly identify new cases and prevent the disease from spreading. And they are starting to see some success with wastewater surveillance. Read the full story. —Jessica Hamzelou  This story is from The Checkup, our weekly newsletter giving you the inside track on all things health and biotech. Sign up to receive it in your inbox every Thursday. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 The US is dismantling itselfA foreign enemy could not invent a better chain of events to wreck its standing in the world. (Wired $)  + We need to talk about whether Donald Trump might be losing it.  (New Yorker $) 2 Big Tech is taking on more debt to fund its AI aspirationsAnd the bubble just keeps growing. (WP $)+ Forget unicorns. 2026 is shaping up to be the year of the “hectocorn.” (The Guardian)+ Everyone in tech agrees we’re in a bubble. They just can’t agree on what happens when it pops. (MIT Technology Review) 3 DOGE accessed even more personal data than we thought Even now, the Trump administration still can’t say how much data is at risk, or what it was used for. (NPR) 4 TikTok has finalized a deal to create a new US entity Ending years of uncertainty about its fate in America. (CNN)+ Why China is the big winner out of all of this. (FT $) 5 The US is now officially out of the World Health Organization And it’s leaving behind nearly $300 million in bills unpaid. (Ars Technica) + The US withdrawal from the WHO will hurt us all. (MIT Technology Review) 6 AI-powered disinformation swarms pose a threat to democracyA would-be autocrat could use them to persuade populations to accept cancelled elections or overturn results. (The Guardian)+ The era of AI persuasion in elections is about to begin. (MIT Technology Review) 7 We’re about to start seeing more robots everywhereBut exactly what they’ll look like remains up for debate. (Vox $)+ Chinese companies are starting to dominate entire sectors of AI and robotics. (MIT Technology Review) 8 Some people seem to be especially vulnerable to lonelinessIf you’re ‘other-directed’, you could particularly benefit from less screentime. (New Scientist $) 9 This academic lost two years of work with a single clickTL;DR: Don’t rely on ChatGPT to store your data. (Nature) 10 How animals develop a sense of direction Their ‘internal compass’ seems to be informed by landmarks that help them form a mental map. (Quanta $) Quote of the day “The rate at which AI is progressing, I think we have AI that is smarter than any human this year, and no later than next year.” —Elon Musk simply cannot resist the urge to make wild predictions at Davos, Wired reports.  One more thing ADAM DETOUR Africa fights rising hunger by looking to foods of the past After falling steadily for decades, the prevalence of global hunger is now on the rise—nowhere more so than in sub-Saharan Africa.  Africa’s indigenous crops are often more nutritious and better suited to the hot and dry conditions that are becoming more prevalent, yet many have been neglected by science, which means they tend to be more vulnerable to diseases and pests and yield well below their theoretical potential. Now the question is whether researchers, governments, and farmers can work together in a way that gets these crops onto plates and provides Africans from all walks of life with the energy and nutrition that they need to thrive, whatever climate change throws their way. Read the full story. —Jonathan W. Rosen We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + The only thing I fancy dry this January is a martini. Here’s how to make one.+ If you absolutely adore the

The Download: chatbots for health, and US fights over AI regulation Read Post »

AI, Committee, News, Uncategorized

LLM or Human? Perceptions of Trust and Information Quality in Research Summaries

arXiv:2601.15556v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to generate and edit scientific abstracts, yet their integration into academic writing raises questions about trust, quality, and disclosure. Despite growing adoption, little is known about how readers perceive LLM-generated summaries and how these perceptions influence evaluations of scientific work. This paper presents a mixed-methods survey experiment investigating whether readers with ML expertise can distinguish between human- and LLM-generated abstracts, how actual and perceived LLM involvement affects judgments of quality and trustworthiness, and what orientations readers adopt toward AI-assisted writing. Our findings show that participants struggle to reliably identify LLM-generated content, yet their beliefs about LLM involvement significantly shape their evaluations. Notably, abstracts edited by LLMs are rated more favorably than those written solely by humans or LLMs. We also identify three distinct reader orientations toward LLM-assisted writing, offering insights into evolving norms and informing policy around disclosure and acceptable use in scientific communication.

LLM or Human? Perceptions of Trust and Information Quality in Research Summaries Read Post »

AI, Committee, News, Uncategorized

ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms

arXiv:2601.15605v1 Announce Type: new Abstract: The rapid growth of live-streaming platforms such as Twitch has introduced complex challenges in moderating toxic behavior. Traditional moderation approaches, such as human annotation and keyword-based filtering, have demonstrated utility, but human moderators on Twitch constantly struggle to scale effectively in the fast-paced, high-volume, and context-rich chat environment of the platform while also facing harassment themselves. Recent advances in large language models (LLMs), such as DeepSeek-R1-Distill and Llama-3-8B-Instruct, offer new opportunities for toxicity detection, especially in understanding nuanced, multimodal communication involving emotes. In this work, we present an exploratory comparison of toxicity detection approaches tailored to Twitch. Our analysis reveals that incorporating emotes improves the detection of toxic behavior. To this end, we introduce ToxiTwitch, a hybrid model that combines LLM-generated embeddings of text and emotes with traditional machine learning classifiers, including Random Forest and SVM. In our case study, the proposed hybrid approach reaches up to 80 percent accuracy under channel-specific training (with 13 percent improvement over BERT and F1-score of 76 percent). This work is an exploratory study intended to surface challenges and limits of emote-aware toxicity detection on Twitch.

ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms Read Post »

AI, Committee, News, Uncategorized

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control

Alibaba Cloud’s Qwen team has open-sourced Qwen3-TTS, a family of multilingual text-to-speech models that target three core tasks in one stack, voice clone, voice design, and high quality speech generation. https://arxiv.org/pdf/2601.15621v1 Model family and capabilities Qwen3-TTS uses a 12Hz speech tokenizer and 2 language model sizes, 0.6B and 1.7B, packaged into 3 main tasks. The open release exposes 5 models, Qwen3-TTS-12Hz-0.6B-Base and Qwen3-TTS-12Hz-1.7B-Base for voice cloning and generic TTS, Qwen3-TTS-12Hz-0.6B-CustomVoice and Qwen3-TTS-12Hz-1.7B-CustomVoice for promptable preset speakers, and Qwen3-TTS-12Hz-1.7B-VoiceDesign for free form voice creation from natural language descriptions, along with the Qwen3-TTS-Tokenizer-12Hz codec. All models support 10 languages, Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian. CustomVoice variants ship with 9 curated timbres, such as Vivian, a bright young Chinese female voice, Ryan, a dynamic English male voice, and Ono_Anna, a playful Japanese female voice, each with a short description that encodes timbre and speaking style. The VoiceDesign model maps text instructions directly to new voices, for example ‘speak in a nervous teenage male voice with rising intonation’ and can then be combined with the Base model by first generating a short reference clip and reusing it via create_voice_clone_prompt. https://arxiv.org/pdf/2601.15621v1 Architecture, tokenizer, and streaming path Qwen3-TTS is a dual track language model, one track predicts discrete acoustic tokens from text, the other handles alignment and control signals. The system is trained on more than 5 million hours of multilingual speech in 3 pre training stages that move from general mapping, to high quality data, to long context support up to 32,768 tokens. A key component is the Qwen3-TTS-Tokenizer-12Hz codec. It operates at 12.5 frames per second, about 80 ms per token, and uses 16 quantizers with a 2048 entry codebook. On LibriSpeech test clean it reaches PESQ wideband 3.21, STOI 0.96, and UTMOS 4.16, outperforming SpeechTokenizer, XCodec, Mimi, FireredTTS 2 and other recent semantic tokenizers, while using a similar or lower frame rate. The tokenizer is implemented as a pure left context streaming decoder, so it can emit waveforms as soon as enough tokens are available. With 4 tokens per packet, each streaming packet carries 320 ms of audio. The non-DiT decoder and BigVGAN free design reduces decode cost and simplifies batching. On the language model side, the research team reports end to end streaming measurements on a single vLLM backend with torch.compile and CUDA Graph optimizations. For Qwen3-TTS-12Hz-0.6B-Base and Qwen3-TTS-12Hz-1.7B-Base at concurrency 1, the first packet latency is around 97 ms and 101 ms, with real time factors of 0.288 and 0.313 respectively. Even at concurrency 6, first packet latency stays around 299 ms and 333 ms. https://arxiv.org/pdf/2601.15621v1 Alignment and control Post training uses a staged alignment pipeline. First, Direct Preference Optimization aligns generated speech with human preferences on multilingual data. Then GSPO with rule based rewards improves stability and prosody. A final speaker fine tuning stage on the Base model yields target speaker variants while preserving the core capabilities of the general model. Instruction following is implemented in a ChatML style format, where text instructions about style, emotion or tempo are prepended to the input. This same interface powers VoiceDesign, CustomVoice style prompts, and fine grained edits for cloned speakers. Benchmarks, zero shot cloning, and multilingual speech On the Seed-TTS test set, Qwen3-TTS is evaluated as a zero-shot voice cloning system. The Qwen3-TTS-12Hz-1.7B-Base model reaches a Word Error Rate of 0.77 on test-zh and 1.24 on test-en. The research team highlights the 1.24 WER on test-en as state of the art among the compared systems, while the Chinese WER is close to, but not lower than, the best CosyVoice 3 score. https://arxiv.org/pdf/2601.15621v1 On a multilingual TTS test set covering 10 languages, Qwen3-TTS achieves the lowest WER in 6 languages, Chinese, English, Italian, French, Korean, and Russian, and competitive performance on the remaining 4 languages, while also obtaining the highest speaker similarity in all 10 languages compared to MiniMax-Speech and ElevenLabs Multilingual v2. Cross-lingual evaluations show that Qwen3-TTS-12Hz-1.7B-Base reduces mixed error rate for several language pairs, such as zh-to-ko, where the error drops from 14.4 for CosyVoice3 to 4.82, about a 66 percent relative reduction. On InstructTTSEval, the Qwen3TTS-12Hz-1.7B-VD VoiceDesign model sets new state of the art scores among open source models on Description-Speech Consistency and Response Precision in both Chinese and English, and is competitive with commercial systems like Hume and Gemini on several metrics. Key Takeaways Full open source multilingual TTS stack: Qwen3-TTS is an Apache 2.0 licensed suite that covers 3 tasks in one stack, high quality TTS, 3 second voice cloning, and instruction based voice design across 10 languages using the 12Hz tokenizer family. Efficient discrete codec and real time streaming: The Qwen3-TTS-Tokenizer-12Hz uses 16 codebooks at 12.5 frames per second, reaches strong PESQ, STOI and UTMOS scores, and supports packetized streaming with about 320 ms of audio per packet and sub 120 ms first packet latency for the 0.6B and 1.7B models in the reported setup. Task specific model variants: The release offers Base models for cloning and generic TTS, CustomVoice models with 9 predefined speakers and style prompts, and a VoiceDesign model that generates new voices directly from natural language descriptions which can then be reused by the Base model. Strong alignment and multilingual quality: A multi stage alignment pipeline with DPO, GSPO and speaker fine tuning gives Qwen3-TTS low word error rates and high speaker similarity, with lowest WER in 6 of 10 languages and the best speaker similarity in all 10 languages among the evaluated systems, and state of the art zero shot English cloning on Seed TTS. Check out the Model Weights, Repo and Playground. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control appeared first on MarkTechPost.

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control Read Post »

AI, Committee, News, Uncategorized

Measles is surging in the US. Wastewater tracking could help.

This week marked a rather unpleasant anniversary: It’s a year since Texas reported a case of measles—the start of a significant outbreak that ended up spreading across multiple states. Since the start of January 2025, there have been over 2,500 confirmed cases of measles in the US. Three people have died. As vaccination rates drop and outbreaks continue, scientists have been experimenting with new ways to quickly identify new cases and prevent the disease from spreading. And they are starting to see some success with wastewater surveillance. After all, wastewater contains saliva, urine, feces, shed skin, and more. You could consider it a rich biological sample. Wastewater analysis helped scientists understand how covid was spreading during the pandemic. It’s early days, but it is starting to help us get a handle on measles. Globally, there has been some progress toward eliminating measles, largely thanks to vaccination efforts. Such efforts led to an 88% drop in measles deaths between 2000 and 2024, according to the World Health Organization. It estimates that “nearly 59 million lives have been saved by the measles vaccine” since 2000. Still, an estimated 95,000 people died from measles in 2024 alone—most of them young children. And cases are surging in Europe, Southeast Asia, and the Eastern Mediterranean region. Last year, the US saw the highest levels of measles in decades. The country is on track to lose its measles elimination status—a sorry fate that met Canada in November after the country recorded over 5,000 cases in a little over a year. Public health efforts to contain the spread of measles—which is incredibly contagious—typically involve clinical monitoring in health-care settings, along with vaccination campaigns. But scientists have started looking to wastewater, too. Along with various bodily fluids, we all shed viruses and bacteria into wastewater, whether that’s through brushing our teeth, showering, or using the toilet. The idea of looking for these pathogens in wastewater to track diseases has been around for a while, but things really kicked into gear during the covid-19 pandemic, when scientists found that the coronavirus responsible for the disease was shed in feces. This led Marlene Wolfe of Emory University and Alexandria Boehm of Stanford University to establish WastewaterSCAN, an academic-led program developed to analyze wastewater samples across the US. Covid was just the beginning, says Wolfe. “Over the years we have worked to expand what can be monitored,” she says. Two years ago, for a previous edition of the Checkup, Wolfe told Cassandra Willyard that wastewater surveillance of measles was “absolutely possible,” as the virus is shed in urine. The hope was that this approach could shed light on measles outbreaks in a community, even if members of that community weren’t able to access health care and receive an official diagnosis. And that it could highlight when and where public health officials needed to act to prevent measles from spreading. Evidence that it worked as an effective public health measure was, at the time, scant. Since then, she and her colleagues have developed a test to identify measles RNA. They trialed it at two wastewater treatment plants in Texas between December 2024 and May 2025. At each site, the team collected samples two or three times a week and tested them for measles RNA. Over that period, the team found measles RNA in 10.5% of the samples they collected, as reported in a preprint paper published at medRxiv in July and currently under review at a peer-reviewed journal. The first detection came a week before the first case of measles was officially confirmed in the area. That’s promising—it suggests that wastewater surveillance might pick up measles cases early, giving public health officials a head start in efforts to limit any outbreaks. There are more promising results from a team in Canada. Mike McKay and Ryland Corchis-Scott at the University of Windsor in Ontario and their colleagues have also been testing wastewater samples for measles RNA. Between February and November 2025, the team collected samples from a wastewater treatment facility serving over 30,000 people in Leamington, Ontario.  These wastewater tests are somewhat limited—even if they do pick up measles, they won’t tell you who has measles, where exactly infections are occurring, or even how many people are infected. McKay and his colleagues have begun to make some progress here. In addition to monitoring the large wastewater plant, the team used tampons to soak up wastewater from a hospital lateral sewer. They then compared their measles test results with the number of clinical cases in that hospital. This gave them some idea of the virus’s “shedding rate.” When they applied this to the data collected from the Leamington wastewater treatment facility, the team got estimates of measles cases that were much higher than the figures officially reported.  Their findings track with the opinions of local health officials (who estimate that the true number of cases during the outbreak was around five to 10 times higher than the confirmed case count), the team members wrote in a paper published on medRxiv a couple of weeks ago. There will always be limits to wastewater surveillance. “We’re looking at the pool of waste of an entire community, so it’s very hard to pull in information about individual infections,” says Corchis-Scott. Wolfe also acknowledges that “we have a lot to learn about how we can best use the tools so they are useful.” But her team at WastewaterSCAN has been testing wastewater across the US for measles since May last year. And their findings are published online and shared with public health officials. In some cases, the findings are already helping inform the response to measles. “We’ve seen public health departments act on this data,” says Wolfe. Some have issued alerts, or increased vaccination efforts in those areas, for example. “[We’re at] a point now where we really see public health departments, clinicians, [and] families using that information to help keep themselves and their communities safe,” she says. McKay says his team has stopped testing for measles

Measles is surging in the US. Wastewater tracking could help. Read Post »

AI, Committee, News, Uncategorized

America’s coming war over AI regulation

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here. In the final weeks of 2025, the battle over regulating artificial intelligence in the US reached a boiling point. On December 11, after Congress failed twice to pass a law banning state AI laws, President Donald Trump signed a sweeping executive order seeking to handcuff states from regulating the booming industry. Instead, he vowed to work with Congress to establish a “minimally burdensome” national AI policy, one that would position the US to win the global AI race. The move marked a qualified victory for tech titans, who have been marshaling multimillion-dollar war chests to oppose AI regulations, arguing that a patchwork of state laws would stifle innovation. In 2026, the battleground will shift to the courts. While some states might back down from passing AI laws, others will charge ahead, buoyed by mounting public pressure to protect children from chatbots and rein in power-hungry data centers. Meanwhile, dueling super PACs bankrolled by tech moguls and AI-safety advocates will pour tens of millions into congressional and state elections to seat lawmakers who champion their competing visions for AI regulation.  Trump’s executive order directs the Department of Justice to establish a task force that sues states whose AI laws clash with his vision for light-touch regulation. It also directs the Department of Commerce to starve states of federal broadband funding if their AI laws are “onerous.” In practice, the order may target a handful of laws in Democratic states, says James Grimmelmann, a law professor at Cornell Law School. “The executive order will be used to challenge a smaller number of provisions, mostly relating to transparency and bias in AI, which tend to be more liberal issues,” Grimmelmann says. For now, many states aren’t flinching. On December 19, New York’s governor, Kathy Hochul, signed the Responsible AI Safety and Education (RAISE) Act, a landmark law requiring AI companies to publish the protocols used to ensure the safe development of their AI models and report critical safety incidents. On January 1, California debuted the nation’s first frontier AI safety law, SB 53—which the RAISE Act was modeled on—aimed at preventing catastrophic harms such as biological weapons or cyberattacks. While both laws were watered down from earlier iterations to survive bruising industry lobbying, they struck a rare, if fragile, compromise between tech giants and AI safety advocates. If Trump targets these hard-won laws, Democratic states like California and New York will likely take the fight to court. Republican states like Florida with vocal champions for AI regulation might follow suit. Trump could face an uphill battle. “The Trump administration is stretching itself thin with some of its attempts to effectively preempt [legislation] via executive action,” says Margot Kaminski, a law professor at the University of Colorado Law School. “It’s on thin ice.” But Republican states that are anxious to stay off Trump’s radar or can’t afford to lose federal broadband funding for their sprawling rural communities might retreat from passing or enforcing AI laws. Win or lose in court, the chaos and uncertainty could chill state lawmaking. Paradoxically, the Democratic states that Trump wants to rein in—armed with big budgets and emboldened by the optics of battling the administration—may be the least likely to budge. In lieu of state laws, Trump promises to create a federal AI policy with Congress. But the gridlocked and polarized body won’t be delivering a bill this year. In July, the Senate killed a moratorium on state AI laws that had been inserted into a tax bill, and in November, the House scrapped an encore attempt in a defense bill. In fact, Trump’s bid to strong-arm Congress with an executive order may sour any appetite for a bipartisan deal.  The executive order “has made it harder to pass responsible AI policy by hardening a lot of positions, making it a much more partisan issue,” says Brad Carson, a former Democratic congressman from Oklahoma who is building a network of super PACs backing candidates who support AI regulation. “It hardened Democrats and created incredible fault lines among Republicans,” he says.  While AI accelerationists in Trump’s orbit—AI and crypto czar David Sacks among them—champion deregulation, populist MAGA firebrands like Steve Bannon warn of rogue superintelligence and mass unemployment. In response to Trump’s executive order, Republican state attorneys general signed a bipartisan letter urging the FCC not to supersede state AI laws. With Americans increasingly anxious about how AI could harm mental health, jobs, and the environment, public demand for regulation is growing. If Congress stays paralyzed, states will be the only ones acting to keep the AI industry in check. In 2025, state legislators introduced more than 1,000 AI bills, and nearly 40 states enacted over 100 laws, according to the National Conference of State Legislatures. Efforts to protect children from chatbots may inspire rare consensus. On January 7, Google and Character Technologies, a startup behind the companion chatbot Character.AI, settled several lawsuits with families of teenagers who killed themselves after interacting with the bot. Just a day later, the Kentucky attorney general sued Character Technologies, alleging that the chatbots drove children to suicide and other forms of self-harm. OpenAI and Meta face a barrage of similar suits. Expect more to pile up this year. Without AI laws on the books, it remains to be seen how product liability laws and free speech doctrines apply to these novel dangers. “It’s an open question what the courts will do,” says Grimmelmann.  While litigation brews, states will move to pass child safety laws, which are exempt from Trump’s proposed ban on state AI laws. On January 9, OpenAI inked a deal with a former foe, the child-safety advocacy group Common Sense Media, to back a ballot initiative in California called the Parents & Kids Safe AI Act, setting guardrails around how chatbots interact with children. The measure proposes requiring AI companies to

America’s coming war over AI regulation Read Post »

AI, Committee, News, Uncategorized

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation

arXiv:2510.06605v2 Announce Type: replace-cross Abstract: The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model’s origin by extracting an intrinsic, unique signature (a “fingerprint”) and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model’s unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model’s input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model’s Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation Read Post »

AI, Committee, News, Uncategorized

Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling

arXiv:2511.09345v2 Announce Type: replace Abstract: Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs. Although recent studies have reduced token consumption through dynamic self-consistency, they remain constrained by the high latency of sequential requests. In this paper, we propose SeerSC, a dynamic self-consistency framework that simultaneously improves token efficiency and latency by integrating System 1 and System 2 reasoning. Specifically, we utilize the rapid System 1 to compute the answer entropy for given queries. This score is then used to evaluate the potential of samples for scaling, enabling dynamic self-consistency under System 2. Benefiting from the advance and accurate estimation provided by System 1, the proposed method can reduce token usage while simultaneously achieving a significant decrease in latency through parallel generation. It outperforms existing methods, achieving up to a 47% reduction in token consumption and a 43% reduction in inference latency without significant performance loss.

Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at Privacy Policy and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
en_US