News Archives - Page 30 of 101

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

admin NU / กันยายน 7, 2025

Latvian language-tech firm Tilde has released TildeOpen LLM, an open-source foundational large language model (LLM) purpose-built for European languages, with a sharp focus on under-represented and smaller national and regional languages. It’s a strategic leap toward linguistic equity and digital sovereignty within the EU. Under the Hood: Architecture, Training and Governance The public release occurred on September 3, 2025, when Tilde deployed the model free to users via Hugging Face. Built as a 30-billion-parameter dense decoder-only transformer, the model is available under a permissive license (CC-BY-4.0) and includes broad language support—from Latvian and Lithuanian to Ukrainian, Turkish, and beyond. Training occurred on the EU’s supercomputers: LUMI (Finland) and JUPITER, tapping into 2 million GPU hours awarded via the European Commission’s Large AI Grand Challenge. Fine technical detail: trained via EleutherAI–inspired GPT-NeoX scripts across 450K updates, consuming ~2 trillion tokens. Training included three-stage sampling: uniform across languages, natural distribution to boost high-data-volume languages, and a final uniform sweep for balance. Hyperparameters: 60 layers, embedding size 6144, 48 attention heads, 8192-token context window, SwiGLU activations, RoPE positional encoding, RMSNorm layer norms. Language Equity and Data Sovereignty Mainstream models lean heavily on English and other major languages, causing skewed performance when dealing with Baltic, Slavic, or other smaller European languages. This under-representation leads to poor grammar, awkward phrasing, and hallucinations. TildeOpen resolves this by embedding an “equitable tokenizer”, engineered to represent text similarly regardless of language—reducing token count and increasing inference efficiency for lesser-represented languages. Crucially, organizations can self-host—in local data centers or secure EU-compliant clouds—ensuring adherence to GDPR and other data-protection mandates. This addresses sovereignty concerns tied to US- or Asia-hosted models. Strategic Horizon: From Prototype to European AI Infrastructure TildeOpen is a foundational “base” model. It is expected for it’s upcoming versions more specialized (e.g., instruction-tuned translation models) built atop this core. It’s also a geo-flag planting moment: Latvia, via Tilde, positions itself as a tech exporter, with aspirations to scale European AI infrastructure while preserving linguistic diversity. For Research, the move mirrors broader research on multilingual model behavior—gaps still exist. Evaluations show even strong open LLMs can hallucinate or lag in lexical accuracy for Baltic languages, reinforcing the need for localized development. Summary TildeOpen LLM reframes EU AI—not just as regulatory compliance, but as technical stewardship. It’s a grounded, high-capacity model with transparent architecture, scalable deployment, and a fierce commitment to linguistic equity. It doesn’t indulge hype; it delivers substance. FAQs Q1: What is TildeOpen LLM?TildeOpen is a 30B-parameter multilingual large language model trained on EU supercomputers, optimized for European languages, especially under-represented ones. Q2: How is it different from mainstream LLMs?Unlike global models that prioritize English, TildeOpen uses an equitable tokenizer and balanced training to ensure fair representation and accuracy across smaller European languages. Q3: Can organizations self-host the model?Yes. TildeOpen is open-source under CC-BY-4.0 and can be deployed in local data centers or EU-compliant clouds to meet GDPR and data sovereignty requirements. Q4: What are the main use cases?Government services, translation, education, AI assistants, speech technologies, and multilingual customer support—any domain requiring accurate European language processing. Check out the Model on Hugging Face and Technical details here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages appeared first on MarkTechPost.

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages Read Post »

AI, Committee, ข่าว, Uncategorized

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem

admin NU / กันยายน 7, 2025

Large language models (LLMs) very often generate “hallucinations”—confident yet incorrect outputs that appear plausible. Despite improvements in training methods and architectures, hallucinations persist. A new research from OpenAI provides a rigorous explanation: hallucinations stem from statistical properties of supervised versus self-supervised learning, and their persistence is reinforced by misaligned evaluation benchmarks. What Makes Hallucinations Statistically Inevitable? The research team explains hallucinations as errors inherent to generative modeling. Even with perfectly clean training data, the cross-entropy objective used in pretraining introduces statistical pressures that produce errors. The research team reduce the problem to a supervised binary classification task called Is-It-Valid (IIV): determining whether a model’s output is valid or erroneous. They prove that the generative error rate of an LLM is at least twice its IIV misclassification rate. In other words, hallucinations occur for the same reasons misclassifications appear in supervised learning: epistemic uncertainty, poor models, distribution shift, or noisy data. Why Do Rare Facts Trigger More Hallucinations? One major driver is the singleton rate—the fraction of facts that appear only once in training data. By analogy to Good–Turing missing-mass estimation, if 20% of facts are singletons, at least 20% of them will be hallucinated. This explains why LLMs answer reliably about widely repeated facts (e.g., Einstein’s birthday) but fail on obscure or rarely mentioned ones. Can Poor Model Families Lead to Hallucinations? Yes. Hallucinations also emerge when the model class cannot adequately represent a pattern. Classic examples include n-gram models generating ungrammatical sentences, or modern tokenized models miscounting letters because characters are hidden inside subword tokens. These representational limits cause systematic errors even when the data itself is sufficient. Why Doesn’t Post-Training Eliminate Hallucinations? Post-training methods such as RLHF (reinforcement learning from human feedback), DPO, and RLAIF reduce some errors, especially harmful or conspiratorial outputs. But overconfident hallucinations remain because evaluation incentives are misaligned. Like students guessing on multiple-choice exams, LLMs are rewarded for bluffing when unsure. Most benchmarks—such as MMLU, GPQA, and SWE-bench—apply binary scoring: correct answers get credit, abstentions (“I don’t know”) get none, and incorrect answers are penalized no more harshly than abstentions. Under this scheme, guessing maximizes benchmark scores, even if it fosters hallucinations. How Do Leaderboards Reinforce Hallucinations? A review of popular benchmarks shows that nearly all use binary grading with no partial credit for uncertainty. As a result, models that truthfully express uncertainty perform worse than those that always guess. This creates systemic pressure for developers to optimize models for confident answers rather than calibrated ones. What Changes Could Reduce Hallucinations? The research team argue that fixing hallucinations requires socio-technical change, not just new evaluation suites. They propose explicit confidence targets: benchmarks should clearly specify penalties for wrong answers and partial credit for abstentions. For example: “Answer only if you are >75% confident. Mistakes lose 2 points; correct answers earn 1; ‘I don’t know’ earns 0.” This design mirrors real-world exams like earlier SAT and GRE formats, where guessing carried penalties. It encourages behavioral calibration—models abstain when their confidence is below the threshold, producing fewer overconfident hallucinations while still optimizing for benchmark performance. What Are the Broader Implications? This work reframes hallucinations as predictable outcomes of training objectives and evaluation misalignment rather than inexplicable quirks. The findings highlight: Pretraining inevitability: Hallucinations parallel misclassification errors in supervised learning. Post-training reinforcement: Binary grading schemes incentivize guessing. Evaluation reform: Adjusting mainstream benchmarks to reward uncertainty can realign incentives and improve trustworthiness. By connecting hallucinations to established learning theory, the research demystifies their origin and suggests practical mitigation strategies that shift responsibility from model architectures to evaluation design. Check out the PAPER and Technical details here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. The post From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem appeared first on MarkTechPost.

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem Read Post »

AI, Committee, ข่าว, Uncategorized

Putin says organ transplants could grant immortality. Not quite.

admin NU / กันยายน 6, 2025

This week I’m writing from Manchester, where I’ve been attending a conference on aging. Wednesday was full of talks and presentations by scientists who are trying to understand the nitty-gritty of aging—all the way down to the molecular level. Once we can understand the complex biology of aging, we should be able to slow or prevent the onset of age-related diseases, they hope. Then my editor forwarded me a video of the leaders of Russia and China talking about immortality. “These days at 70 years old you are still a child,” China’s Xi Jinping, 72, was translated as saying, according to footage livestreamed by CCTV to multiple media outlets. “With the developments of biotechnology, human organs can be continuously transplanted, and people can live younger and younger, and even achieve immortality,” Russia’s Vladimir Putin, also 72, is reported to have replied. SERGEI BOBYLEV, SPUTNIK, KREMLIN POOL PHOTO VIA AP There’s a striking contrast between that radical vision and the incremental longevity science presented at the meeting. Repeated rounds of organ transplantation surgery aren’t likely to help anyone radically extend their lifespan anytime soon. First, back to Putin’s proposal: the idea of continually replacing aged organs to stay young. It’s a simplistic way to think about aging. After all, aging is so complicated that researchers can’t agree on what causes it, why it occurs, or even how to define it, let alone “treat” it. Having said that, there may be some merit to the idea of repairing worn-out body parts with biological or synthetic replacements. Replacement therapies—including bioengineered organs—are being developed by multiple research teams. Some have already been tested in people. This week, let’s take a look at the idea of replacement therapies. No one fully understands why our organs start to fail with age. On the face of it, replacing them seems like a good idea. After all, we already know how to do organ transplants. They’ve been a part of medicine since the 1950s and have been used to save hundreds of thousands of lives in the US alone. And replacing old organs with young ones might have more broadly beneficial effects. When a young mouse is stitched to an old one, the older mouse benefits from the arrangement, and its health seems to improve. The problem is that we don’t really know why. We don’t know what it is about young body tissues that makes them health-promoting. We don’t know how long these effects might last in a person. We don’t know how different organ transplants will compare, either. Might a young heart be more beneficial than a young liver? No one knows. And that’s before you consider the practicalities of organ transplantation. There is already a shortage of donor organs—thousands of people die on waiting lists. Transplantation requires major surgery and, typically, a lifetime of prescription drugs that damp down the immune system, leaving a person more susceptible to certain infections and diseases. So the idea of repeated organ transplantations shouldn’t really be a particularly appealing one. “I don’t think that’s going to happen anytime soon,” says Jesse Poganik, who studies aging at Brigham and Women’s Hospital in Boston and is also in Manchester for the meeting. Poganik has been collaborating with transplant surgeons in his own research. “The surgeries are good, but they’re not simple,” he tells me. And they come with real risks. His own 24-year-old cousin developed a form of cancer after a liver and heart transplant. She died a few weeks ago, he says. So when it comes to replacing worn-out organs, scientists are looking for both biological and synthetic alternatives. We’ve been replacing body parts for centuries. Wooden toes were used as far back as the 15th century. Joint replacements have been around for more than a hundred years. And major innovations over the last 70 years have given us devices like pacemakers, hearing aids, brain implants, and artificial hearts. Scientists are exploring other ways to make tissues and organs, too. There are different approaches here, but they include everything from injecting stem cells to seeding “scaffolds” with cells in a lab. In 1999, researchers used volunteers’ own cells to seed bladder-shaped collagen scaffolds. The resulting bioengineered bladders went on to be transplanted into seven people in an initial trial. Now scientists are working on more complicated organs. Jean Hébert, a program manager at the US government’s Advanced Research Projects Agency for Health, has been exploring ways to gradually replace the cells in a person’s brain. The idea is that, eventually, the recipient will end up with a young brain. Hébert showed my colleague Antonio Regalado how, in his early experiments, he removed parts of mice’s brains and replaced them with embryonic stem cells. That work seems a world away from the biochemical studies being presented at the British Society for Research on Ageing annual meeting in Manchester, where I am now. On Wednesday, one scientist described how he’d been testing potential longevity drugs on the tiny nematode worm C. elegans. These worms live for only about 15 to 40 days, and his team can perform tens of thousands of experiments with them. About 40% of the drugs that extend lifespan in C. elegans also help mice live longer, he told us. To me, that’s not an amazing hit rate. And we don’t know how many of those drugs will work in people. Probably less than 40% of that 40%. Other scientists presented work on chemical reactions happening at the cellular level. It was deep, basic science, and my takeaway was that there’s a lot aging researchers still don’t fully understand. It will take years—if not decades—to get the full picture of aging at the molecular level. And if we rely on a series of experiments in worms, and then mice, and then humans, we’re unlikely to make progress for a really long time. In that context, the idea of replacement therapy feels like a shortcut. “Replacement is a really exciting avenue because you don’t have to understand the biology of aging as much,” says Sierra Lore,

Putin says organ transplants could grant immortality. Not quite. Read Post »

AI, Committee, ข่าว, Uncategorized

A Gentle Introduction to Batch Normalization

admin NU / กันยายน 6, 2025

Deep neural networks have drastically evolved over the years, overcoming common challenges that arise when training these complex models.

A Gentle Introduction to Batch Normalization Read Post »

AI, Committee, ข่าว, Uncategorized

How to Build a Complete End-to-End NLP Pipeline with Gensim: Topic Modeling, Word Embeddings, Semantic Search, and Advanced Text Analysis

admin NU / กันยายน 6, 2025

In this tutorial, we present a complete end-to-end Natural Language Processing (NLP) pipeline built with Gensim and supporting libraries, designed to run seamlessly in Google Colab. It integrates multiple core techniques in modern NLP, including preprocessing, topic modeling with Latent Dirichlet Allocation (LDA), word embeddings with Word2Vec, TF-IDF-based similarity analysis, and semantic search. The pipeline not only demonstrates how to train and evaluate these models but also showcases practical visualizations, advanced topic analysis, and document classification workflows. By combining statistical methods with machine learning approaches, the tutorial provides a comprehensive framework for understanding and experimenting with text data at scale. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser !pip install –upgrade scipy==1.11.4 !pip install gensim==4.3.2 nltk wordcloud matplotlib seaborn pandas numpy scikit-learn !pip install –upgrade setuptools print(“Please restart runtime after installation!”) print(“Go to Runtime > Restart runtime, then run the next cell”) import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from wordcloud import WordCloud import warnings warnings.filterwarnings(‘ignore’) from gensim import corpora, models, similarities from gensim.models import Word2Vec, LdaModel, TfidfModel, CoherenceModel from gensim.parsing.preprocessing import preprocess_string, strip_tags, strip_punctuation, strip_multiple_whitespaces, strip_numeric, remove_stopwords, strip_short import nltk nltk.download(‘punkt’, quiet=True) nltk.download(‘stopwords’, quiet=True) from nltk.corpus import stopwords from nltk.tokenize import word_tokenize We install and upgrade the necessary libraries, such as SciPy, Gensim, NLTK, and visualization tools, to ensure compatibility. We then import all required modules for preprocessing, modeling, and analysis. We also download NLTK resources to tokenize and handle stopwords efficiently, thereby setting up the environment for our NLP pipeline. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser class AdvancedGensimPipeline: def __init__(self): self.dictionary = None self.corpus = None self.lda_model = None self.word2vec_model = None self.tfidf_model = None self.similarity_index = None self.processed_docs = None def create_sample_corpus(self): “””Create a diverse sample corpus for demonstration””” documents = [ “Data science combines statistics, programming, and domain expertise to extract insights”, “Big data analytics helps organizations make data-driven decisions at scale”, “Cloud computing provides scalable infrastructure for modern applications and services”, “Cybersecurity protects digital systems from threats and unauthorized access attempts”, “Software engineering practices ensure reliable and maintainable code development”, “Database management systems store and organize large amounts of structured information”, “Python programming language is widely used for data analysis and machine learning”, “Statistical modeling helps identify patterns and relationships in complex datasets”, “Cross-validation techniques ensure robust model performance evaluation and selection”, “Recommendation systems suggest relevant items based on user preferences and behavior”, “Text mining extracts valuable insights from unstructured textual data sources”, “Image classification assigns predefined categories to visual content automatically”, “Reinforcement learning trains agents through interaction with dynamic environments” ] return documents def preprocess_documents(self, documents): “””Advanced document preprocessing using Gensim filters””” print(“Preprocessing documents…”) CUSTOM_FILTERS = [ strip_tags, strip_punctuation, strip_multiple_whitespaces, strip_numeric, remove_stopwords, strip_short, lambda x: x.lower() ] processed_docs = [] for doc in documents: processed = preprocess_string(doc, CUSTOM_FILTERS) stop_words = set(stopwords.words(‘english’)) processed = [word for word in processed if word not in stop_words and len(word) > 2] processed_docs.append(processed) self.processed_docs = processed_docs print(f”Processed {len(processed_docs)} documents”) return processed_docs def create_dictionary_and_corpus(self): “””Create Gensim dictionary and corpus””” print(“Creating dictionary and corpus…”) self.dictionary = corpora.Dictionary(self.processed_docs) self.dictionary.filter_extremes(no_below=2, no_above=0.8) self.corpus = [self.dictionary.doc2bow(doc) for doc in self.processed_docs] print(f”Dictionary size: {len(self.dictionary)}”) print(f”Corpus size: {len(self.corpus)}”) def train_word2vec_model(self): “””Train Word2Vec model for word embeddings””” print(“Training Word2Vec model…”) self.word2vec_model = Word2Vec( sentences=self.processed_docs, vector_size=100, window=5, min_count=2, workers=4, epochs=50 ) print(“Word2Vec model trained successfully”) def analyze_word_similarities(self): “””Analyze word similarities using Word2Vec””” print(“n=== Word2Vec Similarity Analysis ===”) test_words = [‘machine’, ‘data’, ‘learning’, ‘computer’] for word in test_words: if word in self.word2vec_model.wv: similar_words = self.word2vec_model.wv.most_similar(word, topn=3) print(f”Words similar to ‘{word}’: {similar_words}”) try: if all(w in self.word2vec_model.wv for w in [‘machine’, ‘computer’, ‘data’]): analogy = self.word2vec_model.wv.most_similar( positive=[‘computer’, ‘data’], negative=[‘machine’], topn=1 ) print(f”Analogy result: {analogy}”) except: print(“Not enough vocabulary for complex analogies”) def train_lda_model(self, num_topics=5): “””Train LDA topic model””” print(f”Training LDA model with {num_topics} topics…”) self.lda_model = LdaModel( corpus=self.corpus, id2word=self.dictionary, num_topics=num_topics, random_state=42, passes=10, alpha=’auto’, per_word_topics=True, eval_every=None ) print(“LDA model trained successfully”) def evaluate_topic_coherence(self): “””Evaluate topic model coherence””” print(“Evaluating topic coherence…”) coherence_model = CoherenceModel( model=self.lda_model, texts=self.processed_docs, dictionary=self.dictionary, coherence=’c_v’ ) coherence_score = coherence_model.get_coherence() print(f”Topic Coherence Score: {coherence_score:.4f}”) return coherence_score def display_topics(self): “””Display discovered topics””” print(“n=== Discovered Topics ===”) topics = self.lda_model.print_topics(num_words=8) for idx, topic in enumerate(topics): print(f”Topic {idx}: {topic[1]}”) def create_tfidf_model(self): “””Create TF-IDF model for document similarity””” print(“Creating TF-IDF model…”) self.tfidf_model = TfidfModel(self.corpus) corpus_tfidf = self.tfidf_model[self.corpus] self.similarity_index = similarities.MatrixSimilarity(corpus_tfidf) print(“TF-IDF model and similarity index created”) def find_similar_documents(self, query_doc_idx=0): “””Find documents similar to a query document””” print(f”n=== Document Similarity Analysis ===”) query_doc_tfidf = self.tfidf_model[self.corpus[query_doc_idx]] similarities_scores = self.similarity_index[query_doc_tfidf] sorted_similarities = sorted(enumerate(similarities_scores), key=lambda x: x[1], reverse=True) print(f”Documents most similar to document {query_doc_idx}:”) for doc_idx, similarity in sorted_similarities[:5]: print(f”Doc {doc_idx}: {similarity:.4f}”) def visualize_topics(self): “””Create visualizations for topic analysis””” print(“Creating topic visualizations…”) doc_topic_matrix = [] for doc_bow in self.corpus: doc_topics = dict(self.lda_model.get_document_topics(doc_bow, minimum_probability=0)) topic_vec = [doc_topics.get(i, 0) for i in range(self.lda_model.num_topics)] doc_topic_matrix.append(topic_vec) doc_topic_df = pd.DataFrame(doc_topic_matrix, columns=[f’Topic_{i}’ for i in range(self.lda_model.num_topics)]) plt.figure(figsize=(12, 8)) sns.heatmap(doc_topic_df.T, annot=True, cmap=’Blues’, fmt=’.2f’) plt.title(‘Document-Topic Distribution Heatmap’) plt.xlabel(‘Documents’) plt.ylabel(‘Topics’) plt.tight_layout() plt.show() fig, axes = plt.subplots(2, 3, figsize=(15, 10)) axes = axes.flatten() for topic_id in range(min(6, self.lda_model.num_topics)): topic_words = dict(self.lda_model.show_topic(topic_id, topn=20)) wordcloud = WordCloud( width=300, height=200, background_color=’white’, colormap=’viridis’ ).generate_from_frequencies(topic_words) axes[topic_id].imshow(wordcloud, interpolation=’bilinear’) axes[topic_id].set_title(f’Topic {topic_id}’) axes[topic_id].axis(‘off’) for i in range(self.lda_model.num_topics, 6): axes[i].axis(‘off’) plt.tight_layout() plt.show() def advanced_topic_analysis(self): “””Perform advanced topic analysis””” print(“n=== Advanced Topic Analysis ===”) topic_distributions = [] for i, doc_bow in enumerate(self.corpus): doc_topics = self.lda_model.get_document_topics(doc_bow) dominant_topic = max(doc_topics, key=lambda x: x[1]) if doc_topics else (0, 0) topic_distributions.append({ ‘doc_id’: i, ‘dominant_topic’: dominant_topic[0], ‘topic_probability’: dominant_topic[1] }) topic_df = pd.DataFrame(topic_distributions) plt.figure(figsize=(10, 6)) topic_counts = topic_df[‘dominant_topic’].value_counts().sort_index() plt.bar(range(len(topic_counts)), topic_counts.values) plt.xlabel(‘Topic ID’) plt.ylabel(‘Number of Documents’) plt.title(‘Distribution of Dominant Topics Across Documents’) plt.xticks(range(len(topic_counts)), [f’Topic {i}’ for i in topic_counts.index]) plt.show() return topic_df def document_classification_demo(self, new_document): “””Classify a new document using trained models””” print(f”n=== Document Classification Demo ===”) print(f”Classifying: ‘{new_document[:50]}…'”) processed_new = preprocess_string(new_document, [ strip_tags, strip_punctuation, strip_multiple_whitespaces, strip_numeric, remove_stopwords, strip_short, lambda x: x.lower() ]) new_doc_bow = self.dictionary.doc2bow(processed_new) doc_topics = self.lda_model.get_document_topics(new_doc_bow) print(“Topic probabilities:”) for topic_id, prob in doc_topics: print(f” Topic {topic_id}: {prob:.4f}”) new_doc_tfidf = self.tfidf_model[new_doc_bow] similarities_scores = self.similarity_index[new_doc_tfidf] most_similar = np.argmax(similarities_scores) print(f”Most similar document: {most_similar} (similarity: {similarities_scores[most_similar]:.4f})”) return doc_topics, most_similar def run_complete_pipeline(self): “””Execute the complete NLP pipeline””” print(“=== Advanced Gensim NLP Pipeline

How to Build a Complete End-to-End NLP Pipeline with Gensim: Topic Modeling, Word Embeddings, Semantic Search, and Advanced Text Analysis Read Post »

AI, Committee, ข่าว, Uncategorized

The Download: longevity myths, and sewer-cleaning robots

admin NU / กันยายน 6, 2025

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Putin says organ transplants could grant immortality. Not quite. —Jessica Hamzelou Earlier this week, my editor forwarded me a video of the leaders of Russia and China talking about immortality. “These days at 70 years old you are still a child,” China’s Xi Jinping, 72, was translated as saying. “With the developments of biotechnology, human organs can be continuously transplanted, and people can live younger and younger, and even achieve immortality,” Russia’s Vladimir Putin, also 72, is reported to have replied. In reality, rounds of organ transplantation surgery aren’t likely to help anyone radically extend their lifespan anytime soon. And it’s a simplistic way to think about aging—a process so complicated that researchers can’t agree on what causes it, why it occurs, or even how to define it, let alone “treat” it. Read the full story. This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. India is using robots to clean sewer pipes so humans no longer have to When Jitender was a child in New Delhi, both his parents worked as manual scavengers—a job that involved clearing the city’s sewers by hand. Now, he is among almost 200 contractors involved in the Delhi government’s effort to shift from this manual process to safer mechanical methods. Although it has been outlawed since 1993, manual scavenging—the practice of extracting human excreta from toilets, sewers, or septic tanks—is still practiced widely in India. And not only is the job undignified, but it can be extremely dangerous. Now, several companies have emerged to offer alternatives at a wide range of technical complexity. Read the full story. —Hamaad Habibullah This story is from our new print edition, which is all about the future of security. Subscribe here to catch future copies when they land. The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 RFK Jr buried a major study linking alcohol and cancerClearly, the alcohol industry’s intense lobbying of the Trump administration is working. (Vox)+ RFK Jr repeated health untruths during a marathon Senate hearing yesterday. (Mother Jones)+ His anti-vaccine stance alarmed Democrats and Republicans alike. (The Atlantic $) 2 US tech giants want to embed AI in educationThey’re backing a vaguely worded initiative to that effect launched by Melania Trump. (Rolling Stone $)+ Tech leaders took it in turns to praise Trump during dinner. (WSJ $)+ Elon Musk was nowhere to be seen. (The Guardian)+ AI’s giants want to take over the classroom. (MIT Technology Review) 3 The FTC will probe AI companies over their impact on children In a bid to evaluate whether chatbots are harming their mental health. (WSJ $)+ An AI companion site is hosting sexually charged conversations with underage celebrity bots. (MIT Technology Review) 4 Podcasting giant Joe Rogan has been spreading climate misinformationHe’s grossly misinterpreted scientists’ research—and they’re exasperated. (The Guardian)+ Rogan claims the Earth’s temperature is plummeting. It isn’t. (Forbes)+ Why climate researchers are taking the temperature of mountain snow. (MIT Technology Review) 5 DeepSeek is working on its own advanced AI agentWatch out, OpenAI. (Bloomberg $) 6 OpenAI will start making its own AI chips next yearIn a bid to lessen its reliance on Nvidia. (FT $) 7 Warner Bros is suing MidjourneyThe AI startup used the likenesses of characters including Superman without permission, it alleges. (Bloomberg $)+ What comes next for AI copyright lawsuits? (MIT Technology Review) 8 Rivers and lakes are being used to cool down buildingsBut networks in Paris, Toronto, the US are facing a looming problem. (Wired $)+ The future of urban housing is energy-efficient refrigerators. (MIT Technology Review) 9 How high school reunions survive in the age of social mediaCuriosity is a powerful driving force, it seems. (The Atlantic $) 10 Facebook’s poke feature is back If I still used Facebook, I’d be thrilled. (TechCrunch) Quote of the day “Even if it doesn’t turn you into the alien if you eat this stuff, I guarantee you’ll grow an extra ear.” —Senator John Kennedy, a Republican from Louisiana, warns of dire consequences if Americans eat shrimp from countries other than the US, Gizmodo reports. One more thing Why one developer won’t quit fighting to connect the US’s gridsMichael Skelly hasn’t learned to take no for an answer. For much of the last 15 years, the energy entrepreneur has worked to develop long-haul transmission lines to carry wind power across the Great Plains, Midwest, and Southwest. But so far, he has little to show for the effort. Skelly has long argued that building such lines and linking together the nation’s grids would accelerate the shift from coal- and natural-gas-fueled power plants to the renewables needed to cut the pollution driving climate change. But his previous business shut down in 2019, after halting two of its projects and selling off interests in three more. Skelly contends he was early, not wrong. And he has a point: market and policymakers are increasingly coming around to his perspective. Read the full story. —James Temple We can still have nice things A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.) + The Paper, the new mockumentary from the makers of the American Office, looks interesting.+ Giorgio Armani was a true maestro of menswear.+ The phases of the moon are pretty fascinating + The Damien Hirst-directed video for Blur’s classic Country House has been given a 4K makeover.

The Download: longevity myths, and sewer-cleaning robots Read Post »

AI, Committee, ข่าว, Uncategorized

Google AI Introduces Personal Health Agent (PHA): A Multi-Agent Framework that Enables Personalized Interactions to Address Individual Health Needs

admin NU / กันยายน 6, 2025

Table of contents What is a Personal Health Agent? How does the PHA framework operate? How was the PHA evaluated? Evaluation of the Data Science Agent Evaluation of the Domain Expert Agent Evaluation of the Health Coach Agent Evaluation of the Integrated PHA System How does the PHA contribute to health AI? What is the larger significance of Google’s PHA blueprint? Conclusion https://arxiv.org/abs/2508.20148v1 What is a Personal Health Agent? Large language models (LLMs) have demonstrated strong performance across various domains like clinical reasoning, decision support, and consumer health applications. However, most existing platforms are designed as single-purpose tools, such as symptom checkers, digital coaches, or health information assistants. These approaches often fail to address the complexity of real-world health needs, where individuals require integrated reasoning over wearable streams, personal health records, and laboratory test results. A team of researchers from Google has proposed a Personal Health Agent (PHA) framework. The PHA is designed as a multi-agent system that unifies complementary roles: data analysis, medical knowledge reasoning, and health coaching. Instead of returning isolated outputs from a single model, the PHA employs a central orchestrator to coordinate specialized sub-agents, iteratively synthesize their outputs, and deliver coherent, personalized guidance. https://arxiv.org/abs/2508.20148v1 How does the PHA framework operate? The Personal Health Agent (PHA) is built on top of the Gemini 2.0 model family. It follows a modular architecture consisting of three sub-agents and one orchestrator: Data Science Agent (DS)The DS agent interprets and analyzes time-series data from wearables (e.g., step counts, heart rate variability, sleep metrics) and structured health records. It is capable of decomposing open-ended user questions into formal analysis plans, executing statistical reasoning, and comparing results against population-level reference data. For example, it can quantify whether physical activity in the past month is associated with improvements in sleep quality. Domain Expert Agent (DE)The DE agent provides medically contextualized information. It integrates personal health records, demographic information, and wearable signals to generate explanations grounded in medical knowledge. Unlike general-purpose LLMs that may produce plausible but unreliable outputs, the DE agent follows an iterative reasoning-investigation-examination loop, combining authoritative medical resources with personal data. This allows it to provide evidence-based interpretations, such as whether a specific blood pressure measurement is within a safe range for an individual with a particular condition. Health Coach Agent (HC)The HC agent addresses behavioral change and long-term goal setting. Drawing from established coaching strategies such as motivational interviewing, it conducts multi-turn conversations, identifies user goals, clarifies constraints, and generates structured, personalized plans. For example, it may guide a user through setting a weekly exercise schedule, adapting to individual barriers, and incorporating feedback from progress tracking. OrchestratorThe orchestrator coordinates these three agents. When a query is received, it assigns a primary agent responsible for generating the main output and supporting agents to provide contextual data or domain knowledge. After collecting the results, the orchestrator runs an iterative reflection loop, checking outputs for coherence and accuracy before synthesizing them into a single response. This ensures that the final output is not merely an aggregation of agent responses but an integrated recommendation. How was the PHA evaluated? The research team conducted one of the most comprehensive evaluations of a health AI system to date. Their evaluation framework involved 10 benchmark tasks, 7,000+ human annotations, and 1,100 hours of assessment from health experts and end-users. Evaluation of the Data Science Agent The DS agent was assessed on its ability to generate structured analysis plans and produce correct, executable code. Compared to baseline Gemini models, it demonstrated: A significant increase in analysis plan quality, improving mean expert-rated scores from 53.7% to 75.6%. A reduction in critical data handling errors from 25.4% to 11.0%. An improvement in code pass rates from 58.4% to 75.5% on first attempts, with further gains under iterative self-correction. https://arxiv.org/abs/2508.20148v1 https://arxiv.org/abs/2508.20148v1 https://arxiv.org/abs/2508.20148v1 Evaluation of the Domain Expert Agent The DE agent was benchmarked across four capabilities: factual accuracy, diagnostic reasoning, contextual personalization, and multimodal data synthesis. Results include: Factual knowledge: On over 2,000 board-style exam questions across endocrinology, cardiology, sleep medicine, and fitness, the DE agent achieved 83.6% accuracy, outperforming baseline Gemini (81.8%). Diagnostic reasoning: On 2,000 self-reported symptom cases, it achieved 46.1% top-1 diagnostic accuracy compared to 41.4% for a state-of-the-art Gemini baseline. Personalization: In user studies, 72% of participants preferred DE agent responses to baseline outputs, citing higher trustworthiness and contextual relevance. Multimodal synthesis: In expert clinician reviews of health summaries generated from wearable, lab, and survey data, the DE agent’s outputs were rated more clinically significant, comprehensive, and trustworthy than baseline outputs. Evaluation of the Health Coach Agent The HC agent was designed and assessed through expert interviews and user studies. Experts emphasized the need for six coaching capabilities: goal identification, active listening, context clarification, empowerment, SMART (Specific, Measurable, Attainable, Relevant, Time-bound) recommendations, and iterative feedback incorporation. In evaluations, the HC agent demonstrated improved conversation flow and user engagement compared to baseline models. It avoided premature recommendations and instead balanced information gathering with actionable advice, producing outputs more consistent with expert coaching practices. Evaluation of the Integrated PHA System At the system level, the orchestrator and three agents were tested together in open-ended, multimodal conversations reflecting realistic health scenarios. Both experts and end-users rated the integrated Personal Health Agent (PHA) significantly higher than baseline Gemini systems across measures of accuracy, coherence, personalization, and trustworthiness. How does the PHA contribute to health AI? The introduction of a multi-agent PHA addresses several limitations of existing health AI systems: Integration of heterogeneous data: Wearable signals, medical records, and lab test results are analyzed jointly rather than in isolation. Division of labor: Each sub-agent specializes in a domain where single monolithic models often underperform, e.g., numerical reasoning for DS, clinical grounding for DE, and behavioral engagement for HC. Iterative reflection: The orchestrator’s review cycle reduces inconsistencies that often arise when multiple outputs are simply concatenated. Systematic evaluation: Unlike most prior work, which relied on small-scale case studies, the Personal Health Agent (PHA) was validated with a large multimodal dataset (the WEAR-ME study)

Google AI Introduces Personal Health Agent (PHA): A Multi-Agent Framework that Enables Personalized Interactions to Address Individual Health Needs Read Post »

AI, Committee, ข่าว, Uncategorized

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

admin NU / กันยายน 5, 2025

arXiv:2502.18452v3 Announce Type: replace Abstract: During Human Robot Interactions in disaster relief scenarios, Large Language Models (LLMs) have the potential for substantial physical reasoning to assist in mission objectives. However, these reasoning capabilities are often found only in larger models, which are not currently reasonable to deploy on robotic systems due to size constraints. To meet our problem space requirements, we introduce a dataset and pipeline to create Field Reasoning and Instruction Decoding Agent (FRIDA) models. In our pipeline, domain experts and linguists combine their knowledge to make high-quality, few-shot prompts used to generate synthetic data for fine-tuning. We hand-curate datasets for this few-shot prompting and for evaluation to improve LLM reasoning on both general and disaster-specific objects. We concurrently run an ablation study to understand which kinds of synthetic data most affect performance. We fine-tune several small instruction-tuned models and find that ablated FRIDA models only trained on objects’ physical state and function data outperformed both the FRIDA models trained on all synthetic data and the base models in our evaluation. We demonstrate that the FRIDA pipeline is capable of instilling physical common sense with minimal data.

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response Read Post »

AI, Committee, ข่าว, Uncategorized

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

admin NU / กันยายน 5, 2025

arXiv:2509.04011v1 Announce Type: cross Abstract: We present NER Retriever, a zero-shot retrieval framework for ad-hoc Named Entity Retrieval, a variant of Named Entity Recognition (NER), where the types of interest are not provided in advance, and a user-defined type description is used to retrieve documents mentioning entities of that type. Instead of relying on fixed schemas or fine-tuned models, our method builds on internal representations of large language models (LLMs) to embed both entity mentions and user-provided open-ended type descriptions into a shared semantic space. We show that internal representations, specifically the value vectors from mid-layer transformer blocks, encode fine-grained type information more effectively than commonly used top-layer embeddings. To refine these representations, we train a lightweight contrastive projection network that aligns type-compatible entities while separating unrelated types. The resulting entity embeddings are compact, type-aware, and well-suited for nearest-neighbor search. Evaluated on three benchmarks, NER Retriever significantly outperforms both lexical and dense sentence-level retrieval baselines. Our findings provide empirical support for representation selection within LLMs and demonstrate a practical solution for scalable, schema-free entity retrieval. The NER Retriever Codebase is publicly available at https://github.com/ShacharOr100/ner_retriever

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings Read Post »

AI, Committee, ข่าว, Uncategorized

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

admin NU / กันยายน 5, 2025

arXiv:2509.03888v1 Announce Type: new Abstract: Large Language Models (LLMs) can comply with harmful instructions, raising serious safety concerns despite their impressive capabilities. Recent work has leveraged probing-based approaches to study the separability of malicious and benign inputs in LLMs’ internal representations, and researchers have proposed using such probing methods for safety detection. We systematically re-examine this paradigm. Motivated by poor out-of-distribution performance, we hypothesize that probes learn superficial patterns rather than semantic harmfulness. Through controlled experiments, we confirm this hypothesis and identify the specific patterns learned: instructional patterns and trigger words. Our investigation follows a systematic approach, progressing from demonstrating comparable performance of simple n-gram methods, to controlled experiments with semantically cleaned datasets, to detailed analysis of pattern dependencies. These results reveal a false sense of security around current probing-based approaches and highlight the need to redesign both models and evaluation protocols, for which we provide further discussions in the hope of suggesting responsible further research in this direction. We have open-sourced the project at https://github.com/WangCheng0116/Why-Probe-Fails.

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize Read Post »

ข่าว

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem

Putin says organ transplants could grant immortality. Not quite.

A Gentle Introduction to Batch Normalization

How to Build a Complete End-to-End NLP Pipeline with Gensim: Topic Modeling, Word Embeddings, Semantic Search, and Advanced Text Analysis

The Download: longevity myths, and sewer-cleaning robots

Google AI Introduces Personal Health Agent (PHA): A Multi-Agent Framework that Enables Personalized Interactions to Address Individual Health Needs

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

บริการของเรา

หน้าแรก

วิธีการทำงาน

ข่าว

แพ็กเกจราคา

ฝ่ายสนับสนุน

ศูนย์ช่วยเหลือ

รายงานปัญหา

ให้ความคิดเห็น

นโยบายความเป็นส่วนตัว

บัญชีผู้ใช้

ติดตามเรา