Uncategorized Archives - 3ページ目 (97ページ中)

Google AI Introduces FLAME Approach: A One-Step Active Learning that Selects the Most Informative Samples for Training and Makes a Model Specialization Super Fast

admin NU / 10月 24, 2025

Open vocabulary object detectors answer text queries with boxes. In remote sensing, zero shot performance drops because classes are fine grained and visual context is unusual. Google Research team proposess FLAME, a one step active learning strategy that rides on a strong open vocabulary detector and adds a tiny refiner that you can train in near real time on a CPU. The base model generates high recall proposals, the refiner filters false positives with a few targeted labels, and you avoid full model fine tuning. It reports state of the art accuracy on DOTA and DIOR with 30 shots, and minute scale adaptation per label on a CPU. https://arxiv.org/pdf/2510.17670v1 Problem framing Open vocabulary detectors such as OWL ViT v2 are trained on web scale image text pairs. They generalize well on natural images, yet they struggle when categories are subtle, for example chimney versus storage tank, or when the imaging geometry is different, for example nadir aerial tiles with rotated objects and small scales. Precision falls because the text embedding and the visual embedding overlap for look alike categories. A practical system needs the breadth of open vocabulary models, and the precision of a local specialist, without hours of GPU fine tuning or thousands of new labels. Method and design in concise FLAME is a cascaded pipeline. Step one, run a zero shot open vocabulary detector to produce many candidate boxes for a text query, for example “chimney.” Step two, represent each candidate with visual features and its similarity to the text. Step three, retrieve marginal samples that sit near the decision boundary by doing a low dimensional projection with PCA, then a density estimate, then select the uncertain band. Step four, cluster this band and pick one item per cluster for diversity. Step five, have a user label about 30 crops as positive or negative. Step six, optionally rebalance with SMOTE or SVM SMOTE if the labels are skewed. Step seven, train a small classifier, for example an RBF SVM or a two layer MLP, to accept or reject the original proposals. The base detector stays frozen, so you keep recall and generalization, and the refiner learns the exact semantics the user meant. https://arxiv.org/pdf/2510.17670v1 Datasets, base models, and setup Evaluation uses two standard remote sensing detection benchmarks. DOTA has oriented boxes over 15 categories in high resolution aerial images. DIOR has 23,463 images and 192,472 instances over 20 categories. The comparison includes a zero shot OWL ViT v2 baseline, a zero shot RS OWL ViT v2 that is fine tuned on RS WebLI, and several few shot baselines. RS OWL ViT v2 improves zero shot mean AP to 31.827 percent on DOTA and 29.387 percent on DIOR, which becomes the starting point for FLAME. https://arxiv.org/pdf/2510.17670v1 Understanding the Results On 30 shot adaptation, FLAME cascaded on RS OWL ViT v2 reaches 53.96 percent AP on DOTA and 53.21 percent AP on DIOR, which is the top accuracy among the listed methods. The comparison includes SIoU, a prototype based method with DINOv2, and a few shot method proposed by the research team. These numbers appear in Table 1. The research team also reports the per class breakdown in Table 2. On DIOR, the chimney class improves from 0.11 in zero shot to 0.94 after FLAME, which illustrates how the refiner removes look alike false positives from the open vocabulary proposals. https://arxiv.org/pdf/2510.17670v1 Key Takeaways FLAME is a one step active learning cascade over OWL ViT v2, it retrieves marginal samples using density estimation, enforces diversity with clustering, collects about 30 labels, and trains a lightweight refiner such as an RBF SVM or a small MLP, with no base model fine tuning. With 30 shots, FLAME on RS OWL ViT v2 reaches 53.96% AP on DOTA and 53.21% AP on DIOR, exceeding prior few shot baselines including SIoU and a prototype method with DINOv2. On DIOR, the chimney class improves from 0.11 in zero shot to 0.94 after FLAME, which shows strong filtering of look alike false positives. Adaptation runs in about 1 minute for each label on a standard CPU, which supports near real time, user in the loop specialization. Zero shot OWL ViT v2 starts at 13.774% AP on DOTA and 14.982% on DIOR, RS OWL ViT v2 raises zero shot AP to 31.827% and 29.387% respectively, and FLAME then delivers the large precision gains on top. Editorial Comments FLAME is a one step active learning cascade that layers a tiny refiner on top of OWL ViT v2, selecting marginal detections, collecting about 30 labels, and training a small classifier without touching the base model. On DOTA and DIOR, FLAME with RS OWL ViT v2 reports 53.96 percent AP and 53.21 percent AP, establishing a strong few shot baseline. On DIOR chimney, average precision rises from 0.11 to 0.94 after refinement, illustrating false positive suppression. Adaptation runs in about 1 minute per label on a CPU, enabling interactive specialization. OWLv2 and RS WebLI provide the foundation for zero shot proposals. Overall, FLAME demonstrates a practical path to open vocabulary detection specialization in remote sensing by pairing RS OWL ViT v2 proposals with a minute scale CPU refiner that lifts DOTA to 53.96 percent AP and DIOR to 53.21 percent AP. Check out the Paper here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Google AI Introduces FLAME Approach: A One-Step Active Learning that Selects the Most Informative Samples for Training and Makes a Model Specialization Super Fast appeared first on MarkTechPost.

Google AI Introduces FLAME Approach: A One-Step Active Learning that Selects the Most Informative Samples for Training and Makes a Model Specialization Super Fast 投稿を読む »

AI, Committee, ニュース, Uncategorized

OpenAI launches company knowledge in ChatGPT, letting you access your firm’s data from Google Drive, Slack, GitHub

admin NU / 10月 24, 2025

Is the Google Search for internal enterprise knowledge finally here…but from OpenAI? It certainly seems that way. Today, OpenAI has launched company knowledge in ChatGPT, a major new capability for subscribers to ChatGPT’s paid Business, Enterprise, and Edu plans that lets them call up their company’s data directly from third-party workplace apps including Slack, SharePoint, Google Drive, Gmail, GitHub, HubSpot and combine it in ChatGPT outputs to them. As OpenAI’s CEO of Applications Fidji Simo put it in a post on the social network X: “it brings all the context from your apps (Slack, Google Drive, GitHub, etc) together in ChatGPT so you can get answers that are specific to your business.” Intriguingly, OpenAI’s blog post on the feature states that is “powered by a version of GPT‑5 that’s trained to look across multiple sources to give more comprehensive and accurate answers,” which sounds to me like a new fine-tuned version of the model family the company released back in August, though there are no additional details on how it was trained. Nonetheless, company knowledge in ChatGPT is rolling out globally and is designed to make ChatGPT a central point of access for verified organizational information, supported by secure integrations and enterprise-grade compliance controls, and give employees way faster access to their company’s information while working. Now, instead of toggling over to Slack to find the assignment you were given and instructions, or tabbing over to Google Drive and opening up specific files to find the names and numbers you need to call, ChatGPT can deliver all that type of information directly into your chat session — if your company enables the proper connections. As OpenAI Chief Operating Officer Brad Lightcap wrote in a post on the social network X: “company knowledge has changed how i use chatgpt at work more than anything we have built so far – let us know what you think!” It builds upon the third-party app connectors unveiled back in August 2025, though those were only for individual users on the ChatGPT Plus plans. Connecting ChatGPT to Workplace Systems Enterprise teams often face the challenge of fragmented data across various internal tools—email, chat, file storage, project management, and customer platforms. Company knowledge bridges those silos by enabling ChatGPT to connect to approved systems like, and other supported apps through enterprise-managed connectors. Each response generated with company knowledge includes citations and direct links to the original sources, allowing teams to verify where specific details originated. This transparency helps organizations maintain data trustworthiness while increasing productivity. OpenAI confirms that company knowledge uses a version of GPT-5 optimized for multi-source reasoning and cross-system synthesis, providing detailed, contextually accurate results even across disparate sources. Built for Enterprise Control and Security Company knowledge was designed from the ground up for enterprise governance and compliance. It respects existing permissions within connected apps — ChatGPT can only access what a user is already authorized to view— and never trains on company data by default. Security features include industry-standard encryption, support for SSO and SCIM for account provisioning, and IP allowlisting to restrict access to approved corporate networks. Enterprise administrators can also define role-based access control (RBAC) policies and manage permissions at a group or department level. OpenAI’s Enterprise Compliance API provides a full audit trail, allowing administrators to review conversation logs for reporting and regulatory purposes. This capability helps enterprises meet internal governance standards and industry-specific requirements such as SOC 2 and ISO 27001 compliance. Admin Configuration and Connector Management For enterprise deployment, administrators must enable company knowledge and its connectors within the ChatGPT workspace. Once connectors are active, users can authenticate their own accounts for each work app they need to access. In Enterprise and Edu plans, connectors are off by default and require explicit admin approval before employees can use them. Admins can selectively enable connectors, manage access by role, and require SSO-based authentication for enhanced control. Business plan users, by contrast, have connectors enabled automatically if available in their workspace. Admins can still oversee which connectors are approved, ensuring alignment with internal IT and data policies. Company knowledge becomes available to any user with at least one active connector, and admins can configure group-level permissions for different teams — such as restricting GitHub access to engineering while enabling Google Drive or HubSpot for marketing and sales. How Company Knowledge Works in Practice Activating company knowledge is straightforward. Users can start a new or existing conversation in ChatGPT and select “Company knowledge” under the message composer or from the tools menu. After authenticating their connected apps, they can ask questions as usual—such as “Summarize this account’s latest feedback and risks” or “Compile a Q4 performance summary from project trackers.” ChatGPT searches across the connected tools, retrieves relevant context, and produces an answer with full citations and source links. The system can combine data across apps — for instance, blending Slack updates, Google Docs notes, and HubSpot CRM records — to create an integrated view of a project, client, or initiative. When company knowledge is not selected, ChatGPT may still use connectors in a limited capacity as part of the default experience, but responses will not include detailed citations or multi-source synthesis. Advanced Use Cases for Enterprise Teams For development and operations leaders, company knowledge can act as a centralized intelligence layer that surfaces real-time updates and dependencies across complex workflows. ChatGPT can, for example, summarize open GitHub pull requests, highlight unresolved Linear tickets, and cross-reference Slack engineering discussions—all in a single output. Technical teams can also use it for incident retrospectives or release planning by pulling relevant information from issue trackers, logs, and meeting notes. Procurement or finance leaders can use it to consolidate purchase requests or budget updates across shared drives and internal communications. Because the model can reference structured and unstructured data simultaneously, it supports wide-ranging scenarios—from compliance documentation reviews to cross-departmental performance summaries. Privacy, Data Residency, and Compliance Enterprise data protection is a central design element of company knowledge. ChatGPT processes data in line with OpenAI’s

OpenAI launches company knowledge in ChatGPT, letting you access your firm’s data from Google Drive, Slack, GitHub 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Machine Learning Practitioner’s Guide to Fine-Tuning Language Models

admin NU / 10月 24, 2025

Fine-tuning has become much more accessible in 2024–2025, with parameter-efficient methods letting even 70B+ parameter models run on consumer GPUs.

The Machine Learning Practitioner’s Guide to Fine-Tuning Language Models 投稿を読む »

AI, Committee, ニュース, Uncategorized

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA

admin NU / 10月 23, 2025

arXiv:2510.19172v1 Announce Type: new Abstract: LLMs often fail to handle temporal knowledge conflicts–contradictions arising when facts evolve over time within their training data. Existing studies evaluate this phenomenon through benchmarks built on structured knowledge bases like Wikidata, but they focus on widely-covered, easily-memorized popular entities and lack the dynamic structure needed to fairly evaluate LLMs with different knowledge cut-off dates. We introduce evolveQA, a benchmark specifically designed to evaluate LLMs on temporally evolving knowledge, constructed from 3 real-world, time-stamped corpora: AWS updates, Azure changes, and WHO disease outbreak reports. Our framework identifies naturally occurring knowledge evolution and generates questions with gold answers tailored to different LLM knowledge cut-off dates. Through extensive evaluation of 12 open and closed-source LLMs across 3 knowledge probing formats, we demonstrate significant performance drops of up to 31% on evolveQA compared to static knowledge questions.

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA 投稿を読む »

AI, Committee, ニュース, Uncategorized

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM

admin NU / 10月 23, 2025

arXiv:2505.24379v3 Announce Type: replace-cross Abstract: Large Language Models are typically trained on datasets collected from the web, which may inadvertently contain harmful or sensitive personal information. To address growing privacy concerns, unlearning methods have been proposed to remove the influence of specific data from trained models. Of these, exact unlearning — which retrains the model from scratch without the target data — is widely regarded the gold standard for mitigating privacy risks in deployment. In this paper, we revisit this assumption in a practical deployment setting where both the pre- and post-unlearning logits API are exposed, such as in open-weight scenarios. Targeting this setting, we introduce a novel data extraction attack that leverages signals from the pre-unlearning model to guide the post-unlearning model, uncovering patterns that reflect the removed data distribution. Combining model guidance with a token filtering strategy, our attack significantly improves extraction success rates — doubling performance in some cases — across common benchmarks such as MUSE, TOFU, and WMDP. Furthermore, we demonstrate our attack’s effectiveness on a simulated medical diagnosis dataset to highlight real-world privacy risks associated with exact unlearning. In light of our findings, which suggest that unlearning may, in a contradictory way, increase the risk of privacy leakage during real-world deployments, we advocate for evaluation of unlearning methods to consider broader threat models that account not only for post-unlearning models but also for adversarial access to prior checkpoints. Code is publicly available at: https://github.com/Nicholas0228/unlearned_data_extraction_llm.

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM 投稿を読む »

AI, Committee, ニュース, Uncategorized

Interpretable Question Answering with Knowledge Graphs

admin NU / 10月 23, 2025

arXiv:2510.19181v1 Announce Type: new Abstract: This paper presents a question answering system that operates exclusively on a knowledge graph retrieval without relying on retrieval augmented generation (RAG) with large language models (LLMs). Instead, a small paraphraser model is used to paraphrase the entity relationship edges retrieved from querying the knowledge graph. The proposed pipeline is divided into two main stages. The first stage involves pre-processing a document to generate sets of question-answer (QA) pairs. The second stage converts these QAs into a knowledge graph from which graph-based retrieval is performed using embeddings and fuzzy techniques. The graph is queried, re-ranked, and paraphrased to generate a final answer. This work includes an evaluation using LLM-as-a-judge on the CRAG benchmark, which resulted in accuracies of 71.9% and 54.4% using LLAMA-3.2 and GPT-3.5-Turbo, respectively.

Interpretable Question Answering with Knowledge Graphs 投稿を読む »

AI, Committee, ニュース, Uncategorized

The Coverage Principle: How Pre-Training Enables Post-Training

admin NU / 10月 23, 2025

arXiv:2510.15020v2 Announce Type: replace-cross Abstract: Language models demonstrate remarkable abilities when pre-trained on large text corpora and fine-tuned for specific tasks, but how and why pre-training shapes the success of the final model remains poorly understood. Notably, although pre-training success is often quantified by cross-entropy loss, cross-entropy can be a poor predictor of downstream performance. Instead, we provide a theoretical perspective on this relationship through the lens of emph{coverage}, which quantifies the probability mass the pre-trained model places on high-quality responses and which is necessary and sufficient for post-training and test-time scaling methods such as Best-of-N to succeed. Our main results develop an understanding of emph{the coverage principle}, a phenomenon whereby next-token prediction (more generally, maximum likelihood) implicitly optimizes toward a model with good coverage. In particular, we uncover a mechanism that explains the power of coverage in predicting downstream performance: emph{coverage generalizes faster than cross-entropy}, avoiding spurious dependence on problem-dependent parameters such as the sequence length. We also study practical algorithmic interventions with provable benefits for improving coverage, including (i) model/checkpoint selection procedures, (ii) gradient normalization schemes, and (iii) test-time decoding strategies.

The Coverage Principle: How Pre-Training Enables Post-Training 投稿を読む »

AI, Committee, ニュース, Uncategorized

Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model that Replaces Attention with Consensus and Trains with Discrete Diffusion

admin NU / 10月 23, 2025

Anthrogen has introduced Odyssey, a family of protein language models for sequence and structure generation, protein editing, and conditional design. The production models range from 1.2B to 102B parameters. The Anthrogen’s research team positions Odyssey as a frontier, multimodal model for real protein design workloads, and notes that an API is in early access. https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf What problem does Odyssey target? Protein design couples amino acid sequence with 3D structure and with functional context. Many prior models adopt self attention, which mixes information across the entire sequence at once. Proteins follow geometric constraints, so long range effects travel through local neighborhoods in 3D. Anthrogen frames this as a locality problem and proposes a new propagation rule, called Consensus, that better matches the domain. https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf Input representation and tokenization Odyssey is multimodal. It embeds sequence tokens, structure tokens, and lightweight functional cues, then fuses them into a shared representation. For structure, Odyssey uses a finite scalar quantizer, FSQ, to convert 3D geometry into compact tokens. Think of FSQ as an alphabet for shapes that lets the model read structure as easily as sequence. Functional cues can include domain tags, secondary structure hints, orthologous group labels, or short text descriptors. This joint view gives the model access to local sequence patterns and long range geometric relations in a single latent space. https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf Backbone change, Consensus instead of self attention Consensus replaces global self attention with iterative, locality aware updates on a sparse contact or sequence graph. Each layer encourages nearby neighborhoods to agree first, then spreads that agreement outward across the chain and contact graph. This change alters compute. Self attention scales as O(L²) with sequence length L. Anthrogen reports that Consensus scales as O(L), which keeps long sequences and multi domain constructs affordable. The company also reports improved robustness to learning rate choices at larger scales, which reduces brittle runs and restarts. https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf Training objective and generation, discrete diffusion Odyssey trains with discrete diffusion on sequence and structure tokens. The forward process applies masking noise that mimics mutation. The reverse time denoiser learns to reconstruct consistent sequence and coordinates that work together. At inference, the same reverse process supports conditional generation and editing. You can hold a scaffold, fix a motif, mask a loop, add a functional tag, and then let the model complete the rest while keeping sequence and structure in sync. Anthrogen reports matched comparisons where diffusion outperforms masked language modeling during evaluation. The page notes lower training perplexities for diffusion versus complex masking, and lower or comparable training perplexities versus simple masking. In validation, diffusion models outperform their masked counterparts, while a 1.2B masked model tends to overfit to its own masking schedule. The company argues that diffusion models the joint distribution of the full protein, which aligns with sequence plus structure co design. https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf Key takeaways Odyssey is a multimodal protein model family that fuses sequence, structure, and functional context, with production models at 1.2B, 8B, and 102B parameters. Consensus replaces self attention with locality aware propagation that scales as O(L) and shows robust learning rate behavior at larger scales. FSQ converts 3D coordinates into discrete structure tokens for joint sequence and structure modeling. Discrete diffusion trains a reverse time denoiser and, in matched comparisons, outperforms masked language modeling during evaluation. Anthrogen reports better performance with about 10x less data than competing models, which addresses data scarcity in protein modeling. Editorial Comments Odyssey is impressive model because it operationalizes joint sequence and structure modeling with FSQ, Consensus, and discrete diffusion, enabling conditional design and editing under practical constraints. Odyssey scales to 102B parameters with O(L) complexity for Consensus, which lowers cost for long proteins and improves learning-rate robustness. Anthrogen reports diffusion outperforming masked language modeling in matched evaluations, which aligns with co-design objectives. The system targets multi-objective design, including potency, specificity, stability, and manufacturability. The research team emphasizes data efficiency near 10x versus competing models, which is material in domains with scarce labeled data. Check out the Paper, and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model that Replaces Attention with Consensus and Trains with Discrete Diffusion appeared first on MarkTechPost.

Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model that Replaces Attention with Consensus and Trains with Discrete Diffusion 投稿を読む »

AI, Committee, ニュース, Uncategorized

Explaining Large Language Models with gSMILE

admin NU / 10月 22, 2025

arXiv:2505.21657v5 Announce Type: replace Abstract: Large Language Models (LLMs) such as GPT, LLaMA, and Claude achieve remarkable performance in text generation but remain opaque in their decision-making processes, limiting trust and accountability in high-stakes applications. We present gSMILE (generative SMILE), a model-agnostic, perturbation-based framework for token-level interpretability in LLMs. Extending the SMILE methodology, gSMILE uses controlled prompt perturbations, Wasserstein distance metrics, and weighted linear surrogates to identify input tokens with the most significant impact on the output. This process enables the generation of intuitive heatmaps that visually highlight influential tokens and reasoning paths. We evaluate gSMILE across leading LLMs (OpenAI’s gpt-3.5-turbo-instruct, Meta’s LLaMA 3.1 Instruct Turbo, and Anthropic’s Claude 2.1) using attribution fidelity, attribution consistency, attribution stability, attribution faithfulness, and attribution accuracy as metrics. Results show that gSMILE delivers reliable human-aligned attributions, with Claude 2.1 excelling in attention fidelity and GPT-3.5 achieving the highest output consistency. These findings demonstrate gSMILE’s ability to balance model performance and interpretability, enabling more transparent and trustworthy AI systems.

Explaining Large Language Models with gSMILE 投稿を読む »

AI, Committee, ニュース, Uncategorized

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

admin NU / 10月 22, 2025

arXiv:2510.18455v1 Announce Type: new Abstract: Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks 投稿を読む »

Uncategorized

Google AI Introduces FLAME Approach: A One-Step Active Learning that Selects the Most Informative Samples for Training and Makes a Model Specialization Super Fast

OpenAI launches company knowledge in ChatGPT, letting you access your firm’s data from Google Drive, Slack, GitHub

The Machine Learning Practitioner’s Guide to Fine-Tuning Language Models

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM

Interpretable Question Answering with Knowledge Graphs

The Coverage Principle: How Pre-Training Enables Post-Training

Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model that Replaces Attention with Consensus and Trains with Discrete Diffusion

Explaining Large Language Models with gSMILE

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

私たちのサービス

ホーム

仕組み

ニュース

料金

サポート

ヘルプセンター

問題を報告

フィードバックを送る

プライバシーポリシー

ユーザーアカウント

フォローする