YouZum

Uncategorized

AI, Committee, 新闻, Uncategorized

Codev lets enterprises avoid vibe coding hangovers with a team of agents that generate and document code

For many software developers using generative AI, vibe coding is a double-edged sword. The process delivers rapid prototypes but often leaves a trail of brittle, undocumented code that creates significant technical debt. A new open-source platform, Codev, addresses this by proposing a fundamental shift: treating the natural language conversation with an AI as part of the actual source code. Codev is based on SP(IDE)R, a framework designed to turn vibe-coding conversations into structured, versioned, and auditable assets that become part of the code repository. What is Codev? At its core, Codev is a methodology that treats natural language context as an integral part of the development lifecycle as opposed to a disposable artifact as is the case with vanilla vibe coding. According to co-founder Waleed Kadous, the goal is to invert the typical engineering workflow. “A key principle of Codev is that documents like the specification are the actual code of the system,” he told VentureBeat. “It’s almost like natural language is compiled down into Typescript by our agents.” This approach avoids the common pitfall where documentation is created after the fact, if at all. Its flagship protocol, SP(IDE)R, provides a lightweight but formal structure for building software. The process begins with Specify, where a human and multiple AI agents collaborate to turn a high-level request into concrete acceptance criteria. Next, in the Plan stage, an AI proposes a phased implementation, which is again reviewed. For each phase, the AI enters an IDE loop: it Implements the code, Defends it against bugs and regression with comprehensive tests, and Evaluates the result against the specification. The final step is Review, where the team documents lessons learned to update and improve the SP(IDE)R protocol itself for future projects. The framework’s key differentiator is its use of multiple agents and explicit human review at different stages. Kadous notes that each agent brings unique strengths to the review process. “Gemini is extremely good at catching security issues,” he said, citing a critical cross-site scripting (XSS) flaw and another bug that “would have shared an OpenAI API key with the client, which could cost thousands of dollars.” Meanwhile, “GPT-5 is very good at understanding how to simplify a design.” This structured review, with a human providing final approval at each stage, prevents the kind of runaway automation that leads to flawed code. The platform’s AI-native philosophy extends to its installation. There is no complex installer; instead, a user instructs their AI agent to apply the Codev GitHub repository to set up the project. The developers “dogfooded” their framework, using Codev to build Codev. “The key point here is that natural language is executable now, with the agent being the interpreter,” Kadous said. “This is great because it means it’s not a ‘blind’ integration of Codev, the agent gets to choose the best way to integrate it and can intelligently make decisions.” Codev case study To test the framework’s effectiveness, its creators ran a direct comparison between vanilla vibe-coding and Codev. They gave Claude Opus 4.1 a request to build a modern web-based todo manager. The first attempt used a conversational, vibe-coding approach. The result was a plausible-looking demo. However, an automated analysis conducted by three independent AI agents found that it had implemented 0% of the required functionality, contained no tests, and lacked a database or API. The second attempt used the same AI model and prompt but applied the SP(IDE)R protocol. This time, the AI produced a production-ready application with 32 source files, 100% of the specified functionality, five test suites, a SQLite database, and a complete RESTful API. Throughout this process, the human developers reported they never directly edited a single line of source code. While this was a single experiment, Kadous estimates the impact is substantial. “Subjectively, it feels like I’m about three times as productive with Codev as without,” he says. The quality also speaks for itself. “I used LLMs as a judge, and one of them described the output like what a well-oiled engineering team would produce. That was exactly what I was aiming for.” While the process is powerful, it redefines the developer’s role from a hands-on coder to a system architect and reviewer. According to Kadous, the initial spec and plan stages can each take between 45 minutes to two hours of focused collaboration. This is in contrast to the impression given by many vibe-coding platforms, where a single prompt and a few minutes of processing gives you a fully functional and scalable application. “All of the value I add is in the background knowledge I apply to the specs and plans,” he explains. He emphasizes that the framework is designed to augment, not replace, experienced talent. “The people who will do the best… are senior engineers and above because they know the pitfalls… It just takes the senior engineer you already have and makes them much more productive.” A future of human and AI collaboration Frameworks like Codev signal a shift where the primary creative act of software development moves from writing code to crafting precise, machine-readable specifications and plans. For enterprise teams, this means AI-generated code can become auditable, maintainable, and reliable. By capturing the entire development conversation in version control and enforcing it with CI, the process turns ephemeral chats into durable engineering assets. Codev proposes a future where the AI acts not as a chaotic assistant, but as a disciplined collaborator in a structured, human-led workflow. However, Kadous acknowledges this shift creates new challenges for the workforce. “Senior engineers that reject AI outright will be outpaced by senior engineers who embrace it,” he predicts. He also expresses concern for junior developers who may not get the chance “to build their architectural chops,” a skill that becomes even more critical when guiding AI. This highlights a central challenge for the industry: ensuring that as AI elevates top performers, it also creates pathways to develop the next generation of talent.

Codev lets enterprises avoid vibe coding hangovers with a team of agents that generate and document code Read Post »

AI, Committee, 新闻, Uncategorized

A Coding Implementation to Build a Unified Tool Orchestration Framework from Documentation to Automated Pipelines

In this tutorial, we build a compact, efficient framework that demonstrates how to convert tool documentation into standardized, callable interfaces, register those tools in a central system, and execute them as part of an automated pipeline. As we move through each stage, we create a simple converter, design mock bioinformatics tools, organize them into a registry, and benchmark both individual and multi-step pipeline executions. Through this process, we explore how structured tool interfaces and automation can streamline and modularize data workflows. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser import re, json, time, random from dataclasses import dataclass from typing import Callable, Dict, Any, List, Tuple @dataclass class ToolSpec: name: str description: str inputs: Dict[str, str] outputs: Dict[str, str] def parse_doc_to_spec(name: str, doc: str) -> ToolSpec: desc = doc.strip().splitlines()[0].strip() if doc.strip() else name arg_block = “n”.join([l for l in doc.splitlines() if “–” in l or “:” in l]) inputs = {} for line in arg_block.splitlines(): m = re.findall(r”(–?w[w-]*|bw+b)s*[:=]?s*(w+)?”, line) for key, typ in m: k = key.lstrip(“-“) if k and k not in inputs and k not in [“Returns”,”Output”,”Outputs”]: inputs[k] = (typ or “str”) if not inputs: inputs = {“in”: “str”} return ToolSpec(name=name, description=desc, inputs=inputs, outputs={“out”:”json”}) We start by defining the structure for our tools and writing a simple parser that converts plain documentation into a standardized tool specification. This helps us automatically extract parameters and outputs from textual descriptions. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def tool_fastqc(seq_fasta: str, min_len:int=30) -> Dict[str,Any]: seqs = [s for s in re.split(r”>[^n]*n”, seq_fasta)[1:]] lens = [len(re.sub(r”s+”,””,s)) for s in seqs] q30 = sum(l>=min_len for l in lens)/max(1,len(lens)) gc = sum(c in “GCgc” for s in seqs for c in s)/max(1,sum(lens)) return {“n_seqs”:len(lens),”len_mean”:(sum(lens)/max(1,len(lens))),”pct_q30″:q30,”gc”:gc} def tool_bowtie2_like(ref:str, reads:str, mode:str=”end-to-end”) -> Dict[str,Any]: def revcomp(s): t=str.maketrans(“ACGTacgt”,”TGCAtgca”); return s.translate(t)[::-1] reads_list=[r for r in re.split(r”>[^n]*n”, reads)[1:]] ref_seq=””.join(ref.splitlines()[1:]) hits=[] for i,r in enumerate(reads_list): rseq=””.join(r.split()) aligned = (rseq in ref_seq) or (revcomp(rseq) in ref_seq) hits.append({“read_id”:i,”aligned”:bool(aligned),”pos”:ref_seq.find(rseq)}) return {“n”:len(hits),”aligned”:sum(h[“aligned”] for h in hits),”mode”:mode,”hits”:hits} def tool_bcftools_like(ref:str, alt:str, win:int=15) -> Dict[str,Any]: ref_seq=””.join(ref.splitlines()[1:]); alt_seq=””.join(alt.splitlines()[1:]) n=min(len(ref_seq),len(alt_seq)); vars=[] for i in range(n): if ref_seq[i]!=alt_seq[i]: vars.append({“pos”:i,”ref”:ref_seq[i],”alt”:alt_seq[i]}) return {“n_sites”:n,”n_var”:len(vars),”variants”:vars[:win]} FASTQC_DOC = “””FastQC-like quality control for FASTA –seq_fasta: str –min_len: int Outputs: json””” BOWTIE_DOC = “””Bowtie2-like aligner –ref: str –reads: str –mode: str Outputs: json””” BCF_DOC = “””bcftools-like variant caller –ref: str –alt: str –win: int Outputs: json””” We create mock implementations of bioinformatics tools such as FastQC, Bowtie2, and Bcftools. We define their expected inputs and outputs so they can be executed consistently through a unified interface. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser @dataclass class MCPTool: spec: ToolSpec fn: Callable[…, Dict[str,Any]] class MCPServer: def __init__(self): self.tools: Dict[str,MCPTool] = {} def register(self, name:str, doc:str, fn:Callable[…,Dict[str,Any]]): spec = parse_doc_to_spec(name, doc); self.tools[name]=MCPTool(spec, fn) def list_tools(self) -> List[Dict[str,Any]]: return [dict(name=t.spec.name, description=t.spec.description, inputs=t.spec.inputs, outputs=t.spec.outputs) for t in self.tools.values()] def call_tool(self, name:str, args:Dict[str,Any]) -> Dict[str,Any]: if name not in self.tools: raise KeyError(f”tool {name} not found”) spec = self.tools[name].spec kwargs={k:args.get(k) for k in spec.inputs.keys()} return self.tools[name].fn(**kwargs) server=MCPServer() server.register(“fastqc”, FASTQC_DOC, tool_fastqc) server.register(“bowtie2”, BOWTIE_DOC, tool_bowtie2_like) server.register(“bcftools”, BCF_DOC, tool_bcftools_like) Task = Tuple[str, Dict[str,Any]] PIPELINES = { “rnaseq_qc_align_call”:[ (“fastqc”, {“seq_fasta”:”{reads}”, “min_len”:30}), (“bowtie2”, {“ref”:”{ref}”, “reads”:”{reads}”, “mode”:”end-to-end”}), (“bcftools”, {“ref”:”{ref}”, “alt”:”{alt}”, “win”:15}), ] } def compile_pipeline(nl_request:str) -> List[Task]: key = “rnaseq_qc_align_call” if re.search(r”rna|qc|align|variant|call”, nl_request, re.I) else “rnaseq_qc_align_call” return PIPELINES[key] We build a lightweight server that registers tools, lists their specifications, and allows us to call them programmatically. We also define a basic pipeline structure that outlines the sequence in which tools should run. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def mk_fasta(header:str, seq:str)->str: return f”>{header}n{seq}n” random.seed(0) REF_SEQ=””.join(random.choice(“ACGT”) for _ in range(300)) REF = mk_fasta(“ref”,REF_SEQ) READS = mk_fasta(“r1”, REF_SEQ[50:130]) + mk_fasta(“r2”,”ACGT”*15) + mk_fasta(“r3”, REF_SEQ[180:240]) ALT = mk_fasta(“alt”, REF_SEQ[:150] + “T” + REF_SEQ[151:]) def run_pipeline(nl:str, ctx:Dict[str,str]) -> Dict[str,Any]: plan=compile_pipeline(nl); results=[]; t0=time.time() for name, arg_tpl in plan: args={k:(v.format(**ctx) if isinstance(v,str) else v) for k,v in arg_tpl.items()} out=server.call_tool(name, args) results.append({“tool”:name,”args”:args,”output”:out}) return {“request”:nl,”elapsed_s”:round(time.time()-t0,4),”results”:results} We prepare small synthetic FASTA data for testing and implement a function that runs the entire pipeline. Here, we dynamically pass tool parameters and execute each step in the sequence. Check out the FULL CODES here. Copy CodeCopiedUse a different Browser def bench_individual() -> List[Dict[str,Any]]: cases=[ (“fastqc”, {“seq_fasta”:READS,”min_len”:25}), (“bowtie2”, {“ref”:REF,”reads”:READS,”mode”:”end-to-end”}), (“bcftools”, {“ref”:REF,”alt”:ALT,”win”:10}), ] rows=[] for name,args in cases: t0=time.time(); ok=True; err=None; out=None try: out=server.call_tool(name,args) except Exception as e: ok=False; err=str(e) rows.append({“tool”:name,”ok”:ok,”ms”:int((time.time()-t0)*1000),”out_keys”:list(out.keys()) if ok else [],”err”:err}) return rows def bench_pipeline() -> Dict[str,Any]: t0=time.time() res=run_pipeline(“Run RNA-seq QC, align, and variant call.”, {“ref”:REF,”reads”:READS,”alt”:ALT}) ok = all(step[“output”] for step in res[“results”]) return {“pipeline”:”rnaseq_qc_align_call”,”ok”:ok,”ms”:int((time.time()-t0)*1000),”n_steps”:len(res[“results”])} print(“== TOOLS ==”); print(json.dumps(server.list_tools(), indent=2)) print(“n== INDIVIDUAL BENCH ==”); print(json.dumps(bench_individual(), indent=2)) print(“n== PIPELINE BENCH ==”); print(json.dumps(bench_pipeline(), indent=2)) print(“n== PIPELINE RUN ==”); print(json.dumps(run_pipeline(“Run RNA-seq QC, align, and variant call.”, {“ref”:REF,”reads”:READS,”alt”:ALT}), indent=2)) We benchmark both individual tools and the full pipeline, capturing their outputs and performance metrics. Finally, we print the results to verify that each stage of the workflow runs successfully and integrates smoothly. In conclusion, we develop a clear understanding of how lightweight tool conversion, registration, and orchestration can work together in a single environment. We observe how a unified interface allows us to connect multiple tools seamlessly, run them in sequence, and measure their performance. This hands-on exercise helps us appreciate how simple design principles, standardization, automation, and modularity can enhance the reproducibility and efficiency of computational workflows in any domain. Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. The post A Coding Implementation to Build a Unified Tool Orchestration Framework from Documentation to Automated Pipelines appeared first on MarkTechPost.

A Coding Implementation to Build a Unified Tool Orchestration Framework from Documentation to Automated Pipelines Read Post »

AI, Committee, 新闻, Uncategorized

Developers can now add live Google Maps data to Gemini-powered AI app outputs

Google is adding a new feature for third-party developers building atop its Gemini AI models that rivals like OpenAI’s ChatGPT, Anthropic’s Claude, and the growing array of Chinese open source options are unlikely to get anytime soon: grounding with Google Maps. This addition allows developers to connect Google’s Gemini AI models’ reasoning capabilities with live geospatial data from Google Maps, enabling applications to deliver detailed, location-relevant responses to user queries—such as business hours, reviews, or the atmosphere of a specific venue. By tapping into data from over 250 million places, developers can now build more intelligent and responsive location-aware experiences. This is particularly useful for applications where proximity, real-time availability, or location-specific personalization matter—such as local search, delivery services, real estate, and travel planning. When the user’s location is known, developers can pass latitude and longitude into the request to enhance the response quality. By tightly integrating real-time and historical Maps data into the Gemini API, Google enables applications to generate grounded, location-specific responses with factual accuracy and contextual depth that are uniquely possible through its mapping infrastructure. Merging AI and Geospatial Intelligence The new feature is accessible in Google AI Studio, where developers can try a live demo powered by the Gemini Live API. Models that support the grounding with Google Maps include: Gemini 2.5 Pro Gemini 2.5 Flash Gemini 2.5 Flash-Lite Gemini 2.0 Flash In one demonstration, a user asked for Italian restaurant recommendations in Chicago. The assistant, leveraging Maps data, retrieved top-rated options and clarified a misspelled restaurant name before locating the correct venue with accurate business details. Developers can also retrieve a context token to embed a Google Maps widget in their app’s user interface. This interactive component displays photos, reviews, and other familiar content typically found in Google Maps. Integration is handled via the generateContent method in the Gemini API, where developers include googleMaps as a tool. They can also enable a Maps widget by setting a parameter in the request. The widget, rendered using a returned context token, can provide a visual layer alongside the AI-generated text. Use Cases Across Industries The Maps grounding tool is designed to support a wide range of practical use cases: Itinerary generation: Travel apps can create detailed daily plans with routing, timing, and venue information. Personalized local recommendations: Real estate platforms can highlight listings near kid-friendly amenities like schools and parks. Detailed location queries: Applications can provide specific information, such as whether a cafe offers outdoor seating, using community reviews and Maps metadata. Developers are encouraged to only enable the tool when geographic context is relevant, to optimize both performance and cost. According to the developer documentation, pricing starts at $25 per 1,000 grounded prompts — a steep sum for those trafficking in numerous queries. Combining Search and Maps for Enhanced Context Developers can use Grounding with Google Maps alongside Grounding with Google Search in the same request. While the Maps tool contributes factual data—like addresses, hours, and ratings—the Search tool adds broader context from web content, such as news or event listings. For example, when asked about live music on Beale Street, the combined tools provide venue details from Maps and event times from Search. According to Google, internal testing shows that using both tools together leads to significantly improved response quality. Unfortunately, it doesn’t appear that the Google Maps grounding includes live vehicular traffic data — at least not yet. Customization and Developer Flexibility The experience is built for customization. Developers can tweak system prompts, choose from different Gemini models, and configure voice settings to tailor interactions. The demo app in Google AI Studio is also remixable, enabling developers to test ideas, add features, and iterate on designs within a flexible development environment. The API returns structured metadata—including source links, place IDs, and citation spans—that developers can use to build inline citations or verify the AI-generated outputs. This supports transparency and enhances trust in user-facing applications. Google also requires that Maps-based sources be attributed clearly and linked back to the source using their URI. Implementation Considerations for AI Builders For technical teams integrating this capability, Google recommends: Passing user location context when known, for better results. Displaying Google Maps source links directly beneath the relevant content. Only enabling the tool when the query clearly involves geographic context. Monitoring latency and disabling grounding when performance is critical. Grounding with Google Maps is currently available globally, though prohibited in several territories (including China, Iran, North Korea, and Cuba), and not permitted for emergency response use cases. Availability and Access Grounding with Google Maps is now generally available through the Gemini API. With this release, Google continues to expand the capabilities of the Gemini API, empowering developers to build AI-driven applications that understand and respond to the world around them.

Developers can now add live Google Maps data to Gemini-powered AI app outputs Read Post »

AI, Committee, 新闻, Uncategorized

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

arXiv:2510.13939v1 Announce Type: new Abstract: The use of copyrighted books for training AI models has led to numerous lawsuits from authors concerned about AI’s ability to generate derivative content.Yet it’s unclear whether these models can generate high quality literary text while emulating authors’ styles. To answer this we conducted a preregistered study comparing MFA-trained expert writers with three frontier AI models: ChatGPT, Claude & Gemini in writing up to 450 word excerpts emulating 50 award-winning authors’ diverse styles. In blind pairwise evaluations by 159 representative expert & lay readers, AI-generated text from in-context prompting was strongly disfavored by experts for both stylistic fidelity (OR=0.16, p

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers Read Post »

AI, Committee, 新闻, Uncategorized

Interpreting the Latent Structure of Operator Precedence in Language Models

arXiv:2510.13908v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities but continue to struggle with arithmetic tasks. Prior works largely focus on outputs or prompting strategies, leaving the open question of the internal structure through which models do arithmetic computation. In this work, we investigate whether LLMs encode operator precedence in their internal representations via the open-source instruction-tuned LLaMA 3.2-3B model. We constructed a dataset of arithmetic expressions with three operands and two operators, varying the order and placement of parentheses. Using this dataset, we trace whether intermediate results appear in the residual stream of the instruction-tuned LLaMA 3.2-3B model. We apply interpretability techniques such as logit lens, linear classification probes, and UMAP geometric visualization. Our results show that intermediate computations are present in the residual stream, particularly after MLP blocks. We also find that the model linearly encodes precedence in each operator’s embeddings post attention layer. We introduce partial embedding swap, a technique that modifies operator precedence by exchanging high-impact embedding dimensions between operators.

Interpreting the Latent Structure of Operator Precedence in Language Models Read Post »

AI, Committee, 新闻, Uncategorized

Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

arXiv:2509.06917v2 Announce Type: replace-cross Abstract: We introduce Paper2Agent, an automated framework that converts research papers into AI agents. Paper2Agent transforms research output from passive artifacts into active systems that can accelerate downstream use, adoption, and discovery. Conventional research papers require readers to invest substantial effort to understand and adapt a paper’s code, data, and methods to their own work, creating barriers to dissemination and reuse. Paper2Agent addresses this challenge by automatically converting a paper into an AI agent that acts as a knowledgeable research assistant. It systematically analyzes the paper and the associated codebase using multiple agents to construct a Model Context Protocol (MCP) server, then iteratively generates and runs tests to refine and robustify the resulting MCP. These paper MCPs can then be flexibly connected to a chat agent (e.g. Claude Code) to carry out complex scientific queries through natural language while invoking tools and workflows from the original paper. We demonstrate Paper2Agent’s effectiveness in creating reliable and capable paper agents through in-depth case studies. Paper2Agent created an agent that leverages AlphaGenome to interpret genomic variants and agents based on ScanPy and TISSUE to carry out single-cell and spatial transcriptomics analyses. We validate that these paper agents can reproduce the original paper’s results and can correctly carry out novel user queries. Paper2Agent automatically created AI co-scientist that identified new splicing variant associated with ADHD risk. By turning static papers into dynamic, interactive AI agents, Paper2Agent introduces a new paradigm for knowledge dissemination and a foundation for the collaborative ecosystem of AI co-scientists.

Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents Read Post »

AI, Committee, 新闻, Uncategorized

Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline?

arXiv:2506.02132v4 Announce Type: replace Abstract: Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information relies primarily on studies of early models like BERT and GPT-2. Building on classic BERTology work, we analyze 25 models spanning from classical architectures (BERT, DeBERTa, GPT-2) to modern large language models (Pythia, OLMo-2, Gemma-2, Qwen2.5, Llama-3.1), probing layer-by-layer representations across eight linguistic tasks in English. Consistent with earlier findings, we find that hierarchical organization persists in modern models: early layers capture syntax, middle layers handle semantics and entity-level information, and later layers encode discourse phenomena. We dive deeper, conducting an in-depth multilingual analysis of two specific linguistic properties – lexical identity and inflectional morphology – that help disentangle form from meaning. We find that lexical information concentrates linearly in early layers but becomes increasingly nonlinear deeper in the network, while inflectional information remains linearly accessible throughout all layers. Additional analyses of attention mechanisms, steering vectors, and pretraining checkpoints reveal where this information resides within layers, how it can be functionally manipulated, and how representations evolve during pretraining. Taken together, our findings suggest that, even with substantial advances in LLM technologies, transformer models learn to organize linguistic information in similar ways, regardless of model architecture, size, or training regime, indicating that these properties are important for next token prediction. Our code is available at https://github.com/ml5885/model_internal_sleuthing

Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline? Read Post »

AI, Committee, 新闻, Uncategorized

Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning

arXiv:2510.13909v1 Announce Type: new Abstract: Inductive Knowledge Graph Reasoning (KGR) aims to discover facts in open-domain KGs containing unknown entities and relations, which poses a challenge for KGR models in comprehending uncertain KG components. Existing studies have proposed Knowledge Graph Foundation Models (KGFMs) that learn structural invariances across KGs to handle this uncertainty. Recently, Large Language Models (LLMs) have demonstrated strong capabilities for open-domain knowledge reasoning. As a result, the latest research has focused on LLM-based KGFMs that integrate LLM knowledge with KG context for inductive KGR. However, the intrinsic knowledge of LLMs may be overshadowed by sparse KG context, leading to LLM knowledge distortion, which can cause irreversible damage to model reasoning. Moreover, existing LLM-based KGR methods still struggle to fully constrain generative hallucinations in LLMs, severely limiting the credibility of reasoning results. To address these limitations, we propose a Knowledge Reasoning Language Model (KRLM) that achieves unified coordination between LLM knowledge and KG context throughout the KGR process. Specifically, we design a Knowledge Reasoning Language (KRL) instruction format and a KRL tokenizer to align LLM knowledge with KG representations. Then, we propose a KRL attention layer that coordinates intrinsic LLM knowledge with additional KG context through a dynamic knowledge memory mechanism. Finally, a structure-aware next-entity predictor is proposed, which strictly constrains the reasoning results within a trustworthy knowledge domain. Extensive experimental results on 25 real-world inductive KGR datasets demonstrate the significant superiority of the proposed KRLMfootnote{Our source codes are available at https://anonymous.4open.science/r/KRLM-EA36 in both zero-shot reasoning and fine-tuning scenarios.

Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning Read Post »

AI, Committee, 新闻, Uncategorized

A Matter of Representation: Towards Graph-Based Abstract Code Generation

arXiv:2510.13163v1 Announce Type: new Abstract: Most large language models (LLMs) today excel at generating raw, sequential code with minimal abstractions and custom structures. However, there has been little work on graph-based abstract code generation, where significant logic is encapsulated in predefined nodes and execution flow is determined by edges. This is relevant for visual programming languages, and in cases where raw source code is inaccessible to users and LLM training sets. In this work, we propose and evaluate JSON representations for graphs to enable high accuracy graph-based abstract code generation. We evaluate these representations on ScratchTest, a mini-benchmark based on our custom Python re-implementation of Scratch, which tests the LLM in code graph space. Our findings demonstrate that LLMs can indeed perform the aforementioned generation task in a single pass without relying on specialized or complex pipelines, given the correct graph representations. We also show that different representations induce significantly different accuracies, highlighting the instrumental role of representations in this generation task. All in all, this work establishes the first steps towards representation learning for graph-based abstract code generation.

A Matter of Representation: Towards Graph-Based Abstract Code Generation Read Post »

We use cookies to improve your experience and performance on our website. You can learn more at 隱私權政策 and manage your privacy settings by clicking Settings.

Privacy Preferences

You can choose your cookie settings by turning on/off each type of cookie as you wish, except for essential cookies.

Allow All
Manage Consent Preferences
  • Always Active

Save
zh_CN