{"id":39307,"date":"2025-09-20T06:39:34","date_gmt":"2025-09-20T06:39:34","guid":{"rendered":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/"},"modified":"2025-09-20T06:39:34","modified_gmt":"2025-09-20T06:39:34","slug":"mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/","title":{"rendered":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators"},"content":{"rendered":"<div class=\"wp-block-yoast-seo-table-of-contents yoast-table-of-contents\">\n<h3><strong>Table of contents<\/strong><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-hardware-generation-without-templates\" data-level=\"3\">Hardware Generation without Templates<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-input-ir-affine-relation-centric-semantics-deconstruct\" data-level=\"3\">Input IR: Affine, Relation-Centric Semantics (Deconstruct)<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-front-end-fu-graph-memory-co-design-architect\" data-level=\"3\">Front End: FU Graph + Memory Co-Design (Architect)<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-back-end-compile-amp-optimize-to-rtl-compile-amp-optimize\" data-level=\"3\">Back End: Compile &amp; Optimize to RTL (Compile &amp; Optimize)<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-outcome\" data-level=\"3\">Outcome<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-importance-for-each-segment\" data-level=\"3\">Importance for each segment<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-how-the-compiler-for-ai-chips-works-step-by-step\" data-level=\"3\">How the \u201cCompiler for AI Chips\u201d Works\u2014Step-by-Step ?<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-where-it-lands-in-the-ecosystem\" data-level=\"3\">Where It Lands in the Ecosystem?<\/a><\/li>\n<li><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#h-summary\" data-level=\"3\">\u0e2a\u0e23\u0e38\u0e1b<\/a><\/li>\n<\/ul>\n<\/div>\n<p>MIT researchers (Han Lab) introduced <strong>LEGO<\/strong>, a <em>compiler-like<\/em> framework that takes tensor workloads (e.g., GEMM, Conv2D, attention, MTTKRP) and automatically <strong>generates synthesizable RTL<\/strong> for spatial accelerators\u2014no handwritten templates. LEGO\u2019s front end expresses workloads and dataflows in a <strong>relation-centric affine representation<\/strong>, builds <strong>FU (functional unit) interconnects<\/strong> and <strong>on-chip memory<\/strong> layouts for reuse, and supports <strong>fusing multiple spatial dataflows<\/strong> in a single design. The back end lowers to a primitive-level graph and uses <strong>linear programming<\/strong> and graph transforms to insert pipeline registers, rewire broadcasts, extract reduction trees, and <strong>shrink area and power<\/strong>. Evaluated across foundation models and classic CNNs\/Transformers, LEGO\u2019s generated hardware shows <strong>3.2\u00d7 speedup<\/strong> and <strong>2.4\u00d7 energy efficiency<\/strong> over Gemmini under matched resources.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"545\" data-attachment-id=\"74666\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/screenshot-2025-09-18-at-5-12-24-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1.png\" data-orig-size=\"1480,788\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-09-18 at 5.12.24\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-300x160.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545.png\" alt=\"\" class=\"wp-image-74666\" \/><figcaption class=\"wp-element-caption\">https:\/\/hanlab.mit.edu\/projects\/lego<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Hardware Generation without Templates<\/strong><\/h3>\n<p>Existing flows either: (1) analyze dataflows without generating hardware, or (2) generate RTL from <strong>hand-tuned templates<\/strong> with fixed topologies. These approaches restrict the architecture space and struggle with modern workloads that need to <strong>switch dataflows dynamically<\/strong> across layers\/ops (e.g., conv vs. depthwise vs. attention). LEGO directly targets <strong>any dataflow and combinations<\/strong>, generating both architecture and RTL from a high-level description rather than configuring a few numeric parameters in a template.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"422\" data-attachment-id=\"74668\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/screenshot-2025-09-18-at-5-12-47-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.47-PM-1.png\" data-orig-size=\"1330,548\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-09-18 at 5.12.47\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.47-PM-1-300x124.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.47-PM-1-1024x422.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.47-PM-1-1024x422.png\" alt=\"\" class=\"wp-image-74668\" \/><figcaption class=\"wp-element-caption\">https:\/\/hanlab.mit.edu\/projects\/lego<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Input IR: Affine, Relation-Centric Semantics (Deconstruct)<\/strong><\/h3>\n<p>LEGO models tensor programs as loop nests with three index classes: <strong>temporal<\/strong> (for-loops), <strong>spatial<\/strong> (par-for FUs), and <strong>computation<\/strong> (pre-tiling iteration domain). Two affine relations drive the compiler:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Data mapping<\/strong> fI\u2192Df_{I\u2192D}: maps computation indices to tensor indices.<\/li>\n<li><strong>Dataflow mapping<\/strong> fTS\u2192If_{TS\u2192I}: maps temporal\/spatial indices to computation indices.<\/li>\n<\/ul>\n<p>This <strong>affine-only<\/strong> representation eliminates modulo\/division in the core analysis, making <strong>reuse detection<\/strong> and <strong>address generation<\/strong> a linear-algebra problem. LEGO also <strong>decouples control flow<\/strong> from dataflow (a vector <strong>c<\/strong> encodes control signal propagation\/delay), enabling <strong>shared control<\/strong> across FUs and substantially reducing control logic overhead.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Front End: FU Graph + Memory Co-Design (Architect)<\/strong><\/h3>\n<p>The main objectives is to maximize reuse and on-chip bandwidth while minimizing interconnect\/mux overhead.<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Interconnection synthesis.<\/strong> LEGO formulates reuse as solving linear systems over the affine relations to discover <strong>direct<\/strong> and <strong>delay<\/strong> (FIFO) connections between FUs. It then computes <strong>minimum-spanning arborescences<\/strong> (Chu-Liu\/Edmonds) to keep only necessary edges (cost = FIFO depth). A <strong>BFS-based heuristic<\/strong> rewrites <em>direct<\/em> interconnects when <strong>multiple dataflows<\/strong> must co-exist, prioritizing chain reuse and nodes already fed by delay connections to cut muxes and data nodes.<\/li>\n<li><strong>Banked memory synthesis.<\/strong> Given the set of FUs that must read\/write a tensor in the same cycle, LEGO computes <strong>bank counts per tensor dimension<\/strong> from the maximum index deltas (optionally dividing by GCD to reduce banks). It then instantiates <strong>data-distribution switches<\/strong> to route between banks and FUs, leaving FU-to-FU reuse to the interconnect.<\/li>\n<li><strong>Dataflow fusion.<\/strong> Interconnects for different spatial dataflows are combined into a single FU-level <strong>Architecture Description Graph (ADG)<\/strong>; careful planning avoids na\u00efve mux-heavy merges and yields up to <strong>~20% energy gains<\/strong> compared to na\u00efve fusion.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Back End: Compile &amp; Optimize to RTL (Compile &amp; Optimize)<\/strong><\/h3>\n<p>The ADG is lowered to a <strong>Detailed Architecture Graph (DAG)<\/strong> of primitives (FIFOs, muxes, adders, address generators). LEGO applies several <strong>LP\/graph<\/strong> passes:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Delay matching via LP.<\/strong> A linear program chooses output delays DvD_v to <strong>minimize inserted pipeline registers<\/strong> \u2211(Dv\u2212Du\u2212Lv)\u22c5bitwidthsum (D_v-D_u-L_v)cdot text{bitwidth} across edges\u2014meeting timing alignment with minimal storage.<\/li>\n<li><strong>Broadcast pin rewiring.<\/strong> A two-stage optimization (virtual cost shaping + MST-based rewiring among destinations) converts expensive broadcasts into <strong>forward chains<\/strong>, enabling register sharing and lower latency; a final LP re-balances delays.<\/li>\n<li><strong>Reduction tree extraction + pin reuse.<\/strong> Sequential adder chains become <strong>balanced trees<\/strong>; a <strong>0-1 ILP<\/strong> remaps reducer inputs across dataflows so fewer physical pins are required (mux instead of add). This reduces both <strong>logic depth<\/strong> and <strong>register count<\/strong>.<\/li>\n<\/ul>\n<p>These passes focus on the <strong>datapath<\/strong>, which dominates resources (e.g., FU-array registers \u2248 <strong>40% area<\/strong>, <strong>60% power<\/strong>), and produce <strong>~35% area savings<\/strong> versus na\u00efve generation.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Outcome<\/strong><\/h3>\n<p><strong>Setup.<\/strong> LEGO is implemented in C++ with HiGHS as the LP solver and emits SpinalHDL\u2192Verilog. Evaluation covers tensor kernels and end-to-end models (AlexNet, MobileNetV2, ResNet-50, EfficientNetV2, BERT, GPT-2, CoAtNet, DDPM, Stable Diffusion, LLaMA-7B). A single <strong>LEGO-MNICOC<\/strong> accelerator instance is used across models; a mapper picks per-layer tiling\/dataflow. Gemmini is the main baseline under <strong>matched resources<\/strong> (256 MACs, 256 KB on-chip buffer, 128-bit bus @ 16 GB\/s).<\/p>\n<p><strong>End-to-end speed\/efficiency.<\/strong> LEGO achieves <strong>3.2\u00d7<\/strong> speedup and <strong>2.4\u00d7<\/strong> energy efficiency on average vs. Gemmini. Gains stem from: (i) a <strong>fast, accurate performance model<\/strong> guiding mapping; (ii) <strong>dynamic spatial dataflow switching<\/strong> enabled by generated interconnects (e.g., depthwise conv layers choose OH\u2013OW\u2013IC\u2013OC). Both designs are bandwidth-bound on GPT-2.<\/p>\n<p><strong>Resource breakdown.<\/strong> Example SoC-style configuration shows <strong>FU array<\/strong> and <strong>NoC<\/strong> dominate area\/power, with PPUs contributing ~<strong>2\u20135%<\/strong>. This supports the decision to aggressively optimize datapaths and control reuse.<\/p>\n<p><strong>Generative models.<\/strong> On a larger 1024-FU configuration, LEGO sustains <strong>&gt;80% utilization<\/strong> for DDPM\/Stable Diffusion; LLaMA-7B remains bandwidth-limited (expected for low operational intensity).<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"679\" data-attachment-id=\"74670\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/screenshot-2025-09-18-at-5-13-15-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.13.15-PM-1.png\" data-orig-size=\"1290,856\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2025-09-18 at 5.13.15\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.13.15-PM-1-300x199.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.13.15-PM-1-1024x679.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.13.15-PM-1-1024x679.png\" alt=\"\" class=\"wp-image-74670\" \/><figcaption class=\"wp-element-caption\">https:\/\/hanlab.mit.edu\/projects\/lego<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Importance for each segment<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>For researchers:<\/strong> LEGO provides a <strong>mathematically grounded<\/strong> path from loop-nest specifications to <strong>spatial hardware<\/strong> with <strong>provable LP-based<\/strong> optimizations. It abstracts away low-level RTL and exposes meaningful levers (tiling, spatialization, reuse patterns) for systematic exploration.<\/li>\n<li><strong>For practitioners:<\/strong> It is effectively <strong>hardware-as-code<\/strong>. You can target <strong>arbitrary dataflows<\/strong> and <strong>fuse<\/strong> them in one accelerator, letting a compiler derive interconnects, buffers, and controllers while shrinking mux\/FIFO overheads. This improves <strong>energy<\/strong> and supports <strong>multi-op pipelines<\/strong> without manual template redesign.<\/li>\n<li><strong>For product leaders:<\/strong> \u0e42\u0e14\u0e22 <strong>lowering the barrier to custom silicon<\/strong>, LEGO enables <strong>task-tuned, power-efficient<\/strong> edge accelerators (wearables, IoT) that keep pace with fast-moving AI stacks\u2014<em>the silicon adapts to the model, not the other way around<\/em>. End-to-end results against a state-of-the-art generator (Gemmini) quantify the upside.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>How the \u201cCompiler for AI Chips\u201d Works\u2014Step-by-Step<\/strong>?<\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>Deconstruct (Affine IR).<\/strong> Write the tensor op as loop nests; supply affine <strong>f_{I\u2192D}<\/strong> (data mapping), <strong>f_{TS\u2192I}<\/strong> (dataflow), and control flow vector <strong>c<\/strong>. This specifies <em>what<\/em> to compute and <em>how<\/em> it is spatialized, without templates.<\/li>\n<li><strong>Architect (Graph Synthesis).<\/strong> Solve reuse equations \u2192 <strong>FU interconnects<\/strong> (direct\/delay) \u2192 <strong>MST\/heuristics<\/strong> for minimal edges and fused dataflows; compute <strong>banked memory<\/strong> and <strong>distribution switches<\/strong> to satisfy concurrent accesses without conflicts.<\/li>\n<li><strong>Compile &amp; Optimize (LP + Graph Transforms).<\/strong> Lower to a primitive DAG; run <strong>delay-matching LP<\/strong>, <strong>broadcast rewiring (MST)<\/strong>, <strong>reduction-tree extraction<\/strong>, and <strong>pin-reuse ILP<\/strong>; perform <strong>bit-width inference<\/strong> and optional <strong>power gating<\/strong>. These passes jointly deliver <strong>~35% area and ~28% energy<\/strong> savings vs. na\u00efve codegen.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Where It Lands in the Ecosystem?<\/strong><\/h3>\n<p>Compared with analysis tools (Timeloop\/MAESTRO) and template-bound generators (Gemmini, DNA, MAGNET), LEGO is <strong>template-free<\/strong>, supports <strong>any dataflow<\/strong> and <strong>their combinations<\/strong>, and emits <strong>synthesizable RTL<\/strong>. Results show <strong>comparable or better<\/strong> area\/power versus expert handwritten accelerators under similar dataflows and technologies, while offering <strong>one-architecture-for-many-models<\/strong> deployment.<\/p>\n<h3 class=\"wp-block-heading\"><strong>\u0e2a\u0e23\u0e38\u0e1b<\/strong><\/h3>\n<p>LEGO operationalizes <em>hardware generation as compilation<\/em> for tensor programs: an affine front end for <strong>reuse-aware interconnect\/memory synthesis<\/strong> and an LP-powered back end for <strong>datapath minimization<\/strong>. The framework\u2019s measured <strong>3.2\u00d7 performance<\/strong> and <strong>2.4\u00d7 energy<\/strong> gains over a leading open generator, plus <strong>~35% area<\/strong> reductions from back-end optimizations, position it as a practical path to <strong>application-specific AI accelerators<\/strong> at the edge and beyond.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the<a href=\"https:\/\/arxiv.org\/abs\/2509.12053\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0<strong>Paper\u00a0<\/strong><\/a><strong>and\u00a0<a href=\"https:\/\/hanlab.mit.edu\/projects\/lego\" target=\"_blank\" rel=\"noreferrer noopener\">Project Page<\/a><em>.<\/em><\/strong>\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<p><!-- CONTENT END 8 --><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/09\/18\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\">MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Table of contents Hardware Generation without Templates Input IR: Affine, Relation-Centric Semantics (Deconstruct) Front End: FU Graph + Memory Co-Design (Architect) Back End: Compile &amp; Optimize to RTL (Compile &amp; Optimize) Outcome Importance for each segment How the \u201cCompiler for AI Chips\u201d Works\u2014Step-by-Step ? Where It Lands in the Ecosystem? Summary MIT researchers (Han Lab) introduced LEGO, a compiler-like framework that takes tensor workloads (e.g., GEMM, Conv2D, attention, MTTKRP) and automatically generates synthesizable RTL for spatial accelerators\u2014no handwritten templates. LEGO\u2019s front end expresses workloads and dataflows in a relation-centric affine representation, builds FU (functional unit) interconnects and on-chip memory layouts for reuse, and supports fusing multiple spatial dataflows in a single design. The back end lowers to a primitive-level graph and uses linear programming and graph transforms to insert pipeline registers, rewire broadcasts, extract reduction trees, and shrink area and power. Evaluated across foundation models and classic CNNs\/Transformers, LEGO\u2019s generated hardware shows 3.2\u00d7 speedup and 2.4\u00d7 energy efficiency over Gemmini under matched resources. https:\/\/hanlab.mit.edu\/projects\/lego Hardware Generation without Templates Existing flows either: (1) analyze dataflows without generating hardware, or (2) generate RTL from hand-tuned templates with fixed topologies. These approaches restrict the architecture space and struggle with modern workloads that need to switch dataflows dynamically across layers\/ops (e.g., conv vs. depthwise vs. attention). LEGO directly targets any dataflow and combinations, generating both architecture and RTL from a high-level description rather than configuring a few numeric parameters in a template. https:\/\/hanlab.mit.edu\/projects\/lego Input IR: Affine, Relation-Centric Semantics (Deconstruct) LEGO models tensor programs as loop nests with three index classes: temporal (for-loops), spatial (par-for FUs), and computation (pre-tiling iteration domain). Two affine relations drive the compiler: Data mapping fI\u2192Df_{I\u2192D}: maps computation indices to tensor indices. Dataflow mapping fTS\u2192If_{TS\u2192I}: maps temporal\/spatial indices to computation indices. This affine-only representation eliminates modulo\/division in the core analysis, making reuse detection and address generation a linear-algebra problem. LEGO also decouples control flow from dataflow (a vector c encodes control signal propagation\/delay), enabling shared control across FUs and substantially reducing control logic overhead. Front End: FU Graph + Memory Co-Design (Architect) The main objectives is to maximize reuse and on-chip bandwidth while minimizing interconnect\/mux overhead. Interconnection synthesis. LEGO formulates reuse as solving linear systems over the affine relations to discover direct and delay (FIFO) connections between FUs. It then computes minimum-spanning arborescences (Chu-Liu\/Edmonds) to keep only necessary edges (cost = FIFO depth). A BFS-based heuristic rewrites direct interconnects when multiple dataflows must co-exist, prioritizing chain reuse and nodes already fed by delay connections to cut muxes and data nodes. Banked memory synthesis. Given the set of FUs that must read\/write a tensor in the same cycle, LEGO computes bank counts per tensor dimension from the maximum index deltas (optionally dividing by GCD to reduce banks). It then instantiates data-distribution switches to route between banks and FUs, leaving FU-to-FU reuse to the interconnect. Dataflow fusion. Interconnects for different spatial dataflows are combined into a single FU-level Architecture Description Graph (ADG); careful planning avoids na\u00efve mux-heavy merges and yields up to ~20% energy gains compared to na\u00efve fusion. Back End: Compile &amp; Optimize to RTL (Compile &amp; Optimize) The ADG is lowered to a Detailed Architecture Graph (DAG) of primitives (FIFOs, muxes, adders, address generators). LEGO applies several LP\/graph passes: Delay matching via LP. A linear program chooses output delays DvD_v to minimize inserted pipeline registers \u2211(Dv\u2212Du\u2212Lv)\u22c5bitwidthsum (D_v-D_u-L_v)cdot text{bitwidth} across edges\u2014meeting timing alignment with minimal storage. Broadcast pin rewiring. A two-stage optimization (virtual cost shaping + MST-based rewiring among destinations) converts expensive broadcasts into forward chains, enabling register sharing and lower latency; a final LP re-balances delays. Reduction tree extraction + pin reuse. Sequential adder chains become balanced trees; a 0-1 ILP remaps reducer inputs across dataflows so fewer physical pins are required (mux instead of add). This reduces both logic depth and register count. These passes focus on the datapath, which dominates resources (e.g., FU-array registers \u2248 40% area, 60% power), and produce ~35% area savings versus na\u00efve generation. Outcome Setup. LEGO is implemented in C++ with HiGHS as the LP solver and emits SpinalHDL\u2192Verilog. Evaluation covers tensor kernels and end-to-end models (AlexNet, MobileNetV2, ResNet-50, EfficientNetV2, BERT, GPT-2, CoAtNet, DDPM, Stable Diffusion, LLaMA-7B). A single LEGO-MNICOC accelerator instance is used across models; a mapper picks per-layer tiling\/dataflow. Gemmini is the main baseline under matched resources (256 MACs, 256 KB on-chip buffer, 128-bit bus @ 16 GB\/s). End-to-end speed\/efficiency. LEGO achieves 3.2\u00d7 speedup and 2.4\u00d7 energy efficiency on average vs. Gemmini. Gains stem from: (i) a fast, accurate performance model guiding mapping; (ii) dynamic spatial dataflow switching enabled by generated interconnects (e.g., depthwise conv layers choose OH\u2013OW\u2013IC\u2013OC). Both designs are bandwidth-bound on GPT-2. Resource breakdown. Example SoC-style configuration shows FU array and NoC dominate area\/power, with PPUs contributing ~2\u20135%. This supports the decision to aggressively optimize datapaths and control reuse. Generative models. On a larger 1024-FU configuration, LEGO sustains &gt;80% utilization for DDPM\/Stable Diffusion; LLaMA-7B remains bandwidth-limited (expected for low operational intensity). https:\/\/hanlab.mit.edu\/projects\/lego Importance for each segment For researchers: LEGO provides a mathematically grounded path from loop-nest specifications to spatial hardware with provable LP-based optimizations. It abstracts away low-level RTL and exposes meaningful levers (tiling, spatialization, reuse patterns) for systematic exploration. For practitioners: It is effectively hardware-as-code. You can target arbitrary dataflows and fuse them in one accelerator, letting a compiler derive interconnects, buffers, and controllers while shrinking mux\/FIFO overheads. This improves energy and supports multi-op pipelines without manual template redesign. For product leaders: By lowering the barrier to custom silicon, LEGO enables task-tuned, power-efficient edge accelerators (wearables, IoT) that keep pace with fast-moving AI stacks\u2014the silicon adapts to the model, not the other way around. End-to-end results against a state-of-the-art generator (Gemmini) quantify the upside. How the \u201cCompiler for AI Chips\u201d Works\u2014Step-by-Step? Deconstruct (Affine IR). Write the tensor op as loop nests; supply affine f_{I\u2192D} (data mapping), f_{TS\u2192I} (dataflow), and control flow vector c. This specifies what to compute and how it is spatialized, without templates. Architect (Graph Synthesis). Solve reuse equations \u2192 FU interconnects (direct\/delay) \u2192 MST\/heuristics for minimal<\/p>","protected":false},"author":2,"featured_media":39308,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-39307","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-20T06:39:34+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators\",\"datePublished\":\"2025-09-20T06:39:34+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\"},\"wordCount\":1293,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\",\"url\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\",\"name\":\"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp\",\"datePublished\":\"2025-09-20T06:39:34+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp\",\"width\":1024,\"height\":545},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/","og_locale":"th_TH","og_type":"article","og_title":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-09-20T06:39:34+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"6 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators","datePublished":"2025-09-20T06:39:34+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/"},"wordCount":1293,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/","url":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/","name":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp","datePublished":"2025-09-20T06:39:34+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp","width":1024,"height":545},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/mits-lego-a-compiler-for-ai-chips-that-auto-generates-fast-efficient-spatial-accelerators\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"MIT\u2019s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-150x150.webp",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-300x160.webp",300,160,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD.webp",1024,545,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-18x10.webp",18,10,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-300x300.webp",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-600x319.webp",600,319,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2025\/09\/Screenshot-2025-09-18-at-5.12.24-PM-1-1024x545-zetmxD-100x100.webp",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"Table of contents Hardware Generation without Templates Input IR: Affine, Relation-Centric Semantics (Deconstruct) Front End: FU Graph + Memory Co-Design (Architect) Back End: Compile &amp; Optimize to RTL (Compile &amp; Optimize) Outcome Importance for each segment How the \u201cCompiler for AI Chips\u201d Works\u2014Step-by-Step ? Where It Lands in the Ecosystem? Summary MIT researchers (Han Lab)&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/39307","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=39307"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/39307\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media\/39308"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=39307"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=39307"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=39307"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}