{"id":53609,"date":"2025-11-26T08:16:43","date_gmt":"2025-11-26T08:16:43","guid":{"rendered":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/"},"modified":"2025-11-26T08:16:43","modified_gmt":"2025-11-26T08:16:43","slug":"how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals","status":"publish","type":"post","link":"https:\/\/youzum.net\/ja\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/","title":{"rendered":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals"},"content":{"rendered":"<p>In this tutorial, we explore how to build neural networks from scratch using <a href=\"https:\/\/github.com\/tinygrad\/tinygrad\"><strong>Tinygrad<\/strong><\/a> while remaining fully hands-on with tensors, autograd, attention mechanisms, and transformer architectures. We progressively build every component ourselves, from basic tensor operations to multi-head attention, transformer blocks, and, finally, a working mini-GPT model. Through each stage, we observe how Tinygrad\u2019s simplicity helps us understand what happens under the hood when models train, optimize, and fuse kernels for performance. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import subprocess, sys, os\nprint(\"Installing dependencies...\")\nsubprocess.check_call([\"apt-get\", \"install\", \"-qq\", \"clang\"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)\nsubprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"git+https:\/\/github.com\/tinygrad\/tinygrad.git\"])\n\n\nimport numpy as np\nfrom tinygrad import Tensor, nn, Device\nfrom tinygrad.nn import optim\nimport time\n\n\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\" alt=\"\ud83d\ude80\" class=\"wp-smiley\" \/> Using device: {Device.DEFAULT}\")\nprint(\"=\" * 60)\n\n\nprint(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f4da.png\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" \/> PART 1: Tensor Operations &amp; Autograd\")\nprint(\"-\" * 60)\n\n\nx = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)\ny = Tensor([[2.0, 0.0], [1.0, 2.0]], requires_grad=True)\n\n\nz = (x @ y).sum() + (x ** 2).mean()\nz.backward()\n\n\nprint(f\"x:n{x.numpy()}\")\nprint(f\"y:n{y.numpy()}\")\nprint(f\"z (scalar): {z.numpy()}\")\nprint(f\"\u2202z\/\u2202x:n{x.grad.numpy()}\")\nprint(f\"\u2202z\/\u2202y:n{y.grad.numpy()}\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We set up Tinygrad in our Colab environment and immediately begin experimenting with tensors and automatic differentiation. We create a small computation graph and observe how gradients flow through matrix operations. As we print the outputs, we gain an intuitive understanding of how Tinygrad handles backpropagation under the hood. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f9e0.png\" alt=\"\ud83e\udde0\" class=\"wp-smiley\" \/> PART 2: Building Custom Layers\")\nprint(\"-\" * 60)\n\n\nclass MultiHeadAttention:\n   def __init__(self, dim, num_heads):\n       self.num_heads = num_heads\n       self.dim = dim\n       self.head_dim = dim \/\/ num_heads\n       self.qkv = Tensor.glorot_uniform(dim, 3 * dim)\n       self.out = Tensor.glorot_uniform(dim, dim)\n  \n   def __call__(self, x):\n       B, T, C = x.shape[0], x.shape[1], x.shape[2]\n       qkv = x.reshape(B * T, C).dot(self.qkv).reshape(B, T, 3, self.num_heads, self.head_dim)\n       q, k, v = qkv[:, :, 0], qkv[:, :, 1], qkv[:, :, 2]\n       scale = (self.head_dim ** -0.5)\n       attn = (q @ k.transpose(-2, -1)) * scale\n       attn = attn.softmax(axis=-1)\n       out = (attn @ v).transpose(1, 2).reshape(B, T, C)\n       return out.reshape(B * T, C).dot(self.out).reshape(B, T, C)\n\n\nclass TransformerBlock:\n   def __init__(self, dim, num_heads):\n       self.attn = MultiHeadAttention(dim, num_heads)\n       self.ff1 = Tensor.glorot_uniform(dim, 4 * dim)\n       self.ff2 = Tensor.glorot_uniform(4 * dim, dim)\n       self.ln1_w = Tensor.ones(dim)\n       self.ln2_w = Tensor.ones(dim)\n  \n   def __call__(self, x):\n       x = x + self.attn(self._layernorm(x, self.ln1_w))\n       ff = x.reshape(-1, x.shape[-1])\n       ff = ff.dot(self.ff1).gelu().dot(self.ff2)\n       x = x + ff.reshape(x.shape)\n       return self._layernorm(x, self.ln2_w)\n  \n   def _layernorm(self, x, w):\n       mean = x.mean(axis=-1, keepdim=True)\n       var = ((x - mean) ** 2).mean(axis=-1, keepdim=True)\n       return w * (x - mean) \/ (var + 1e-5).sqrt()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We design our own multi-head attention module and a transformer block entirely from scratch. We implement the projections, attention scores, softmax, feedforward layers, and layer normalization manually. As we run this code, we see how each component contributes to a transformer layer\u2019s overall behavior. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> PART 3: Mini-GPT Architecture\")\nprint(\"-\" * 60)\n\n\nclass MiniGPT:\n   def __init__(self, vocab_size=256, dim=128, num_heads=4, num_layers=2, max_len=32):\n       self.vocab_size = vocab_size\n       self.dim = dim\n       self.tok_emb = Tensor.glorot_uniform(vocab_size, dim)\n       self.pos_emb = Tensor.glorot_uniform(max_len, dim)\n       self.blocks = [TransformerBlock(dim, num_heads) for _ in range(num_layers)]\n       self.ln_f = Tensor.ones(dim)\n       self.head = Tensor.glorot_uniform(dim, vocab_size)\n  \n   def __call__(self, idx):\n       B, T = idx.shape[0], idx.shape[1]\n       tok_emb = self.tok_emb[idx.flatten()].reshape(B, T, self.dim)\n       pos_emb = self.pos_emb[:T].reshape(1, T, self.dim)\n       x = tok_emb + pos_emb\n       for block in self.blocks:\n           x = block(x)\n       mean = x.mean(axis=-1, keepdim=True)\n       var = ((x - mean) ** 2).mean(axis=-1, keepdim=True)\n       x = self.ln_f * (x - mean) \/ (var + 1e-5).sqrt()\n       return x.reshape(B * T, self.dim).dot(self.head).reshape(B, T, self.vocab_size)\n  \n   def get_params(self):\n       params = [self.tok_emb, self.pos_emb, self.ln_f, self.head]\n       for block in self.blocks:\n           params.extend([block.attn.qkv, block.attn.out, block.ff1, block.ff2, block.ln1_w, block.ln2_w])\n       return params\n\n\nmodel = MiniGPT(vocab_size=256, dim=64, num_heads=4, num_layers=2, max_len=16)\nparams = model.get_params()\ntotal_params = sum(p.numel() for p in params)\nprint(f\"Model initialized with {total_params:,} parameters\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We assemble the full MiniGPT architecture using the components built earlier. We embed tokens, add positional information, stack multiple transformer blocks, and project the final outputs back to vocab logits. As we initialize the model, we begin to appreciate how a compact transformer can be built with surprisingly few moving parts. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f3cb.png\" alt=\"\ud83c\udfcb\" class=\"wp-smiley\" \/> PART 4: Training Loop\")\nprint(\"-\" * 60)\n\n\ndef gen_data(batch_size, seq_len):\n   x = np.random.randint(0, 256, (batch_size, seq_len))\n   y = np.roll(x, 1, axis=1)\n   y[:, 0] = x[:, 0]\n   return Tensor(x, dtype='int32'), Tensor(y, dtype='int32')\n\n\noptimizer = optim.Adam(params, lr=0.001)\nlosses = []\n\n\nprint(\"Training to predict previous token in sequence...\")\nwith Tensor.train():\n   for step in range(20):\n       start = time.time()\n       x_batch, y_batch = gen_data(batch_size=16, seq_len=16)\n       logits = model(x_batch)\n       B, T, V = logits.shape[0], logits.shape[1], logits.shape[2]\n       loss = logits.reshape(B * T, V).sparse_categorical_crossentropy(y_batch.reshape(B * T))\n       optimizer.zero_grad()\n       loss.backward()\n       optimizer.step()\n       losses.append(loss.numpy())\n       elapsed = time.time() - start\n       if step % 5 == 0:\n           print(f\"Step {step:3d} | Loss: {loss.numpy():.4f} | Time: {elapsed*1000:.1f}ms\")\n\n\nprint(\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/26a1.png\" alt=\"\u26a1\" class=\"wp-smiley\" \/> PART 5: Lazy Evaluation &amp; Kernel Fusion\")\nprint(\"-\" * 60)\n\n\nN = 512\na = Tensor.randn(N, N)\nb = Tensor.randn(N, N)\n\n\nprint(\"Creating computation: (A @ B.T + A).sum()\")\nlazy_result = (a @ b.T + a).sum()\nprint(\"\u2192 No computation done yet (lazy evaluation)\")\n\n\nprint(\"nCalling .realize() to execute...\")\nstart = time.time()\nrealized = lazy_result.realize()\nelapsed = time.time() - start\n\n\nprint(f\"\u2713 Computed in {elapsed*1000:.2f}ms\")\nprint(f\"Result: {realized.numpy():.4f}\")\nprint(\"nNote: Operations were fused into optimized kernels!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We train the MiniGPT model on simple synthetic data and observe the loss decreasing across steps. We also explore Tinygrad\u2019s lazy execution model by creating a fused kernel that executes only when it is realized. As we monitor timings, we understand how kernel fusion improves performance. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> PART 6: Custom Operations\")\nprint(\"-\" * 60)\n\n\ndef custom_activation(x):\n   return x * x.sigmoid()\n\n\nx = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]], requires_grad=True)\ny = custom_activation(x)\nloss = y.sum()\nloss.backward()\n\n\nprint(f\"Input:    {x.numpy()}\")\nprint(f\"Swish(x): {y.numpy()}\")\nprint(f\"Gradient: {x.grad.numpy()}\")\n\n\nprint(\"nn\" + \"=\" * 60)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Tutorial Complete!\")\nprint(\"=\" * 60)\nprint(\"\"\"\nKey Concepts Covered:\n1. Tensor operations with automatic differentiation\n2. Custom neural network layers (Attention, Transformer)\n3. Building a mini-GPT language model from scratch\n4. Training loop with Adam optimizer\n5. Lazy evaluation and kernel fusion\n6. Custom activation functions\n\"\"\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement a custom activation function and verify that gradients propagate correctly through it. We then print a summary of all major concepts covered in the tutorial. As we finish, we reflect on how each section builds our ability to understand, modify, and extend deep learning internals using Tinygrad.<\/p>\n<p>In conclusion, we reinforce our understanding of how neural networks truly operate beneath modern abstractions, and we experience firsthand how Tinygrad empowers us to tinker with every internal detail. We have built a transformer, trained it on synthetic data, experimented with lazy evaluation and kernel fusion, and even created custom operations, all within a minimal, transparent framework. At last, we recognize how this workflow prepares us for deeper experimentation, whether we extend the model, integrate real datasets, or continue exploring Tinygrad\u2019s low-level capabilities.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/edit\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/11\/25\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\">How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we explore how to build neural networks from scratch using Tinygrad while remaining fully hands-on with tensors, autograd, attention mechanisms, and transformer architectures. We progressively build every component ourselves, from basic tensor operations to multi-head attention, transformer blocks, and, finally, a working mini-GPT model. Through each stage, we observe how Tinygrad\u2019s simplicity helps us understand what happens under the hood when models train, optimize, and fuse kernels for performance. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser import subprocess, sys, os print(&#8220;Installing dependencies&#8230;&#8221;) subprocess.check_call([&#8220;apt-get&#8221;, &#8220;install&#8221;, &#8220;-qq&#8221;, &#8220;clang&#8221;], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) subprocess.check_call([sys.executable, &#8220;-m&#8221;, &#8220;pip&#8221;, &#8220;install&#8221;, &#8220;-q&#8221;, &#8220;git+https:\/\/github.com\/tinygrad\/tinygrad.git&#8221;]) import numpy as np from tinygrad import Tensor, nn, Device from tinygrad.nn import optim import time print(f&#8221; Using device: {Device.DEFAULT}&#8221;) print(&#8220;=&#8221; * 60) print(&#8220;n PART 1: Tensor Operations &amp; Autograd&#8221;) print(&#8220;-&#8221; * 60) x = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True) y = Tensor([[2.0, 0.0], [1.0, 2.0]], requires_grad=True) z = (x @ y).sum() + (x ** 2).mean() z.backward() print(f&#8221;x:n{x.numpy()}&#8221;) print(f&#8221;y:n{y.numpy()}&#8221;) print(f&#8221;z (scalar): {z.numpy()}&#8221;) print(f&#8221;\u2202z\/\u2202x:n{x.grad.numpy()}&#8221;) print(f&#8221;\u2202z\/\u2202y:n{y.grad.numpy()}&#8221;) We set up Tinygrad in our Colab environment and immediately begin experimenting with tensors and automatic differentiation. We create a small computation graph and observe how gradients flow through matrix operations. As we print the outputs, we gain an intuitive understanding of how Tinygrad handles backpropagation under the hood. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser print(&#8220;nn PART 2: Building Custom Layers&#8221;) print(&#8220;-&#8221; * 60) class MultiHeadAttention: def __init__(self, dim, num_heads): self.num_heads = num_heads self.dim = dim self.head_dim = dim \/\/ num_heads self.qkv = Tensor.glorot_uniform(dim, 3 * dim) self.out = Tensor.glorot_uniform(dim, dim) def __call__(self, x): B, T, C = x.shape[0], x.shape[1], x.shape[2] qkv = x.reshape(B * T, C).dot(self.qkv).reshape(B, T, 3, self.num_heads, self.head_dim) q, k, v = qkv[:, :, 0], qkv[:, :, 1], qkv[:, :, 2] scale = (self.head_dim ** -0.5) attn = (q @ k.transpose(-2, -1)) * scale attn = attn.softmax(axis=-1) out = (attn @ v).transpose(1, 2).reshape(B, T, C) return out.reshape(B * T, C).dot(self.out).reshape(B, T, C) class TransformerBlock: def __init__(self, dim, num_heads): self.attn = MultiHeadAttention(dim, num_heads) self.ff1 = Tensor.glorot_uniform(dim, 4 * dim) self.ff2 = Tensor.glorot_uniform(4 * dim, dim) self.ln1_w = Tensor.ones(dim) self.ln2_w = Tensor.ones(dim) def __call__(self, x): x = x + self.attn(self._layernorm(x, self.ln1_w)) ff = x.reshape(-1, x.shape[-1]) ff = ff.dot(self.ff1).gelu().dot(self.ff2) x = x + ff.reshape(x.shape) return self._layernorm(x, self.ln2_w) def _layernorm(self, x, w): mean = x.mean(axis=-1, keepdim=True) var = ((x &#8211; mean) ** 2).mean(axis=-1, keepdim=True) return w * (x &#8211; mean) \/ (var + 1e-5).sqrt() We design our own multi-head attention module and a transformer block entirely from scratch. We implement the projections, attention scores, softmax, feedforward layers, and layer normalization manually. As we run this code, we see how each component contributes to a transformer layer\u2019s overall behavior. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser print(&#8220;n PART 3: Mini-GPT Architecture&#8221;) print(&#8220;-&#8221; * 60) class MiniGPT: def __init__(self, vocab_size=256, dim=128, num_heads=4, num_layers=2, max_len=32): self.vocab_size = vocab_size self.dim = dim self.tok_emb = Tensor.glorot_uniform(vocab_size, dim) self.pos_emb = Tensor.glorot_uniform(max_len, dim) self.blocks = [TransformerBlock(dim, num_heads) for _ in range(num_layers)] self.ln_f = Tensor.ones(dim) self.head = Tensor.glorot_uniform(dim, vocab_size) def __call__(self, idx): B, T = idx.shape[0], idx.shape[1] tok_emb = self.tok_emb[idx.flatten()].reshape(B, T, self.dim) pos_emb = self.pos_emb[:T].reshape(1, T, self.dim) x = tok_emb + pos_emb for block in self.blocks: x = block(x) mean = x.mean(axis=-1, keepdim=True) var = ((x &#8211; mean) ** 2).mean(axis=-1, keepdim=True) x = self.ln_f * (x &#8211; mean) \/ (var + 1e-5).sqrt() return x.reshape(B * T, self.dim).dot(self.head).reshape(B, T, self.vocab_size) def get_params(self): params = [self.tok_emb, self.pos_emb, self.ln_f, self.head] for block in self.blocks: params.extend([block.attn.qkv, block.attn.out, block.ff1, block.ff2, block.ln1_w, block.ln2_w]) return params model = MiniGPT(vocab_size=256, dim=64, num_heads=4, num_layers=2, max_len=16) params = model.get_params() total_params = sum(p.numel() for p in params) print(f&#8221;Model initialized with {total_params:,} parameters&#8221;) We assemble the full MiniGPT architecture using the components built earlier. We embed tokens, add positional information, stack multiple transformer blocks, and project the final outputs back to vocab logits. As we initialize the model, we begin to appreciate how a compact transformer can be built with surprisingly few moving parts. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser print(&#8220;nn PART 4: Training Loop&#8221;) print(&#8220;-&#8221; * 60) def gen_data(batch_size, seq_len): x = np.random.randint(0, 256, (batch_size, seq_len)) y = np.roll(x, 1, axis=1) y[:, 0] = x[:, 0] return Tensor(x, dtype=&#8217;int32&#8242;), Tensor(y, dtype=&#8217;int32&#8242;) optimizer = optim.Adam(params, lr=0.001) losses = [] print(&#8220;Training to predict previous token in sequence&#8230;&#8221;) with Tensor.train(): for step in range(20): start = time.time() x_batch, y_batch = gen_data(batch_size=16, seq_len=16) logits = model(x_batch) B, T, V = logits.shape[0], logits.shape[1], logits.shape[2] loss = logits.reshape(B * T, V).sparse_categorical_crossentropy(y_batch.reshape(B * T)) optimizer.zero_grad() loss.backward() optimizer.step() losses.append(loss.numpy()) elapsed = time.time() &#8211; start if step % 5 == 0: print(f&#8221;Step {step:3d} | Loss: {loss.numpy():.4f} | Time: {elapsed*1000:.1f}ms&#8221;) print(&#8220;nn PART 5: Lazy Evaluation &amp; Kernel Fusion&#8221;) print(&#8220;-&#8221; * 60) N = 512 a = Tensor.randn(N, N) b = Tensor.randn(N, N) print(&#8220;Creating computation: (A @ B.T + A).sum()&#8221;) lazy_result = (a @ b.T + a).sum() print(&#8220;\u2192 No computation done yet (lazy evaluation)&#8221;) print(&#8220;nCalling .realize() to execute&#8230;&#8221;) start = time.time() realized = lazy_result.realize() elapsed = time.time() &#8211; start print(f&#8221;\u2713 Computed in {elapsed*1000:.2f}ms&#8221;) print(f&#8221;Result: {realized.numpy():.4f}&#8221;) print(&#8220;nNote: Operations were fused into optimized kernels!&#8221;) We train the MiniGPT model on simple synthetic data and observe the loss decreasing across steps. We also explore Tinygrad\u2019s lazy execution model by creating a fused kernel that executes only when it is realized. As we monitor timings, we understand how kernel fusion improves performance. Check out the\u00a0FULL CODES here. Copy CodeCopiedUse a different Browser print(&#8220;nn PART 6: Custom Operations&#8221;) print(&#8220;-&#8221; * 60) def custom_activation(x): return x * x.sigmoid() x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]], requires_grad=True) y = custom_activation(x) loss = y.sum() loss.backward() print(f&#8221;Input: {x.numpy()}&#8221;) print(f&#8221;Swish(x): {y.numpy()}&#8221;) print(f&#8221;Gradient: {x.grad.numpy()}&#8221;) print(&#8220;nn&#8221; + &#8220;=&#8221; * 60) print(&#8221; Tutorial Complete!&#8221;) print(&#8220;=&#8221; * 60) print(&#8220;&#8221;&#8221; Key Concepts Covered: 1. Tensor operations with automatic differentiation 2. Custom neural network layers (Attention, Transformer) 3. Building a mini-GPT language model from scratch 4. Training loop with Adam optimizer 5. Lazy evaluation and kernel fusion 6. Custom activation functions &#8220;&#8221;&#8221;) We implement a custom activation function and verify that gradients propagate correctly through it. We then print a summary<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-53609","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/ja\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/ja\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-26T08:16:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals\",\"datePublished\":\"2025-11-26T08:16:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\"},\"wordCount\":563,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\",\"url\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\",\"name\":\"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\",\"datePublished\":\"2025-11-26T08:16:43+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage\",\"url\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\",\"contentUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/ja\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/ja\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/","og_locale":"ja_JP","og_type":"article","og_title":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/ja\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-11-26T08:16:43+00:00","og_image":[{"url":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"admin NU","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"7\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals","datePublished":"2025-11-26T08:16:43+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/"},"wordCount":563,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"ja","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/","url":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/","name":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png","datePublished":"2025-11-26T08:16:43+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#primaryimage","url":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png","contentUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-to-implement-functional-components-of-transformer-and-mini-gpt-model-from-scratch-using-tinygrad-to-understand-deep-learning-internals\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/ja\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/ja\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/ja\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we explore how to build neural networks from scratch using Tinygrad while remaining fully hands-on with tensors, autograd, attention mechanisms, and transformer architectures. We progressively build every component ourselves, from basic tensor operations to multi-head attention, transformer blocks, and, finally, a working mini-GPT model. Through each stage, we observe how Tinygrad\u2019s simplicity&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/53609","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/comments?post=53609"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/53609\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media?parent=53609"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/categories?post=53609"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/tags?post=53609"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}