{"id":82309,"date":"2026-04-09T14:59:05","date_gmt":"2026-04-09T14:59:05","guid":{"rendered":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/"},"modified":"2026-04-09T14:59:05","modified_gmt":"2026-04-09T14:59:05","slug":"sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context","status":"publish","type":"post","link":"https:\/\/youzum.net\/it\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/","title":{"rendered":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context"},"content":{"rendered":"<p>A deep neural network can be understood as a geometric system, where each layer reshapes the input space to form increasingly complex decision boundaries. For this to work effectively, layers must preserve meaningful spatial information \u2014 particularly how far a data point lies from these boundaries \u2014 since this distance enables deeper layers to build rich, non-linear representations.<\/p>\n<p>Sigmoid disrupts this process by compressing all inputs into a narrow range between 0 and 1. As values move away from decision boundaries, they become indistinguishable, causing a loss of geometric context across layers. This leads to weaker representations and limits the effectiveness of depth.<\/p>\n<p>ReLU, on the other hand, preserves magnitude for positive inputs, allowing distance information to flow through the network. This enables deeper models to remain expressive without requiring excessive width or compute.<\/p>\n<p>In this article, we focus on this forward-pass behavior \u2014 analyzing how Sigmoid and ReLU differ in signal propagation and representation geometry using a two-moons experiment, and what that means for inference efficiency and scalability.<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"803\" height=\"507\" data-attachment-id=\"78878\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-414\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-11.png\" data-orig-size=\"803,507\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-11-300x189.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-11.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-11.png\" alt=\"\" class=\"wp-image-78878\" \/><\/figure>\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" width=\"816\" height=\"536\" data-attachment-id=\"78879\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-414\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-10.png\" data-orig-size=\"816,536\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-10-300x197.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-10.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-10.png\" alt=\"\" class=\"wp-image-78879\" \/><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Setting up the dependencies<\/strong><\/h3>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib.gridspec as gridspec\nfrom matplotlib.colors import ListedColormap\nfrom sklearn.datasets import make_moons\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split<\/code><\/pre>\n<\/div>\n<\/div>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">plt.rcParams.update({\n    \"font.family\":        \"monospace\",\n    \"axes.spines.top\":    False,\n    \"axes.spines.right\":  False,\n    \"figure.facecolor\":   \"white\",\n    \"axes.facecolor\":     \"#f7f7f7\",\n    \"axes.grid\":          True,\n    \"grid.color\":         \"#e0e0e0\",\n    \"grid.linewidth\":     0.6,\n})\n \nT = {                          \n    \"bg\":      \"white\",\n    \"panel\":   \"#f7f7f7\",\n    \"sig\":     \"#e05c5c\",      \n    \"relu\":    \"#3a7bd5\",      \n    \"c0\":      \"#f4a261\",      \n    \"c1\":      \"#2a9d8f\",      \n    \"text\":    \"#1a1a1a\",\n    \"muted\":   \"#666666\",\n}<\/code><\/pre>\n<\/div>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Creating the dataset<\/strong><\/h3>\n<p>To study the effect of activation functions in a controlled setting, we first generate a synthetic dataset using scikit-learn\u2019s make_moons. This creates a non-linear, two-class problem where simple linear boundaries fail, making it ideal for testing how well neural networks learn complex decision surfaces.<\/p>\n<p>We add a small amount of noise to make the task more realistic, then standardize the features using StandardScaler so both dimensions are on the same scale \u2014 ensuring stable training. The dataset is then split into training and test sets to evaluate generalization.<\/p>\n<p>Finally, we visualize the data distribution. This plot serves as the baseline geometry that both Sigmoid and ReLU networks will attempt to model, allowing us to later compare how each activation function transforms this space across layers.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">X, y = make_moons(n_samples=400, noise=0.18, random_state=42)\nX = StandardScaler().fit_transform(X)\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, test_size=0.25, random_state=42\n)\n\nfig, ax = plt.subplots(figsize=(7, 5))\nfig.patch.set_facecolor(T[\"bg\"])\nax.set_facecolor(T[\"panel\"])\nax.scatter(X[y == 0, 0], X[y == 0, 1], c=T[\"c0\"], s=40,\n           edgecolors=\"white\", linewidths=0.5, label=\"Class 0\", alpha=0.9)\nax.scatter(X[y == 1, 0], X[y == 1, 1], c=T[\"c1\"], s=40,\n           edgecolors=\"white\", linewidths=0.5, label=\"Class 1\", alpha=0.9)\nax.set_title(\"make_moons -- our dataset\", color=T[\"text\"], fontsize=13)\nax.set_xlabel(\"x\u2081\", color=T[\"muted\"]); ax.set_ylabel(\"x\u2082\", color=T[\"muted\"])\nax.tick_params(colors=T[\"muted\"]); ax.legend(fontsize=10)\nplt.tight_layout()\nplt.savefig(\"moons_dataset.png\", dpi=140, bbox_inches=\"tight\")\nplt.show()<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" width=\"690\" height=\"490\" data-attachment-id=\"78876\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-412\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-8.png\" data-orig-size=\"690,490\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-8-300x213.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-8.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-8.png\" alt=\"\" class=\"wp-image-78876\" \/><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Creating the Network<\/strong><\/h3>\n<p>Next, we implement a small, controlled neural network to isolate the effect of activation functions. The goal here is not to build a highly optimized model, but to create a clean experimental setup where Sigmoid and ReLU can be compared under identical conditions.<\/p>\n<p>We define both activation functions (Sigmoid and ReLU) along with their derivatives, and use binary cross-entropy as the loss since this is a binary classification task. The TwoLayerNet class represents a simple 3-layer feedforward network (2 hidden layers + output), where the only configurable component is the activation function.<\/p>\n<p>A key detail is the initialization strategy: we use He initialization for ReLU and Xavier initialization for Sigmoid, ensuring that each network starts in a fair and stable regime based on its activation dynamics.<\/p>\n<p>The forward pass computes activations layer by layer, while the backward pass performs standard gradient descent updates. Importantly, we also include diagnostic methods like get_hidden and get_z_trace, which allow us to inspect how signals evolve across layers \u2014 this is crucial for analyzing how much geometric information is preserved or lost.<\/p>\n<p>By keeping architecture, data, and training setup constant, this implementation ensures that any difference in performance or internal representations can be directly attributed to the activation function itself \u2014 setting the stage for a clear comparison of their impact on signal propagation and expressiveness.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def sigmoid(z):      return 1 \/ (1 + np.exp(-np.clip(z, -500, 500)))\ndef sigmoid_d(a):    return a * (1 - a)\ndef relu(z):         return np.maximum(0, z)\ndef relu_d(z):       return (z &gt; 0).astype(float)\ndef bce(y, yhat):    return -np.mean(y * np.log(yhat + 1e-9) + (1 - y) * np.log(1 - yhat + 1e-9))\n\nclass TwoLayerNet:\n    def __init__(self, activation=\"relu\", seed=0):\n        np.random.seed(seed)\n        self.act_name = activation\n        self.act  = relu    if activation == \"relu\" else sigmoid\n        self.dact = relu_d  if activation == \"relu\" else sigmoid_d\n\n        # He init for ReLU, Xavier for Sigmoid\n        scale = lambda fan_in: np.sqrt(2 \/ fan_in) if activation == \"relu\" else np.sqrt(1 \/ fan_in)\n        self.W1 = np.random.randn(2, 8)  * scale(2)\n        self.b1 = np.zeros((1, 8))\n        self.W2 = np.random.randn(8, 8)  * scale(8)\n        self.b2 = np.zeros((1, 8))\n        self.W3 = np.random.randn(8, 1)  * scale(8)\n        self.b3 = np.zeros((1, 1))\n        self.loss_history = []\n\n    def forward(self, X, store=False):\n        z1 = X  @ self.W1 + self.b1;  a1 = self.act(z1)\n        z2 = a1 @ self.W2 + self.b2;  a2 = self.act(z2)\n        z3 = a2 @ self.W3 + self.b3;  out = sigmoid(z3)\n        if store:\n            self._cache = (X, z1, a1, z2, a2, z3, out)\n        return out\n\n    def backward(self, lr=0.05):\n        X, z1, a1, z2, a2, z3, out = self._cache\n        n = X.shape[0]\n\n        dout = (out - self.y_cache) \/ n\n        dW3 = a2.T @ dout;  db3 = dout.sum(axis=0, keepdims=True)\n        da2 = dout @ self.W3.T\n        dz2 = da2 * (self.dact(z2) if self.act_name == \"relu\" else self.dact(a2))\n        dW2 = a1.T @ dz2;  db2 = dz2.sum(axis=0, keepdims=True)\n        da1 = dz2 @ self.W2.T\n        dz1 = da1 * (self.dact(z1) if self.act_name == \"relu\" else self.dact(a1))\n        dW1 = X.T  @ dz1;  db1 = dz1.sum(axis=0, keepdims=True)\n\n        for p, g in [(self.W3,dW3),(self.b3,db3),(self.W2,dW2),\n                     (self.b2,db2),(self.W1,dW1),(self.b1,db1)]:\n            p -= lr * g\n\n    def train_step(self, X, y, lr=0.05):\n        self.y_cache = y.reshape(-1, 1)\n        out = self.forward(X, store=True)\n        loss = bce(self.y_cache, out)\n        self.backward(lr)\n        return loss\n\n    def get_hidden(self, X, layer=1):\n        \"\"\"Return post-activation values for layer 1 or 2.\"\"\"\n        z1 = X @ self.W1 + self.b1;  a1 = self.act(z1)\n        if layer == 1: return a1\n        z2 = a1 @ self.W2 + self.b2; return self.act(z2)\n\n    def get_z_trace(self, x_single):\n        \"\"\"Return pre-activation magnitudes per layer for ONE sample.\"\"\"\n        z1 = x_single @ self.W1 + self.b1\n        a1 = self.act(z1)\n        z2 = a1 @ self.W2 + self.b2\n        a2 = self.act(z2)\n        z3 = a2 @ self.W3 + self.b3\n        return [np.abs(z1).mean(), np.abs(a1).mean(),\n                np.abs(z2).mean(), np.abs(a2).mean(),\n                np.abs(z3).mean()]<\/code><\/pre>\n<\/div>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Training the Networks<\/strong><\/h3>\n<p>Now we train both networks under identical conditions to ensure a fair comparison. We initialize two models \u2014 one using Sigmoid and the other using ReLU \u2014 with the same random seed so they start from equivalent weight configurations.<\/p>\n<p>The training loop runs for 800 epochs using mini-batch gradient descent. In each epoch, we shuffle the training data, split it into batches, and update both networks in parallel. This setup guarantees that the only variable changing between the two runs is the activation function.<\/p>\n<p>We also track the loss after every epoch and log it at regular intervals. This allows us to observe how each network evolves over time \u2014 not just in terms of convergence speed, but whether it continues improving or plateaus.<\/p>\n<p>This step is critical because it establishes the first signal of divergence: if both models start identically but behave differently during training, that difference must come from how each activation function propagates and preserves information through the network.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">EPOCHS = 800\nLR     = 0.05\nBATCH  = 64\n\nnet_sig  = TwoLayerNet(\"sigmoid\", seed=42)\nnet_relu = TwoLayerNet(\"relu\",    seed=42)\n\nfor epoch in range(EPOCHS):\n    idx = np.random.permutation(len(X_train))\n    for net in [net_sig, net_relu]:\n        epoch_loss = []\n        for i in range(0, len(idx), BATCH):\n            b = idx[i:i+BATCH]\n            loss = net.train_step(X_train[b], y_train[b], LR)\n            epoch_loss.append(loss)\n        net.loss_history.append(np.mean(epoch_loss))\n\n    if (epoch + 1) % 200 == 0:\n        ls = net_sig.loss_history[-1]\n        lr = net_relu.loss_history[-1]\n        print(f\"  Epoch {epoch+1:4d} | Sigmoid loss: {ls:.4f} | ReLU loss: {lr:.4f}\")\n\nprint(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Training complete.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"812\" height=\"517\" data-attachment-id=\"78877\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-413\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-9.png\" data-orig-size=\"812,517\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-9-300x191.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-9.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-9.png\" alt=\"\" class=\"wp-image-78877\" \/><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Training Loss Curve<\/strong><\/h3>\n<p>The loss curves make the divergence between Sigmoid and ReLU very clear. Both networks start from the same initialization and are trained under identical conditions, yet their learning trajectories quickly separate. Sigmoid improves initially but plateaus around ~0.28 by epoch 400, showing almost no progress afterward \u2014 a sign that the network has exhausted the useful signal it can extract.<\/p>\n<p>ReLU, in contrast, continues to steadily reduce loss throughout training, dropping from ~0.15 to ~0.03 by epoch 800. This isn\u2019t just faster convergence; it reflects a deeper issue: Sigmoid\u2019s compression is limiting the flow of meaningful information, causing the model to stall, while ReLU preserves that signal, allowing the network to keep refining its decision boundary.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">fig, ax = plt.subplots(figsize=(10, 5))\nfig.patch.set_facecolor(T[\"bg\"])\nax.set_facecolor(T[\"panel\"])\n\nax.plot(net_sig.loss_history,  color=T[\"sig\"],  lw=2.5, label=\"Sigmoid\")\nax.plot(net_relu.loss_history, color=T[\"relu\"], lw=2.5, label=\"ReLU\")\n\nax.set_xlabel(\"Epoch\", color=T[\"muted\"])\nax.set_ylabel(\"Binary Cross-Entropy Loss\", color=T[\"muted\"])\nax.set_title(\"Training Loss -- same architecture, same init, same LRnonly the activation differs\",\n             color=T[\"text\"], fontsize=12)\nax.legend(fontsize=11)\nax.tick_params(colors=T[\"muted\"])\n\n# Annotate final losses\nfor net, color, va in [(net_sig, T[\"sig\"], \"bottom\"), (net_relu, T[\"relu\"], \"top\")]:\n    final = net.loss_history[-1]\n    ax.annotate(f\"  final: {final:.4f}\", xy=(EPOCHS-1, final),\n                color=color, fontsize=9, va=va)\n\nplt.tight_layout()\nplt.savefig(\"loss_curves.png\", dpi=140, bbox_inches=\"tight\")\nplt.show()<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"987\" height=\"489\" data-attachment-id=\"78875\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-411\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-7.png\" data-orig-size=\"987,489\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-7-300x149.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-7.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-7.png\" alt=\"\" class=\"wp-image-78875\" \/><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Decision Boundary Plots<\/strong><\/h3>\n<p>The decision boundary visualization makes the difference even more tangible. The Sigmoid network learns a nearly linear boundary, failing to capture the curved structure of the two-moons dataset, which results in lower accuracy (~79%). This is a direct consequence of its compressed internal representations \u2014 the network simply doesn\u2019t have enough geometric signal to construct a complex boundary.<\/p>\n<p>In contrast, the ReLU network learns a highly non-linear, well-adapted boundary that closely follows the data distribution, achieving much higher accuracy (~96%). Because ReLU preserves magnitude across layers, it enables the network to progressively bend and refine the decision surface, turning depth into actual expressive power rather than wasted capacity.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def plot_boundary(ax, net, X, y, title, color):\n    h = 0.025\n    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n                         np.arange(y_min, y_max, h))\n    grid = np.c_[xx.ravel(), yy.ravel()]\n    Z = net.forward(grid).reshape(xx.shape)\n\n    # Soft shading\n    cmap_bg = ListedColormap([\"#fde8c8\", \"#c8ece9\"])\n    ax.contourf(xx, yy, Z, levels=50, cmap=cmap_bg, alpha=0.85)\n    ax.contour(xx, yy, Z, levels=[0.5], colors=[color], linewidths=2)\n\n    ax.scatter(X[y==0, 0], X[y==0, 1], c=T[\"c0\"], s=35,\n               edgecolors=\"white\", linewidths=0.4, alpha=0.9)\n    ax.scatter(X[y==1, 0], X[y==1, 1], c=T[\"c1\"], s=35,\n               edgecolors=\"white\", linewidths=0.4, alpha=0.9)\n\n    acc = ((net.forward(X) &gt;= 0.5).ravel() == y).mean()\n    ax.set_title(f\"{title}nTest acc: {acc:.1%}\", color=color, fontsize=12)\n    ax.set_xlabel(\"x\u2081\", color=T[\"muted\"]); ax.set_ylabel(\"x\u2082\", color=T[\"muted\"])\n    ax.tick_params(colors=T[\"muted\"])\n\nfig, axes = plt.subplots(1, 2, figsize=(13, 5.5))\nfig.patch.set_facecolor(T[\"bg\"])\nfig.suptitle(\"Decision Boundaries learned on make_moons\",\n             fontsize=13, color=T[\"text\"])\n\nplot_boundary(axes[0], net_sig,  X_test, y_test, \"Sigmoid\", T[\"sig\"])\nplot_boundary(axes[1], net_relu, X_test, y_test, \"ReLU\",    T[\"relu\"])\n\nplt.tight_layout()\nplt.savefig(\"decision_boundaries.png\", dpi=140, bbox_inches=\"tight\")\nplt.show()<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"431\" data-attachment-id=\"78883\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-418\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-15.png\" data-orig-size=\"1289,543\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-15-300x126.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-15-1024x431.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-15-1024x431.png\" alt=\"\" class=\"wp-image-78883\" \/><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Layer-by-Layer Signal Trace<\/strong><\/h3>\n<p>This chart tracks how the signal evolves across layers for a point far from the decision boundary \u2014 and it clearly shows where Sigmoid fails. Both networks start with similar pre-activation magnitude at the first layer (~2.0), but Sigmoid immediately compresses it to ~0.3, while ReLU retains a higher value. As we move deeper, Sigmoid continues to squash the signal into a narrow band (0.5\u20130.6), effectively erasing meaningful differences. ReLU, on the other hand, preserves and amplifies magnitude, with the final layer reaching values as high as 9\u201320.<\/p>\n<p>This means the output neuron in the ReLU network is making decisions based on a strong, well-separated signal, while the Sigmoid network is forced to classify using a weak, compressed one. The key takeaway is that ReLU preserves distance from the decision boundary across layers, allowing that information to compound, whereas Sigmoid progressively destroys it.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">far_class0 = X_train[y_train == 0][np.argmax(\n    np.linalg.norm(X_train[y_train == 0] - [-1.2, -0.3], axis=1)\n)]\nfar_class1 = X_train[y_train == 1][np.argmax(\n    np.linalg.norm(X_train[y_train == 1] - [1.2, 0.3], axis=1)\n)]\n\nstage_labels = [\"z\u2081 (pre)\", \"a\u2081 (post)\", \"z\u2082 (pre)\", \"a\u2082 (post)\", \"z\u2083 (out)\"]\nx_pos = np.arange(len(stage_labels))\n\nfig, axes = plt.subplots(1, 2, figsize=(13, 5.5))\nfig.patch.set_facecolor(T[\"bg\"])\nfig.suptitle(\"Layer-by-layer signal magnitude -- a point far from the boundary\",\n             fontsize=12, color=T[\"text\"])\n\nfor ax, sample, title in zip(\n    axes,\n    [far_class0, far_class1],\n    [\"Class 0 sample (deep in its moon)\", \"Class 1 sample (deep in its moon)\"]\n):\n    ax.set_facecolor(T[\"panel\"])\n    sig_trace  = net_sig.get_z_trace(sample.reshape(1, -1))\n    relu_trace = net_relu.get_z_trace(sample.reshape(1, -1))\n\n    ax.plot(x_pos, sig_trace,  \"o-\", color=T[\"sig\"],  lw=2.5, markersize=8, label=\"Sigmoid\")\n    ax.plot(x_pos, relu_trace, \"s-\", color=T[\"relu\"], lw=2.5, markersize=8, label=\"ReLU\")\n\n    for i, (s, r) in enumerate(zip(sig_trace, relu_trace)):\n        ax.text(i, s - 0.06, f\"{s:.3f}\", ha=\"center\", fontsize=8, color=T[\"sig\"])\n        ax.text(i, r + 0.04, f\"{r:.3f}\", ha=\"center\", fontsize=8, color=T[\"relu\"])\n\n    ax.set_xticks(x_pos); ax.set_xticklabels(stage_labels, color=T[\"muted\"], fontsize=9)\n    ax.set_ylabel(\"Mean |activation|\", color=T[\"muted\"])\n    ax.set_title(title, color=T[\"text\"], fontsize=11)\n    ax.tick_params(colors=T[\"muted\"]); ax.legend(fontsize=10)\n\nplt.tight_layout()\nplt.savefig(\"signal_trace.png\", dpi=140, bbox_inches=\"tight\")\nplt.show()<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"433\" data-attachment-id=\"78880\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-415\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-12.png\" data-orig-size=\"1285,543\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-12-300x127.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-12-1024x433.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-12-1024x433.png\" alt=\"\" class=\"wp-image-78880\" \/><\/figure>\n<h1 class=\"wp-block-heading\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marktechpost.com\/d1468d47-7595-40b7-8471-1f7351288c6f\" width=\"624\" height=\"480\" \/><\/h1>\n<h3 class=\"wp-block-heading\"><strong>Hidden Space Scatter<\/strong><\/h3>\n<p>This is the most important visualization because it directly exposes how each network uses (or fails to use) depth. In the Sigmoid network (left), both classes collapse into a tight, overlapping region \u2014 a diagonal smear where points are heavily entangled. The standard deviation actually decreases from layer 1 (0.26) to layer 2 (0.19), meaning the representation is becoming less expressive with depth. Each layer is compressing the signal further, stripping away the spatial structure needed to separate the classes.<\/p>\n<p>ReLU shows the opposite behavior. In layer 1, while some neurons are inactive (the \u201cdead zone\u201d), the active ones already spread across a wider range (1.15 std), indicating preserved variation. By layer 2, this expands even further (1.67 std), and the classes become clearly separable \u2014 one is pushed to high activation ranges while the other remains near zero. At this point, the output layer\u2019s job is trivial.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">fig, axes = plt.subplots(2, 2, figsize=(13, 10))\nfig.patch.set_facecolor(T[\"bg\"])\nfig.suptitle(\"Hidden-space representations on make_moons test set\",\n             fontsize=13, color=T[\"text\"])\n\nfor col, (net, color, name) in enumerate([\n    (net_sig,  T[\"sig\"],  \"Sigmoid\"),\n    (net_relu, T[\"relu\"], \"ReLU\"),\n]):\n    for row, layer in enumerate([1, 2]):\n        ax = axes[row][col]\n        ax.set_facecolor(T[\"panel\"])\n        H = net.get_hidden(X_test, layer=layer)\n\n        ax.scatter(H[y_test==0, 0], H[y_test==0, 1], c=T[\"c0\"], s=40,\n                   edgecolors=\"white\", linewidths=0.4, alpha=0.85, label=\"Class 0\")\n        ax.scatter(H[y_test==1, 0], H[y_test==1, 1], c=T[\"c1\"], s=40,\n                   edgecolors=\"white\", linewidths=0.4, alpha=0.85, label=\"Class 1\")\n\n        spread = H.std()\n        ax.text(0.04, 0.96, f\"std: {spread:.4f}\",\n                transform=ax.transAxes, fontsize=9, va=\"top\",\n                color=T[\"text\"],\n                bbox=dict(boxstyle=\"round,pad=0.3\", fc=\"white\", ec=color, alpha=0.85))\n\n        ax.set_title(f\"{name}  --  Layer {layer} hidden space\",\n                     color=color, fontsize=11)\n        ax.set_xlabel(f\"Unit 1\", color=T[\"muted\"])\n        ax.set_ylabel(f\"Unit 2\", color=T[\"muted\"])\n        ax.tick_params(colors=T[\"muted\"])\n        if row == 0 and col == 0: ax.legend(fontsize=9)\n\nplt.tight_layout()\nplt.savefig(\"hidden_space.png\", dpi=140, bbox_inches=\"tight\")\nplt.show()<\/code><\/pre>\n<\/div>\n<\/div>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"782\" data-attachment-id=\"78882\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-417\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-14.png\" data-orig-size=\"1289,985\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-14-300x229.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-14-1024x782.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-14-1024x782.png\" alt=\"\" class=\"wp-image-78882\" \/><\/figure>\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"818\" height=\"505\" data-attachment-id=\"78881\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/image-416\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-13.png\" data-orig-size=\"818,505\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-13-300x185.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-13.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/image-13.png\" alt=\"\" class=\"wp-image-78881\" \/><\/figure>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Deep%20Learning\/Sigmoid_Relu.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\">Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>A deep neural network can be understood as a geometric system, where each layer reshapes the input space to form increasingly complex decision boundaries. For this to work effectively, layers must preserve meaningful spatial information \u2014 particularly how far a data point lies from these boundaries \u2014 since this distance enables deeper layers to build rich, non-linear representations. Sigmoid disrupts this process by compressing all inputs into a narrow range between 0 and 1. As values move away from decision boundaries, they become indistinguishable, causing a loss of geometric context across layers. This leads to weaker representations and limits the effectiveness of depth. ReLU, on the other hand, preserves magnitude for positive inputs, allowing distance information to flow through the network. This enables deeper models to remain expressive without requiring excessive width or compute. In this article, we focus on this forward-pass behavior \u2014 analyzing how Sigmoid and ReLU differ in signal propagation and representation geometry using a two-moons experiment, and what that means for inference efficiency and scalability. Setting up the dependencies Copy CodeCopiedUse a different Browser import numpy as np import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec from matplotlib.colors import ListedColormap from sklearn.datasets import make_moons from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split Copy CodeCopiedUse a different Browser plt.rcParams.update({ &#8220;font.family&#8221;: &#8220;monospace&#8221;, &#8220;axes.spines.top&#8221;: False, &#8220;axes.spines.right&#8221;: False, &#8220;figure.facecolor&#8221;: &#8220;white&#8221;, &#8220;axes.facecolor&#8221;: &#8220;#f7f7f7&#8221;, &#8220;axes.grid&#8221;: True, &#8220;grid.color&#8221;: &#8220;#e0e0e0&#8221;, &#8220;grid.linewidth&#8221;: 0.6, }) T = { &#8220;bg&#8221;: &#8220;white&#8221;, &#8220;panel&#8221;: &#8220;#f7f7f7&#8221;, &#8220;sig&#8221;: &#8220;#e05c5c&#8221;, &#8220;relu&#8221;: &#8220;#3a7bd5&#8221;, &#8220;c0&#8221;: &#8220;#f4a261&#8221;, &#8220;c1&#8221;: &#8220;#2a9d8f&#8221;, &#8220;text&#8221;: &#8220;#1a1a1a&#8221;, &#8220;muted&#8221;: &#8220;#666666&#8221;, } Creating the dataset To study the effect of activation functions in a controlled setting, we first generate a synthetic dataset using scikit-learn\u2019s make_moons. This creates a non-linear, two-class problem where simple linear boundaries fail, making it ideal for testing how well neural networks learn complex decision surfaces. We add a small amount of noise to make the task more realistic, then standardize the features using StandardScaler so both dimensions are on the same scale \u2014 ensuring stable training. The dataset is then split into training and test sets to evaluate generalization. Finally, we visualize the data distribution. This plot serves as the baseline geometry that both Sigmoid and ReLU networks will attempt to model, allowing us to later compare how each activation function transforms this space across layers. Copy CodeCopiedUse a different Browser X, y = make_moons(n_samples=400, noise=0.18, random_state=42) X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.25, random_state=42 ) fig, ax = plt.subplots(figsize=(7, 5)) fig.patch.set_facecolor(T[&#8220;bg&#8221;]) ax.set_facecolor(T[&#8220;panel&#8221;]) ax.scatter(X[y == 0, 0], X[y == 0, 1], c=T[&#8220;c0&#8243;], s=40, edgecolors=&#8221;white&#8221;, linewidths=0.5, label=&#8221;Class 0&#8243;, alpha=0.9) ax.scatter(X[y == 1, 0], X[y == 1, 1], c=T[&#8220;c1&#8243;], s=40, edgecolors=&#8221;white&#8221;, linewidths=0.5, label=&#8221;Class 1&#8243;, alpha=0.9) ax.set_title(&#8220;make_moons &#8212; our dataset&#8221;, color=T[&#8220;text&#8221;], fontsize=13) ax.set_xlabel(&#8220;x\u2081&#8221;, color=T[&#8220;muted&#8221;]); ax.set_ylabel(&#8220;x\u2082&#8221;, color=T[&#8220;muted&#8221;]) ax.tick_params(colors=T[&#8220;muted&#8221;]); ax.legend(fontsize=10) plt.tight_layout() plt.savefig(&#8220;moons_dataset.png&#8221;, dpi=140, bbox_inches=&#8221;tight&#8221;) plt.show() Creating the Network Next, we implement a small, controlled neural network to isolate the effect of activation functions. The goal here is not to build a highly optimized model, but to create a clean experimental setup where Sigmoid and ReLU can be compared under identical conditions. We define both activation functions (Sigmoid and ReLU) along with their derivatives, and use binary cross-entropy as the loss since this is a binary classification task. The TwoLayerNet class represents a simple 3-layer feedforward network (2 hidden layers + output), where the only configurable component is the activation function. A key detail is the initialization strategy: we use He initialization for ReLU and Xavier initialization for Sigmoid, ensuring that each network starts in a fair and stable regime based on its activation dynamics. The forward pass computes activations layer by layer, while the backward pass performs standard gradient descent updates. Importantly, we also include diagnostic methods like get_hidden and get_z_trace, which allow us to inspect how signals evolve across layers \u2014 this is crucial for analyzing how much geometric information is preserved or lost. By keeping architecture, data, and training setup constant, this implementation ensures that any difference in performance or internal representations can be directly attributed to the activation function itself \u2014 setting the stage for a clear comparison of their impact on signal propagation and expressiveness. Copy CodeCopiedUse a different Browser def sigmoid(z): return 1 \/ (1 + np.exp(-np.clip(z, -500, 500))) def sigmoid_d(a): return a * (1 &#8211; a) def relu(z): return np.maximum(0, z) def relu_d(z): return (z &gt; 0).astype(float) def bce(y, yhat): return -np.mean(y * np.log(yhat + 1e-9) + (1 &#8211; y) * np.log(1 &#8211; yhat + 1e-9)) class TwoLayerNet: def __init__(self, activation=&#8221;relu&#8221;, seed=0): np.random.seed(seed) self.act_name = activation self.act = relu if activation == &#8220;relu&#8221; else sigmoid self.dact = relu_d if activation == &#8220;relu&#8221; else sigmoid_d # He init for ReLU, Xavier for Sigmoid scale = lambda fan_in: np.sqrt(2 \/ fan_in) if activation == &#8220;relu&#8221; else np.sqrt(1 \/ fan_in) self.W1 = np.random.randn(2, 8) * scale(2) self.b1 = np.zeros((1, 8)) self.W2 = np.random.randn(8, 8) * scale(8) self.b2 = np.zeros((1, 8)) self.W3 = np.random.randn(8, 1) * scale(8) self.b3 = np.zeros((1, 1)) self.loss_history = [] def forward(self, X, store=False): z1 = X @ self.W1 + self.b1; a1 = self.act(z1) z2 = a1 @ self.W2 + self.b2; a2 = self.act(z2) z3 = a2 @ self.W3 + self.b3; out = sigmoid(z3) if store: self._cache = (X, z1, a1, z2, a2, z3, out) return out def backward(self, lr=0.05): X, z1, a1, z2, a2, z3, out = self._cache n = X.shape[0] dout = (out &#8211; self.y_cache) \/ n dW3 = a2.T @ dout; db3 = dout.sum(axis=0, keepdims=True) da2 = dout @ self.W3.T dz2 = da2 * (self.dact(z2) if self.act_name == &#8220;relu&#8221; else self.dact(a2)) dW2 = a1.T @ dz2; db2 = dz2.sum(axis=0, keepdims=True) da1 = dz2 @ self.W2.T dz1 = da1 * (self.dact(z1) if self.act_name == &#8220;relu&#8221; else self.dact(a1)) dW1 = X.T @ dz1; db1 = dz1.sum(axis=0, keepdims=True) for p, g in [(self.W3,dW3),(self.b3,db3),(self.W2,dW2), (self.b2,db2),(self.W1,dW1),(self.b1,db1)]: p -= lr * g def train_step(self, X, y, lr=0.05): self.y_cache = y.reshape(-1, 1) out = self.forward(X, store=True) loss = bce(self.y_cache, out) self.backward(lr) return loss def get_hidden(self, X, layer=1): &#8220;&#8221;&#8221;Return post-activation values for layer 1 or 2.&#8221;&#8221;&#8221; z1 = X @ self.W1 + self.b1; a1 = self.act(z1) if<\/p>","protected":false},"author":2,"featured_media":82310,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-82309","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/it\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\" \/>\n<meta property=\"og:locale\" content=\"it_IT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/it\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-09T14:59:05+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Scritto da\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Tempo di lettura stimato\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minuti\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context\",\"datePublished\":\"2026-04-09T14:59:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\"},\"wordCount\":1347,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\",\"url\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\",\"name\":\"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png\",\"datePublished\":\"2026-04-09T14:59:05+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#breadcrumb\"},\"inLanguage\":\"it-IT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png\",\"width\":803,\"height\":507},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"it-IT\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"it-IT\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/it\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/it\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/","og_locale":"it_IT","og_type":"article","og_title":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/it\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-04-09T14:59:05+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Scritto da":"admin NU","Tempo di lettura stimato":"14 minuti"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context","datePublished":"2026-04-09T14:59:05+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/"},"wordCount":1347,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"it-IT","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/","url":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/","name":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png","datePublished":"2026-04-09T14:59:05+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#breadcrumb"},"inLanguage":"it-IT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/"]}]},{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png","width":803,"height":507},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/sigmoid-vs-relu-activation-functions-the-inference-cost-of-losing-geometric-context\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"it-IT"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"it-IT","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/it\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-300x189.png",300,189,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu.png",803,507,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-18x12.png",18,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-600x379.png",600,379,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/04\/image-11-acrDQu-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/it\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/it\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/it\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"A deep neural network can be understood as a geometric system, where each layer reshapes the input space to form increasingly complex decision boundaries. For this to work effectively, layers must preserve meaningful spatial information \u2014 particularly how far a data point lies from these boundaries \u2014 since this distance enables deeper layers to build&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/82309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/comments?post=82309"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/posts\/82309\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media\/82310"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/media?parent=82309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/categories?post=82309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/it\/wp-json\/wp\/v2\/tags?post=82309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}