{"id":32931,"date":"2025-08-20T06:08:55","date_gmt":"2025-08-20T06:08:55","guid":{"rendered":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/"},"modified":"2025-08-20T06:08:55","modified_gmt":"2025-08-20T06:08:55","slug":"a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface","status":"publish","type":"post","link":"https:\/\/youzum.net\/ja\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/","title":{"rendered":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface"},"content":{"rendered":"<p>In this tutorial, we implement a fully functional <a href=\"https:\/\/github.com\/ollama\/ollama\"><strong>Ollama<\/strong><\/a> environment inside Google Colab to replicate a self-hosted LLM workflow. We begin by installing Ollama directly on the Colab VM using the official Linux installer and then launch the Ollama server in the background to expose the HTTP API on localhost:11434. After verifying the service, we pull lightweight models such as qwen2.5:0.5b-instruct or llama3.2:1b, which balance resource constraints with usability in a CPU-only environment. To interact with these models programmatically, we use the \/api\/chat endpoint via Python\u2019s requests module with streaming enabled, which allows token-level output to be captured incrementally. Finally, we layer a <a href=\"https:\/\/github.com\/gradio-app\/gradio\"><strong>Gradio<\/strong><\/a>-based UI on top of this client so we can issue prompts, maintain multi-turn history, configure parameters like temperature and context size, and view results in real time. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import os, sys, subprocess, time, json, requests, textwrap\nfrom pathlib import Path\n\n\ndef sh(cmd, check=True):\n   \"\"\"Run a shell command, stream output.\"\"\"\n   p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)\n   for line in p.stdout:\n       print(line, end=\"\")\n   p.wait()\n   if check and p.returncode != 0:\n       raise RuntimeError(f\"Command failed: {cmd}\")\n\n\nif not Path(\"\/usr\/local\/bin\/ollama\").exists() and not Path(\"\/usr\/bin\/ollama\").exists():\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> Installing Ollama ...\")\n   sh(\"curl -fsSL https:\/\/ollama.com\/install.sh | sh\")\nelse:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Ollama already installed.\")\n\n\ntry:\n   import gradio \nexcept Exception:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> Installing Gradio ...\")\n   sh(\"pip -q install gradio==4.44.0\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We first check if Ollama is already installed on the system, and if not, we install it using the official script. At the same time, we ensure Gradio is available by importing it or installing the required version when missing. This way, we prepare our Colab environment for running the chat interface smoothly. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def start_ollama():\n   try:\n       requests.get(\"http:\/\/127.0.0.1:11434\/api\/tags\", timeout=1)\n       print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Ollama server already running.\")\n       return None\n   except Exception:\n       pass\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f680.png\" alt=\"\ud83d\ude80\" class=\"wp-smiley\" \/> Starting Ollama server ...\")\n   proc = subprocess.Popen([\"ollama\", \"serve\"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)\n   for _ in range(60):\n       time.sleep(1)\n       try:\n           r = requests.get(\"http:\/\/127.0.0.1:11434\/api\/tags\", timeout=1)\n           if r.ok:\n               print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Ollama server is up.\")\n               break\n       except Exception:\n           pass\n   else:\n       raise RuntimeError(\"Ollama did not start in time.\")\n   return proc\n\n\nserver_proc = start_ollama()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We start the Ollama <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/08\/proxy-servers-explained-types-use-cases-trends-in-2025-technical-deep-dive\/\" target=\"_blank\">server<\/a> in the background and keep checking its health endpoint until it responds successfully. By doing this, we ensure the server is running and ready before sending any API requests. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">MODEL = os.environ.get(\"OLLAMA_MODEL\", \"qwen2.5:0.5b-instruct\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f9e0.png\" alt=\"\ud83e\udde0\" class=\"wp-smiley\" \/> Using model: {MODEL}\")\ntry:\n   tags = requests.get(\"http:\/\/127.0.0.1:11434\/api\/tags\", timeout=5).json()\n   have = any(m.get(\"name\")==MODEL for m in tags.get(\"models\", []))\nexcept Exception:\n   have = False\n\n\nif not have:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/2b07.png\" alt=\"\u2b07\" class=\"wp-smiley\" \/>  Pulling model {MODEL} (first time only) ...\")\n   sh(f\"ollama pull {MODEL}\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the default model to use, check if it is already available on the Ollama server, and if not, we automatically pull it. This ensures that the chosen model is ready before we start running any chat sessions. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">OLLAMA_URL = \"http:\/\/127.0.0.1:11434\/api\/chat\"\n\n\ndef ollama_chat_stream(messages, model=MODEL, temperature=0.2, num_ctx=None):\n   \"\"\"Yield streaming text chunks from Ollama \/api\/chat.\"\"\"\n   payload = {\n       \"model\": model,\n       \"messages\": messages,\n       \"stream\": True,\n       \"options\": {\"temperature\": float(temperature)}\n   }\n   if num_ctx:\n       payload[\"options\"][\"num_ctx\"] = int(num_ctx)\n   with requests.post(OLLAMA_URL, json=payload, stream=True) as r:\n       r.raise_for_status()\n       for line in r.iter_lines():\n           if not line:\n               continue\n           data = json.loads(line.decode(\"utf-8\"))\n           if \"message\" in data and \"content\" in data[\"message\"]:\n               yield data[\"message\"][\"content\"]\n           if data.get(\"done\"):\n               break<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We create a streaming client for the Ollama \/api\/chat endpoint, where we send messages as JSON payloads and yield tokens as they arrive. This lets us handle responses incrementally, so we see the model\u2019s output in real time instead of waiting for the full completion. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def smoke_test():\n   print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f9ea.png\" alt=\"\ud83e\uddea\" class=\"wp-smiley\" \/> Smoke test:\")\n   sys_msg = {\"role\":\"system\",\"content\":\"You are concise. Use short bullets.\"}\n   user_msg = {\"role\":\"user\",\"content\":\"Give 3 quick tips to sleep better.\"}\n   out = []\n   for chunk in ollama_chat_stream([sys_msg, user_msg], temperature=0.3):\n       print(chunk, end=\"\")\n       out.append(chunk)\n   print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f9ea.png\" alt=\"\ud83e\uddea\" class=\"wp-smiley\" \/> Done.n\")\ntry:\n   smoke_test()\nexcept Exception as e:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> Smoke test skipped:\", e)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We run a quick smoke test by sending a simple prompt through our streaming client to confirm that the model responds correctly. This helps us verify that Ollama is installed, the server is running, and the chosen model is working before we build the full chat UI. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. <\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import gradio as gr\n\n\nSYSTEM_PROMPT = \"You are a helpful, crisp assistant. Prefer bullets when helpful.\"\n\n\ndef chat_fn(message, history, temperature, num_ctx):\n   msgs = [{\"role\":\"system\",\"content\":SYSTEM_PROMPT}]\n   for u, a in history:\n       if u: msgs.append({\"role\":\"user\",\"content\":u})\n       if a: msgs.append({\"role\":\"assistant\",\"content\":a})\n   msgs.append({\"role\":\"user\",\"content\": message})\n   acc = \"\"\n   try:\n       for part in ollama_chat_stream(msgs, model=MODEL, temperature=temperature, num_ctx=num_ctx or None):\n           acc += part\n           yield acc\n   except Exception as e:\n       yield f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> Error: {e}\"\n\n\nwith gr.Blocks(title=\"Ollama Chat (Colab)\", fill_height=True) as demo:\n   gr.Markdown(\"# <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f999.png\" alt=\"\ud83e\udd99\" class=\"wp-smiley\" \/> Ollama Chat (Colab)nSmall local-ish LLM via Ollama + Gradio.n\")\n   with gr.Row():\n       temp = gr.Slider(0.0, 1.0, value=0.3, step=0.1, label=\"Temperature\")\n       num_ctx = gr.Slider(512, 8192, value=2048, step=256, label=\"Context Tokens (num_ctx)\")\n   chat = gr.Chatbot(height=460)\n   msg = gr.Textbox(label=\"Your message\", placeholder=\"Ask anything\u2026\", lines=3)\n   clear = gr.Button(\"Clear\")\n\n\n   def user_send(m, h):\n       m = (m or \"\").strip()\n       if not m: return \"\", h\n       return \"\", h + [[m, None]]\n\n\n   def bot_reply(h, temperature, num_ctx):\n       u = h[-1][0]\n       stream = chat_fn(u, h[:-1], temperature, int(num_ctx))\n       acc = \"\"\n       for partial in stream:\n           acc = partial\n           h[-1][1] = acc\n           yield h\n\n\n   msg.submit(user_send, [msg, chat], [msg, chat])\n      .then(bot_reply, [chat, temp, num_ctx], [chat])\n   clear.click(lambda: None, None, chat)\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f310.png\" alt=\"\ud83c\udf10\" class=\"wp-smiley\" \/> Launching Gradio ...\")\ndemo.launch(share=True)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We integrate Gradio to build an interactive chat UI on top of the Ollama server, where user input and conversation history are converted into the correct message format and streamed back as model responses. The sliders let us adjust parameters like temperature and context length, while the chat box and clear button provide a simple, real-time interface for testing different prompts.<\/p>\n<p>In conclusion, we establish a reproducible pipeline for running Ollama in Colab: installation, server startup, model management, API access, and user interface integration. The system uses Ollama\u2019s REST API as the core interaction layer, providing both command-line and Python streaming access, while Gradio handles session persistence and chat rendering. This approach preserves the \u201cself-hosted\u201d design described in the original guide but adapts it for Colab\u2019s constraints, where Docker and GPU-backed Ollama images are not practical. The result is a compact yet technically complete framework that lets us experiment with multiple LLMs, adjust generation parameters dynamically, and test conversational AI locally within a notebook environment.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/self_hosted_llm_ollama_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a><\/strong>. Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>.<\/p>\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/95xaxi6d7td.typeform.com\/to\/jhs8ftBd\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f4e5.png\" alt=\"\ud83d\udce5\" class=\"wp-smiley\" \/> Sponsorship Media Kit<\/a><\/div>\n<\/div>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/08\/19\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\">A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we implement a fully functional Ollama environment inside Google Colab to replicate a self-hosted LLM workflow. We begin by installing Ollama directly on the Colab VM using the official Linux installer and then launch the Ollama server in the background to expose the HTTP API on localhost:11434. After verifying the service, we pull lightweight models such as qwen2.5:0.5b-instruct or llama3.2:1b, which balance resource constraints with usability in a CPU-only environment. To interact with these models programmatically, we use the \/api\/chat endpoint via Python\u2019s requests module with streaming enabled, which allows token-level output to be captured incrementally. Finally, we layer a Gradio-based UI on top of this client so we can issue prompts, maintain multi-turn history, configure parameters like temperature and context size, and view results in real time. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser import os, sys, subprocess, time, json, requests, textwrap from pathlib import Path def sh(cmd, check=True): &#8220;&#8221;&#8221;Run a shell command, stream output.&#8221;&#8221;&#8221; p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) for line in p.stdout: print(line, end=&#8221;&#8221;) p.wait() if check and p.returncode != 0: raise RuntimeError(f&#8221;Command failed: {cmd}&#8221;) if not Path(&#8220;\/usr\/local\/bin\/ollama&#8221;).exists() and not Path(&#8220;\/usr\/bin\/ollama&#8221;).exists(): print(&#8221; Installing Ollama &#8230;&#8221;) sh(&#8220;curl -fsSL https:\/\/ollama.com\/install.sh | sh&#8221;) else: print(&#8221; Ollama already installed.&#8221;) try: import gradio except Exception: print(&#8221; Installing Gradio &#8230;&#8221;) sh(&#8220;pip -q install gradio==4.44.0&#8221;) We first check if Ollama is already installed on the system, and if not, we install it using the official script. At the same time, we ensure Gradio is available by importing it or installing the required version when missing. This way, we prepare our Colab environment for running the chat interface smoothly. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser def start_ollama(): try: requests.get(&#8220;http:\/\/127.0.0.1:11434\/api\/tags&#8221;, timeout=1) print(&#8221; Ollama server already running.&#8221;) return None except Exception: pass print(&#8221; Starting Ollama server &#8230;&#8221;) proc = subprocess.Popen([&#8220;ollama&#8221;, &#8220;serve&#8221;], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) for _ in range(60): time.sleep(1) try: r = requests.get(&#8220;http:\/\/127.0.0.1:11434\/api\/tags&#8221;, timeout=1) if r.ok: print(&#8221; Ollama server is up.&#8221;) break except Exception: pass else: raise RuntimeError(&#8220;Ollama did not start in time.&#8221;) return proc server_proc = start_ollama() We start the Ollama server in the background and keep checking its health endpoint until it responds successfully. By doing this, we ensure the server is running and ready before sending any API requests. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser MODEL = os.environ.get(&#8220;OLLAMA_MODEL&#8221;, &#8220;qwen2.5:0.5b-instruct&#8221;) print(f&#8221; Using model: {MODEL}&#8221;) try: tags = requests.get(&#8220;http:\/\/127.0.0.1:11434\/api\/tags&#8221;, timeout=5).json() have = any(m.get(&#8220;name&#8221;)==MODEL for m in tags.get(&#8220;models&#8221;, [])) except Exception: have = False if not have: print(f&#8221; Pulling model {MODEL} (first time only) &#8230;&#8221;) sh(f&#8221;ollama pull {MODEL}&#8221;) We define the default model to use, check if it is already available on the Ollama server, and if not, we automatically pull it. This ensures that the chosen model is ready before we start running any chat sessions. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser OLLAMA_URL = &#8220;http:\/\/127.0.0.1:11434\/api\/chat&#8221; def ollama_chat_stream(messages, model=MODEL, temperature=0.2, num_ctx=None): &#8220;&#8221;&#8221;Yield streaming text chunks from Ollama \/api\/chat.&#8221;&#8221;&#8221; payload = { &#8220;model&#8221;: model, &#8220;messages&#8221;: messages, &#8220;stream&#8221;: True, &#8220;options&#8221;: {&#8220;temperature&#8221;: float(temperature)} } if num_ctx: payload[&#8220;options&#8221;][&#8220;num_ctx&#8221;] = int(num_ctx) with requests.post(OLLAMA_URL, json=payload, stream=True) as r: r.raise_for_status() for line in r.iter_lines(): if not line: continue data = json.loads(line.decode(&#8220;utf-8&#8221;)) if &#8220;message&#8221; in data and &#8220;content&#8221; in data[&#8220;message&#8221;]: yield data[&#8220;message&#8221;][&#8220;content&#8221;] if data.get(&#8220;done&#8221;): break We create a streaming client for the Ollama \/api\/chat endpoint, where we send messages as JSON payloads and yield tokens as they arrive. This lets us handle responses incrementally, so we see the model\u2019s output in real time instead of waiting for the full completion. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser def smoke_test(): print(&#8220;n Smoke test:&#8221;) sys_msg = {&#8220;role&#8221;:&#8221;system&#8221;,&#8221;content&#8221;:&#8221;You are concise. Use short bullets.&#8221;} user_msg = {&#8220;role&#8221;:&#8221;user&#8221;,&#8221;content&#8221;:&#8221;Give 3 quick tips to sleep better.&#8221;} out = [] for chunk in ollama_chat_stream([sys_msg, user_msg], temperature=0.3): print(chunk, end=&#8221;&#8221;) out.append(chunk) print(&#8220;n Done.n&#8221;) try: smoke_test() except Exception as e: print(&#8221; Smoke test skipped:&#8221;, e) We run a quick smoke test by sending a simple prompt through our streaming client to confirm that the model responds correctly. This helps us verify that Ollama is installed, the server is running, and the chosen model is working before we build the full chat UI. Check out the\u00a0Full Codes here. Copy CodeCopiedUse a different Browser import gradio as gr SYSTEM_PROMPT = &#8220;You are a helpful, crisp assistant. Prefer bullets when helpful.&#8221; def chat_fn(message, history, temperature, num_ctx): msgs = [{&#8220;role&#8221;:&#8221;system&#8221;,&#8221;content&#8221;:SYSTEM_PROMPT}] for u, a in history: if u: msgs.append({&#8220;role&#8221;:&#8221;user&#8221;,&#8221;content&#8221;:u}) if a: msgs.append({&#8220;role&#8221;:&#8221;assistant&#8221;,&#8221;content&#8221;:a}) msgs.append({&#8220;role&#8221;:&#8221;user&#8221;,&#8221;content&#8221;: message}) acc = &#8220;&#8221; try: for part in ollama_chat_stream(msgs, model=MODEL, temperature=temperature, num_ctx=num_ctx or None): acc += part yield acc except Exception as e: yield f&#8221; Error: {e}&#8221; with gr.Blocks(title=&#8221;Ollama Chat (Colab)&#8221;, fill_height=True) as demo: gr.Markdown(&#8220;# Ollama Chat (Colab)nSmall local-ish LLM via Ollama + Gradio.n&#8221;) with gr.Row(): temp = gr.Slider(0.0, 1.0, value=0.3, step=0.1, label=&#8221;Temperature&#8221;) num_ctx = gr.Slider(512, 8192, value=2048, step=256, label=&#8221;Context Tokens (num_ctx)&#8221;) chat = gr.Chatbot(height=460) msg = gr.Textbox(label=&#8221;Your message&#8221;, placeholder=&#8221;Ask anything\u2026&#8221;, lines=3) clear = gr.Button(&#8220;Clear&#8221;) def user_send(m, h): m = (m or &#8220;&#8221;).strip() if not m: return &#8220;&#8221;, h return &#8220;&#8221;, h + [[m, None]] def bot_reply(h, temperature, num_ctx): u = h[-1][0] stream = chat_fn(u, h[:-1], temperature, int(num_ctx)) acc = &#8220;&#8221; for partial in stream: acc = partial h[-1][1] = acc yield h msg.submit(user_send, [msg, chat], [msg, chat]) .then(bot_reply, [chat, temp, num_ctx], [chat]) clear.click(lambda: None, None, chat) print(&#8221; Launching Gradio &#8230;&#8221;) demo.launch(share=True) We integrate Gradio to build an interactive chat UI on top of the Ollama server, where user input and conversation history are converted into the correct message format and streamed back as model responses. The sliders let us adjust parameters like temperature and context length, while the chat box and clear button provide a simple, real-time interface for testing different prompts. In conclusion, we establish a reproducible pipeline for running Ollama in Colab: installation, server startup, model management, API access, and user interface integration. The system uses Ollama\u2019s REST API as the core interaction layer, providing both command-line and Python streaming access, while Gradio handles session persistence and chat rendering. This approach preserves the \u201cself-hosted\u201d design described in the original guide but<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-32931","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/ja\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/ja\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-20T06:08:55+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"6\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface\",\"datePublished\":\"2025-08-20T06:08:55+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\"},\"wordCount\":688,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\",\"url\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\",\"name\":\"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\",\"datePublished\":\"2025-08-20T06:08:55+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage\",\"url\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\",\"contentUrl\":\"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/ja\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/ja\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/","og_locale":"ja_JP","og_type":"article","og_title":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/ja\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-08-20T06:08:55+00:00","og_image":[{"url":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png","type":"","width":"","height":""}],"author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"admin NU","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"6\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface","datePublished":"2025-08-20T06:08:55+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/"},"wordCount":688,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"ja","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/","url":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/","name":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage"},"thumbnailUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png","datePublished":"2025-08-20T06:08:55+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#primaryimage","url":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png","contentUrl":"https:\/\/s.w.org\/images\/core\/emoji\/16.0.1\/72x72\/1f527.png"},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/a-coding-implementation-to-build-a-complete-self-hosted-llm-workflow-with-ollama-rest-api-and-gradio-chat-interface\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"A Coding Implementation to Build a Complete Self-Hosted LLM Workflow with Ollama, REST API, and Gradio Chat Interface"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/ja\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/ja\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/ja\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/ja\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we implement a fully functional Ollama environment inside Google Colab to replicate a self-hosted LLM workflow. We begin by installing Ollama directly on the Colab VM using the official Linux installer and then launch the Ollama server in the background to expose the HTTP API on localhost:11434. After verifying the service, we&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/32931","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/comments?post=32931"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/posts\/32931\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/media?parent=32931"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/categories?post=32931"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/ja\/wp-json\/wp\/v2\/tags?post=32931"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}