{"id":51193,"date":"2025-11-13T08:03:40","date_gmt":"2025-11-13T08:03:40","guid":{"rendered":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/"},"modified":"2025-11-13T08:03:40","modified_gmt":"2025-11-13T08:03:40","slug":"how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers","status":"publish","type":"post","link":"https:\/\/youzum.net\/de\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/","title":{"rendered":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers"},"content":{"rendered":"<p>In this tutorial, we build our own custom GPT-style chat system from scratch using a local Hugging Face model. We start by loading a lightweight instruction-tuned model that understands conversational prompts, then wrap it inside a structured chat framework that includes a system role, user memory, and assistant responses. We define how the agent interprets context, constructs messages, and optionally uses small built-in tools to fetch local data or simulated search results. By the end, we have a fully functional, conversational model that behaves like a personalized GPT running. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">!pip install transformers accelerate sentencepiece --quiet\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nfrom typing import List, Tuple, Optional\nimport textwrap, json, os<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We begin by installing the essential libraries and importing the required modules. We ensure that the environment has all necessary dependencies, such as transformers, torch, and sentencepiece, ready for use. This setup allows us to work seamlessly with Hugging Face models inside Google Colab. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">MODEL_NAME = \"microsoft\/Phi-3-mini-4k-instruct\"\nBASE_SYSTEM_PROMPT = (\n   \"You are a custom GPT running locally. \"\n   \"Follow user instructions carefully. \"\n   \"Be concise and structured. \"\n   \"If something is unclear, say it is unclear. \"\n   \"Prefer practical examples over corporate examples unless explicitly asked. \"\n   \"When asked for code, give runnable code.\"\n)\nMAX_NEW_TOKENS = 256<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We configure our model name, define the system prompt that governs the assistant\u2019s behavior, and set token limits. We establish how our custom GPT should respond, concise, structured, and practical. This section defines the foundation of our model\u2019s identity and instruction style. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"Loading model...\")\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\nif tokenizer.pad_token_id is None:\n   tokenizer.pad_token_id = tokenizer.eos_token_id\nmodel = AutoModelForCausalLM.from_pretrained(\n   MODEL_NAME,\n   torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,\n   device_map=\"auto\"\n)\nmodel.eval()\nprint(\"Model loaded.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We load the tokenizer and model from Hugging Face into memory and prepare them for inference. We automatically adjust the device mapping based on available hardware, ensuring GPU acceleration if possible. Once loaded, our model is ready to generate responses. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">ConversationHistory = List[Tuple[str, str]]\nhistory: ConversationHistory = [(\"system\", BASE_SYSTEM_PROMPT)]\n\n\ndef wrap_text(s: str, w: int = 100) -&gt; str:\n   return \"n\".join(textwrap.wrap(s, width=w))\n\n\ndef build_chat_prompt(history: ConversationHistory, user_msg: str) -&gt; str:\n   prompt_parts = []\n   for role, content in history:\n       if role == \"system\":\n           prompt_parts.append(f\"&lt;|system|&gt;n{content}n\")\n       elif role == \"user\":\n           prompt_parts.append(f\"&lt;|user|&gt;n{content}n\")\n       elif role == \"assistant\":\n           prompt_parts.append(f\"&lt;|assistant|&gt;n{content}n\")\n   prompt_parts.append(f\"&lt;|user|&gt;n{user_msg}n\")\n   prompt_parts.append(\"&lt;|assistant|&gt;n\")\n   return \"\".join(prompt_parts)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We initialize the conversation history, starting with a system role, and create a prompt builder to format messages. We define how user and assistant turns are arranged in a consistent conversational structure. This ensures the model always understands the dialogue context correctly. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def local_tool_router(user_msg: str) -&gt; Optional[str]:\n   msg = user_msg.strip().lower()\n   if msg.startswith(\"search:\"):\n       query = user_msg.split(\":\", 1)[-1].strip()\n       return f\"Search results about '{query}':n- Key point 1n- Key point 2n- Key point 3\"\n   if msg.startswith(\"docs:\"):\n       topic = user_msg.split(\":\", 1)[-1].strip()\n       return f\"Documentation extract on '{topic}':n1. The agent orchestrates tools.n2. The model consumes output.n3. Responses become memory.\"\n   return None<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We add a lightweight tool router that extends our GPT\u2019s capability to simulate tasks like search or documentation retrieval. We define logic to detect special prefixes such as \u201csearch:\u201d or \u201cdocs:\u201d in user queries. This simple agentic design gives our assistant contextual awareness. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">def generate_reply(history: ConversationHistory, user_msg: str) -&gt; str:\n   tool_context = local_tool_router(user_msg)\n   if tool_context:\n       user_msg = user_msg + \"nnUseful context:n\" + tool_context\n   prompt = build_chat_prompt(history, user_msg)\n   inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n   with torch.no_grad():\n       output_ids = model.generate(\n           **inputs,\n           max_new_tokens=MAX_NEW_TOKENS,\n           do_sample=True,\n           top_p=0.9,\n           temperature=0.6,\n           pad_token_id=tokenizer.eos_token_id\n       )\n   decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True)\n   reply = decoded.split(\"&lt;|assistant|&gt;\")[-1].strip() if \"&lt;|assistant|&gt;\" in decoded else decoded[len(prompt):].strip()\n   history.append((\"user\", user_msg))\n   history.append((\"assistant\", reply))\n   return reply\n\n\ndef save_history(history: ConversationHistory, path: str = \"chat_history.json\") -&gt; None:\n   data = [{\"role\": r, \"content\": c} for (r, c) in history]\n   with open(path, \"w\") as f:\n       json.dump(data, f, indent=2)\n\n\ndef load_history(path: str = \"chat_history.json\") -&gt; ConversationHistory:\n   if not os.path.exists(path):\n       return [(\"system\", BASE_SYSTEM_PROMPT)]\n   with open(path, \"r\") as f:\n       data = json.load(f)\n   return [(item[\"role\"], item[\"content\"]) for item in data]<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the primary reply generation function, which combines history, context, and model inference to produce coherent outputs. We also add functions to save and load past conversations for persistence. This snippet forms the operational core of our custom GPT. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">print(\"n--- Demo turn 1 ---\")\ndemo_reply_1 = generate_reply(history, \"Explain what this custom GPT setup is doing in 5 bullet points.\")\nprint(wrap_text(demo_reply_1))\n\n\nprint(\"n--- Demo turn 2 ---\")\ndemo_reply_2 = generate_reply(history, \"search: agentic ai with local models\")\nprint(wrap_text(demo_reply_2))\n\n\ndef interactive_chat():\n   print(\"nChat ready. Type 'exit' to stop.\")\n   while True:\n       try:\n           user_msg = input(\"nUser: \").strip()\n       except EOFError:\n           break\n       if user_msg.lower() in (\"exit\", \"quit\", \"q\"):\n           break\n       reply = generate_reply(history, user_msg)\n       print(\"nAssistant:n\" + wrap_text(reply))\n\n\n# interactive_chat()\nprint(\"nCustom GPT initialized successfully.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We test the entire setup by running demo prompts and displaying generated responses. We also create an optional interactive chat loop to converse directly with the assistant. By the end, we confirm that our custom GPT runs locally and responds intelligently in real time.<\/p>\n<p>In conclusion, we designed and executed a custom conversational agent that mirrors GPT-style reasoning without relying on any external services. We saw how local models can be made interactive through prompt orchestration, lightweight tool routing, and conversational memory management. This approach enables us to understand the internal logic behind commercial GPT systems. It empowers us to experiment with our own rules, behaviors, and integrations in a transparent and fully offline manner.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/ML%20Project%20Codes\/custom_gpt_local_colab_chat_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/11\/12\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\">How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build our own custom GPT-style chat system from scratch using a local Hugging Face model. We start by loading a lightweight instruction-tuned model that understands conversational prompts, then wrap it inside a structured chat framework that includes a system role, user memory, and assistant responses. We define how the agent interprets context, constructs messages, and optionally uses small built-in tools to fetch local data or simulated search results. By the end, we have a fully functional, conversational model that behaves like a personalized GPT running. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser !pip install transformers accelerate sentencepiece &#8211;quiet import torch from transformers import AutoTokenizer, AutoModelForCausalLM from typing import List, Tuple, Optional import textwrap, json, os We begin by installing the essential libraries and importing the required modules. We ensure that the environment has all necessary dependencies, such as transformers, torch, and sentencepiece, ready for use. This setup allows us to work seamlessly with Hugging Face models inside Google Colab. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser MODEL_NAME = &#8220;microsoft\/Phi-3-mini-4k-instruct&#8221; BASE_SYSTEM_PROMPT = ( &#8220;You are a custom GPT running locally. &#8221; &#8220;Follow user instructions carefully. &#8221; &#8220;Be concise and structured. &#8221; &#8220;If something is unclear, say it is unclear. &#8221; &#8220;Prefer practical examples over corporate examples unless explicitly asked. &#8221; &#8220;When asked for code, give runnable code.&#8221; ) MAX_NEW_TOKENS = 256 We configure our model name, define the system prompt that governs the assistant\u2019s behavior, and set token limits. We establish how our custom GPT should respond, concise, structured, and practical. This section defines the foundation of our model\u2019s identity and instruction style. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser print(&#8220;Loading model&#8230;&#8221;) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) if tokenizer.pad_token_id is None: tokenizer.pad_token_id = tokenizer.eos_token_id model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, device_map=&#8221;auto&#8221; ) model.eval() print(&#8220;Model loaded.&#8221;) We load the tokenizer and model from Hugging Face into memory and prepare them for inference. We automatically adjust the device mapping based on available hardware, ensuring GPU acceleration if possible. Once loaded, our model is ready to generate responses. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser ConversationHistory = List[Tuple[str, str]] history: ConversationHistory = [(&#8220;system&#8221;, BASE_SYSTEM_PROMPT)] def wrap_text(s: str, w: int = 100) -&gt; str: return &#8220;n&#8221;.join(textwrap.wrap(s, width=w)) def build_chat_prompt(history: ConversationHistory, user_msg: str) -&gt; str: prompt_parts = [] for role, content in history: if role == &#8220;system&#8221;: prompt_parts.append(f&#8221;&lt;|system|&gt;n{content}n&#8221;) elif role == &#8220;user&#8221;: prompt_parts.append(f&#8221;&lt;|user|&gt;n{content}n&#8221;) elif role == &#8220;assistant&#8221;: prompt_parts.append(f&#8221;&lt;|assistant|&gt;n{content}n&#8221;) prompt_parts.append(f&#8221;&lt;|user|&gt;n{user_msg}n&#8221;) prompt_parts.append(&#8220;&lt;|assistant|&gt;n&#8221;) return &#8220;&#8221;.join(prompt_parts) We initialize the conversation history, starting with a system role, and create a prompt builder to format messages. We define how user and assistant turns are arranged in a consistent conversational structure. This ensures the model always understands the dialogue context correctly. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser def local_tool_router(user_msg: str) -&gt; Optional[str]: msg = user_msg.strip().lower() if msg.startswith(&#8220;search:&#8221;): query = user_msg.split(&#8220;:&#8221;, 1)[-1].strip() return f&#8221;Search results about &#8216;{query}&#8217;:n- Key point 1n- Key point 2n- Key point 3&#8243; if msg.startswith(&#8220;docs:&#8221;): topic = user_msg.split(&#8220;:&#8221;, 1)[-1].strip() return f&#8221;Documentation extract on &#8216;{topic}&#8217;:n1. The agent orchestrates tools.n2. The model consumes output.n3. Responses become memory.&#8221; return None We add a lightweight tool router that extends our GPT\u2019s capability to simulate tasks like search or documentation retrieval. We define logic to detect special prefixes such as \u201csearch:\u201d or \u201cdocs:\u201d in user queries. This simple agentic design gives our assistant contextual awareness. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser def generate_reply(history: ConversationHistory, user_msg: str) -&gt; str: tool_context = local_tool_router(user_msg) if tool_context: user_msg = user_msg + &#8220;nnUseful context:n&#8221; + tool_context prompt = build_chat_prompt(history, user_msg) inputs = tokenizer(prompt, return_tensors=&#8221;pt&#8221;).to(model.device) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=MAX_NEW_TOKENS, do_sample=True, top_p=0.9, temperature=0.6, pad_token_id=tokenizer.eos_token_id ) decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True) reply = decoded.split(&#8220;&lt;|assistant|&gt;&#8221;)[-1].strip() if &#8220;&lt;|assistant|&gt;&#8221; in decoded else decoded[len(prompt):].strip() history.append((&#8220;user&#8221;, user_msg)) history.append((&#8220;assistant&#8221;, reply)) return reply def save_history(history: ConversationHistory, path: str = &#8220;chat_history.json&#8221;) -&gt; None: data = [{&#8220;role&#8221;: r, &#8220;content&#8221;: c} for (r, c) in history] with open(path, &#8220;w&#8221;) as f: json.dump(data, f, indent=2) def load_history(path: str = &#8220;chat_history.json&#8221;) -&gt; ConversationHistory: if not os.path.exists(path): return [(&#8220;system&#8221;, BASE_SYSTEM_PROMPT)] with open(path, &#8220;r&#8221;) as f: data = json.load(f) return [(item[&#8220;role&#8221;], item[&#8220;content&#8221;]) for item in data] We define the primary reply generation function, which combines history, context, and model inference to produce coherent outputs. We also add functions to save and load past conversations for persistence. This snippet forms the operational core of our custom GPT. Check out the\u00a0FULL CODES here.\u00a0 Copy CodeCopiedUse a different Browser print(&#8220;n&#8212; Demo turn 1 &#8212;&#8220;) demo_reply_1 = generate_reply(history, &#8220;Explain what this custom GPT setup is doing in 5 bullet points.&#8221;) print(wrap_text(demo_reply_1)) print(&#8220;n&#8212; Demo turn 2 &#8212;&#8220;) demo_reply_2 = generate_reply(history, &#8220;search: agentic ai with local models&#8221;) print(wrap_text(demo_reply_2)) def interactive_chat(): print(&#8220;nChat ready. Type &#8216;exit&#8217; to stop.&#8221;) while True: try: user_msg = input(&#8220;nUser: &#8220;).strip() except EOFError: break if user_msg.lower() in (&#8220;exit&#8221;, &#8220;quit&#8221;, &#8220;q&#8221;): break reply = generate_reply(history, user_msg) print(&#8220;nAssistant:n&#8221; + wrap_text(reply)) # interactive_chat() print(&#8220;nCustom GPT initialized successfully.&#8221;) We test the entire setup by running demo prompts and displaying generated responses. We also create an optional interactive chat loop to converse directly with the assistant. By the end, we confirm that our custom GPT runs locally and responds intelligently in real time. In conclusion, we designed and executed a custom conversational agent that mirrors GPT-style reasoning without relying on any external services. We saw how local models can be made interactive through prompt orchestration, lightweight tool routing, and conversational memory management. This approach enables us to understand the internal logic behind commercial GPT systems. It empowers us to experiment with our own rules, behaviors, and integrations in a transparent and fully offline manner. Check out the\u00a0FULL CODES here.\u00a0Feel free to check out our\u00a0GitHub Page for Tutorials, Codes and Notebooks.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0100k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-51193","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/de\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/de\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-13T08:03:40+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"6\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers\",\"datePublished\":\"2025-11-13T08:03:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\"},\"wordCount\":641,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\",\"url\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\",\"name\":\"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2025-11-13T08:03:40+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/de\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/de\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/","og_locale":"de_DE","og_type":"article","og_title":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/de\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2025-11-13T08:03:40+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"admin NU","Gesch\u00e4tzte Lesezeit":"6\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers","datePublished":"2025-11-13T08:03:40+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/"},"wordCount":641,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/","url":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/","name":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2025-11-13T08:03:40+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/how-to-build-a-fully-functional-custom-gpt-style-conversational-ai-locally-using-hugging-face-transformers\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/de\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/de\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/de\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/de\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we build our own custom GPT-style chat system from scratch using a local Hugging Face model. We start by loading a lightweight instruction-tuned model that understands conversational prompts, then wrap it inside a structured chat framework that includes a system role, user memory, and assistant responses. We define how the agent interprets&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/51193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/comments?post=51193"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/posts\/51193\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/media?parent=51193"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/categories?post=51193"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/de\/wp-json\/wp\/v2\/tags?post=51193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}