{"id":100312,"date":"2026-06-27T18:36:52","date_gmt":"2026-06-27T18:36:52","guid":{"rendered":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/"},"modified":"2026-06-27T18:36:52","modified_gmt":"2026-06-27T18:36:52","slug":"deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/","title":{"rendered":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1"},"content":{"rendered":"<p class=\"wp-block-paragraph\">DeepSeek released <strong><a href=\"https:\/\/github.com\/deepseek-ai\/DeepSpec\/blob\/main\/DSpark_paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">DSpark<\/a><\/strong>, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints <code>DeepSeek-V4-Pro-DSpark<\/code> and <code>DeepSeek-V4-Flash-DSpark<\/code> reuse the existing V4 weights, with a draft module attached.<\/p>\n<p class=\"wp-block-paragraph\">The DeepSeek research team also <a href=\"https:\/\/huggingface.co\/deepseek-ai\/DeepSeek-V4-Pro-DSpark\" target=\"_blank\" rel=\"noreferrer noopener\">open-sourced DeepSpec<\/a>, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving.<\/p>\n<h2 class=\"wp-block-heading\"><strong>TL;DR<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay.<\/li>\n<li>A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy.<\/li>\n<li>Offline, accepted length rises 26\u201331% over Eagle3 and 16\u201318% over DFlash.<\/li>\n<li>In production on DeepSeek-V4, per-user generation runs 60\u201385% faster than the MTP-1 baseline.<\/li>\n<li>Output stays lossless, and the checkpoints plus DeepSpec training code are open-source.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>What is DSpark?<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Speculative decoding splits generation into two roles. A small draft model proposes a block of tokens. The full target model then verifies that block in one forward pass.<\/p>\n<p class=\"wp-block-paragraph\">Rejection sampling accepts the longest valid prefix and appends one bonus token. Because the rule preserves the target distribution exactly, there is no quality loss. DSpark keeps this guarantee. It changes how tokens are drafted and how many get verified.<\/p>\n<h2 class=\"wp-block-heading\"><strong>The Latency Math it Optimizes<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Per-token latency follows one equation from the paper: <code>L = (T<sub>draft<\/sub> + T<sub>verify<\/sub>) \/ \u03c4<\/code>. Here <code>\u03c4<\/code> is the number of tokens accepted per cycle. Speedup comes from three levers only.<\/p>\n<p class=\"wp-block-paragraph\">You can draft faster, lowering <code>T<sub>draft<\/sub><\/code>. You can draft better, raising <code>\u03c4<\/code>. Or you can verify smarter, reducing wasted <code>T<sub>verify<\/sub><\/code>. DSpark pulls all three levers at once.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How It Works: Semi-Autoregressive Generation<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Earlier drafters force a trade-off. Autoregressive drafters like Eagle3 condition each token on prior ones. That gives strong acceptance, but drafting cost grows with block size.<\/p>\n<p class=\"wp-block-paragraph\">Parallel drafters like DFlash produce the whole block in one pass. Drafting stays cheap, but each position ignores its neighbors. The result is \u2018multi-modal collision\u2019 and rapid acceptance decay along the suffix.<\/p>\n<p class=\"wp-block-paragraph\">DSpark splits drafting into two stages. A heavy parallel backbone, DFlash in their setup, produces base logits for every position. Then a lightweight sequential head adds a prefix-dependent bias before sampling each token.<\/p>\n<p class=\"wp-block-paragraph\">The default sequential head is a Markov head. It only looks at the immediately preceding token. A low-rank factorization (rank 256) keeps it cheap, even with large vocabularies.<\/p>\n<p class=\"wp-block-paragraph\">Once position one samples \u2018of\u2019, the head boosts \u2018course\u2019 and suppresses \u2018problem\u2019. An optional RNN head tracks the full block prefix. It adds only marginal gains, so the Markov head ships as the default.<\/p>\n<p class=\"wp-block-paragraph\">The payoff shows up position by position. DSpark inherits the parallel backbone\u2019s high first-token accuracy. The sequential head then holds acceptance steady deep into the block.<\/p>\n<p class=\"wp-block-paragraph\">Training freezes the target model and reuses its embedding and output head. A total-variation loss is the key term. Minimizing that distance directly maximizes the draft\u2019s acceptance rate.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How It Works: Confidence-Scheduled Verification<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">More draft tokens do not always mean more speed. Verifying tokens that will be rejected wastes batch capacity under heavy load. DSpark adds two parts to fix this.<\/p>\n<p class=\"wp-block-paragraph\">A confidence head outputs a score for each draft position. The score estimates the chance that token survives verification, given accepted predecessors. It is supervised by the analytical per-step acceptance rate.<\/p>\n<p class=\"wp-block-paragraph\">Raw neural confidence is usually overconfident. So the research team applies Sequential Temperature Scaling, a post-hoc calibration step. It cuts expected calibration error from 3\u20138% down to about 1%.<\/p>\n<p class=\"wp-block-paragraph\">A hardware-aware prefix scheduler then sets the verification length per request. It uses a profiled throughput curve, <code>SPS(B)<\/code>, measured once at startup. When GPUs are idle, it verifies more tokens. When GPUs are busy, it verifies fewer.<\/p>\n<p class=\"wp-block-paragraph\">The scheduler uses an early-stopping rule to stay lossless. The appendix section gives a counterexample showing why a naive global search would leak information.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Metrics<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Offline tests cover math, code, and daily chat. Targets include Qwen3-4B, 8B, 14B, and Gemma4-12B. DSpark beats both baselines on accepted length across every domain.<\/p>\n<p class=\"wp-block-paragraph\">Against Eagle3, macro-average accepted length rises 30.9%, 26.7%, and 30.0% on the three Qwen3 sizes. Against DFlash, gains are 16.3%, 18.4%, and 18.3%. A 2-layer DSpark even beats a 5-layer DFlash.<\/p>\n<p class=\"wp-block-paragraph\">The sequential head adds little cost. Scaling draft length from 4 to 16 adds only 0.2\u20131.3% per-round latency. In return, accepted length improves by up to 30%.<\/p>\n<p class=\"wp-block-paragraph\">Production results come from DeepSeek-V4-Flash and V4-Pro under live traffic. The baseline is MTP-1, the prior single-token setup. At matched throughput, per-user speed rises 60\u201385% on Flash and 57\u201378% on Pro. The shipped configuration is DSpark-5, a five-token draft block with the Markov head.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Drafter<\/th>\n<th>Drafting style<\/th>\n<th>Block cost<\/th>\n<th>Suffix acceptance<\/th>\n<th>Verification length<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Eagle3<\/td>\n<td>Autoregressive<\/td>\n<td>Grows with block size<\/td>\n<td>High, stable<\/td>\n<td>Fixed<\/td>\n<\/tr>\n<tr>\n<td>DFlash<\/td>\n<td>Parallel<\/td>\n<td>Near-constant<\/td>\n<td>Decays fast<\/td>\n<td>Fixed (full block)<\/td>\n<\/tr>\n<tr>\n<td>MTP-1<\/td>\n<td>Single-token (MTP)<\/td>\n<td>Low<\/td>\n<td>\u2014<\/td>\n<td>Static 2 tokens<\/td>\n<\/tr>\n<tr>\n<td>DSpark<\/td>\n<td>Parallel + sequential head<\/td>\n<td>Near-constant<\/td>\n<td>High, stable<\/td>\n<td>Dynamic, load-aware<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h2 class=\"wp-block-heading\"><strong>Use Cases With Examples<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Structured workloads gain the most from longer verification. In code generation, acceptance is naturally high. The scheduler can verify long prefixes with little waste, so coding agents stream output faster.<\/p>\n<p class=\"wp-block-paragraph\">Open-ended chat behaves differently. A confidence-threshold sweep raised chat acceptance from 45.7% to 95.7%. The confidence head flags uncertain suffix tokens so they can be pruned.<\/p>\n<p class=\"wp-block-paragraph\">Math reasoning sits between the two. Its acceptance rose from 76.9% to 92.5% in the same sweep. Long step-by-step traces benefit from steady deep-block acceptance.<\/p>\n<p class=\"wp-block-paragraph\">High-concurrency serving is the headline case. At moderate load, the scheduler runs roughly 4\u20136 verified tokens per request. As concurrency rises, it trims that budget to protect throughput.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Try It<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">DeepSpec runs in three stages: data preparation, training, then evaluation. A config selects the algorithm and target model. Evaluation benchmarks a trained draft checkpoint across nine datasets.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\"># Install dependencies\npython -m pip install -r requirements.txt\n\n# Train a DSpark draft against a Qwen3-4B target.\n# The algorithm and target are chosen by the config, e.g.\n# config\/dspark\/dspark_qwen3_4b.py\nbash scripts\/train\/train.sh\n\n# Evaluate the trained draft across the 9 benchmark datasets.\n# Set in the eval config:\n#   target_name_or_path = Qwen\/Qwen3-4B\n#   draft_name_or_path  = ~\/checkpoints\/deepspec\/dspark_block8_qwen3_4b\/step_latest\nbash scripts\/eval\/eval.sh<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">The default configs assume one node with 8 GPUs. Reduce <code>CUDA_VISIBLE_DEVICES<\/code> for fewer. Note the target cache can be large, near 38 TB for the Qwen3-4B setting.<\/p>\n<p class=\"wp-block-paragraph\">For the production checkpoints, the draft module attaches to the existing V4 weights. The Hugging Face cards include a minimal inference example in the <code>inference<\/code> folder. No retraining of the target model is required.<\/p>\n<p class=\"wp-block-paragraph\">The interactive demo below shows the mechanism. Pick a drafter, a domain, and a GPU-load level. Watch the draft block, the confidence scores, and the scheduler\u2019s verification budget change in real time. The numbers are illustrative, modeled on the paper\u2019s reported behavior.<\/p>\n<p><!-- DSpark Decoding Simulator \u2014 paste into a WordPress \"Custom HTML\" block only --><\/p>\n<div>\n<\/div>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">Check out the\u00a0<strong><a href=\"https:\/\/github.com\/deepseek-ai\/DeepSpec\/blob\/main\/DSpark_paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSpec\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a>\u00a0<\/strong>and\u00a0<strong><a href=\"https:\/\/huggingface.co\/deepseek-ai\/DeepSeek-V4-Pro-DSpark\" target=\"_blank\" rel=\"noreferrer noopener\">Model weight on HF<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/27\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\">DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The work targets one problem: faster large-model inference in busy production serving. TL;DR DSpark pairs a parallel draft backbone with a tiny sequential head to cut suffix decay. A confidence head and load-aware scheduler verify more tokens when GPUs are idle, fewer when busy. Offline, accepted length rises 26\u201331% over Eagle3 and 16\u201318% over DFlash. In production on DeepSeek-V4, per-user generation runs 60\u201385% faster than the MTP-1 baseline. Output stays lossless, and the checkpoints plus DeepSpec training code are open-source. What is DSpark? Speculative decoding splits generation into two roles. A small draft model proposes a block of tokens. The full target model then verifies that block in one forward pass. Rejection sampling accepts the longest valid prefix and appends one bonus token. Because the rule preserves the target distribution exactly, there is no quality loss. DSpark keeps this guarantee. It changes how tokens are drafted and how many get verified. The Latency Math it Optimizes Per-token latency follows one equation from the paper: L = (Tdraft + Tverify) \/ \u03c4. Here \u03c4 is the number of tokens accepted per cycle. Speedup comes from three levers only. You can draft faster, lowering Tdraft. You can draft better, raising \u03c4. Or you can verify smarter, reducing wasted Tverify. DSpark pulls all three levers at once. How It Works: Semi-Autoregressive Generation Earlier drafters force a trade-off. Autoregressive drafters like Eagle3 condition each token on prior ones. That gives strong acceptance, but drafting cost grows with block size. Parallel drafters like DFlash produce the whole block in one pass. Drafting stays cheap, but each position ignores its neighbors. The result is \u2018multi-modal collision\u2019 and rapid acceptance decay along the suffix. DSpark splits drafting into two stages. A heavy parallel backbone, DFlash in their setup, produces base logits for every position. Then a lightweight sequential head adds a prefix-dependent bias before sampling each token. The default sequential head is a Markov head. It only looks at the immediately preceding token. A low-rank factorization (rank 256) keeps it cheap, even with large vocabularies. Once position one samples \u2018of\u2019, the head boosts \u2018course\u2019 and suppresses \u2018problem\u2019. An optional RNN head tracks the full block prefix. It adds only marginal gains, so the Markov head ships as the default. The payoff shows up position by position. DSpark inherits the parallel backbone\u2019s high first-token accuracy. The sequential head then holds acceptance steady deep into the block. Training freezes the target model and reuses its embedding and output head. A total-variation loss is the key term. Minimizing that distance directly maximizes the draft\u2019s acceptance rate. How It Works: Confidence-Scheduled Verification More draft tokens do not always mean more speed. Verifying tokens that will be rejected wastes batch capacity under heavy load. DSpark adds two parts to fix this. A confidence head outputs a score for each draft position. The score estimates the chance that token survives verification, given accepted predecessors. It is supervised by the analytical per-step acceptance rate. Raw neural confidence is usually overconfident. So the research team applies Sequential Temperature Scaling, a post-hoc calibration step. It cuts expected calibration error from 3\u20138% down to about 1%. A hardware-aware prefix scheduler then sets the verification length per request. It uses a profiled throughput curve, SPS(B), measured once at startup. When GPUs are idle, it verifies more tokens. When GPUs are busy, it verifies fewer. The scheduler uses an early-stopping rule to stay lossless. The appendix section gives a counterexample showing why a naive global search would leak information. Metrics Offline tests cover math, code, and daily chat. Targets include Qwen3-4B, 8B, 14B, and Gemma4-12B. DSpark beats both baselines on accepted length across every domain. Against Eagle3, macro-average accepted length rises 30.9%, 26.7%, and 30.0% on the three Qwen3 sizes. Against DFlash, gains are 16.3%, 18.4%, and 18.3%. A 2-layer DSpark even beats a 5-layer DFlash. The sequential head adds little cost. Scaling draft length from 4 to 16 adds only 0.2\u20131.3% per-round latency. In return, accepted length improves by up to 30%. Production results come from DeepSeek-V4-Flash and V4-Pro under live traffic. The baseline is MTP-1, the prior single-token setup. At matched throughput, per-user speed rises 60\u201385% on Flash and 57\u201378% on Pro. The shipped configuration is DSpark-5, a five-token draft block with the Markov head. Drafter Drafting style Block cost Suffix acceptance Verification length Eagle3 Autoregressive Grows with block size High, stable Fixed DFlash Parallel Near-constant Decays fast Fixed (full block) MTP-1 Single-token (MTP) Low \u2014 Static 2 tokens DSpark Parallel + sequential head Near-constant High, stable Dynamic, load-aware Use Cases With Examples Structured workloads gain the most from longer verification. In code generation, acceptance is naturally high. The scheduler can verify long prefixes with little waste, so coding agents stream output faster. Open-ended chat behaves differently. A confidence-threshold sweep raised chat acceptance from 45.7% to 95.7%. The confidence head flags uncertain suffix tokens so they can be pruned. Math reasoning sits between the two. Its acceptance rose from 76.9% to 92.5% in the same sweep. Long step-by-step traces benefit from steady deep-block acceptance. High-concurrency serving is the headline case. At moderate load, the scheduler runs roughly 4\u20136 verified tokens per request. As concurrency rises, it trims that budget to protect throughput. Try It DeepSpec runs in three stages: data preparation, training, then evaluation. A config selects the algorithm and target model. Evaluation benchmarks a trained draft checkpoint across nine datasets. Copy CodeCopiedUse a different Browser # Install dependencies python -m pip install -r requirements.txt # Train a DSpark draft against a Qwen3-4B target. # The algorithm and target are chosen by the config, e.g. # config\/dspark\/dspark_qwen3_4b.py bash scripts\/train\/train.sh # Evaluate the trained draft across the 9 benchmark datasets. # Set in<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-100312","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-27T18:36:52+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1\",\"datePublished\":\"2026-06-27T18:36:52+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\"},\"wordCount\":1131,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\",\"url\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\",\"name\":\"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-06-27T18:36:52+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/","og_locale":"th_TH","og_type":"article","og_title":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-06-27T18:36:52+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"6 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1","datePublished":"2026-06-27T18:36:52+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/"},"wordCount":1131,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/","url":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/","name":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1 - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-06-27T18:36:52+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60\u201385% Over MTP-1"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"DeepSeek released DSpark, a speculative decoding framework, with open-source checkpoints and training code. It is a serving optimization, not a new model. The checkpoints DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark reuse the existing V4 weights, with a draft module attached. The DeepSeek research team also open-sourced DeepSpec, an MIT-licensed codebase for training and evaluating speculative decoding drafters. The&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/100312","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=100312"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/100312\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=100312"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=100312"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=100312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}