{"id":76333,"date":"2026-03-09T12:14:53","date_gmt":"2026-03-09T12:14:53","guid":{"rendered":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/"},"modified":"2026-03-09T12:14:53","modified_gmt":"2026-03-09T12:14:53","slug":"a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation","status":"publish","type":"post","link":"https:\/\/youzum.net\/th\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/","title":{"rendered":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation"},"content":{"rendered":"<p>In this tutorial, we build a complete pipeline for single-cell RNA sequencing analysis using <a href=\"https:\/\/github.com\/scverse\/scanpy\"><strong>Scanpy<\/strong><\/a>. We start by installing the required libraries and loading the PBMC 3k dataset, then perform quality control, filtering, and normalization to prepare the data for downstream analysis. We then identify highly variable genes, perform PCA for dimensionality reduction, and construct a neighborhood graph to generate UMAP embeddings and Leiden clusters. Through marker gene discovery and visualization, we explore how clusters correspond to biological cell populations and implement a simple rule-based annotation strategy to infer cell types.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">import sys\nimport subprocess\nimport importlib\n\n\ndef pip_install(*packages):\n   subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", *packages])\n\n\nrequired = [\n   \"scanpy\",\n   \"anndata\",\n   \"leidenalg\",\n   \"igraph\",\n   \"harmonypy\",\n   \"seaborn\"\n]\npip_install(*required)\n\n\nimport os\nimport warnings\nwarnings.filterwarnings(\"ignore\")\n\n\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport scanpy as sc\nimport anndata as ad\n\n\nsc.settings.verbosity = 2\nsc.settings.set_figure_params(dpi=110, facecolor=\"white\", frameon=False)\nnp.random.seed(42)\n\n\nprint(\"Scanpy version:\", sc.__version__)\n\n\nadata = sc.datasets.pbmc3k()\nadata.var_names_make_unique()\n\n\nprint(\"nInitial AnnData:\")\nprint(adata)\n\n\nadata.layers[\"counts\"] = adata.X.copy()\n\n\nadata.var[\"mt\"] = adata.var_names.str.upper().str.startswith(\"MT-\")\nsc.pp.calculate_qc_metrics(adata, qc_vars=[\"mt\"], percent_top=None, log1p=False, inplace=True)\n\n\nprint(\"nQC summary:\")\ndisplay(\n   adata.obs[[\"n_genes_by_counts\", \"total_counts\", \"pct_counts_mt\"]].describe().T\n)\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We install all required dependencies and import the core scientific computing libraries needed for the analysis. We configure Scanpy settings, initialize the environment, and load the PBMC 3k single-cell RNA-seq dataset. We then compute quality-control metrics, including mitochondrial gene percentage, total counts, and the number of detected genes, for each cell.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">fig, axs = plt.subplots(1, 3, figsize=(15, 4))\nsc.pl.violin(adata, [\"n_genes_by_counts\"], jitter=0.4, ax=axs[0], show=False)\nsc.pl.violin(adata, [\"total_counts\"], jitter=0.4, ax=axs[1], show=False)\nsc.pl.violin(adata, [\"pct_counts_mt\"], jitter=0.4, ax=axs[2], show=False)\nplt.tight_layout()\nplt.show()\n\n\nsc.pl.scatter(adata, x=\"total_counts\", y=\"n_genes_by_counts\", color=\"pct_counts_mt\")\n\n\nadata = adata[adata.obs[\"n_genes_by_counts\"] &gt;= 200].copy()\nadata = adata[adata.obs[\"n_genes_by_counts\"] &lt;= 5000].copy()\nadata = adata[adata.obs[\"pct_counts_mt\"] &lt; 10].copy()\nsc.pp.filter_genes(adata, min_cells=3)\n\n\nprint(\"nAfter filtering:\")\nprint(adata)\n\n\nsc.pp.normalize_total(adata, target_sum=1e4)\nsc.pp.log1p(adata)\nadata.raw = adata.copy()\n\n\nsc.pp.highly_variable_genes(\n   adata,\n   flavor=\"seurat\",\n   min_mean=0.0125,\n   max_mean=3,\n   min_disp=0.5\n)\n\n\nprint(\"nHighly variable genes selected:\", int(adata.var[\"highly_variable\"].sum()))\n\n\nsc.pl.highly_variable_genes(adata)\n\n\nadata = adata[:, adata.var[\"highly_variable\"]].copy()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We visualize quality control metrics using plots to check the distribution of gene counts and mitochondrial content. We apply filtering steps to remove low-quality cells and genes that do not meet basic expression thresholds. We then normalize the data, apply a log transformation, and identify highly variable genes for downstream analysis.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">sc.pp.regress_out(adata, [\"total_counts\", \"pct_counts_mt\"])\nsc.pp.scale(adata, max_value=10)\n\n\nsc.tl.pca(adata, svd_solver=\"arpack\")\nsc.pl.pca_variance_ratio(adata, log=True)\nsc.pl.pca(adata, color=None)\n\n\nsc.pp.neighbors(adata, n_neighbors=12, n_pcs=30, metric=\"euclidean\")\nsc.tl.umap(adata, min_dist=0.35, spread=1.0)\nsc.tl.leiden(adata, resolution=0.6, key_added=\"leiden\")\n\n\nprint(\"nCluster counts:\")\ndisplay(adata.obs[\"leiden\"].value_counts().sort_index().rename(\"cells_per_cluster\").to_frame())\n\n\nsc.pl.umap(adata, color=[\"leiden\"], legend_loc=\"on data\", title=\"PBMC 3k - Leiden clusters\")\n\n\nsc.tl.rank_genes_groups(adata, groupby=\"leiden\", method=\"wilcoxon\")\nsc.pl.rank_genes_groups(adata, n_genes=20, sharey=False)\n\n\nmarker_table = sc.get.rank_genes_groups_df(adata, group=None)\nprint(\"nTop marker rows:\")\ndisplay(marker_table.head(20))<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We regress out technical confounders and scale the dataset to prepare it for dimensionality reduction. We perform principal component analysis to capture the dataset\u2019s major variance structure. We then construct the neighborhood graph, compute UMAP embeddings, perform Leiden clustering, and identify marker genes for each cluster.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">top_markers_per_cluster = (\n   marker_table.groupby(\"group\")\n   .head(10)\n   .loc[:, [\"group\", \"names\", \"logfoldchanges\", \"pvals_adj\"]]\n   .reset_index(drop=True)\n)\nprint(\"nTop 10 markers per cluster:\")\ndisplay(top_markers_per_cluster)\n\n\ncandidate_markers = [\n   \"IL7R\", \"LTB\", \"MALAT1\", \"CCR7\",\n   \"NKG7\", \"GNLY\", \"PRF1\",\n   \"MS4A1\", \"CD79A\", \"CD79B\",\n   \"LYZ\", \"S100A8\", \"FCER1A\", \"CST3\",\n   \"PPBP\", \"FCGR3A\", \"LGALS3\", \"CTSS\",\n   \"CD3D\", \"TRBC1\", \"TRAC\"\n]\ncandidate_markers = [g for g in candidate_markers if g in adata.var_names]\n\n\nif candidate_markers:\n   sc.pl.dotplot(\n       adata,\n       var_names=candidate_markers,\n       groupby=\"leiden\",\n       standard_scale=\"var\",\n       dendrogram=True\n   )\n   sc.pl.matrixplot(\n       adata,\n       var_names=candidate_markers,\n       groupby=\"leiden\",\n       standard_scale=\"var\",\n       dendrogram=True\n   )\n\n\ncluster_marker_reference = {\n   \"T_cells\": [\"IL7R\", \"LTB\", \"CCR7\", \"CD3D\", \"TRBC1\", \"TRAC\"],\n   \"NK_cells\": [\"NKG7\", \"GNLY\", \"PRF1\"],\n   \"B_cells\": [\"MS4A1\", \"CD79A\", \"CD79B\"],\n   \"Monocytes\": [\"LYZ\", \"FCGR3A\", \"LGALS3\", \"CTSS\", \"S100A8\", \"CST3\"],\n   \"Dendritic_cells\": [\"FCER1A\", \"CST3\"],\n   \"Platelets\": [\"PPBP\"]\n}\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We examine the most significant marker genes detected for each cluster and summarize the top markers. We visualize gene expression patterns across clusters using dot plots and matrix plots for known immune cell markers. We also define a reference mapping of marker genes associated with major immune cell types for later annotation.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\"no-line-numbers\"><code class=\"no-wrap language-php\">available_reference = {\n   celltype: [g for g in genes if g in adata.var_names]\n   for celltype, genes in cluster_marker_reference.items()\n}\navailable_reference = {k: v for k, v in available_reference.items() if len(v) &gt; 0}\n\n\nfor celltype, genes in available_reference.items():\n   sc.tl.score_genes(adata, gene_list=genes, score_name=f\"{celltype}_score\", use_raw=False)\n\n\nscore_cols = [f\"{ct}_score\" for ct in available_reference.keys()]\ncluster_scores = adata.obs.groupby(\"leiden\")[score_cols].mean()\ndisplay(cluster_scores)\n\n\ncluster_to_celltype = {}\nfor cluster in cluster_scores.index:\n   best = cluster_scores.loc[cluster].idxmax().replace(\"_score\", \"\")\n   cluster_to_celltype[cluster] = best\n\n\nadata.obs[\"cell_type\"] = adata.obs[\"leiden\"].map(cluster_to_celltype).astype(\"category\")\n\n\nprint(\"nCluster to cell-type mapping:\")\ndisplay(pd.DataFrame.from_dict(cluster_to_celltype, orient=\"index\", columns=[\"assigned_cell_type\"]))\n\n\nsc.pl.umap(\n   adata,\n   color=[\"leiden\", \"cell_type\"],\n   legend_loc=\"on data\",\n   wspace=0.45\n)\n\n\nsc.tl.rank_genes_groups(adata, groupby=\"cell_type\", method=\"wilcoxon\")\nsc.pl.rank_genes_groups(adata, n_genes=15, sharey=False)\n\n\ncelltype_markers = sc.get.rank_genes_groups_df(adata, group=None)\nprint(\"nTop markers by annotated cell type:\")\ndisplay(\n   celltype_markers.groupby(\"group\").head(8)[[\"group\", \"names\", \"logfoldchanges\", \"pvals_adj\"]]\n)\n\n\ncluster_prop = (\n   adata.obs[\"cell_type\"]\n   .value_counts(normalize=True)\n   .mul(100)\n   .round(2)\n   .rename(\"percent\")\n   .to_frame()\n)\nprint(\"nCell-type proportions (%):\")\ndisplay(cluster_prop)\n\n\nplt.figure(figsize=(7, 4))\ncluster_prop[\"percent\"].sort_values().plot(kind=\"barh\")\nplt.xlabel(\"Percent of cells\")\nplt.ylabel(\"Cell type\")\nplt.title(\"Estimated cell-type composition\")\nplt.tight_layout()\nplt.show()\n\n\noutput_dir = \"scanpy_pbmc3k_outputs\"\nos.makedirs(output_dir, exist_ok=True)\n\n\nadata.write(os.path.join(output_dir, \"pbmc3k_scanpy_advanced.h5ad\"))\nmarker_table.to_csv(os.path.join(output_dir, \"cluster_markers.csv\"), index=False)\ncelltype_markers.to_csv(os.path.join(output_dir, \"celltype_markers.csv\"), index=False)\ncluster_scores.to_csv(os.path.join(output_dir, \"cluster_score_matrix.csv\"))\n\n\nprint(f\"nSaved outputs to: {output_dir}\")\nprint(\"Files:\")\nfor f in sorted(os.listdir(output_dir)):\n   print(\" -\", f)\n\n\nsummary = {\n   \"n_cells_final\": int(adata.n_obs),\n   \"n_genes_final\": int(adata.n_vars),\n   \"n_clusters\": int(adata.obs[\"leiden\"].nunique()),\n   \"clusters\": sorted(adata.obs[\"leiden\"].unique().tolist()),\n   \"cell_types\": sorted(adata.obs[\"cell_type\"].unique().tolist()),\n}\nprint(\"nAnalysis summary:\")\nfor k, v in summary.items():\n   print(f\"{k}: {v}\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We score each cell using known marker gene sets and assign probable cell types to clusters based on expression patterns. We visualize the annotated cell types on the UMAP embedding and perform differential gene expression analysis across the predicted cell populations. Also, we compute cell-type proportions, generate summary visualizations, and save the processed dataset and analysis outputs for further research.<\/p>\n<p>In conclusion, we developed a full end-to-end workflow for analyzing single-cell transcriptomic data using Scanpy. We performed preprocessing, clustering, marker-gene analysis, and cell-type annotation, and visualized the data structure using UMAP and gene expression plots. By saving the processed AnnData object and analysis outputs, we created a reusable dataset for further biological interpretation and advanced modeling. This workflow demonstrates how Scanpy enables scalable, reproducible single-cell analysis through a structured, modular Python pipeline.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Data%20Science\/scanpy_single_cell_rna_seq_analysis_pipeline_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/08\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\">A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build a complete pipeline for single-cell RNA sequencing analysis using Scanpy. We start by installing the required libraries and loading the PBMC 3k dataset, then perform quality control, filtering, and normalization to prepare the data for downstream analysis. We then identify highly variable genes, perform PCA for dimensionality reduction, and construct a neighborhood graph to generate UMAP embeddings and Leiden clusters. Through marker gene discovery and visualization, we explore how clusters correspond to biological cell populations and implement a simple rule-based annotation strategy to infer cell types. Copy CodeCopiedUse a different Browser import sys import subprocess import importlib def pip_install(*packages): subprocess.check_call([sys.executable, &#8220;-m&#8221;, &#8220;pip&#8221;, &#8220;install&#8221;, &#8220;-q&#8221;, *packages]) required = [ &#8220;scanpy&#8221;, &#8220;anndata&#8221;, &#8220;leidenalg&#8221;, &#8220;igraph&#8221;, &#8220;harmonypy&#8221;, &#8220;seaborn&#8221; ] pip_install(*required) import os import warnings warnings.filterwarnings(&#8220;ignore&#8221;) import numpy as np import pandas as pd import matplotlib.pyplot as plt import scanpy as sc import anndata as ad sc.settings.verbosity = 2 sc.settings.set_figure_params(dpi=110, facecolor=&#8221;white&#8221;, frameon=False) np.random.seed(42) print(&#8220;Scanpy version:&#8221;, sc.__version__) adata = sc.datasets.pbmc3k() adata.var_names_make_unique() print(&#8220;nInitial AnnData:&#8221;) print(adata) adata.layers[&#8220;counts&#8221;] = adata.X.copy() adata.var[&#8220;mt&#8221;] = adata.var_names.str.upper().str.startswith(&#8220;MT-&#8220;) sc.pp.calculate_qc_metrics(adata, qc_vars=[&#8220;mt&#8221;], percent_top=None, log1p=False, inplace=True) print(&#8220;nQC summary:&#8221;) display( adata.obs[[&#8220;n_genes_by_counts&#8221;, &#8220;total_counts&#8221;, &#8220;pct_counts_mt&#8221;]].describe().T ) We install all required dependencies and import the core scientific computing libraries needed for the analysis. We configure Scanpy settings, initialize the environment, and load the PBMC 3k single-cell RNA-seq dataset. We then compute quality-control metrics, including mitochondrial gene percentage, total counts, and the number of detected genes, for each cell. Copy CodeCopiedUse a different Browser fig, axs = plt.subplots(1, 3, figsize=(15, 4)) sc.pl.violin(adata, [&#8220;n_genes_by_counts&#8221;], jitter=0.4, ax=axs[0], show=False) sc.pl.violin(adata, [&#8220;total_counts&#8221;], jitter=0.4, ax=axs[1], show=False) sc.pl.violin(adata, [&#8220;pct_counts_mt&#8221;], jitter=0.4, ax=axs[2], show=False) plt.tight_layout() plt.show() sc.pl.scatter(adata, x=&#8221;total_counts&#8221;, y=&#8221;n_genes_by_counts&#8221;, color=&#8221;pct_counts_mt&#8221;) adata = adata[adata.obs[&#8220;n_genes_by_counts&#8221;] &gt;= 200].copy() adata = adata[adata.obs[&#8220;n_genes_by_counts&#8221;] &lt;= 5000].copy() adata = adata[adata.obs[&#8220;pct_counts_mt&#8221;] &lt; 10].copy() sc.pp.filter_genes(adata, min_cells=3) print(&#8220;nAfter filtering:&#8221;) print(adata) sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata) adata.raw = adata.copy() sc.pp.highly_variable_genes( adata, flavor=&#8221;seurat&#8221;, min_mean=0.0125, max_mean=3, min_disp=0.5 ) print(&#8220;nHighly variable genes selected:&#8221;, int(adata.var[&#8220;highly_variable&#8221;].sum())) sc.pl.highly_variable_genes(adata) adata = adata[:, adata.var[&#8220;highly_variable&#8221;]].copy() We visualize quality control metrics using plots to check the distribution of gene counts and mitochondrial content. We apply filtering steps to remove low-quality cells and genes that do not meet basic expression thresholds. We then normalize the data, apply a log transformation, and identify highly variable genes for downstream analysis. Copy CodeCopiedUse a different Browser sc.pp.regress_out(adata, [&#8220;total_counts&#8221;, &#8220;pct_counts_mt&#8221;]) sc.pp.scale(adata, max_value=10) sc.tl.pca(adata, svd_solver=&#8221;arpack&#8221;) sc.pl.pca_variance_ratio(adata, log=True) sc.pl.pca(adata, color=None) sc.pp.neighbors(adata, n_neighbors=12, n_pcs=30, metric=&#8221;euclidean&#8221;) sc.tl.umap(adata, min_dist=0.35, spread=1.0) sc.tl.leiden(adata, resolution=0.6, key_added=&#8221;leiden&#8221;) print(&#8220;nCluster counts:&#8221;) display(adata.obs[&#8220;leiden&#8221;].value_counts().sort_index().rename(&#8220;cells_per_cluster&#8221;).to_frame()) sc.pl.umap(adata, color=[&#8220;leiden&#8221;], legend_loc=&#8221;on data&#8221;, title=&#8221;PBMC 3k &#8211; Leiden clusters&#8221;) sc.tl.rank_genes_groups(adata, groupby=&#8221;leiden&#8221;, method=&#8221;wilcoxon&#8221;) sc.pl.rank_genes_groups(adata, n_genes=20, sharey=False) marker_table = sc.get.rank_genes_groups_df(adata, group=None) print(&#8220;nTop marker rows:&#8221;) display(marker_table.head(20)) We regress out technical confounders and scale the dataset to prepare it for dimensionality reduction. We perform principal component analysis to capture the dataset\u2019s major variance structure. We then construct the neighborhood graph, compute UMAP embeddings, perform Leiden clustering, and identify marker genes for each cluster. Copy CodeCopiedUse a different Browser top_markers_per_cluster = ( marker_table.groupby(&#8220;group&#8221;) .head(10) .loc[:, [&#8220;group&#8221;, &#8220;names&#8221;, &#8220;logfoldchanges&#8221;, &#8220;pvals_adj&#8221;]] .reset_index(drop=True) ) print(&#8220;nTop 10 markers per cluster:&#8221;) display(top_markers_per_cluster) candidate_markers = [ &#8220;IL7R&#8221;, &#8220;LTB&#8221;, &#8220;MALAT1&#8221;, &#8220;CCR7&#8221;, &#8220;NKG7&#8221;, &#8220;GNLY&#8221;, &#8220;PRF1&#8221;, &#8220;MS4A1&#8221;, &#8220;CD79A&#8221;, &#8220;CD79B&#8221;, &#8220;LYZ&#8221;, &#8220;S100A8&#8221;, &#8220;FCER1A&#8221;, &#8220;CST3&#8221;, &#8220;PPBP&#8221;, &#8220;FCGR3A&#8221;, &#8220;LGALS3&#8221;, &#8220;CTSS&#8221;, &#8220;CD3D&#8221;, &#8220;TRBC1&#8221;, &#8220;TRAC&#8221; ] candidate_markers = [g for g in candidate_markers if g in adata.var_names] if candidate_markers: sc.pl.dotplot( adata, var_names=candidate_markers, groupby=&#8221;leiden&#8221;, standard_scale=&#8221;var&#8221;, dendrogram=True ) sc.pl.matrixplot( adata, var_names=candidate_markers, groupby=&#8221;leiden&#8221;, standard_scale=&#8221;var&#8221;, dendrogram=True ) cluster_marker_reference = { &#8220;T_cells&#8221;: [&#8220;IL7R&#8221;, &#8220;LTB&#8221;, &#8220;CCR7&#8221;, &#8220;CD3D&#8221;, &#8220;TRBC1&#8221;, &#8220;TRAC&#8221;], &#8220;NK_cells&#8221;: [&#8220;NKG7&#8221;, &#8220;GNLY&#8221;, &#8220;PRF1&#8221;], &#8220;B_cells&#8221;: [&#8220;MS4A1&#8221;, &#8220;CD79A&#8221;, &#8220;CD79B&#8221;], &#8220;Monocytes&#8221;: [&#8220;LYZ&#8221;, &#8220;FCGR3A&#8221;, &#8220;LGALS3&#8221;, &#8220;CTSS&#8221;, &#8220;S100A8&#8221;, &#8220;CST3&#8221;], &#8220;Dendritic_cells&#8221;: [&#8220;FCER1A&#8221;, &#8220;CST3&#8221;], &#8220;Platelets&#8221;: [&#8220;PPBP&#8221;] } We examine the most significant marker genes detected for each cluster and summarize the top markers. We visualize gene expression patterns across clusters using dot plots and matrix plots for known immune cell markers. We also define a reference mapping of marker genes associated with major immune cell types for later annotation. Copy CodeCopiedUse a different Browser available_reference = { celltype: [g for g in genes if g in adata.var_names] for celltype, genes in cluster_marker_reference.items() } available_reference = {k: v for k, v in available_reference.items() if len(v) &gt; 0} for celltype, genes in available_reference.items(): sc.tl.score_genes(adata, gene_list=genes, score_name=f&#8221;{celltype}_score&#8221;, use_raw=False) score_cols = [f&#8221;{ct}_score&#8221; for ct in available_reference.keys()] cluster_scores = adata.obs.groupby(&#8220;leiden&#8221;)[score_cols].mean() display(cluster_scores) cluster_to_celltype = {} for cluster in cluster_scores.index: best = cluster_scores.loc[cluster].idxmax().replace(&#8220;_score&#8221;, &#8220;&#8221;) cluster_to_celltype[cluster] = best adata.obs[&#8220;cell_type&#8221;] = adata.obs[&#8220;leiden&#8221;].map(cluster_to_celltype).astype(&#8220;category&#8221;) print(&#8220;nCluster to cell-type mapping:&#8221;) display(pd.DataFrame.from_dict(cluster_to_celltype, orient=&#8221;index&#8221;, columns=[&#8220;assigned_cell_type&#8221;])) sc.pl.umap( adata, color=[&#8220;leiden&#8221;, &#8220;cell_type&#8221;], legend_loc=&#8221;on data&#8221;, wspace=0.45 ) sc.tl.rank_genes_groups(adata, groupby=&#8221;cell_type&#8221;, method=&#8221;wilcoxon&#8221;) sc.pl.rank_genes_groups(adata, n_genes=15, sharey=False) celltype_markers = sc.get.rank_genes_groups_df(adata, group=None) print(&#8220;nTop markers by annotated cell type:&#8221;) display( celltype_markers.groupby(&#8220;group&#8221;).head(8)[[&#8220;group&#8221;, &#8220;names&#8221;, &#8220;logfoldchanges&#8221;, &#8220;pvals_adj&#8221;]] ) cluster_prop = ( adata.obs[&#8220;cell_type&#8221;] .value_counts(normalize=True) .mul(100) .round(2) .rename(&#8220;percent&#8221;) .to_frame() ) print(&#8220;nCell-type proportions (%):&#8221;) display(cluster_prop) plt.figure(figsize=(7, 4)) cluster_prop[&#8220;percent&#8221;].sort_values().plot(kind=&#8221;barh&#8221;) plt.xlabel(&#8220;Percent of cells&#8221;) plt.ylabel(&#8220;Cell type&#8221;) plt.title(&#8220;Estimated cell-type composition&#8221;) plt.tight_layout() plt.show() output_dir = &#8220;scanpy_pbmc3k_outputs&#8221; os.makedirs(output_dir, exist_ok=True) adata.write(os.path.join(output_dir, &#8220;pbmc3k_scanpy_advanced.h5ad&#8221;)) marker_table.to_csv(os.path.join(output_dir, &#8220;cluster_markers.csv&#8221;), index=False) celltype_markers.to_csv(os.path.join(output_dir, &#8220;celltype_markers.csv&#8221;), index=False) cluster_scores.to_csv(os.path.join(output_dir, &#8220;cluster_score_matrix.csv&#8221;)) print(f&#8221;nSaved outputs to: {output_dir}&#8221;) print(&#8220;Files:&#8221;) for f in sorted(os.listdir(output_dir)): print(&#8221; -&#8220;, f) summary = { &#8220;n_cells_final&#8221;: int(adata.n_obs), &#8220;n_genes_final&#8221;: int(adata.n_vars), &#8220;n_clusters&#8221;: int(adata.obs[&#8220;leiden&#8221;].nunique()), &#8220;clusters&#8221;: sorted(adata.obs[&#8220;leiden&#8221;].unique().tolist()), &#8220;cell_types&#8221;: sorted(adata.obs[&#8220;cell_type&#8221;].unique().tolist()), } print(&#8220;nAnalysis summary:&#8221;) for k, v in summary.items(): print(f&#8221;{k}: {v}&#8221;) We score each cell using known marker gene sets and assign probable cell types to clusters based on expression patterns. We visualize the annotated cell types on the UMAP embedding and perform differential gene expression analysis across the predicted cell populations. Also, we compute cell-type proportions, generate summary visualizations, and save the processed dataset and analysis outputs for further research. In conclusion, we developed a full end-to-end workflow for analyzing single-cell transcriptomic data using Scanpy. We performed preprocessing, clustering, marker-gene analysis, and cell-type annotation, and visualized the data structure using UMAP and gene expression plots. By saving the processed AnnData object and analysis outputs, we created a reusable dataset for further biological interpretation and advanced modeling. This workflow demonstrates how Scanpy enables scalable, reproducible single-cell analysis through a structured, modular Python pipeline. Check out the\u00a0Full Codes here.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0120k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-76333","post","type-post","status-publish","format-standard","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/th\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/th\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-09T12:14:53+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation\",\"datePublished\":\"2026-03-09T12:14:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\"},\"wordCount\":552,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\",\"url\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\",\"name\":\"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"datePublished\":\"2026-03-09T12:14:53+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/th\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/th\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/","og_locale":"th_TH","og_type":"article","og_title":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/th\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-03-09T12:14:53+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin NU","Est. reading time":"7 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation","datePublished":"2026-03-09T12:14:53+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/"},"wordCount":552,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"th","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/","url":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/","name":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"datePublished":"2026-03-09T12:14:53+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/a-coding-guide-to-build-a-complete-single-cell-rna-sequencing-analysis-pipeline-using-scanpy-for-clustering-visualization-and-cell-type-annotation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"A Coding Guide to Build a Complete Single Cell RNA Sequencing Analysis Pipeline Using Scanpy for Clustering Visualization and Cell Type Annotation"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/th\/members\/adminnu\/"}]}},"rttpg_featured_image_url":null,"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/th\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/th\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/th\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In this tutorial, we build a complete pipeline for single-cell RNA sequencing analysis using Scanpy. We start by installing the required libraries and loading the PBMC 3k dataset, then perform quality control, filtering, and normalization to prepare the data for downstream analysis. We then identify highly variable genes, perform PCA for dimensionality reduction, and construct&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/76333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/comments?post=76333"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/posts\/76333\/revisions"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/media?parent=76333"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/categories?post=76333"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/th\/wp-json\/wp\/v2\/tags?post=76333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}