In this tutorial, we build an end-to-end spatial graph learning pipeline using city2graph. We start by collecting real urban POI data and street network information from OpenStreetMap, with a synthetic fallback to ensure the workflow remains reliable. We then engineer spatial features, construct multiple proximity graph families, and compare how different graph-building strategies represent the same urban environment. After that, we create both heterogeneous and homogeneous graph structures, convert them into PyTorch Geometric format, and train a GraphSAGE model to predict POI categories from spatial structure. Through this process, we integrate geospatial data processing, graph construction, and GNN-based urban function inference into a single practical workflow. Installing city2graph and Importing Geospatial and Graph Learning Libraries Copy CodeCopiedUse a different Browser !pip -q install “city2graph[cpu]” osmnx contextily scikit-learn 2>/dev/null import warnings, numpy as np, pandas as pd, geopandas as gpd warnings.filterwarnings(“ignore”) from shapely.geometry import Point import matplotlib.pyplot as plt import city2graph as c2g print(“city2graph version:”, getattr(c2g, “__version__”, “unknown”)) print(“PyTorch / PyG available:”, c2g.is_torch_available()) import torch import torch.nn.functional as F from torch_geometric.nn import SAGEConv, to_hetero from torch_geometric.utils import to_undirected from sklearn.preprocessing import StandardScaler from sklearn.neighbors import NearestNeighbors from sklearn.metrics import accuracy_score, f1_score from sklearn.decomposition import PCA SEED = 42 np.random.seed(SEED); torch.manual_seed(SEED) We begin by installing the required libraries and importing the geospatial, graph learning, and machine learning tools used throughout the tutorial. We verify that city2graph and PyTorch Geometric are available so the rest of the workflow can run properly. We also set a fixed random seed to make the graph construction, training split, and model results more reproducible. Collecting OpenStreetMap POI Data with a Synthetic Fallback Copy CodeCopiedUse a different Browser CENTER = (35.6595, 139.7005) DIST_M = 1100 TAG_QUERIES = { “food”: {“amenity”: [“restaurant”, “cafe”, “fast_food”, “bar”, “pub”]}, “retail”: {“shop”: True}, “education”: {“amenity”: [“school”, “university”, “college”, “kindergarten”, “library”]}, “health”: {“amenity”: [“hospital”, “clinic”, “pharmacy”, “doctors”, “dentist”]}, } def to_points(gdf): g = gdf.copy() g[“geometry”] = g.geometry.representative_point() return g poi_gdf, segments_gdf = None, None try: import osmnx as ox ox.settings.use_cache = True ox.settings.log_console = False frames = [] for label, tags in TAG_QUERIES.items(): try: f = ox.features_from_point(CENTER, tags=tags, dist=DIST_M) f = f[f.geometry.notna()] if len(f): f = to_points(f)[[“geometry”]].copy() f[“category”] = label frames.append(f) except Exception as e: print(f” (skip {label}: {e})”) if not frames: raise RuntimeError(“No POIs returned from Overpass.”) poi_gdf = gpd.GeoDataFrame(pd.concat(frames, ignore_index=True), crs=”EPSG:4326″) G = ox.graph_from_point(CENTER, dist=DIST_M, network_type=”walk”) segments_gdf = ox.graph_to_gdfs(G, nodes=False, edges=True).reset_index(drop=True)[[“geometry”]] print(f”OSM acquisition OK -> {len(poi_gdf)} POIs, {len(segments_gdf)} street segments”) except Exception as e: print(f”OSM unavailable ({e}) -> generating synthetic clustered POIs.”) rng = np.random.default_rng(SEED) cats = list(TAG_QUERIES.keys()) centers = rng.uniform(-0.01, 0.01, size=(8, 2)) + np.array(CENTER[::-1]) rows = [] for ci, c in enumerate(centers): dom = cats[ci % len(cats)] n = rng.integers(40, 90) pts = c + rng.normal(0, 0.0016, size=(n, 2)) for (lon, lat) in pts: cat = dom if rng.random() < 0.75 else rng.choice(cats) rows.append({“geometry”: Point(lon, lat), “category”: cat}) poi_gdf = gpd.GeoDataFrame(rows, crs=”EPSG:4326″) segments_gdf = None print(f”Synthetic dataset -> {len(poi_gdf)} POIs”) if len(poi_gdf) > 700: poi_gdf = poi_gdf.sample(700, random_state=SEED).reset_index(drop=True) metric_crs = poi_gdf.estimate_utm_crs() poi_gdf = poi_gdf.to_crs(metric_crs).reset_index(drop=True) if segments_gdf is not None: segments_gdf = segments_gdf.to_crs(metric_crs) print(“Class balance:n”, poi_gdf[“category”].value_counts()) We collect real POI data from OpenStreetMap around Shibuya, Tokyo, and group the locations into broad urban function categories such as food, retail, education, and health. We also download the walkable street network so that the POIs can later be connected with urban-form features. If the OSM request fails, we generate a synthetic clustered dataset, which keeps the tutorial runnable even when online data access is unavailable. Engineering Spatial Features and Building Proximity Graph Families Copy CodeCopiedUse a different Browser poi_gdf[“cx”] = poi_gdf.geometry.x poi_gdf[“cy”] = poi_gdf.geometry.y coords = poi_gdf[[“cx”, “cy”]].to_numpy() nn = NearestNeighbors(radius=150.0).fit(coords) poi_gdf[“local_density”] = [len(idx) – 1 for idx in nn.radius_neighbors(coords, return_distance=False)] if segments_gdf is not None and len(segments_gdf): try: joined = gpd.sjoin_nearest(poi_gdf[[“geometry”]], segments_gdf[[“geometry”]], distance_col=”dist_street”) poi_gdf[“dist_street”] = joined.groupby(level=0)[“dist_street”].min().reindex(poi_gdf.index).fillna(0.0) except Exception: poi_gdf[“dist_street”] = 0.0 else: poi_gdf[“dist_street”] = 0.0 poi_gdf[“category”] = poi_gdf[“category”].astype(“category”) poi_gdf[“label”] = poi_gdf[“category”].cat.codes.astype(int) CLASS_NAMES = list(poi_gdf[“category”].cat.categories) print(“Classes:”, CLASS_NAMES) def graph_stats(name, builder): try: nodes, edges = builder() deg = pd.Series(np.r_[edges.index.get_level_values(0), edges.index.get_level_values(1)]).value_counts() return name, len(edges), round(deg.mean(), 2), (nodes, edges) except Exception as e: return name, f”ERR: {e}”, None, None builders = { “KNN (k=8)”: lambda: c2g.knn_graph(poi_gdf, distance_metric=”euclidean”, k=8, as_nx=False), “Delaunay”: lambda: c2g.delaunay_graph(poi_gdf, as_nx=False), “Gabriel”: lambda: c2g.gabriel_graph(poi_gdf, as_nx=False), “RNG”: lambda: c2g.relative_neighborhood_graph(poi_gdf, as_nx=False), “EMST”: lambda: c2g.euclidean_minimum_spanning_tree(poi_gdf, as_nx=False), “Waxman”: lambda: c2g.waxman_graph(poi_gdf, distance_metric=”euclidean”, r0=150, beta=0.6), } print(“n— Proximity graph comparison —“) print(f”{‘graph’:<14}{‘#edges’:>10}{‘avg_degree’:>12}”) built = {} for nm, b in builders.items(): name, ne, avgdeg, payload = graph_stats(nm, b) print(f”{name:<14}{str(ne):>10}{str(avgdeg):>12}”) if payload: built[nm] = payload fig, axes = plt.subplots(1, 3, figsize=(16, 5)) for ax, key in zip(axes, [“KNN (k=8)”, “Delaunay”, “EMST”]): if key in built: n_, e_ = built[key] e_.plot(ax=ax, linewidth=0.4, color=”#3b7dd8″, alpha=0.6) poi_gdf.plot(ax=ax, markersize=4, color=”#d83b5c”) ax.set_title(key); ax.set_axis_off() plt.suptitle(“Spatial graph topologies on the same POI set”, y=1.02) plt.tight_layout(); plt.show() We engineer spatial features for each POI by extracting its projected coordinates, calculating local density, and estimating distance to the nearest street segment. We then assign category labels and build several families of proximity graphs, including KNN, Delaunay, Gabriel, RNG, EMST, and Waxman. We compare their edge counts and average degrees, then visualize selected graph topologies to see how differently they connect the same set of POIs. Constructing Heterogeneous and Homogeneous Graphs in PyTorch Geometric Copy CodeCopiedUse a different Browser nodes_dict = {} for cat in CLASS_NAMES: sub = poi_gdf[poi_gdf[“category”] == cat].copy().reset_index(drop=True) nodes_dict[cat] = sub[[“geometry”, “cx”, “cy”, “local_density”]] try: _, bridge_edges = c2g.bridge_nodes(nodes_dict, proximity_method=”knn”, k=3, distance_metric=”euclidean”) hetero = c2g.gdf_to_pyg( nodes_dict, bridge_edges, node_feature_cols={cat: [“cx”, “cy”, “local_density”] for cat in CLASS_NAMES}, ) print(“nHeteroData node types:”, hetero.node_types) print(“HeteroData edge types:”) for et in hetero.edge_types: print(f” {et}: {hetero[et].edge_index.shape[1]} edges”) except Exception as e: hetero = None print(“Heterogeneous build skipped:”, e) nodes, edges = c2g.knn_graph(poi_gdf, distance_metric=”euclidean”, k=8, as_nx=False) deg = pd.Series(np.r_[edges.index.get_level_values(0), edges.index.get_level_values(1)]).value_counts() nodes[“degree”] = deg.reindex(nodes.index).fillna(0).astype(float) for col in [“cx”, “cy”, “local_density”, “dist_street”, “label”]: if col not in nodes.columns: nodes[col] = poi_gdf.loc[nodes.index, col].values FEATS = [“cx”, “cy”, “local_density”, “dist_street”, “degree”] nodes[FEATS] = StandardScaler().fit_transform(nodes[FEATS].astype(float)) data = c2g.gdf_to_pyg(nodes, edges, node_feature_cols=FEATS, node_label_cols=[“label”]) data.edge_index = to_undirected(data.edge_index) data.x = data.x.float() y = data.y.long().view(-1) N, num_classes = data.num_nodes, int(y.max()) + 1 print(f”nHomogeneous Data: {N} nodes, {data.edge_index.shape[1]} directed-edges, ” f”{data.x.shape[1]} features, {num_classes} classes”) We construct