Graph Traversal Algorithms in Python: BFS and DFS Deep Dive with NetworkX

I. Introduction

Graphs are one of the most fundamental data structures in mathematics and computer science, capturing relationships between entities via vertices (nodes) and edges (links). Formally, a graph is defined as:

\[G = (V, E)\]

where \( V \) is a non-empty finite set of vertices, and \( E \) is a set of edges representing binary relations over \( V \). In an undirected graph, an edge \( e \in E \) is an unordered pair \( (u, v) \), while in a directed graph (or digraph), \( e = (u, v) \) is an ordered pair indicating a directed connection from \( u \) to \( v \).

Graphs are ubiquitous: they appear in transportation networks, social media, web indexing, biology, linguistics, compiler theory, and AI planning. Efficiently analyzing these structures often requires graph traversal algorithms, which systematically explore nodes and edges to extract structural and topological insights.

Why Traversal?

Traversal is not merely visiting nodes, it is the basis of solving a wide class of problems:

  • Detecting cycles in dependency graphs
  • Enumerating connected components in sparse networks .
  • Constructing traversal trees for hierarchical representations
  • Performing layered searches (e.g., BFS) for optimal reachability
  • Classifying edges in directed acyclic graphs (DAGs)

Two canonical traversal algorithms are:

  • Breadth-First Search (BFS): explores neighbors level by level using a queue (FIFO)
  • Depth-First Search (DFS): explores as deep as possible before backtracking, typically using a stack (LIFO) or recursion

Both algorithms operate on the same input, the graph \(G = (V, E)\), but produce different traversal trees and expose different structural properties. From a theoretical standpoint, they offer insight into:

  • Graph connectivity
  • Path existence and reachability
  • Layering and distance (BFS)
  • Parent-child structure and edge classification (DFS)

Why NetworkX in Python?

While one could implement BFS and DFS from scratch using adjacency lists or matrices, the NetworkX library provides:

  • Clean abstractions over different graph types (e.g., Graph, DiGraph)
  • Support for both directed/undirected and weighted/unweighted graphs
  • Efficient traversal utilities: bfs_tree(), dfs_edges(), etc.
  • Integration with matplotlib for structural visualization

In this article, we will explore BFS and DFS both theoretically and practically. We'll analyze their computational complexity, traversal characteristics, and Python implementations using NetworkX. Our goal is not only to use these algorithms, but to understand their behavior deeply, mathematically, algorithmically, and visually.

II. Graph Representations

Before traversing a graph, we must choose how to represent it. The representation directly influences the algorithm's time and space complexity. Mathematically, a graph is defined as: \(G = (V, E)\) where \(V = \{v_1, v_2, ..., v_n\}\) is the vertex set (with cardinality \(n = |V|\)), and \(E ⊆ V × V\) is the edge set. In practice, we must encode \(E\) efficiently. Two classic representations are:

1. Adjacency Matrix

An \(n × n\) matrix \(A\) where:


A[i][j] = 
    1   if there exists an edge from vertex i to vertex j
    0   otherwise

Formally: Let \(V = \{v_1, v_2, ..., v_n\}\), then the adjacency matrix \(A ∈ \{0,1\}^{n×n}\) is defined as:

\[A[i][j] = 1 ⟺ (v_i, v_j) ∈ E\]
  • Pros: Constant-time edge existence check: \(O(1)\)
  • Cons: Space complexity: \(O(n²)\) regardless of sparsity

2. Adjacency List

A mapping from each vertex to a list (or set) of its adjacent vertices:

\[Adj(v) = \{u ∈ V | (v, u) ∈ E\}\]

In Python, this is often implemented as a dictionary of lists or sets. For sparse graphs, this yields space complexity: \(O(n + m)\) where \(m = |E|\) This representation is optimal for traversal algorithms like BFS and DFS.

3. NetworkX Internal Representation

NetworkX abstracts graphs as Python objects, with adjacency list storage under the hood. For a simple graph:

import networkx as nx

G = nx.Graph()
G.add_edges_from([
    (1, 2), (1, 3), (2, 4), (3, 4)
])
Undirected simple graph using NetworkX
Figure 1: Undirected simple graph created using NetworkX

Internally, NetworkX stores:

G._adj = {
    1: {2: {}, 3: {}},
    2: {1: {}, 4: {}},
    3: {1: {}, 4: {}},
    4: {2: {}, 3: {}}
}

where each entry maps a node to its neighbors, with optional edge attribute dictionaries.

4. Directed and Weighted Graphs

NetworkX supports directed graphs via nx.DiGraph(), and edge weights can be stored as attributes:

DG = nx.DiGraph()
DG.add_edge('A', 'B', weight=3.7)

Mathematically, we interpret the edge set as a function:

\[E: V × V → ℝ^+\]

where \(E(u, v)\) returns the edge weight between \(u\) and \(v\), if such an edge exists.

5. Sparse vs Dense Graphs

The density of a graph is defined as:

\[ D = \frac{2 \left| E \right|}{\left| V \right| \left( \left| V \right| - 1 \right)} \]

For sparse graphs \(D ≪ 1\), adjacency lists are preferable. For dense graphs \(D → 1\), adjacency matrices allow faster edge queries, at the cost of space.

Summary

  • Adjacency Matrix: Best for dense graphs, fast edge lookup, poor space efficiency
  • Adjacency List: Best for traversal, scales well with sparse graphs
  • NetworkX: Encapsulates both, uses adjacency list internally, and supports advanced graph operations with ease

In the next section, we will implement Breadth-First Search (BFS) and analyze its traversal properties and theoretical behavior.

III. Breadth-First Search (BFS)

Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores vertices in layers. It starts from a source vertex and visits all its immediate neighbors before proceeding to neighbors of neighbors, and so on.

1. Mathematical Model

Given a graph G = (V, E) and a source vertex s ∈ V, BFS computes the shortest path (in terms of edge count) from s to every vertex reachable from s.

It constructs a level function:

L: V → ℕ ∪ \{∞\}

where L(v) is the number of edges on the shortest path from s to v. If v is unreachable, L(v) = ∞.

The traversal forms a spanning tree (or forest in disconnected graphs), called the BFS tree, rooted at s.

2. Algorithm Description

BFS uses a first-in-first-out (FIFO) queue to maintain the frontier of exploration.

Input: Graph G = (V, E), starting vertex s ∈ V
Output: Distances L(v) and BFS tree T

1. Initialize all L(v) ← ∞
2. Set L(s) ← 0
3. Create a queue Q and enqueue s
4. While Q ≠ ∅:
     u ← Q.dequeue()
     For each neighbor v of u:
         If L(v) = ∞:
             L(v) ← L(u) + 1
             Parent[v] ← u
             Enqueue v

Time complexity: O(|V| + |E|), assuming adjacency list representation.

3. BFS in Python using NetworkX

We begin with a simple undirected graph:

import networkx as nx

G = nx.Graph()
G.add_edges_from([
    ('A', 'B'), ('A', 'C'),
    ('B', 'D'), ('C', 'E'),
    ('D', 'F'), ('E', 'F')
])

NetworkX provides built-in BFS functions:

bfs_tree = nx.bfs_tree(G, source='A')
list(bfs_tree.edges())
# Output: [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'E'), ('D', 'F')]

This gives a BFS spanning tree rooted at 'A'.

Custom BFS Implementation

from collections import deque

def bfs_custom(graph, start):
    visited = set()
    level = {start: 0}
    parent = {start: None}
    queue = deque([start])

    while queue:
        u = queue.popleft()
        for v in graph.neighbors(u):
            if v not in visited:
                visited.add(v)
                level[v] = level[u] + 1
                parent[v] = u
                queue.append(v)
    return level, parent

This provides the layer (distance) of each node from the source, and the BFS tree via the parent map.

4. Visualization of BFS Layers

We can visualize the BFS layers by mapping node colors to their distance from the source:

import matplotlib.pyplot as plt

level, _ = bfs_custom(G, 'A')
colors = [level.get(n, 0) for n in G.nodes]

pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, with_labels=True, node_color=colors, cmap=plt.cm.viridis)
plt.show()

The spring layout and colormap help illustrate how far each node is from the root, layer by layer.

5. Properties and Use Cases

  • Shortest-path guarantee: For unweighted graphs, BFS finds the shortest path from s to any v.
  • Connected component exploration: BFS can enumerate all nodes reachable from a given vertex.
  • Layered structure: Each frontier expands outward in discrete levels.
  • Bipartite graph testing: Check if the graph can be 2-colored using BFS levels.

6. Theoretical Notes

  • The BFS tree is a subgraph T ⊆ G where for each v reachable from s, the unique path in T is the shortest path in G.
  • On disconnected graphs, BFS produces a forest by restarting on unvisited vertices.
  • For a DAG, BFS does not provide topological order, but it helps identify reachable vertices layer-wise.

In the next section, we’ll analyze Depth-First Search (DFS), including recursive structure, edge classification, and practical applications like cycle detection.

IV. Depth-First Search (DFS)

Depth-First Search (DFS) is a graph traversal algorithm that explores as far along each branch as possible before backtracking. Unlike BFS, which operates in levels, DFS uses depth-oriented exploration to uncover rich structural information.

1. Theoretical Framework

Given a graph G = (V, E), DFS constructs a spanning tree (or forest) by recursively visiting unvisited neighbors. The traversal defines:

  • Discovery time: The time a node is first visited
  • Finish time: The time all descendants have been explored

Mathematically, DFS defines a parent function π: V → V ∪ {None} forming a rooted tree:

∀v ∈ V_reachable, π(v) is the vertex from which v was first discovered

DFS can be implemented recursively or iteratively via an explicit stack (LIFO). The time complexity is:

O(|V| + |E|)

assuming adjacency list representation.

2. DFS Edge Classification

DFS traversals allow us to classify edges based on their relation to the DFS tree:

  • Tree edge: First visit to an unvisited vertex
  • Back edge: Edge to an ancestor (forms a cycle)
  • Forward edge: Edge to a descendant (not tree edge)
  • Cross edge: Edge between subtrees or unrelated vertices

Edge classification is meaningful in directed graphs. In undirected graphs, only tree and back edges appear.

3. DFS in Python using NetworkX

import networkx as nx

G = nx.DiGraph()
G.add_edges_from([
    ('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), 
    ('D', 'E'), ('E', 'B')  # cycle: B → D → E → B
])

dfs_edges = list(nx.dfs_edges(G, source='A'))
print(dfs_edges)
# Output: [('A', 'B'), ('B', 'D'), ('D', 'E'), ('E', 'C')]

This shows the traversal path, forming a DFS tree starting at 'A'.

Tracking Discovery & Finish Time

time = 0
discovery = {}
finish = {}

def dfs_time(G, u, visited):
    global time
    visited.add(u)
    time += 1
    discovery[u] = time
    for v in G.neighbors(u):
        if v not in visited:
            dfs_time(G, v, visited)
    time += 1
    finish[u] = time

visited = set()
dfs_time(G, 'A', visited)
print(discovery)
print(finish)

This captures the discovery and finish times for each node, which can be used to classify edges.

4. Visualization of DFS Tree

DFS traversal builds a tree or forest depending on graph connectivity. We can highlight the tree edges:

import matplotlib.pyplot as plt

tree = nx.DiGraph()
tree.add_edges_from(dfs_edges)

pos = nx.spring_layout(G, seed=42)
nx.draw_networkx(G, pos, alpha=0.4)
nx.draw_networkx_edges(tree, pos, edge_color='red', width=2)
nx.draw_networkx_labels(G, pos)
plt.title("DFS Tree (Red)")
plt.show()

5. Applications of DFS

  • Cycle detection: Presence of a back edge implies a cycle in a directed graph
  • Topological sorting: DFS post-order yields valid order if DAG
  • Connected components: Repeated DFS builds all connected subgraphs
  • Bridge and articulation point detection: Using low-link values (Tarjan’s algorithm)
  • Edge classification: Crucial in compiler optimizations and control-flow graphs

6. Theoretical Notes

  • DFS defines a total order on vertices via finish time
  • DFS tree is not unique: it depends on neighbor iteration order
  • For DAGs, all back edges are absent → used for DAG validation
  • DFS produces a depth-first forest if the graph is disconnected

In the next section, we will directly compare BFS and DFS across dimensions such as memory usage, traversal output, and algorithmic guarantees.