Graph Traversal Algorithms in Python: BFS and DFS Deep Dive with NetworkX
I. Introduction
Graphs are one of the most fundamental data structures in mathematics and computer science, capturing relationships between entities via vertices (nodes) and edges (links). Formally, a graph is defined as:
\[G = (V, E)\]where \( V \) is a non-empty finite set of vertices, and \( E \) is a set of edges representing binary relations over \( V \). In an undirected graph, an edge \( e \in E \) is an unordered pair \( (u, v) \), while in a directed graph (or digraph), \( e = (u, v) \) is an ordered pair indicating a directed connection from \( u \) to \( v \).
Graphs are ubiquitous: they appear in transportation networks, social media, web indexing, biology, linguistics, compiler theory, and AI planning. Efficiently analyzing these structures often requires graph traversal algorithms, which systematically explore nodes and edges to extract structural and topological insights.
Why Traversal?
Traversal is not merely visiting nodes, it is the basis of solving a wide class of problems:
- Detecting cycles in dependency graphs
- Enumerating connected components in sparse networks .
- Constructing traversal trees for hierarchical representations
- Performing layered searches (e.g., BFS) for optimal reachability
- Classifying edges in directed acyclic graphs (DAGs)
Two canonical traversal algorithms are:
- Breadth-First Search (BFS): explores neighbors level by level using a queue (FIFO)
- Depth-First Search (DFS): explores as deep as possible before backtracking, typically using a stack (LIFO) or recursion
Both algorithms operate on the same input, the graph \(G = (V, E)\), but produce different traversal trees and expose different structural properties. From a theoretical standpoint, they offer insight into:
- Graph connectivity
- Path existence and reachability
- Layering and distance (BFS)
- Parent-child structure and edge classification (DFS)
Why NetworkX in Python?
While one could implement BFS and DFS from scratch using adjacency lists or matrices, the NetworkX
library provides:
- Clean abstractions over different graph types (e.g.,
Graph
,DiGraph
) - Support for both directed/undirected and weighted/unweighted graphs
- Efficient traversal utilities:
bfs_tree()
,dfs_edges()
, etc. - Integration with
matplotlib
for structural visualization
In this article, we will explore BFS and DFS both theoretically and practically. We'll analyze their computational complexity, traversal characteristics, and Python implementations using NetworkX. Our goal is not only to use these algorithms, but to understand their behavior deeply, mathematically, algorithmically, and visually.
II. Graph Representations
Before traversing a graph, we must choose how to represent it. The representation directly influences the algorithm's time and space complexity. Mathematically, a graph is defined as: \(G = (V, E)\) where \(V = \{v_1, v_2, ..., v_n\}\) is the vertex set (with cardinality \(n = |V|\)), and \(E ⊆ V × V\) is the edge set. In practice, we must encode \(E\) efficiently. Two classic representations are:
1. Adjacency Matrix
An \(n × n\) matrix \(A\) where:
A[i][j] =
1 if there exists an edge from vertex i to vertex j
0 otherwise
Formally: Let \(V = \{v_1, v_2, ..., v_n\}\), then the adjacency matrix \(A ∈ \{0,1\}^{n×n}\) is defined as:
\[A[i][j] = 1 ⟺ (v_i, v_j) ∈ E\]- Pros: Constant-time edge existence check: \(O(1)\)
- Cons: Space complexity: \(O(n²)\) regardless of sparsity
2. Adjacency List
A mapping from each vertex to a list (or set) of its adjacent vertices:
\[Adj(v) = \{u ∈ V | (v, u) ∈ E\}\]In Python, this is often implemented as a dictionary of lists or sets. For sparse graphs, this yields space complexity: \(O(n + m)\) where \(m = |E|\) This representation is optimal for traversal algorithms like BFS and DFS.
3. NetworkX Internal Representation
NetworkX abstracts graphs as Python objects, with adjacency list storage under the hood. For a simple graph:
import networkx as nx
G = nx.Graph()
G.add_edges_from([
(1, 2), (1, 3), (2, 4), (3, 4)
])

Internally, NetworkX stores:
G._adj = {
1: {2: {}, 3: {}},
2: {1: {}, 4: {}},
3: {1: {}, 4: {}},
4: {2: {}, 3: {}}
}
where each entry maps a node to its neighbors, with optional edge attribute dictionaries.
4. Directed and Weighted Graphs
NetworkX supports directed graphs via nx.DiGraph()
, and edge weights can be stored as attributes:
DG = nx.DiGraph()
DG.add_edge('A', 'B', weight=3.7)
Mathematically, we interpret the edge set as a function:
\[E: V × V → ℝ^+\]where \(E(u, v)\) returns the edge weight between \(u\) and \(v\), if such an edge exists.
5. Sparse vs Dense Graphs
The density of a graph is defined as:
\[ D = \frac{2 \left| E \right|}{\left| V \right| \left( \left| V \right| - 1 \right)} \]For sparse graphs \(D ≪ 1\), adjacency lists are preferable. For dense graphs \(D → 1\), adjacency matrices allow faster edge queries, at the cost of space.
Summary
- Adjacency Matrix: Best for dense graphs, fast edge lookup, poor space efficiency
- Adjacency List: Best for traversal, scales well with sparse graphs
- NetworkX: Encapsulates both, uses adjacency list internally, and supports advanced graph operations with ease
In the next section, we will implement Breadth-First Search (BFS) and analyze its traversal properties and theoretical behavior.
III. Breadth-First Search (BFS)
Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores vertices in layers. It starts from a source vertex and visits all its immediate neighbors before proceeding to neighbors of neighbors, and so on.
1. Mathematical Model
Given a graph G = (V, E)
and a source vertex s ∈ V
, BFS computes the shortest path (in
terms of edge count) from s
to every vertex reachable from s
.
It constructs a level function:
L: V → ℕ ∪ \{∞\}
where L(v)
is the number of edges on the shortest path from s
to v
. If
v
is unreachable, L(v) = ∞
.
The traversal forms a spanning tree (or forest in disconnected graphs), called the BFS tree,
rooted at s
.
2. Algorithm Description
BFS uses a first-in-first-out (FIFO) queue to maintain the frontier of exploration.
Input: Graph G = (V, E), starting vertex s ∈ V
Output: Distances L(v) and BFS tree T
1. Initialize all L(v) ← ∞
2. Set L(s) ← 0
3. Create a queue Q and enqueue s
4. While Q ≠ ∅:
u ← Q.dequeue()
For each neighbor v of u:
If L(v) = ∞:
L(v) ← L(u) + 1
Parent[v] ← u
Enqueue v
Time complexity: O(|V| + |E|)
, assuming adjacency list representation.
3. BFS in Python using NetworkX
We begin with a simple undirected graph:
import networkx as nx
G = nx.Graph()
G.add_edges_from([
('A', 'B'), ('A', 'C'),
('B', 'D'), ('C', 'E'),
('D', 'F'), ('E', 'F')
])
NetworkX provides built-in BFS functions:
bfs_tree = nx.bfs_tree(G, source='A')
list(bfs_tree.edges())
# Output: [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'E'), ('D', 'F')]
This gives a BFS spanning tree rooted at 'A'
.
Custom BFS Implementation
from collections import deque
def bfs_custom(graph, start):
visited = set()
level = {start: 0}
parent = {start: None}
queue = deque([start])
while queue:
u = queue.popleft()
for v in graph.neighbors(u):
if v not in visited:
visited.add(v)
level[v] = level[u] + 1
parent[v] = u
queue.append(v)
return level, parent
This provides the layer (distance) of each node from the source, and the BFS tree via the parent
map.
4. Visualization of BFS Layers
We can visualize the BFS layers by mapping node colors to their distance from the source:
import matplotlib.pyplot as plt
level, _ = bfs_custom(G, 'A')
colors = [level.get(n, 0) for n in G.nodes]
pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, with_labels=True, node_color=colors, cmap=plt.cm.viridis)
plt.show()
The spring layout and colormap help illustrate how far each node is from the root, layer by layer.
5. Properties and Use Cases
- Shortest-path guarantee: For unweighted graphs, BFS finds the shortest path from
s
to anyv
. - Connected component exploration: BFS can enumerate all nodes reachable from a given vertex.
- Layered structure: Each frontier expands outward in discrete levels.
- Bipartite graph testing: Check if the graph can be 2-colored using BFS levels.
6. Theoretical Notes
-
The BFS tree is a subgraph
T ⊆ G
where for eachv
reachable froms
, the unique path inT
is the shortest path inG
. - On disconnected graphs, BFS produces a forest by restarting on unvisited vertices.
- For a DAG, BFS does not provide topological order, but it helps identify reachable vertices layer-wise.
In the next section, we’ll analyze Depth-First Search (DFS), including recursive structure, edge classification, and practical applications like cycle detection.
IV. Depth-First Search (DFS)
Depth-First Search (DFS) is a graph traversal algorithm that explores as far along each branch as possible before backtracking. Unlike BFS, which operates in levels, DFS uses depth-oriented exploration to uncover rich structural information.
1. Theoretical Framework
Given a graph G = (V, E)
, DFS constructs a spanning tree (or forest) by recursively visiting
unvisited neighbors. The traversal defines:
- Discovery time: The time a node is first visited
- Finish time: The time all descendants have been explored
Mathematically, DFS defines a parent function π: V → V ∪ {None}
forming a rooted tree:
∀v ∈ V_reachable, π(v) is the vertex from which v was first discovered
DFS can be implemented recursively or iteratively via an explicit stack (LIFO). The time complexity is:
O(|V| + |E|)
assuming adjacency list representation.
2. DFS Edge Classification
DFS traversals allow us to classify edges based on their relation to the DFS tree:
- Tree edge: First visit to an unvisited vertex
- Back edge: Edge to an ancestor (forms a cycle)
- Forward edge: Edge to a descendant (not tree edge)
- Cross edge: Edge between subtrees or unrelated vertices
Edge classification is meaningful in directed graphs. In undirected graphs, only tree and back edges appear.
3. DFS in Python using NetworkX
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([
('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'),
('D', 'E'), ('E', 'B') # cycle: B → D → E → B
])
dfs_edges = list(nx.dfs_edges(G, source='A'))
print(dfs_edges)
# Output: [('A', 'B'), ('B', 'D'), ('D', 'E'), ('E', 'C')]
This shows the traversal path, forming a DFS tree starting at 'A'
.
Tracking Discovery & Finish Time
time = 0
discovery = {}
finish = {}
def dfs_time(G, u, visited):
global time
visited.add(u)
time += 1
discovery[u] = time
for v in G.neighbors(u):
if v not in visited:
dfs_time(G, v, visited)
time += 1
finish[u] = time
visited = set()
dfs_time(G, 'A', visited)
print(discovery)
print(finish)
This captures the discovery and finish times for each node, which can be used to classify edges.
4. Visualization of DFS Tree
DFS traversal builds a tree or forest depending on graph connectivity. We can highlight the tree edges:
import matplotlib.pyplot as plt
tree = nx.DiGraph()
tree.add_edges_from(dfs_edges)
pos = nx.spring_layout(G, seed=42)
nx.draw_networkx(G, pos, alpha=0.4)
nx.draw_networkx_edges(tree, pos, edge_color='red', width=2)
nx.draw_networkx_labels(G, pos)
plt.title("DFS Tree (Red)")
plt.show()
5. Applications of DFS
- Cycle detection: Presence of a back edge implies a cycle in a directed graph
- Topological sorting: DFS post-order yields valid order if DAG
- Connected components: Repeated DFS builds all connected subgraphs
- Bridge and articulation point detection: Using low-link values (Tarjan’s algorithm)
- Edge classification: Crucial in compiler optimizations and control-flow graphs
6. Theoretical Notes
- DFS defines a total order on vertices via finish time
- DFS tree is not unique: it depends on neighbor iteration order
- For DAGs, all back edges are absent → used for DAG validation
- DFS produces a depth-first forest if the graph is disconnected
In the next section, we will directly compare BFS and DFS across dimensions such as memory usage, traversal output, and algorithmic guarantees.