|Articulation Points Detection Algorithm|
Articulation points in a network are those which are critical to communication: for an articulation point, all paths between certain nodes have to pass through this point. A vertex in a connected undirected graph is an articulation point if removing it and all edges incident to it results in a non-connected graph; in other words, removing a vertex leads a separation of its subtree from the other part of a graph. A connected graph is biconnected if it has no articulation points. A biconnected component of an undirected graph G = (V, E) is a maximal subset B of the edges with the property that the graph GB = (VB, B) is biconnected, where VB is the subset of vertices incident to edges in B. There is a very close relationship between biconnected components and articulation points. The articulation points are exactly the vertices at which two biconnected components are connected. Thus, the articulation point separates the biconnected components of a graph. The idea of articulation points and biconnected components plays an important role in any network graph in terms of its connectivity.
Not surprisingly, using the properties of articulation point and biconnected
component can be applied to any kind of network in which a certain point is
critical to communication; for example, hubs for airlines, electric circuit,
network wires, bonds in protein, traffic routers, and thousands of more
industrial applications in these days.
Click the pictures below to enlarge.
Finding articulation point can be nicely done by using Depth-First Search. In a DFS tree of an undirected graph, a node u is an articulation point, for every child v of u, if there is no back edge from v to a node higher in the DFS tree than u. That is, every node in the decedent tree of u have no way to visit other nodes in the graph without passing through the node u, which is the articulation point. Thus, for each node in DFS traversal, we calculate dfsnum(v) and LOW(v). The definition of LOW(v) is the lowest dfsnum of any vertex that is either in the DFS subtree rooted at v or connected to a vertex in that subtree by a back edge. Then, in DFS, if there is no more nodes to visit, we back up and update the values of LOW as we return from each recursive call.
Recall that DFS is a recursive algorithm, we make use of a stack to trace back the recursive calls. When we process an edge (u, x) - either by a recursive call on vertex x from vertex u, or (u, x) is back edge, we push that edge to a stack. Later, if we identify u as an articulation point, then all edges from the top of the stack down to (u, x) are the edges of one biconnected component. So, we pop edges out of the stack utill the top of the stack is (u, x). Those edges belong to a biconnected component.
Implementation in Java
The implementation is to demonstrate the articulation point algorithm using
Depth-First Search. Therefore, the data structure for a graph is not
complete. Indeed, it's very simple and limited. In addition, although a good
program should check input validation, this implementation assumes it never
receives invalid inputs. So, be aware that users should make sure input is in a
NOTE on input : The first line MUST contain ONLY the number of vertices. Then, a pair of vertices separated by space can follow from the second line. And, ONLY ONE of any duplicate edges such as (u,v) and (v,u) should be entered.
|Input should like this :
0 1 0 5 0 6 0 14 1 5
1 6 1 14 2 4 2 10 3 4
3 15 4 6 4 7 4 10 5 14
6 14 7 9 8 9 8 12 8 13
10 15 11 12 11 13 12 13
The above input can be a representation of the network graph shown on the right.
- Source code : ArticPointDFS.java
- Executable : runDFS.zip (Extract the files in the same directory, then double click on the batch file named runDFS.bat)
- Helper tools for testing : Random graph generator -
(Windows binary) - see
instruction. And also, check
visual random graph generator that generates and actually draws a
graph. These programs are thankfully created by Prof. Ruzzo, CSE,
University of Washington.
Running Time of the Algorithm and the Complexity Analysis
The measurements are made in three ways that the environmental variables are controlled alternatively. For each environmental variable, the running time is measured upon various different graphs. Each data represent the total time taken to perform the search for each graph 100 times.
System Summary : Intel Pentium 4 1.8GHz 512Kb L2 Cache with 512Mb RAM.
In Plot 1), the number of edges is fixed to 459, and the
running time is measured upon different graph size - the number of vertices in a
graph varies. Of course, most of the graphs are not qualified as a connected
graph since the number of edges is only 459; in fact, most of vertices are not
connected to any other vertices at all. In this case, since the algorithm checks
only small number of edges, the algorithm seems to run relatively faster than
the measurements on the graphs with a large number of edges. The asymptotic
behavior is linear, depending on the number of vertices on a graph. Therefore,
the time complexity is O(n) in the worst-case., where
n is the number of vertices in a graph.
In Plot 2), the number of vertices is fixed to 90, and the running time is measured on different graph density – the number of edges in a graph varies. In this case, a graph may have more connected components. The algorithm seems to run relatively slower than the case in Plot 1). However, the asymptotic behavior of the algorithm is still linear, depending on the number of edges in a graph. Therefore, the time complexity is O(m) in the worst case, where m is the number of edges in a graph.
In Plot 3), the average degree of vertices in a graph is fixed to 5, and the running time is measured on different overall graph size – the sum of the number of vertices and edges in a graph varies. In this case, a graph may be considered as a more balanced graph since the number of edges naturally increases as the number of vertices increase. The asymptotic behavior of the algorithm is still linear, depending on the number of both vertices and edges in a graph. Therefore, the time complexity is O(n+m) in the worst case, where n and m is the number of vertices and edges in a graph, respectively.
As mentioned earlier, the running time of the algorithm seems to depend relatively more on the number of edges than on the number of vertices. It’s because, if a graph has more edges incident to each vertex, the algorithm takes more works in each call of the recursive function to decompose the structure of the graph. Those works include numbering and calculating DFS number, DFS level, and LOW number, and retrieving all edges in biconnected components, and so on. However, the algorithm asymptotically runs at linear time in each case where other environmental variables are fixed. Not surprisingly, therefore, we can observe that the complexity of overall performance abides by O(n+m) in the worst case.
by Hyung-Joon Kim, CSE 417, University of Washington