| Finding Clique using Backtracking Algorithm |
Introduction
A clique in a graph is a maximal complete subgraph of three or more nodes. Thee are nodes that are also adjacent to all of the members of the clique. The mathematical definition can be written as;
- Given a graph G = (V, E) and an integer k, a clique is a subset U of V with |U | ³ k such that every pair of vertices in U is joined by an edge. Then, we call U a k-clique of the graph G.
The clique problem is in a class of NP-complete problems, along with other hard problems such as Knapsack packing, graph coloring, Hamiltonian path, vertex cover, N-puzzle, 3-SAT, and famous TSP (Traveling salesman problem). Up until now, no better algorithm than an exponential algorithm for such problems is known although many heuristic algorithms give us reasonably good approximations of the optimal solutions. NP-completeness is thought as a class in which if one of such problems is solved to a polynomial time, all instances of NP-complete problems are reducible to a polynomial time. Then, it will be a greatest innovation in modern science.
Applications
Like instances of other graph problems. finding a maximum clique problem (or decision problem - is there a clique in a graph) can be widely seen in numerous fields of science and engineering applications, even business application. For example, DNA molecular solution problem, data partitioning problem in embedded processor-based systems (memory chips), matching points problem in information systems. imaging process problem to remote sensing and astrophysics, and thousands more.
![]() |
![]() |
| Example 1. Consider a pool of DNA molecules corresponding to the total ensemble of four-vertex cliques was built, followed by a series of selection processes. The molecule 2, 3, 4, and 5, is a 4-clique; as you can see, every node in the subgraph is connected to every other nodes in the subgraph. | Example 2. Consider a map coloring in which each city is to be colored differently from other neighbor cities. The number of different colors they faced represents the number of cities that you can travel directly from one to another, which is equivalent to a clique in a graph. Note that the city B, C, F, and J are adjacent to every other cities in the sub-province, forming a 4-clique in the graph. |
Algorithms
The algorithm for finding a k-clique in an undirected graph is highly parallel and represents a class of solving NP-complete search problems. The algorithm seems simple - systematically list all possible sets of exactly k nodes and for each such set, check whether all pairs are neighbors in a selected subgraph. However, this brute force algorithm to find a clique in a graph is unfortunately a historically hard problem, NP-complete. The algorithm is polynomial if k is constant, but not if k varies. A better one can be done by starting with each node as a clique of one, and to merge cliques into larger cliques until there are no more possible merges to check. Two cliques A and B may be merged if each node in clique A is joined to each node in clique B. For the former method of finding clique, as you expected, a searching tree of a clique problem is spreading exponentially although it depends strongly on a structure of a graph.
![]() |
![]() |
| Backtracking 1. Permutate all possible subgraphs, and check if every node in a subgraph is connected to every other nodes in that subgraph and the size of clique is equal to k. | Backtracking 2. Permuate only subgraphs in which every node in a subgraph is connected to every other nodes in that subgraph, and check if the size of clique is equal to k. In this manner, cutoffs occurs in pruning. |
Implementation in Java
The implementation is to demonstrate backtracking, by Depth-First Search,
used to determine the number of k-cliques in a graph G, and
print the first such clique found, where the value of k is
floor( (log2 n)/2 ), n = |V
|. For the purpose of the program, the data structure for a graph is not
complete. Indeed, it's very simple and limited. In addition, although a good
program should check input validation, this implementation assumes it never
receives invalid inputs. So, be aware that users should make sure input is in a
correct form.
NOTE on input : The first line MUST contain ONLY the number of
vertices. Then, a pair of vertices separated by space can follow from the second
line. And, ONLY ONE of any duplicate edges such as (u,v) and (v,u)
should be entered.
| Input should like this : 16 0 1 0 5 0 6 0 14 1 5 1 6 1 14 2 4 2 10 3 4 3 15 4 6 4 7 4 10 5 14 6 14 7 9 8 9 8 12 8 13 10 15 11 12 11 13 12 13 The above input can be a representation of the network graph shown on the right. |
![]() |
- Source code : CliqueBT.java
- Executable : runCliqueBT.zip (Extract the files in the same directory, then double click on the batch file named runDFS.bat)
- Helper tools for testing : Random graph generator -
rndgraph.exe
(Windows binary) - see
instruction. And also, check
visual random graph generator that generates and actually draws a
graph. These programs are thankfully created by Prof. Ruzzo, CSE,
University of Washington.
Running Time of the Algorithm and the Complexity Analysis
The measurements are made in two ways that the environmental variables are controlled alternatively. For each environmental variable, the running time is measured upon various different graphs.
System Summary: Intel Pentium 4 2.4GHz with 1Mb L2 Cache, 512Mb RAM
In Plot 1), the number of edges is fixed to 2370, and the running time is measured upon different graph size - the number of vertices in a graph varies. For a certain range of vertices, the running time is moderately increasing in polynomial behavior. Specifically, in the Plot 1), when the number of vertices lies in the range of 0~1020, the running time obeys a polynomial behavior. However, if the number of vertices exceeds about 1030, the running time increases with an astronomical growth rate. Although we can’t see in the plot, the running time will probably increase with a polynomial growth rate again up to the point where the number of vertices reaches about 4100. Then, after that point, it will jump with another astronomical growth rate. To visualize, the asymptotic behavior may have a stair-like-shape: switching a polynomial growth and a vertical growth alternatively, having a certain amount of term. This is because although the k-clique algorithm looks like a polynomial algorithm, it actually takes an exponential time, depending on the value of k. Indeed, we can be convinced if we notice that the “critical” points such as 1030, 4100, etc are the numbers of vertices which change the size of clique, from 4 to 5 and 5 to 6, respectively. Thus, the asymptotic behavior obeys O(nk) worst-case time complexity, where k = O(log n) in this particular instance of the problem.

In Plot 2), the number of vertices is fixed to 260, and the running time is measured on different graph density – the number of edges in a graph varies. In this case, since the number of vertices is fixed to 260, the value of k is also fixed to 4. Thus, as we expect, the running time increases in a polynomial time. In fact, a polynomial of degree of 4 is used for the best-fit curve, and it reasonably fits the data. Therefore, the time complexity is O(n4) in the worst-case.
To conclude, even though the k-clique algorithm using backtracking may depend strongly on the structure of a graph, it still has the time complexity of O(nk) in the worst-case, where the size of clique, k, is a variable. (where n can be either the number of vertices or edges since either one of them can be interchangeably expressed as a function with respect to the other). Thus, it is an exponential algorithm, which is NP-complete.
by Hyung-Joon Kim, CSE 417, University of Washington




