DS: Disjoint Sets

Jump to bottom

Rishav Ray edited this page May 10, 2025 · 1 revision

A group of sets are disjoint if they have no elements in common. The disjoint-set data-structure is used to store such sets. The member-sets are typically represented with trees, and the collection of all the sets are typically represented with arrays of a predefined size, or even dictionaries in some cases.
In this data-structure, each member-set has a root element. Additionally, there are two operations that this data-structure supports: find & union.
The find operation gets the root-element of a member-set of an element. The union operation merges the member-sets of two elements based on a variety of factors- their respective root-elements, ranks, & sizes.
When representing a set with a tree, the root element is the root of the tree. The size of the set corresponds to the number of elements in the tree, and the rank of the set is defined as the height of the tree (i.e., the number of levels it has). Since the set is structured as a tree, a hierarchical relationship is present—every element except the root has a parent, and many elements may have one or more children.
When using a tree to represent a set, arrays can efficiently capture three important pieces of information about the group of disjoint sets: the parent of each element, the rank of each set (assuming the element is the root), and the size of each set (again, assuming the element is the root). Dictionaries work well where keys are elements(that may or may not necessarily translate well to indices of an array) and values are their respective parents/ranks/sizes.
There are two types of disjoint-set unions: union by rank and union by size. In union by rank, the root of the set with the lower rank becomes a child of the root of the set with the higher rank. If the two sets have different ranks, the rank of the resulting tree remains unchanged. However, if the sets have the same rank, it doesn't matter which root becomes the parent. In this case, the rank of the new root increases by 1 (i.e., new rank = original rank + 1). This behavior is intuitive when visualized as a tree structure.
In union by size, the root of the smaller set (i.e., the one with fewer elements) becomes a child of the root of the larger set. The size of the new set becomes the sum of the sizes of the two original sets. This makes logical sense, given that the sets are disjoint and their union simply combines all elements from both.
This data structure can be used to detect cycles in an undirected graph. For example, given all the nodes and edges of an undirected graph, you can process each edge one by one. For an edge connecting nodes 0 and 1, you perform a union operation between the set containing 0 and the set containing 1. If you encounter an edge where both nodes are already in the same set (i.e., have the same root), a cycle exists in the graph.