Saravana Kumar's

What matters

Union-Find Data Structure Python : For Clustering Algorithm

| Comments

I was coding kurskal’s clustering algorithm for a max-spacing k-clustering problem. To improve the efficiency of the clustering algorithm, I needed an efficient data structure to union two cluster and find an element in a cluster.

Union Find (disjoint-set) data structure is best option for above operations. Implemented the Union Find structure as per the algorithm specified in the Introduction to Algorithms book.

This Implementation includes Union by Rank and Path compression, which gives a amortized runtime of a(n) (i.e a – Ackermann function)

#! /usr/bin/env python
#
# Copyright 2012 Saravana Kumar(RIT)
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""
Author : Saravana Kumar
E-mail : saran8@gmail.com
Union-find data structure.
Based on Algorithm from Introduction to Algorithms Book(http://mitpress.mit.edu/books/introduction-algorithms),
"""
from node import Node
class UnionFind(object):
def __init__(self):
#To hold the clusters
self.clusters = []
#create a new set(cluster) with a node
def makeSet(self,node):
#set the nodes parent to the node itself
node.parent = node
#set initial rank of node to 0
node.rank = 0
#add the node to cluster list
self.clusters.append(node)
#union the nodeA and nodeB clusters
def union(self, nodeA, nodeB):
self.link(self.findSet(nodeA), self.findSet(nodeB))
#link the nodeA to nodeB or vice versa based upon the rank(number of nodes in the cluster) of the cluster
def link(self, nodeA, nodeB):
if nodeA.rank > nodeB.rank:
nodeB.parent = nodeA
#remove the nodeB from the cluster list, since it is merged with nodeA
self.clusters.remove(nodeB)
else:
nodeA.parent = nodeB
#remove the nodeA from the cluster list, since it is merged with nodeB
self.clusters.remove(nodeA)
#increade the rank of the cluster after merging the cluster
if nodeA.rank == nodeB.rank:
nodeB.rank = nodeB.rank + 1
#find set will path compress(makes the nodes in cluster points to single leader/parent) and returns the leader/parent of the cluster
def findSet(self, node):
if node != node.parent:
node.parent = self.findSet(node.parent)
return node.parent
#get cluster size
def clusterSize(self):
return len(self.clusters)
view raw unionFind.py hosted with ❤ by GitHub

Comments