I was coding kurskal’s clustering algorithm for a max-spacing k-clustering problem. To improve the efficiency of the clustering algorithm, I needed an efficient data structure to union two cluster and find an element in a cluster.
Union Find (disjoint-set) data structure is best option for above operations. Implemented the Union Find structure as per the algorithm specified in the Introduction to Algorithms book.
This Implementation includes Union by Rank and Path compression, which gives a amortized runtime of a(n) (i.e a – Ackermann function)
#! /usr/bin/env python | |
# | |
# Copyright 2012 Saravana Kumar(RIT) | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); you may | |
# not use this file except in compliance with the License. You may obtain | |
# a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
# License for the specific language governing permissions and limitations | |
# under the License. | |
""" | |
Author : Saravana Kumar | |
E-mail : saran8@gmail.com | |
Union-find data structure. | |
Based on Algorithm from Introduction to Algorithms Book(http://mitpress.mit.edu/books/introduction-algorithms), | |
""" | |
from node import Node | |
class UnionFind(object): | |
def __init__(self): | |
#To hold the clusters | |
self.clusters = [] | |
#create a new set(cluster) with a node | |
def makeSet(self,node): | |
#set the nodes parent to the node itself | |
node.parent = node | |
#set initial rank of node to 0 | |
node.rank = 0 | |
#add the node to cluster list | |
self.clusters.append(node) | |
#union the nodeA and nodeB clusters | |
def union(self, nodeA, nodeB): | |
self.link(self.findSet(nodeA), self.findSet(nodeB)) | |
#link the nodeA to nodeB or vice versa based upon the rank(number of nodes in the cluster) of the cluster | |
def link(self, nodeA, nodeB): | |
if nodeA.rank > nodeB.rank: | |
nodeB.parent = nodeA | |
#remove the nodeB from the cluster list, since it is merged with nodeA | |
self.clusters.remove(nodeB) | |
else: | |
nodeA.parent = nodeB | |
#remove the nodeA from the cluster list, since it is merged with nodeB | |
self.clusters.remove(nodeA) | |
#increade the rank of the cluster after merging the cluster | |
if nodeA.rank == nodeB.rank: | |
nodeB.rank = nodeB.rank + 1 | |
#find set will path compress(makes the nodes in cluster points to single leader/parent) and returns the leader/parent of the cluster | |
def findSet(self, node): | |
if node != node.parent: | |
node.parent = self.findSet(node.parent) | |
return node.parent | |
#get cluster size | |
def clusterSize(self): | |
return len(self.clusters) |
http://kpsnirmalsingh.wetpaint.com/
http://nirmalsinghkps.wordpress.com/