AVL Trees in Haskell

AVL Trees are binary search trees with a balance condition. The balance condition ensures that the height of the tree is bounded.

The two conditions satisfied by an AVL tree are:

  • order property: the value stored at every node, is larger than any value stored in the left subtree, and is smaller than any value stored in the right subtree.

  • balance property: at every node the difference in the height of the right and the left subtrees is at most 1.

Both the binary trees below obey the order property at every node. Let us see if they obey the height balance condition. The height of the subtree rooted at every node is listed next to the node. The height of an empty subtree is defined to be -1. The height of a leaf node is 0. The height of a node is defined as larger of the heights of its two subtrees plus 1. The path on the right does not satisfy the height balance condition at node 3. Therefore it is not an AVL tree. Consider the tree on the left, at every node the difference in the height of its subtrees is at most 1. Therefore the tree on the left is an AVL tree.

      (5) 2           (3) 2
      / \               \
     /   \               \
 1 (3)   (8) 1            (4) 1
   /\     /                \
  /  \   /                  \
(1)  (4) (7) 0               (5) 0
 0    0

The length of the longest path in an AVL tree containing n nodes is $O(\log{n})$. If all the work during find, insert, delete is done on some path in the tree, then the worst-case complexity of insert, find, and delete would be $O(\log{n})$. Read the book by Weiss for the correctness and the analysis of the operations below.

Given the abstract data type Tree. The height of any node can be computed using the recurrence below. Given a tree, we can now check if it obeys the height balance condition.

data Tree a = Nil | Node a (Tree a) (Tree a)
        deriving (Eq,Ord,Show,Read)

height :: (Ord a, Num a) => Tree a -> a
height Nil = -1
height (Node k l r) = 1 + (max (height l) (height r))

balanced :: (Ord a, Num a) => Tree a -> Bool
balanced Nil = True
balanced  (Node k l r) | not (balanced l) = False
                       | not (balanced r) = False
                       | abs ((height l) - (height r)) > 1 = False
                       | otherwise = True

-- balanced (Node 3 (Node 2 (Node 1 Nil Nil) Nil) Nil)

Function balanced above checks if the right and the left subtrees of the root are balanced. If either of the subtrees is not height balanced then return false. If both the subtrees are balanced then the only node for which the height balance condition needs to be verified is the root node, and this verified in case 3, by computing the heights of the left and the right subtrees using the function height. If the root node is not height balanced then return false. If both the subtrees are balanced and the root is height balanced as well then return true.

Insert

Insertion works as in the case of binary search trees. The node is inserted in a (unique) place in the tree such that the order property is satisfied by the resulting tree. However, the resulting tree may not satisfy the height balance condition in which case it is to be restructured. The restructuring happens in two ways. Let us see the first case called the single rotation.

single rotation

Consider inserting elements 1,2,3,4,5,6,7 in order starting with an empty AVL tree. The sequence of first three insertions is shown below. The tree obtained after inserting 1,2,3 is not an AVL tree because the height balance condition is not obeyed at node 1. We fix the height balance condition by rotation the tree around node 3 as shown to the right. After insertion of 4,5, another height imbalance occurs at node 3. This is fixed by rotating the subtree rooted at 3, around node 4. After insertion of 6, the root does not satisfy the height balance condition. Now a single rotation is performed at node 4.

(1)    (1)        (1)              (2)     (2)
         \          \     (SR)     / \     / \
         (2)        (2)   ----   (1) (3) (1) (3)        (SR 3)
                      \                        \        ----
                      (3)                      (4)
                                                 \
                                                 (5)


      (2)              (2)                      (4)
      / \              / \      (SR 2)          / \
    (1) (4)          (1) (4)    _____         (2) (5)
        / \              / \                  / \   \
      (3)  (5)         (3)  (5)             (1) (3)  (6)
                             \
                             (6)

In general, single rotation is performed as follows. Let x be the deepest node (the node farthest away from the root) which does not satisfy the balance condition after insertion has happened in a subtree of x. Suppose the insertion happened in the left (right) subtree of the left (right) child of x. Let y be the child of x, and z be the child of y. Note that both y, z is not Nil. The sub tree rooted at x is shown below on the left. T1 is the left subtree of x. T2 is left sub tree of y. T3 is the left subtree of z, and T4 is the right subtree of z. Rotation around y, make x the left child of y, and z the right child of y. T1, T3, T4 are all connected to their original parents. However, T2 cannot be connected to y as its left subtree (x occupies that position). T2 contains all value that are larger than x, and smaller than y. Therefore T2 can be made the right child of x in the tree on the right. Note that this satisfies the order property. Let us see why this satisfies the height balance condition at node x. By assumption, insertion happened in T3 or T4. Before insertion the tree was balanced. Therefore the height of T2 is at one less than the height of the larger of T3 or T4 after insertion. The height of z has not changed after rotation. Height of y has not changed after rotation, longest past from y to leaf is still in T3 or T4. The height of x has reduced by 1. Hence the tree obtained after rotation obeys the balance condition at node y. The height of all the nodes on the path from x to the root of the tree on the left has also been reduced by 1, in the tree on the right. Therefore they also satisfy the balance condition. The height of the nodes which do not lie on the path from the root to the inserted node is not affected by the insertion. Therefore all the nodes satisfy the height balance condition.

 (x)                                  (y)
 / \             (SR y)               / \
T1  (y)          -----               (x) (z)
    /\                              /\   /\
   T2 (z)                         T1 T2 T3 T4
      / \
     T3 T4

double rotation

The second type of rotation occurs when the insertion happens in the right (left) subtree of the left (right) child of the deepest node which does not satisfy the height balance condition. Again let x, y, z be as defined above. Recall that x is the deepest node which does not satisfy the height balance condition. We consider the case when y is the left child of x, and z is the right child of y. The other case is symmetric. The insertion occurred in T2 or T3. The height of T1, and larger of T2 or T3 differ by at most 1, as y is balanced. Height of y and root of T4 differ by more than 1. The rotation reduces the height of y. First, let us verify the order property for the rotated tree. y < z < x, given their positions in the tree to the left. T1 contains values smaller than y, T4 contains values larger than x. T2 contains values larger than y, and smaller than z, therefore it can be the right child of y on the right. Similarly, T3 contains values larger than z, and smaller than x, therefore T3 can be the left child of x on the right. At the start, the height of y and root of T4 differ by two. After rotation height of y has reduced by 1, as z is no longer on the longest path. The height of x has possibly reduced by 2, y, z are not on the longest path possibly. The height of z has increased by 1. After rotation height of y, x differ by at most 1.

     x                                 z
    /\                                 /\
   /  \                               /  \
  y    T4                            y    x
 / \            DR (LR)             /\    /\
T1  \            -----             /  \  /  \
    z                             T1  T2 T3 T4
   / \
  T2  T3

The Haskell code below implements insertion. The first step is to insert, as we do in the case of binary search trees. The resulting tree is then fixed using the function rotate. Rotate is first called on the subtrees in case the subtrees are not balanced. Note that we can check whether the subtree is balanced using the function balanced. If both the subtrees are balanced, then we check whether the root node obeys the height balance condition. If the balance condition is not obeyed at the root, then we check which of the four cases arise (LR/RL/LL/RR). The imbalance is fixed using either a single or a double rotation. The resulting sub tree (rotated) is patched together to obtain the complete tree. Which is passed back as the result. ins function first inserts the value in the tree, and then calls rotate to fix the height balance condition. In the worst case a rotation (either SR or DR) is performed to fix the height balance condition. The work done is proportional to the length of the longest path in the AVL tree.

left :: Tree a -> Tree a
left Nil = Nil
left (Node n l r) = l

right :: Tree a -> Tree a
right Nil = Nil
right (Node n l r) = r

value :: (Ord a, Num a) => Tree a -> a
value Nil = 0
value (Node n l r) = n


ins :: (Num a, Ord a) => Tree a -> a -> Tree a
ins Nil a = Node a Nil Nil
ins (Node b l r) k
    | b < k = rotate ((Node b l (ins r k)))
    | otherwise = rotate (Node b (ins l k) r)

rotate :: (Ord a, Num a) => Tree a -> Tree a
rotate Nil = Nil
rotate (Node n l r) | not (balanced l) = Node n (rotate l) r
                | not (balanced r) = Node n l (rotate r)
                | (height l) + 1 < (height r) &&    -- SR RR
                  (height (left r))  < (height (right r)) =
                      Node (value r) (Node n l (left r)) (right r)
                | (height r) + 1 < (height l) &&  -- SR LL
                  (height (right l))  < (height (left l)) =
                      Node (value l) (left l) (Node n (right l) r)
                | (height l) + 1 < (height r) && -- DR RL
                  (height (left r))  > (height (right r)) =
                     Node (value (left r)) (Node n l (left (left r)))
                       (Node (value r) (right (left r)) (right r))
                | (height r) + 1 < (height l) && -- DR LR
                  (height (right l))  > (height (left l)) =
                     Node (value (right l)) (Node (value l) (left l) (left (right l)))
                       (Node n (right (right l)) r)
                | otherwise = Node n l r

buildTree :: (Num a, Ord a) => [a] -> Tree a
buildTree [] = Nil
buildTree (x:xs) = foldl ins Nil (x:xs)

-- The result of successively inserting elements 1..7
-- buildTree [1..7]
-- Node 4 (Node 2 (Node 1 Nil Nil) (Node 3 Nil Nil)) (Node 6 (Node 5 Nil Nil) (Node 7 Nil Nil))

Note the use of foldl in the construction of the AVL tree.

find

Find is implemented as in the binary search trees. Find takes two arguments, a tree and a key, and returns the Node that contains the key. The base case states that if the tree is Nil then the search returns Nil. If the key is stored in the current Node then return the current Node. If the key is smaller then the key at the node then search in the left subtree else search in the right subtree.

find :: (Eq a, Ord a) => Tree a -> a -> Tree a
find Nil k = Nil
find  (Node b l r) k
    | b == k = Node b l r    
    | b < k = find l k     
    | otherwise = find r k 

delete

Leaving this as an exercise. Implement the deletion as in BSTs. Then, walk up to the root, fixing the imbalance at every node using the rotations above. Multiple rotations might be needed.

The source code is available here. Try it with > ghci avl.hs at the nix prompt.

Related