Overthinking Binary Search Trees

Posted on October 19, 2022 by Michael Keane Galloway

Many years ago I bombed an interview at Amazon. I had some things going on in my professional life that I was emotionally struggling with at the time. I may someday spill some digital ink on that to examine what had me so stressed out when I went for this Amazon interview, but for now, I’ll leave it at I wasn’t at a good place going into the interview. Once there, I blanked on something that I knew would be something that could happen but didn’t practice on: iterating over Binary Search Trees (BST).

I for the life of me couldn’t remember how to do an in order traversal of a BST without recursion. And when you’re interviewing with Amazon, they do not want to see the recursive solution. It’s not a robust solution because on large problems it will blow up the stack and cause crashes. They want to see code that could reasonably go into production, so you need to make sure to iterate over the BST via a loop.

In the aftermath of this interview, I wrote the code in this repository to practice and serve as reference for interview preparation in the future. I started out in the original repository doing the recursive solution and then refactored into the loop based solution. For example, I started with code like the following:

template<typename Item>
void PrintInOrderRecursive(BinarySearchTreeNode<Item>* node) {
    if (node == 0) {
        return;
    }

    PrintInOrderRecursive(node->getLeft());
    cout << node->getData() << endl;
    PrintInOrderRecursive(node->getRight());
}

This takes a node for the BST, checks to see if the node itself is null (the base case), and recursively calls the function on the left child. After the left subtree has been processed the program works its way back up the stack to process the root, and then recursively processes the right subtree. This was the basic approach that I took in my interview that put me on my initial bad foot. In my practice repository, I then refactored into something like the following code sample:

template<typename Item>
void PrintInOrderRecursive(BinarySearchTreeNode<Item>* node) {
    if (node == 0) {
        return;
    }

    vector<BinarySearchTreeNode<Item>*> stack;
    bool rollback = false;

    stack.push_back(node);

    while(!stack.empty()) {
         BinarySearchTreeNode<Item>* localCurrent = this->stack.back();
         
         if (localCurrent == 0) {
            this->stack.pop_back();
            rollback = true;
         }

         if (!rollback) {
            this->stack.push_back(localCurrent->getLeft());
         }
         else {
            cout << node->getData() << endl;
            this->stack.pop_back();
            this->stack.push_back(localCurrent->getRight());
            rollback = false;
         }
    }
}

This approach mimics the use of the call stack in the recursive version by using a vector of node pointers as a stack. That way the program is then only limited by heap space, which is much more generous than the program execution stack. After implementing this in my practice code, I decided to push a bit further, and wrote an iterator.

Actually, I did multiple iterators and a base iterator class. I was after all working on practicing all three of the BST traversal schemes that could come up in an interview. As I was working on this, I realized I could refactor and share a lot of the same code between in order and pre-order traversal.

template<typename Item>
class BSTBaseIterator : public Iterator<Item> {
   public:
      BSTBaseIterator(BinarySearchTreeNode<Item>* root) {
         current = 0;
      }

      virtual void First() {
         while (current == 0 && !IsDone()) {
            BinarySearchTreeNode<Item>* temp = _innerLoop();
            if (temp != 0) {
               current = temp;
            }
         }
      }

      virtual void Next() {
         BinarySearchTreeNode<Item>* newCurrent = 0;

         while (newCurrent == 0 && !IsDone()) {
            BinarySearchTreeNode<Item>* temp = _innerLoop();
            if (temp != 0) {
               newCurrent = temp;
            }
         }

         current = newCurrent;
      }

      virtual bool IsDone() {
         return stack.empty();
      }

      virtual Item CurrentItem() const {
         return current->getData();
      }

      virtual BinarySearchTreeNode<Item>* CurrentNode() const {
         return current;
      }

   protected:
      virtual BinarySearchTreeNode<Item>* _innerLoop() = 0;
      std::vector<BinarySearchTreeNode<Item>*> stack;
      BinarySearchTreeNode<Item>* current;
};

This base class handles a lot of the stack management for the different traversal strategies, and provides a protected _innerLoop to allow child classes implement the logic specific to the traversal strategy. I then implemented in order traversal using this base iterator as follows:

template<typename Item>
class BSTInOrderIterator : public BSTBaseIterator<Item> {
   public:
      BSTInOrderIterator(BinarySearchTreeNode<Item>* root) : BSTBaseIterator<Item>(root) {
         rollback = false;
         if (root != 0) {
            this->stack.push_back(root);
         }
      }

   protected:
      BinarySearchTreeNode<Item>* _innerLoop() {
         BinarySearchTreeNode<Item>* result = 0;
         BinarySearchTreeNode<Item>* localCurrent = this->stack.back();
         
         if (localCurrent == 0) {
            this->stack.pop_back();
            rollback = true;
            return result;
         }

         if (!rollback) {
            this->stack.push_back(localCurrent->getLeft());
         }
         else {
            result = localCurrent;
            this->stack.pop_back();
            this->stack.push_back(localCurrent->getRight());
            rollback = false;
         }

         return result;
      }

   private:
      bool rollback;
};

The BSTInOrderIterator basically just implements the _innerLoop much like the inside of the while loop in my previous in order traversal example. Then once you have this interface implemented you can solve problems like the kth smallest integer in a BST like the following code:

BinarySearchTreeNode<T>* kthSmallest(int k) {
   BinarySearchTreeNode<T>* result = 0;
   BSTInOrderIterator<T> iterator(root);
   int count = 0;

   for(iterator.First(); !iterator.IsDone(); iterator.Next(), ++count) {
      if (count == k) {
         result = iterator.CurrentNode();
         break;
      }
   }

   return result;
}

Since we need the kth smallest item in the BST, we can just do an in order traversal and return the node when we count off k elements. With the iterator implementation, this becomes a simple for loop.

After doing all of this work, this code has mostly sat around in a git repository that I use for notes and interview preparation. I thought since I was starting to write this blog, I could probably publish the code alone and share my implementation and thought process in case it might be useful to someone.