Friday, 20 March 2015

Inserting and Deleting Nodes in a Binary Search Tree

Something I didn't mention in any of my posts on the non built-in ADT's like Trees, LinkedLists, and BST's was how much of a pain it can be to remove or insert the nodes in these structures. This topic is of interest to me, not because of how my SLOG marks are on the line they are useful for solving homework problems, but because of how they lead to a deeper understanding and appreciation of the ADT in question.

So, lets first look at an ordinary Tree. As I've said before, a tree is just a bunch of nodes- each of which has two attributes: "data", which is the value it stores which can be used for identification, and "children"- which is some sort of collection of nodes which can be accessed directly from said node. So, based on the implementation of a Tree we did all the way back in the week 6 tutorial, to be the child of a node in that tree would be to be an element of the list "children", which is an attribute of that node object. As a result, a parent node can access it's children nodes directly. However, the child node cannot point back to the parent, or a cycle will be formed, which would violate the definition of a tree. This tells us something extremely important about the internal structure of trees. In a profound sense, there is no one place that the relation between a child and its parent node is defined. It is a relation which relies on the way how the  two nodes interact with each other.

Now, when we add or remove a node from a tree, we run the risk of altering delicate relations like these. And in order to avoid doing this, we need to understand what constitutes this relation in the first place, especially in respect to the specific implementation of the ADT we're using. In the case of Trees, we need to make sure that the definition of a tree is preserved (each node has only one parent excluding the root, there's a direct path from the node back to the root, none of the nodes form a cycle, etc). This isn't fun, but its not impossible. For instance, to add a node, we just need to add them as leaves- that way, we don't need to worry about accidentally creating a cycle. We also need to set it's "children" attribute to something which signifies that it is a leaf (such as an empty list). Depending on the implementation, we may also need to update certain attributes dependent on the structure as a whole, such as the branching factor or height. Although linked-lists may appear easier to alter due to their constant branching factor of one, they often contain these types of attributes, such as length.

With BST's, a whole other problem comes up. Due to it's ability to utilize a binary search, the tree must also be "balanced". In a nut shell, the notion of balance is the idea that, for any node in the BST, the height and number of nodes in the sub-tree rooted by its left child is equal to that of the sub-tree rooted in its right child. Balance is very hard to preserve since, for starters, its very hard to even define to begin with. It's a relationship which isn't stored in any one node, or even a large collection of them, but in every single node at some level of analysis. The reason why we want balance is simple- if we are looking for an element in a BST, in the worst case, we want to always have 2 options available to us- the ability to go left or right. If any node in a BST lacks, say, a left sub-tree, than if we are trans-versing the tree, we can only go right (assuming, of course, that the element hasn't already been found and there are still children). In this case, we don't actually reduce our options. Rather than halving the problem space, as we aim to for each node in a BST, we just go through it sequentially. And although this may not seem like a big deal, just keep in mind that some BST's contain thousands of nodes. So if your tree has a lot of nodes with only one path instead of two, and those nodes are traversed, than for those periods, it will be as if you are using a simple linear search. And that could be really inefficient.

So, when inserting and deleting nodes from a BST, we must try and preserve this sense of balance. If we remove a node, we want to find a way to replace it with another node directly below it. And as mentioned earlier, when we insert a new node, we want to attach it as a leaf. That way, it doesn't take over the place of an entire sub-tree, or unnecessarily elongate a given path in it. And of course, all of these make sure the Binary search tree retains it's ability to do binary search

No comments:

Post a Comment