Note: binary search trees were discussed in class. If you missed this explanation, please look up binary search trees in a data structures textbook or online.
Note: You can download all of the files referenced in this page as a tar file (proj8.tar) or copy them on the GL file system:
For this assignment, you are provided with a basic implementation of binary search trees: bst.c, bst.h. You will modify and add features to this binary search tree ADT.
These demo main programs show that the basic implementation only work with int data:
Step 1: Warm-up exercise
Implement a function bst_walk_depth() with this prototype:
Keep a copy of this implementation for submission.
Step 2: expand tnode
In this step you will modify the type definition of tnode to include a string (char pointer) and an int field. The purpose of these field is to store baby names in the string and the frequency that these names occur in the US population in the int field. This data structure will help new parents pick baby names.
In the new header file, bst2.h, the data fields are deliberately renamed to name and frequency so that if you forget to modify a function in bst.c, your program will not compile.
When you make a new tnode the string in the name field MUST have its own dynamically allocated memory. It cannot simply point to the string given to bst_insert(). (Hint: use strdup().)
Modify the existing BST functions to work with the new tnode definition. The nodes in this new BST ADT should be sorted by the frequency field (NOT by the name field!!!).
You can use the following main program to test your implementation (comment out the call to bst_find_by_name()):
Step 3: implement two new functions
In this step you will implement two new functions:
The function bst_find_by_rank() finds a name by popularity. The most popular name has rank 1. The least popular name has the biggest rank. Your implementation of this function must make efficient use of the size field of tnode. This is the number of nodes in the subtree rooted at the current tnode. For example, if your asked to find a name with rank 15, you can first check the size of the right subtree (if it exists). Suppose the size of the right subtree is 25, then you can just recursively find the node with rank 15 in the right subtree. If the size of the right subtree is 14, then the current node is the node with rank 15. On the other hand, if the right subtree is small, say it has size 5, then you should find a node with rank 9 (= 15 − 5 − 1) in the left subtree.
On the other hand, the bst_find_by_name() function cannot be implemented efficiently since the data structure is sorted by frequency rather than by name. To find a tnode by name, you simply traverse the BST and check if there is a tnode that matches the given name. (This should be a pre-order walk: visit current node first, then left subtree then right subtree. You can shortcut and return when a match is found.)
For both bst_find_by_rank() and bst_find_by_name(), the return value should be NULL if no matching node is found.
Use the following main programs to check your implementation:
The last two programs require a data file like this one: names1990s.txt. This data was obtained from the Social Security Administration and contains the frequency of girls names on social security card applications from the 1990's.
Step 4: run with valgrind
Test your programs using the valgrind utility. You may not have valgrind installed on your own machine, so this step may require you to run your programs on GL. The syntax for invoking valgrind is very simple:
If valgrind detects an error, you definitely have a bug. On the other hand, a clean report from valgrind does not necessarily mean your program is bug-free since a bug might show up using different data for input.
Similarly, when we wanted to print out all the names from rank 25 to rank 30, we made separate calls to bst_find_by_rank(). This is inefficient and can be replaced by a single walk in the binary search tree from the rank 25 node to the rank 30 node.
For 10 points extra credit, implement a function bst_print_ranks() with the following prototype:
Write and submit a test program that demonstrates your extra credit implementation.
Use the UNIX submit command on the GL system to turn in your project.
You should submit a separate file for the basic implementation in Step 1 and another file for the implementation of the BST functions for the new tnode. Call these bst.c and bst2.c.
Record your self running the sample main programs using the script utility. It will be convenient to have separate typescript files, so number these sequentially as: typescript1, typescript2, typescript3, ...
As usual, grading will be done using different main programs, so you will want to create your own test programs. You may submit these if you wish. Please call them: test1.c, test2.c, test3.c, ...