UMBC CMSC202, Computer Science II, Spring 1998, Sections 0101, 0102, 0103, 0104 and Honors

Tuesday May 5, 1998

Assigned Reading:

A Book on C:
Programming Abstractions in C:

Handouts (available on-line):

Topics Covered: The two main topics for this lecture are: reference parameters and hash tables.

There are two ways to pass a parameter to a function: by value and by reference. (These are not the only two parameter passing schemes --- e.g. there is parameter passing by name in ALGOL --- but these are the two we will discuss.) In C, parameter passing by reference is simulated by passing a pointer to the variable. The pointer itself is passed by value, so technically C does not have a mechanism for parameter passing by reference. C++ adds this feature.
In C, there are 2 reasons for wanting to pass a parameter by reference:
1. We want the function to be able to modify the value of the parameter. (This should be avoided if possible.)
2. We want to pass a parameter without copying. For example, when we pass a large array to a function, we usually do not want to make a new copy of the array because this can take a long time and uses lots of memory.
In C++, we have an additional reason to want parameter passing by reference. If we have an object which includes dynamically allocated members and we pass this object by reference, then the parameter is not automatically destroyed at the end of the function call. This saves us a headache with destructors. (See the previous lecture on the dangers of destructors.)
In C++, a reference parameter x of type T, would appear in the function header as T& x. The use of the & symbol is not consistent with the rest of the C/C++ language, so try to ignore the fact that & is also used as the address of operator. A consistent interpretation of the declaration T& x would be that the address of x has type T. This is not the correct interpretation. The correct (but inconsistent) interpretation of T& x is simply that x is a reference parameter of type T. Don't try to make sense out of this! It doesn't!
For example, we can write a program that avoids the dangers of destructors, by passing the objects by reference. In this program, we have a function foo: The class Record is defined in the header file record2.h. Here foo has a reference parameter T. Note that the type of T is Record not pointer to Record. So, when we use T to invoke the id() member function, we use T.id() and not T->id(). The sample run shows that the parameter T is not destroyed after the function terminates and in fact, the address of T is identical to the address of S, the actual parameter.
Next, we look at references more closely in a series of examples.
- In the first program, we note the use of reference variables. After the following declaration: the variable ref becomes an alias for the variable x. In the rest of the program, using the variable ref is equivalent to the variable x. That is, assigning values to ref has the same effect as assigning values to x. Also, ref can be used as a normal int variable, because it is a reference and not a pointer. The we can only associate a reference variable with another variable by initializing its value, as shown above for ref. After the initialization, we cannot make ref an alias of a different variable, say y --- again, because a reference is not a pointer. Assigning, y to ref simply assigns the value stored in y to x, as the sample run demonstrates.
- Our second program demonstrates a simple use of a reference parameter. In the function add3(): the parameter a is a reference parameter. Thus, changing the value of a in the function, changes the value of the actual parameter in the main program. Note that the syntax for using a is the same as an int variable, because (beating a dead horse) references are NOT pointers. Also, note that from the main program, the add3 function is called using the syntax: In C, you would expect that such a function call cannot change the value of x. In C++, because of reference parameters, you have to look at the function prototype to determine whether a function can modify the values of the actual parameters. (See sample run.)
- Our third program shows that you can write the familiar "swap" function using reference parameters. See sample run.
- Our fourth program shows the use of references as a return value. The function call max(x,y) returns a reference to either x or y, whichever is bigger. Thus, the statement assigns 2 to x or y depending on which one currently holds the greater value. (See sample run.) Use of reference return values is uncommon, but is necessary in some situations. Thus, you should be aware of them for your C++ reading skills.
- Recall that we want reference parameters for the two reasons listed above: changing the value of the actual parameter and avoiding copying. Our fifth program shows that we can avoid copying without letting the function change the value of the actual parameter. This is accomplished by designating the parameters as const reference parameters. In this example, the function add() tries to modify the value of the reference parameter a. The sample run shows the error message given by the compiler for trying to modify the value of a const parameter.
- In the sixth program, we show another benefit of using constant reference parameters. This time the add() function has the prototype: Since b is a constant reference parameter, we can pass a real constant 6 by reference in the call: This is convenient in many situations. See sample run.
Using reference parameters and the const designator, we changed our linked list interface. This particular implementation lets us eliminate one layer of pointers when we want lists of records. See:
- Header file list6.h and implementation file list6.C.
- Implementation of ListItem as strings: header file and implementation.
- Implementation of ListItem as a student record: header file and implementation.
- main program and sample run using a list of strings.
- main program and sample run using a list of student records.
Hash Tables: we have looked at a few data structures: arrays, linked-lists and binary search trees. Each of these have advantages and disadvantages. In a sorted array, we can use binary search to find an item in O(log n) time. However, inserting into a sorted array takes O(n) time (linear time). In an unsorted linked list, search takes linear time, but insertion can be done in constant time (just insert at the front or the back). Using a sorted linked list increases the time it takes to insert an item without making a big improvement in the search time, since both operations now take linear time. A binary search tree allows you to insert, delete and search in O(log n) time. A hash table allows you to insert, delete and search in constant time on average. So, if the only operations you need to support are insert, delete and search, a hash table offers many advantages.
An example: suppose that you are the UMBC registrar and you want to store and retrieve student records based upon the student's social security number (ssn). There is an easy way to this quickly, simply create a huge array of records indexed from 0 to 999,999,999. To retrieve a student's record simply use his/her social security number as the index. The only disadvantage of this method is that it uses too much memory. As an alternative, we can use just the last 4 digits of a student's social security number. Then we would only need 10,000 entries. The disadvantage here is that there are more than 10,000 students at UMBC, so many students would have to use the same index. To solve this problem, we keep a linked list at each entry. For example, if two students have social security numbers that end in 6666, then the 6666 entry of the table is a linked list with the two students' records.
We have here are the main ideas of a hash table. The hash table is an array of linked lists. The key used for hashing is the student's ssn. The hash function takes the key and transforms it into a legal index value for the hash table. In this example, the hash function simply takes the ssn and removes the first 5 digits. Ideally, a hash function would evenly distribute the keys in the hash table. That way, each linked list in the hash table would be relatively short. When two keys hash to the same index value, the situation is called a collision. With 12,000 students and an ideal hash function, each linked list in the hash table would only have 1 or 2 elements. Thus, searching, inserting and deleting from this hash table would take constant time.
So, is taking the last 4 digits of the ssn a good hash function? It is theoretically possible that next every entering freshman has the same last 4 digits in their ssn. Then, our hash table would simply be an unsorted linked list and the performance of search would be poor. However, our experiences with ssn's tells us that the chances of this happening is small. The design of a good hash table depends on having a good hash function. There are schemes for picking provably good hash functions which would be discussed in an algorithms class, not here.
One disadvantage of using the last 4 digits of a ssn as a hash function is that we are not able to control the size of our hash table very well. If UMBC's enrollment increased to 20,000, our only choice is to use 5 digits of the ssn and have a table of size 100,000. Another hash function we can use is to take the ssn and take its remainder modulo some prime number N. That would leave us with a value between 0 and N-1. If we have a hash table with N entries, then this value can be used directly as the index into the hash table.
Using our latest version of the List ADT, we implement a hash table as an array of linked lists. (See the header file hash.h.) We will complete our discussion of this implementation in the next lecture.

Last Modified: 22 Jul 2024 11:27:43 EDT by Richard Chang

Back up to Spring 1998 CMSC 202 Section Homepage