UMBC CMSC202, Computer Science II, Spring 1998,
Sections 0101, 0102, 0103, 0104 and Honors
Tuesday May 5, 1998
Assigned Reading:
- A Book on C:
- Programming Abstractions in C:
Handouts (available on-line):
Topics Covered:
The two main topics for this lecture are: reference parameters and hash
tables.
- There are two ways to pass a parameter to a function: by value and
by reference. (These are not the only two parameter passing schemes
--- e.g. there is parameter passing by name in ALGOL --- but these are
the two we will discuss.) In C, parameter passing by reference is
simulated by passing a pointer to the variable. The pointer itself is
passed by value, so technically C does not have a mechanism for
parameter passing by reference. C++ adds this feature.
- In C, there are 2 reasons for wanting to pass a parameter by
reference:
- We want the function to be able to modify the value of the
parameter. (This should be avoided if possible.)
- We want to pass a parameter without copying. For example,
when we pass a large array to a function, we usually do not want to
make a new copy of the array because this can take a long time and
uses lots of memory.
In C++, we have an additional reason to want parameter passing by
reference. If we have an object which includes dynamically allocated
members and we pass this object by reference, then the parameter is not
automatically destroyed at the end of the function call. This saves us
a headache with destructors. (See the previous lecture on the dangers of
destructors.)
- In C++, a reference parameter x of type T, would
appear in the function header as T& x. The use of the
& symbol is not consistent with the rest of the C/C++ language, so
try to ignore the fact that & is also used as the address of
operator. A consistent interpretation of the declaration T&
x would be that the address of x has type T.
This is not the correct interpretation. The correct (but
inconsistent) interpretation of T& x is simply that
x is a reference parameter of type T. Don't try to make
sense out of this! It doesn't!
-
For example, we can write a program
that avoids the dangers of destructors, by passing the objects by
reference. In this program, we have a function foo:
char *foo(Record &T) {
printf("\nIdentify T: ") ;
T.id() ;
return strdup(T.str) ;
}
The class Record is defined in the header file record2.h. Here foo has a
reference parameter T. Note that the type of T is
Record not pointer to Record. So, when we use T to invoke the
id() member function, we use T.id() and not
T->id(). The sample run
shows that the parameter T is not destroyed after the function
terminates and in fact, the address of T is identical to the
address of S, the actual parameter.
- Next, we look at references more closely in a series of examples.
- In the first program, we
note the use of reference variables. After the following declaration:
int x = 3, y = 9 ;
int &ref = x ; // initialize reference
the variable ref becomes an alias for the variable
x. In the rest of the program, using the variable
ref is equivalent to the variable x. That is,
assigning values to ref has the same effect as assigning
values to x. Also, ref can be used as a normal
int variable, because it is a reference and not a pointer.
The we can only associate a reference variable with another variable
by initializing its value, as shown above for ref. After the
initialization, we cannot make ref an alias of a different
variable, say y --- again, because a reference is not
a pointer. Assigning, y to ref simply assigns the
value stored in y to x, as the
sample run demonstrates.
- Our second program
demonstrates a simple use of a reference parameter. In the function
add3():
void add3(int &a) {
a = a + 3 ;
}
the parameter a is a reference parameter. Thus, changing
the value of a in the function, changes the value of the
actual parameter in the main program. Note that the syntax for using
a is the same as an int variable, because (beating a
dead horse) references are NOT pointers. Also, note that from the main
program, the add3 function is called using the syntax:
add3(x) ;
In C, you would expect that such a function call cannot change the
value of x. In C++, because of reference parameters, you have
to look at the function prototype to determine whether a function can
modify the values of the actual parameters.
(See sample run.)
- Our third program shows
that you can write the familiar "swap" function using reference
parameters. See sample run.
- Our fourth program shows
the use of references as a return value. The function call
max(x,y) returns a reference to either x or
y, whichever is bigger. Thus, the statement
max(x,y) = 2 ;
assigns 2 to x or y depending on which one currently
holds the greater value. (See sample
run.) Use of reference return values is uncommon, but is necessary
in some situations. Thus, you should be aware of them for your
C++ reading skills.
- Recall that we want reference parameters for the two reasons
listed above: changing the value of the actual parameter and avoiding
copying. Our fifth program shows
that we can avoid copying without letting the function change the value
of the actual parameter. This is accomplished by designating the
parameters as const reference parameters. In this example,
the function add() tries to modify the value of the reference
parameter a. The sample
run shows the error message given by the compiler for trying to
modify the value of a const parameter.
- In the sixth program, we
show another benefit of using constant reference parameters. This time
the add() function has the prototype:
int add(int &a, const int &b) ;
Since b is a constant reference parameter, we can pass a real
constant 6 by reference in the call:
z = add(x, 6) ;
This is convenient in many situations.
See sample run.
- Using reference parameters and the const designator,
we changed our linked list interface. This particular implementation
lets us eliminate one layer of pointers when we want lists of records.
See:
- Hash Tables: we have looked at a few data structures:
arrays, linked-lists and binary search trees. Each of these have
advantages and disadvantages. In a sorted array, we can use binary
search to find an item in O(log n) time. However, inserting into a
sorted array takes O(n) time (linear time). In an unsorted linked
list, search takes linear time, but insertion can be done in constant
time (just insert at the front or the back). Using a sorted linked list
increases the time it takes to insert an item without making a big
improvement in the search time, since both operations now take linear
time. A binary search tree allows you to insert, delete and search in
O(log n) time. A hash table allows you to insert, delete and search in
constant time on average. So, if the only operations you need to
support are insert, delete and search, a hash table offers many
advantages.
- An example: suppose that you are the UMBC registrar and you want
to store and retrieve student records based upon the student's social
security number (ssn). There is an easy way to this quickly, simply
create a huge array of records indexed from 0 to 999,999,999. To
retrieve a student's record simply use his/her social security number
as the index. The only disadvantage of this method is that it uses too
much memory. As an alternative, we can use just the last 4 digits of a
student's social security number. Then we would only need 10,000
entries. The disadvantage here is that there are more than 10,000
students at UMBC, so many students would have to use the same index.
To solve this problem, we keep a linked list at each entry. For
example, if two students have social security numbers that end in 6666,
then the 6666 entry of the table is a linked list with the two
students' records.
- We have here are the main ideas of a hash table. The hash table
is an array of linked lists. The key used for hashing is the student's
ssn. The hash function takes the key and transforms it into a legal
index value for the hash table. In this example, the hash function
simply takes the ssn and removes the first 5 digits. Ideally, a hash
function would evenly distribute the keys in the hash table. That way,
each linked list in the hash table would be relatively short. When two
keys hash to the same index value, the situation is called a
collision. With 12,000 students and an ideal hash function,
each linked list in the hash table would only have 1 or 2 elements.
Thus, searching, inserting and deleting from this hash table would take
constant time.
- So, is taking the last 4 digits of the ssn a good hash function? It
is theoretically possible that next every entering freshman has the same
last 4 digits in their ssn. Then, our hash table would simply be an
unsorted linked list and the performance of search would be poor.
However, our experiences with ssn's tells us that the chances of this
happening is small. The design of a good hash table depends on having a
good hash function. There are schemes for picking provably good hash
functions which would be discussed in an algorithms class, not here.
-
One disadvantage of using the last 4 digits of a ssn as a hash function
is that we are not able to control the size of our hash table very well.
If UMBC's enrollment increased to 20,000, our only choice is to use 5
digits of the ssn and have a table of size 100,000. Another hash
function we can use is to take the ssn and take its remainder modulo some
prime number N. That would leave us with a value between 0 and N-1. If
we have a hash table with N entries, then this value can be used directly
as the index into the hash table.
- Using our latest version of the List ADT, we implement a hash
table as an array of linked lists. (See the header file hash.h.) We will complete our discussion
of this implementation in the next
lecture.
Last Modified:
22 Jul 2024 11:27:43 EDT
by
Richard Chang
Back up
to Spring 1998 CMSC 202 Section Homepage