This assignment requires you to compare two ways of storing and retrieving data records. In order to do the comparison, you must write a program which stores data records in
§ a linked list and
§ a hash table with direct chaining as collision resolution strategy.
The data to be used for this assignment is the story "A Christmas Carol" by Charles Dickens. You can download it and save it as a text file. Your program must take the file name as an argument to main(), e.g. if your program is called assignment2.c and the text file is called christmas_carol.txt, you would call your program using assignment2 christmas_carol.txt.
When your program has read the input text file, it is expected to answer queries via a user interface. For a query, the user types in a keyword. To answer a query, your program prints the number of occurrences of the given keyword in the input text file (and possibly some other statistics). As an example, we use the following text (named handbook.txt), which is part of the current CS Handbook p. 23.
Coursework has to be done during a specified time - this means that
you make the best use of your time and we are able to monitor your progress and
make available the equipment, lab time, etc, that you'll need to do the work.
Running your program could result in the following trace:
Assignment> assignment2 handbook.txt
Enter word for retrieval: time
'time' occurs 3 times.
Retrieval from list took 9 comparisons.
Retrieval from table took 3 comparisons.
Another query? [y/n] y
Enter word for retrieval: your
'your' occurs 2 times.
Retrieval from list took 19 comparisons.
Retrieval from table took 2 comparisons.
Another query? [y/n] n
Assignment>
Program Design
Choose a systematic approach when designing your program: Define the data structures that you need. Define the functions that you need to operate on these data structures. Re-use types and functions wherever possible. If a problem is too difficult to solve in one go, divide and conquer. A good starting point is to first develop a program that uses the linked list approach. Several functions of this code can be re-used when extending the program to use a hash table. (Note: You must hand in ONE program only; not one for the linked list and another one for the hash table!) A good solution will have at least 6 functions in addition to the main program:
§ A function that reads the data.
§ A function that inserts data into the linked list.
§ A function that retrieves data from the linked list.
§ A function that inserts data into the hash table.
§ A function that retrieves data from the hash table.
§ A function that handles the queries.
Choose the hash function and the size of the hash table carefully. In your program comments, give 3 examples for possible hash functions and table sizes and justify your final choice. You will have to set some conventions e.g. for dealing with capitalised words, or phrases like "I'll" and "we've". There is not a single right solution. Choose something which seems sensible to you, and motivate it in the comment of your program.