You are required to write a Java program that reads a text file and compresses itusing the algorithm described below. The compression ratio of a compression algorithm is defined asthe size of the orig
You are required to write a Java program that reads a text file and compresses itusing the algorithm described below. The compression ratio of a compression algorithm is defined asthe size of the orig
You are required to write a Java program that reads a text file and compresses it
using the algorithm described below. The compression ratio of a compression algorithm is defined as
the size of the original file divided by the size of the compressed one. The compression ratio achieved
by your program will depend on the input file.
You are also required to write a Java program to decompress a file compressed by the above
algorithm to recover the original file.
3 The Compression Algorithm
Your compression program will receive as input a text file to compress and a second file containing
a set of compression codes for the characters that might appear in the input file. The content of the
second file will be read by your program and stored in a list of codes. Each entry of this list will
store two things: a character and a compression code.
The compression algorithm is very simple. It reads the characters of the text input file one at
a time, looks up the next character in the list of codes, and writes into the compressed file the
corresponding compression code. If a character of the input file is not in the list of codes, the
algorithm prints an appropriate message and terminates. For example, suppose that the list of codes
is the following
Character Compression code
’e’ 001
’t’ 1010
’a’ 1000
’W’ 0111101011
and that the input text file contains the following line:
eteaaW
Then the compression algorithm will write to the compressed file the following
0011010001100010000111101011
Each character in a text file is stored in the file as a sequence of 8 bits (or 16 bits depending on
encoding), while each compression code is a sequence with a variable number of bits. Compression is
achieved when characters appearing often in the input file are assigned compression codes that are
shorter than 8 bits. For the above example, the line eteaaW is stored in the text file with 48 bits,
while the compressed version uses only 28 bits.
If the input file, for example, contained the line
fea
Then the compression algorithm will print a message indicating that there is no compression code
for character ’f’ and then it would terminate.
4 Decompression Algorithm
The decompression algorithm is a bit more complicated than the compression one because the compression codes do not have the same length. The algorithm reads the compression file one bit at a
time until the sequence of bits read from it appears in the list of codes; then the associated character
is written to the decompressed file.
For the same example above, the algorithm reads first bit 0 from the compressed file and it looks
it up in the list of codes. As 0 is not the compression code of any character, then the algorithm
reads the next bit from the compressed file, obtaining the sequence 00. This is not a compression
code, so the algorithm reads the third bit and discovers that 001 is the compression code for ’e’, so
’e’ is written to the decompressed file.
The algorithm continues reading bits and looking up the sequence of bits in the list of compression
codes until it discovers that 1010 is the compression code for ’t’, so this character is written to the
decompressed file, and so on.
5 Classed Provided
You can download from the course’s website all the classes described in this section.
5.1 Class TextFile
This class provides methods for reading characters from a text file and for writing characters to a
text file. This class provides the following public methods:
• TextFile (String fileName, String fileMode). Constructor for the class. The first parameter is the name of a text file. The second parameter can have two values "read" or
"write". If fileMode is "read" the text file will be opened in read mode; your compression
algorithm will open the file in this mode to read the characters that will be compressed. If
fileMode is "write" a new file will be created and information can be written into it; your
decompression algorithm will open the file in this mode so the decompressed text can be stored
in it.
• char readChar(). This method reads and returns the next character from the text file. It
returns (char)0 if the end of the file has been reached before invoking the method.
• String readLine(). This method reads characters starting from the current position of the
text file until the end of the line is reached. The string of characters read is returned as a result
of invoking this method. It returns null if the end of the file has been reached before invoking
this method.
2
• String writeChar(char c). This method writes the character c into the text file.
• close(). This method closes the text file. Important note. If you do not close the text file
the information that you have written into it will not be stored in disk, so make it sure that
you close the file once you have finished decompressing a file.
Suppose, for example, that you wish to compress file "text.txt". To open the file you will use
something like this:
TextFile input = new TextFile("text.txt","read");
A character can then be read from the file by using code like this:
nextChar = input.readChar();
where nextChar is a variable of type char. Let file "text.txt" consists of one line:
Sample text
Then nextChar = input.readChar(); will store ’S’ into nextChar. If now you perform
line = input.readLine();
where line is a variable of type String, then line will store the string "ample text".
5.2 Class CompressedFile
This class provides methods for reading bits from a compressed file and for writing bits to a compressed file. This class provides the following public methods.
• CompressedFile(String fileName, String fileMode). Constructor for the class. The first
parameter is the name of the compressed file. The second parameter is as in the constructor
for class TextFile.
• char readBit(). Reads the next bit from the compressed file. Since bits are difficult to
manipulate in Java, to make this assignment easier for you, bits read from the compressed file
are converted to values of type char. Thie method returns (char)0 if the end of the file has
been reached before invoking the method.
• writeBit(char bit). As mentioned above, to make it easier for you in this assignment you
will handle bits as if they were chars. Whenever you wish to write the bit 0 into the compressed
file, you will invoke this method as writeChar(’0’) and whenever you wish to write the bit 1
into the compressed file you will invoke the method as writeBit(’1’);. Note that this class
will convert ’0’ and ’1’ into bits 0 and 1 before they are written into the compressed file.
• close(). This method closes the compressed file. As mentioned above, if the compressed file
is not closed, the information that was written in it will not be stored in disk.
6 List of Compression Codes
As mentioned above your compression and decompression Java programs will receive as parameter
the name of a file storing the compression codes for text characters. You can download this file from
the course’s website. This is a text file where each line is of the form
ccompression code
3
where c is a single character and compression code is a String consisting of characters ’0’ and ’1’.
Notice that there is no space between c and the compression code. For example, the following are
three actual lines from this file
a1000
t1010
e001
So character ’a’ is assigned code ”1000”, ’t’ has code ”1010”, and ’e’ has code ”001”. You will use
class TextFile to read the contents of this file and store the list of compression codes in your program,
as described below.
7 Classes to Implement
You are to implement at least 4 Java classes: CodePair.java, ArrayCode.java, Compress.java,
and Decompress.java. You can implement more classes if you need to. You must write all the
code yourself. You cannot use code from the textbook, the Internet, or any other sources.
7.1 Class CodePair
This class represents an entry in the list of codes, associating a character with a compression code.
For this class, you must implement all and only the following public methods:
• public CodePair(Char c, String code). A constructor which returns a new CodePair object storing the specified character and compression code. This class will have two instance
variables, one to store a character and the other to store a compression code. As mentioned
above compression codes will be stored as Strings of ’0’s and ’1’s.
• public String getCode(). Returns the compression code stored in this CodePair object.
• public char getCharacter(). Returns the character in this CodePair object.
• public void setCharacter(char c). Stores the given character in this CodePair object.
• public void setCode(String code). Stores the given compression code in this CodePair
object.
• public boolean equals (CodePair anotherPair). Returns true if this CodePair object
has the same character as the character stored in anotherPair.
You can implement any other methods that you want to in this class, but they must be declared
as private methods (i.e. not accessible to other classes).
7.2 Class ArrayCode
This class stores a list of characters and their compression codes. Each pair (character,compression
code) will be stored in an object of class CodePair. The list, then, will be stored in an array of
CodePair objects. This class will have an instance variable referencing this array. You might also
want to use instance variables to store the size of the array and the number of CodePair objects that
have been stored in it.
For this class, you must implement all and only the following public methods:
• public ArrayCode (int size). This is the constructor for the class. When an object of this
class is created an array of CodePair objects of the size specified in the parameter will be
created. All instance variables in this class must be initialized here.
4
• public void add (CodePair pair). Adds the given pair to the array of CodePairs. If the
array is full before the new pair is added to it, then the capacity of the array needs to be
expanded as follows:
– If the length or size of the array is at most 100, then the size of the array will be doubled.
Look at the lecture notes to see how to expand the capacity of an array.
– If the size of the array is larger than 100, then the size of the array will increase by 20.
• public void remove (CodePair pairToRemove). Removes pairToRemove from the array.
Note that some entries of the array need to be shifted for pairToRemove to be correctly removed
from the array. Read the lecture notes to see how to remove information from an array. If
pairToRemove is not in the array, then nothing needs to be removed.
If after removing the specified object from the array the number of CodePair objects in the
array is less than one fourth of the size of the array, then the size of the array must be decreased
by half. Note that this requires copying all values stored in the array into a new array of half
of its size.
• public int findCode(String code). This method looks for the given code in the array. If
the array contains a CodePair object with the given compression code, the position of this
object in the array is returned. If no entry of the array stores the given compression code, this
method must return the value -1.
• public int findCharacter(char c). This method looks for the given character in the array.
If the array contains a CodePair object storing the given character, the position of this object
in the array is returned. If no entry of the array stores the given character, this method must
return the value -1.
• public String getCode(int i). This method returns the compression code in the CodePair
object stored in the i-th position of the array. If there is no CodePair object stored in position
i if the array (for example if i is negative or is larger than the number of items stored in the
array) this method must return null.
• public char getCharacter(int i). This method returns the character in the CodePair
object stored in the i-th position of the array. If there is no CodePair object stored in position
i if the array this method must return the value 0 (null character).
• public int getSize(). This method returns the size or length of the array.
• public int getNumPairs(). This method returns the number of CodePair objects stored in
the array.
You can implement any other methods that you want to in this class, but they must be declared
as private methods.
7.3 Class Compress
This class contains a main method, declared with the usual method header:
public static void main(String[] args)
You can implement any other methods that you want to in this class, but they must be declared as
private methods.
The input to the program will consist of two files: a file to compress and a file containing the
compression codes. The output will be the compressed file. In this class you must implement the
algorithm described in Section 3. To execute your program from the command line you must type
java Compress fileName compressionCodesName
5
where fileName is the name of the file to compress and compressionCodesName is the name of the
file storing the compression codes. The compressed file must be stored in a file with the same name
as the original one, but with extension ”.zzz”. For example if the name of the input file is test.txt
then the name of the compressed file must be test.zzz.
If you are using Eclipse, you need to specify the names of the two input files in the list
of program arguments of Eclipse. Please read the tutorials posted in the course’s website
(http://www.csd.uwo.ca/courses/CS1027b/tutorials.html) to see how to do this. Note that you need
to store the input files in the same folder where Eclipse has the src folder, otherwise Eclipse will not
be able to find them.
Note that within method main you can access the names of the two input files through the
parameter args: args[0] stores the name of the first file and args[1] stores the name of the second
file. For example, if the program is executed like this:
java Compress text7.txt codes.txt
Then we can, for example, print the names of the files from within method main as:
public static void main(String[] args) {
System.out.println("Name of file to compress: "+args[0]);
System.out.println("Name of file storing compression codes: "+args[1]);
To get the name of the output file you can use, for example, the following code:
String outputName = args[0].substring(0, args[0].length() - 3) + "zzz";
Method substring will return the name of the input file (args[0]) minus its last 3 characters
(that should contain the extension "txt).
7.4 Class Decompress
This class also contains a main method. The input to this program will also consist of two files: a
compressed file and a file containing the compression codes. The output will be the decompressed
file. In this class you must implement the algorithm described in Section 4. To execute your program
from the command line you must type
java Decompress compressedFileName compressionCodesName
where compressedFileName is the name of the file to decompress and compressionCodesName is as
above. The decompressed file must be stored in a file with the same name as the original one, but
with extension ”.dec”.
You can implement any other methods that you want to in this class, but they must be declared
as private methods.
8 Testing your Program
We will perform two kinds of tests on your program: (1) tests for your implementation of the classes
ArrayCode and CodePair, and (2) tests for your file compressor and decompressor. For the first part
we will run a test program called TestArray which checks whether your first two classes work as
specified. We will supply you with a copy of TestArray so you can use it to test your implementation.
For testing your file compressor we will give your program some input files, check the sizes of the
compressed files, and then we will decompress them and verify that the original files are restored.
You can use, for example, the cmp Linux utility to compare the original file with the decompressed
one. By typing
cmp file1 file2
6
the utility will report the first character where the two files differ, if any. If the files are identical,
the utility terminates without producing any output. In Windows you can use the program fc to
compare two files.
9 Non-Functional Specifications
• Assignments are to be done individually and must be your own work. Software will be used to
detect cheating.
• Include comments in your code in javadoc format. Add javadoc comments at the beginning
of your classes indicating who the author of the code is and a giving a brief description of
the class. Add javadoc comments to methods and instance variables. Read information about
javadoc in the second lab for this course.
• Add comments to explain the meaning of potentially confusing parts of your code. You need
to use your own judgment here. If the meaning of a fragment of code is obvious, you do not
need to add a comment. If someone other than you reading a fragment of code might struggle
to understand how the code works, then write a comment. However, try to avoid meaningless
comments like these:
i = 1; // initialize the value of i to 1
i = i + 1; // increase the value of i
if (i == j) // compare i and j
• Use Java coding conventions and good programming techniques, for example:
1. Use meaningful variable and method names. A name should help understand what a
variable is used for or what a method does. Avoid the use of variable names without any
meaning, like xxy, or names, like flower, that do not relate to the intended purpose of the
variable or method.
2. Use consistent conventions for naming variables, methods, and classes. For example, you
might decide that names of classes should start with a capital letter, while names of
variables and methods should start with a lower case letter. Names that consist of two or
more words like symbol and table can be combined, for example, using “camelCasing” (the
words are concatenated, but the second word starts with a capital letter: symbolTable) or
they can be combined using underscores: symbol table. However, you need to be consistent.
3. Use consistent notation for naming constants. For example, you can use capital letters
to denote constants and constant names composed of several words can be joined by
underscores: TABLE SIZE.
4. Use constants where appropriate.
5. Readability. Use indentation, tabs, and white spaces in a consistent manner to improve
the readability of your code. The body of a for loop statement, for example, should have
a larger indentation than the statement itself:
for (int i = 0; i < TABLE SIZE; ++i)
table[i] = 0;
Also, positioning of brackets, ’{’ and ’}’ to delimit blocks of code should be consistent.
For example if you put an opening bracket at the end of the header of a method:
private int findPosition() {
int position;
7
then you should not put the bracket in a separate line for a method:
private String getName()
{
return personName;