Huffman Coding Project

Index

Introduction

This project consists of developing a terminal application using the C++ language. Through data structures, it was proposed to create a data compression method using Huffman coding. Huffman coding is a compression method that uses the occurrence probabilities of symbols in the data set to be compressed to determine variable-length codes for each symbol.

Methodology

For the use of coding, tree structures, queues, lists and hash tables were used. A priority queue was created to handle the reordering of the forest of trees that was contained in a list of trees. Below are the project steps and their respective codes.

Project Steps

1. Count the recurrence of each word in the file;

2. Normalize the count;

3. Build the tree with the rules presented by Huffman;

4. Replace words with binary encoding;

5. Save the file in binary format;

6. Observe and discuss space gain or loss.

When executing the program, the following options will appear:
- 1 Prints the list of words from the text and their normalized frequencies.
- 2 Prints the Huffman tree (in list form) with their associated binary values.
- 3 Creates the binary file test.dat in the /src directory.
- 9 Ends the program.
Expected results from executing the options:
- [1] Print word's frequencies.
  
  (Word, Normalized Frequency)
- [2] Print Huffman tree.
  
  (Word, Binary Value)
- [3] Create binary file.
- [9] Exit.

Results Discussion

Through the analysis of the compression result, it was possible to identify an increase in file size. This behavior, contrary to what was expected, is due to the fact that the true Huffman code (different from the proposed one) uses characters as keys for building the tree and their respective binary values. The number of characters is much lower than the number of possible words within a text, causing the binary values to grow in such a way (due to the number of words and the code having no repeated prefixes) that the file size increases.

Compilation and Execution

Command	Function
make clean	Deletes the last compilation performed contained in the build folder
make	Executes the program compilation using g++, and the result goes to the build folder
make run	Executes the program from the build folder after compilation is performed

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Huffman Coding Project

Index

Introduction

Methodology

Project Steps

Menu

[1] Print word's frequencies.

[2] Print Huffman tree.

[3] Create binary file.

[9] Exit.

Results Discussion

Compilation and Execution

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Huffman Coding Project

Index

Introduction

Methodology

Project Steps

Menu

[1] Print word's frequencies.

[2] Print Huffman tree.

[3] Create binary file.

[9] Exit.

Results Discussion

Compilation and Execution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages