Skip to content

diegohat/huffman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Coding Project

C++ Make VSCode

Index

1. Introduction

2. Methodology

3. Menu

4. Results Discussion

5. Compilation and Execution

Introduction

This project consists of developing a terminal application using the C++ language. Through data structures, it was proposed to create a data compression method using Huffman coding. Huffman coding is a compression method that uses the occurrence probabilities of symbols in the data set to be compressed to determine variable-length codes for each symbol.

Methodology

For the use of coding, tree structures, queues, lists and hash tables were used. A priority queue was created to handle the reordering of the forest of trees that was contained in a list of trees. Below are the project steps and their respective codes.

Project Steps

1. Count the recurrence of each word in the file;

2. Normalize the count;

3. Build the tree with the rules presented by Huffman;

4. Replace words with binary encoding;

5. Save the file in binary format;

6. Observe and discuss space gain or loss.

Menu

Screenshot

  • When executing the program, the following options will appear:

    • 1 Prints the list of words from the text and their normalized frequencies.
    • 2 Prints the Huffman tree (in list form) with their associated binary values.
    • 3 Creates the binary file test.dat in the /src directory.
    • 9 Ends the program.
  • Expected results from executing the options:

    • [1] Print word's frequencies.

      (Word, Normalized Frequency)

      Screenshot

    • [2] Print Huffman tree.

      (Word, Binary Value)

      Screenshot

    • [3] Create binary file.

      Screenshot

    • [9] Exit.

      Screenshot

Results Discussion

Through the analysis of the compression result, it was possible to identify an increase in file size. This behavior, contrary to what was expected, is due to the fact that the true Huffman code (different from the proposed one) uses characters as keys for building the tree and their respective binary values. The number of characters is much lower than the number of possible words within a text, causing the binary values to grow in such a way (due to the number of words and the code having no repeated prefixes) that the file size increases.

Screenshot

Compilation and Execution

Command Function
make clean Deletes the last compilation performed contained in the build folder
make Executes the program compilation using g++, and the result goes to the build folder
make run Executes the program from the build folder after compilation is performed

About

Huffman algorithm for data compression

Resources

License

Stars

Watchers

Forks

Contributors