freq-checker (CSV Duplicate Counter)

A Python script that counts the frequency of values in a CSV column and exports the results.

What It Does

Reads a CSV file
Counts how many times each unique value appears in a specified column
Displays the frequency counts
Saves the results to a new CSV file

Usage

Run the script from the command line:

python main.py "path/to/file.csv" "Column name"

Supported Formats

Input: .csv and .xlsx (Excel)
Output: .csv, .xlsx, .png (plot), .html (report)

Options

1. Basic Options

-o, --output: Path to save the results.
--ignore-case: Treat "Apple" and "apple" as duplicates.
--trim: Treat " Apple " and "Apple" as duplicates.

2. Advanced Matching

--fuzzy: Find similar strings (e.g., "John Doe" vs "Jon Doe").
- --threshold <0-100>: Set similarity threshold (default: 90).
--phonetic: Find similarly sounding strings (e.g., "Smith" vs "Smyth").

3. Reporting & Visualization

--plot: Generate a bar chart of top duplicates (saved as .png if output path provided).
--report: Generate a detailed HTML report (saved as .html if output path provided).

Examples

Exact Match (Cleaned):

python main.py data.csv "Email" --ignore-case --trim -o results.csv

Fuzzy Match (Typos):

python main.py data.xlsx "Name" --fuzzy --threshold 85 -o fuzzy_results.xlsx

Phonetic Match (Sound-alike):

python main.py data.csv "LastName" --phonetic -o phonetic.csv

Full Report (Excel + HTML Report + Plot):

python main.py data.xlsx "City" --report --plot -o analysis.xlsx

Installation

Install dependencies:
```
pip install -r requirements.txt
```

Example Output

If your CSV has a "City" column with repeated values:

City
New York    15
Chicago      8
Boston       3

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
freq_checker		freq_checker
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
main.py		main.py
requirements.txt		requirements.txt
sample.csv		sample.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

freq-checker (CSV Duplicate Counter)

What It Does

Usage

Supported Formats

Options

1. Basic Options

2. Advanced Matching

3. Reporting & Visualization

Examples

Installation

Example Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

freq-checker (CSV Duplicate Counter)

What It Does

Usage

Supported Formats

Options

1. Basic Options

2. Advanced Matching

3. Reporting & Visualization

Examples

Installation

Example Output

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages