A Python script that counts the frequency of values in a CSV column and exports the results.
- Reads a CSV file
- Counts how many times each unique value appears in a specified column
- Displays the frequency counts
- Saves the results to a new CSV file
Run the script from the command line:
python main.py "path/to/file.csv" "Column name"- Input:
.csvand.xlsx(Excel) - Output:
.csv,.xlsx,.png(plot),.html(report)
-o,--output: Path to save the results.--ignore-case: Treat "Apple" and "apple" as duplicates.--trim: Treat " Apple " and "Apple" as duplicates.
--fuzzy: Find similar strings (e.g., "John Doe" vs "Jon Doe").--threshold <0-100>: Set similarity threshold (default: 90).
--phonetic: Find similarly sounding strings (e.g., "Smith" vs "Smyth").
--plot: Generate a bar chart of top duplicates (saved as.pngif output path provided).--report: Generate a detailed HTML report (saved as.htmlif output path provided).
Exact Match (Cleaned):
python main.py data.csv "Email" --ignore-case --trim -o results.csvFuzzy Match (Typos):
python main.py data.xlsx "Name" --fuzzy --threshold 85 -o fuzzy_results.xlsxPhonetic Match (Sound-alike):
python main.py data.csv "LastName" --phonetic -o phonetic.csvFull Report (Excel + HTML Report + Plot):
python main.py data.xlsx "City" --report --plot -o analysis.xlsx- Install dependencies:
pip install -r requirements.txt
If your CSV has a "City" column with repeated values:
City
New York 15
Chicago 8
Boston 3