Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
e46934d
added a few more words for the description/amount
greg-randall Dec 16, 2025
0035603
exit if no heading row found
greg-randall Dec 16, 2025
fde0a98
fix(utils): enhance clean_amount to handle US/EU currency formats
greg-randall Dec 16, 2025
f2df348
feat(interpret): improve data parsing and support for YNAB exports
greg-randall Dec 16, 2025
c99b2ba
feat(interpret): sort subscriptions by estimated yearly cost
greg-randall Dec 16, 2025
9fb8c58
feat(interpret): cluster similar subscription amounts
greg-randall Dec 16, 2025
9f40dfc
feat(interpret): allow configurable clustering threshold
greg-randall Dec 16, 2025
d4f497f
feat(interpret): improve subscription grouping via text normalization
greg-randall Dec 16, 2025
09a2a06
feat(interpret): format Amount and Yearly_Cost to two decimal places
greg-randall Dec 16, 2025
1b279f5
feat(interpret): filter subscriptions by recency
greg-randall Dec 16, 2025
b2c48df
fix(utils): remove hardcoded personal finance vendor patterns
greg-randall Dec 16, 2025
20c6695
feat(interpret): implement generic fuzzy matching for description gro…
greg-randall Dec 16, 2025
ff1d266
fix(interpret): add missing merge_similar_descriptions function
greg-randall Dec 16, 2025
97c39f8
feat(interpret): make transaction amount filter configurable
greg-randall Dec 16, 2025
dc42555
fix(interpret): separate outliers by clustering amounts per vendor
greg-randall Dec 16, 2025
a1937a5
fix(interpret): correct column handling for multi-key grouping
greg-randall Dec 16, 2025
b226230
fix(interpret): resolve column length error and properly filter outliers
greg-randall Dec 16, 2025
f0ccf76
fix(interpret): silence DeprecationWarning in cluster_amounts apply
greg-randall Dec 16, 2025
53163b4
Revert "fix(interpret): silence DeprecationWarning in cluster_amounts…
greg-randall Dec 16, 2025
f7e88a1
fix(interpret): refactor amount clustering to use transform
greg-randall Dec 16, 2025
cc5bfa6
feat(interpret): add support for ignoring vendors via external file
greg-randall Dec 16, 2025
ea807cd
feat(interpret): add support for ignoring vendors via external file
greg-randall Dec 16, 2025
f960052
fix(interpret): add missing --ignore-file argument definition
greg-randall Dec 16, 2025
d692e5f
adding ignores functionality
greg-randall Dec 16, 2025
2c782ee
perf(interpret): optimize fuzzy matching complexity
greg-randall Dec 16, 2025
06fe24f
fix(interpret): revert bucketing optimization to restore fuzzy matchi…
greg-randall Dec 16, 2025
b878def
docs: update README with new features and CLI arguments
greg-randall Dec 16, 2025
abc5837
docs: add CSV file format section to README
greg-randall Dec 16, 2025
28744ae
formatting
greg-randall Dec 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,9 @@ venv/
.DS_Store
Thumbs.db

/reports/*[!.gitkeep]
/reports/*[!.gitkeep]ignore_subscriptions.txt

# ignore customized ignores file
!ignore_subscriptions.example.txt
ignore_subscriptions.txt

182 changes: 156 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,172 @@

# Subscription Finder Python Script



This Python script is designed to help users find and manage their subscriptions.



## Setup



To set up the project, follow these steps:

1. **Clone the repository:**
```bash
git clone <repository-url>
cd <repository-name>
```

2. **Create a virtual environment:**

Windows CMD:
```cmd
python -m venv venv
venv\Scripts\activate
```

Bash:
```bash
python3 -m venv venv
source venv/bin/activate
```

3. **Install the dependencies:**
```bash
pip install -r requirements.txt
```


1. **Clone the repository:**

```bash

git clone <repository-url>

cd <repository-name>

```



2. **Create a virtual environment:**

Windows CMD:

```cmd

python -m venv venv

venv\Scripts\activate

```



Bash:

```bash

python3 -m venv venv

source venv/bin/activate

```

3. **Install the dependencies:**

```bash

pip install -r requirements.txt

```



## CSV File Format



The script expects a CSV file containing transaction data. It automatically identifies and maps column headers to standard names, supporting various linguistic and formatting differences.



The essential columns and their recognized variations are:



- **Date**: (`date`, `datum`, `fecha`, `data`) - The date of the transaction.

- **Description**: (`description`, `desc`, `descripción`, `bezeichnung`, `opis`, `payee`) - A textual description of the transaction or vendor.

- **Amount**: (`amount`, `amt`, `importe`, `betrag`, `kwota`, `sum`, `outflow`) - The transaction amount. Note: the script handles currency symbols and different decimal/thousands separators.



The script also supports automatic language detection for column headers and will translate them to English before processing.



## Usage



To use the script, run the following command:



```bash
python interpret.py reports/financial_reports.csv

python interpret.py <path_to_csv_file> [options]

```



**Example:**

```bash

python interpret.py reports/financial_reports.csv --recency-days 120 --threshold 0.2

```



### Command-line Arguments



| Argument | Short | Default | Description |
| :--- | :--- | :--- | :--- |
| `file_path` | | | Path to the CSV file to analyze (Required). |
| `--threshold` | `-t` | `0.15` | Percentage threshold (0.0-1.0) for clustering similar transaction amounts. |
| `--recency-days` | `-r` | `90` | Number of days from the latest transaction date to consider a subscription "active". |
| `--min-transaction-amount` | | `10.0` | Minimum absolute transaction amount to consider. |
| `--max-transaction-amount` | | `10000.0` | Maximum absolute transaction amount to consider. |
| `--ignore-file` | | `ignore_subscriptions.txt` | Path to a text file containing vendor names to ignore. |
| `--debug` | `-d` | `False` | Enable verbose debug output. |



### Ignoring Vendors



You can exclude specific vendors or transactions by adding their names to a text file (default: `ignore_subscriptions.txt`).

- One vendor per line.

- Supports partial matching (e.g., "Grocery" will ignore "Joe's Grocery Store").

- Case-insensitive.



Example `ignore_subscriptions.txt`:

```text

Whole Foods

Starbucks

One-time transfer

```

Replace `reports/financial_reports.csv` with the path to your CSV file.


## How It Works



1. **Parses & Normalizes:** Reads the CSV, detects column names automatically (multilingual support), and normalizes vendor descriptions (removes location data, special characters, etc.).

2. **Fuzzy Matching:** Groups similar vendor names together (e.g., "Netflix.com" and "Netflix Inc") using sequence matching logic.

3. **Ignores:** Filters out vendors listed in the ignore file.

4. **Clusters Amounts:** Groups transactions from the same vendor that have similar amounts (within the specified `--threshold`) to handle small price variations or currency fluctuations. This also helps separate recurring payments from one-off outliers (like a large downpayment vs. a monthly fee).

5. **Identifies Candidates:** Filters for recurring transactions (count > 1) that fall within the specified amount range and recency window.

6. **Reports:** specific details about the potential subscriptions found, sorted by estimated yearly cost.
8 changes: 8 additions & 0 deletions ignore_subscriptions.example.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Add exact or partial vendor names to ignore (case-insensitive)
# One entry per line
Whole Foods
Trader Joe's
Safeway
Publix
Walmart
Target
2 changes: 2 additions & 0 deletions ignore_subscriptions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Kroger
SPOTTY DOG ICE CREAM
Loading