Skip to content

Conversation

@luisfernandobarrera
Copy link
Owner

@luisfernandobarrera luisfernandobarrera commented Nov 8, 2025

Major Features:

  • Implemented RFC generator for Persona Moral (companies/legal entities)

    • Supports 1-word, 2-word, and 3+ word company names
    • Proper handling of excluded words (SA, CV, DE, etc.)
    • Cacophonic word detection and replacement
    • Homoclave calculation
    • Checksum generation
  • Implemented CURP generator for individuals

    • Full 18-character CURP generation
    • Support for all Mexican states with proper state codes
    • Gender identification (H/M)
    • Handles compound names (José, María)
    • Missing materno apellido support
    • Internal consonant extraction
    • Special character and accent handling
  • Enhanced package exports in init.py

    • Exported all new generator and validator classes
    • Proper all definition for clean imports
  • Comprehensive CLI implementation

    • RFC validation and generation (fisica and moral)
    • CURP validation and generation
    • User-friendly colored output
    • Detailed validation reporting
    • Help text and examples

Testing:

  • Added 29 comprehensive unit tests
  • Tests for RFC Persona Moral generation
  • Tests for CURP generation and validation
  • Edge case handling (missing materno, compound names, special chars)
  • All tests passing (29/29)

Architecture Improvements:

  • Refactored code for better organization
  • Consistent class hierarchy
  • Proper separation of concerns
  • Python 3 compatibility (.items() instead of .iteritems())

Documentation:

  • Added docstrings for all new classes and methods
  • Inline comments explaining algorithms
  • Test documentation

Summary by Sourcery

Implement full Mexican RFC and CURP generation and validation for individuals and companies, add a user-friendly CLI, and ensure correctness with extensive tests

New Features:

  • Add RFC generator for Persona Moral with multi-word company name parsing, excluded words handling, cacophonic word replacement, homoclave and checksum calculation
  • Add CURP generator and validator for individuals with full 18-character support, including state codes, gender, compound names, missing maternal surname, internal consonant extraction, and special character handling
  • Provide comprehensive CLI commands for RFC and CURP generation and validation with user-friendly colored output, detailed reporting, and examples

Bug Fixes:

  • Fix use of dict .items() instead of .iteritems() for validator methods and correct CURP length and regex definition

Enhancements:

  • Refactor code to separate generator and validator utilities, integrate unidecode for accent removal, and update package exports in init.py with a clean all
  • Introduce factory methods in RFCGenerator to unify Persona Física and Persona Moral workflows and modernize code for Python 3 compatibility
  • Define consistent constants for state codes, excluded and cacophonic words, and improve class hierarchies in RFC and CURP modules

Documentation:

  • Add docstrings for all new classes and methods and inline comments to explain generation algorithms

Tests:

  • Add 29 comprehensive unit tests covering RFC (fisica and moral) and CURP generation and validation with edge case scenarios

Major Features:
- Implemented RFC generator for Persona Moral (companies/legal entities)
  - Supports 1-word, 2-word, and 3+ word company names
  - Proper handling of excluded words (SA, CV, DE, etc.)
  - Cacophonic word detection and replacement
  - Homoclave calculation
  - Checksum generation

- Implemented CURP generator for individuals
  - Full 18-character CURP generation
  - Support for all Mexican states with proper state codes
  - Gender identification (H/M)
  - Handles compound names (José, María)
  - Missing materno apellido support
  - Internal consonant extraction
  - Special character and accent handling

- Enhanced package exports in __init__.py
  - Exported all new generator and validator classes
  - Proper __all__ definition for clean imports

- Comprehensive CLI implementation
  - RFC validation and generation (fisica and moral)
  - CURP validation and generation
  - User-friendly colored output
  - Detailed validation reporting
  - Help text and examples

Testing:
- Added 29 comprehensive unit tests
- Tests for RFC Persona Moral generation
- Tests for CURP generation and validation
- Edge case handling (missing materno, compound names, special chars)
- All tests passing (29/29)

Architecture Improvements:
- Refactored code for better organization
- Consistent class hierarchy
- Proper separation of concerns
- Python 3 compatibility (.items() instead of .iteritems())

Documentation:
- Added docstrings for all new classes and methods
- Inline comments explaining algorithms
- Test documentation
@sourcery-ai
Copy link

sourcery-ai bot commented Nov 8, 2025

Reviewer's Guide

This PR fully implements RFC and CURP generators and validators for Mexican individuals and legal entities, integrates them into a Click-based CLI, and enriches the package exports and test coverage. It adds detailed name‐ and date‐based algorithms, homoclave and checksum logic, accent and excluded‐word handling, and Python 3 compatibility throughout.

Sequence diagram for CLI RFC generation (Persona Física)

sequenceDiagram
    actor User
    participant CLI
    participant RFCGenerator
    User->>CLI: rfc generate-fisica --nombre --paterno --materno --fecha
    CLI->>RFCGenerator: generate_fisica(nombre, paterno, materno, fecha)
    RFCGenerator-->>CLI: RFC code
    CLI->>User: Display generated RFC
Loading

Sequence diagram for CLI CURP generation

sequenceDiagram
    actor User
    participant CLI
    participant CURPGenerator
    User->>CLI: curp generate --nombre --paterno --materno --fecha --sexo --estado
    CLI->>CURPGenerator: CURPGenerator(nombre, paterno, materno, fecha_nacimiento, sexo, estado)
    CURPGenerator-->>CLI: CURP code
    CLI->>User: Display generated CURP
Loading

Class diagram for RFC generator and validator classes

classDiagram
    RFCGeneral <|-- RFCGeneratorUtils
    RFCGeneratorUtils <|-- RFCGeneratorMorales
    RFCGeneratorUtils <|-- RFCGeneratorFisicas
    RFCGeneral <|-- RFCValidator
    class RFCGenerator {
        +generate_fisica(nombre, paterno, materno, fecha)
        +generate_moral(razon_social, fecha)
    }
    class RFCGeneratorMorales {
        +razon_social: str
        +fecha: date
        +rfc
        +generate_letters()
        +generate_date()
        +homoclave
    }
    class RFCGeneratorFisicas {
        +nombre: str
        +paterno: str
        +materno: str
        +fecha: date
        +rfc
    }
    class RFCValidator {
        +validate(strict)
        +validators(strict)
        +is_valid
        +detect_fisica_moral()
    }
    class RFCGeneral {
        +vocales
        +excluded_words_fisicas
        +excluded_words_morales
        +allowed_chars
    }
Loading

Class diagram for package exports in init.py

classDiagram
    class rfcmx {
        +RFCValidator
        +RFCGenerator
        +RFCGeneratorFisicas
        +RFCGeneratorMorales
        +CURPValidator
        +CURPGenerator
        +CURPException
        +CURPLengthError
        +CURPStructureError
    }
Loading

File-Level Changes

Change Details Files
Structured CURP generation and validation
  • Introduced CURPGeneral with regex, state codes, character sets and excluded/cacophonic lists
  • Built CURPValidator to enforce length and structure rules
  • Added CURPGeneratorUtils and CURPGenerator for name cleaning, letter/date/consonant/homoclave assembly
  • Handled accents, special characters, compound names and missing maternal surname
src/rfcmx/curp.py
tests/test_curp.py
Extended RFC support for Persona Moral and Python 3 compatibility
  • Implemented RFCGeneratorMorales with cleaning rules for multiword company names and homoclave/checksum integration
  • Updated RFCGenerator factory methods for fisica and moral usage
  • Replaced .iteritems() with .items() and ensured consistent class hierarchy
src/rfcmx/rfc.py
tests/test_rfc.py
Comprehensive CLI with generation and validation commands
  • Converted main script to Click groups for rfc and curp namespaces
  • Added subcommands for validate and generate (fisica, moral, curp) with date parsing and error handling
  • Enhanced output with colored status, detailed reports and usage examples
src/rfcmx/cli.py
Package exports and documentation improvements
  • Updated init.py to export new validator and generator classes and define all
  • Added docstrings and inline comments across modules to explain algorithms
  • Ensured unidecode and six imports are available for character normalization
src/rfcmx/__init__.py
src/rfcmx/curp.py
src/rfcmx/rfc.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Several constant lists (excluded words, cacophonic words, allowed chars, etc.) are duplicated across RFC and CURP modules—consider centralizing them in a shared constants module to DRY up the code.
  • The homoclave calculation logic appears in both RFCGeneratorUtils and RFCGeneratorMorales—extract it into a common utility to avoid duplication and potential divergence.
  • Each CLI command manually parses dates with datetime.strptime; consider using click’s built-in DateTime parameter type or a custom click.ParamType to centralize date parsing and error handling.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Several constant lists (excluded words, cacophonic words, allowed chars, etc.) are duplicated across RFC and CURP modules—consider centralizing them in a shared constants module to DRY up the code.
- The homoclave calculation logic appears in both RFCGeneratorUtils and RFCGeneratorMorales—extract it into a common utility to avoid duplication and potential divergence.
- Each CLI command manually parses dates with datetime.strptime; consider using click’s built-in DateTime parameter type or a custom click.ParamType to centralize date parsing and error handling.

## Individual Comments

### Comment 1
<location> `src/rfcmx/curp.py:240-241` </location>
<code_context>
+            raise ValueError('Apellido paterno is required')
+        if not nombre or not nombre.strip():
+            raise ValueError('Nombre is required')
+        if not isinstance(fecha_nacimiento, datetime.date):
+            raise ValueError('fecha_nacimiento must be a datetime.date object')
+        if sexo.upper() not in ('H', 'M'):
</code_context>

<issue_to_address>
**suggestion:** Strict type checking for fecha_nacimiento may limit flexibility.

Consider allowing both datetime.date and datetime.datetime types for fecha_nacimiento, or convert datetime.datetime to datetime.date as needed.

```suggestion
        if isinstance(fecha_nacimiento, datetime.datetime):
            fecha_nacimiento = fecha_nacimiento.date()
        elif not isinstance(fecha_nacimiento, datetime.date):
            raise ValueError('fecha_nacimiento must be a datetime.date or datetime.datetime object')
```
</issue_to_address>

### Comment 2
<location> `src/rfcmx/curp.py:404-412` </location>
<code_context>
+    @property
+    def curp(self):
+        """Generate and return the complete CURP"""
+        if not self._curp:
+            letters = self.generate_letters()
+            date = self.generate_date()
+            gender = self.sexo
+            state = self.get_state_code(self.estado)
+            consonants = self.generate_consonants()
+            homoclave = self.generate_homoclave()
+
+            self._curp = letters + date + gender + state + consonants + homoclave
+
+        return self._curp
</code_context>

<issue_to_address>
**issue (bug_risk):** CURP generation caches the result, which may cause issues if attributes are changed after instantiation.

If relevant attributes are updated after the CURP is first accessed, the cached value will become outdated. Please implement cache invalidation when these attributes change.
</issue_to_address>

### Comment 3
<location> `src/rfcmx/rfc.py:195` </location>
<code_context>
                 # 'checksum': self.validate_checksum,
             }
-        return {name: function() for name, function in validations.iteritems()}
+        return {name: function() for name, function in validations.items()}

     def validate(self, strict=True):
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Use of dict comprehension with function calls may mask exceptions.

Handle exceptions inside the comprehension to ensure individual validation errors are reported without stopping the entire process.

```suggestion
        results = {}
        for name, function in validations.items():
            try:
                results[name] = function()
            except Exception as e:
                results[name] = e
        return results
```
</issue_to_address>

### Comment 4
<location> `src/rfcmx/rfc.py:601-606` </location>
<code_context>
+        :param razon_social: Company name (razón social)
+        :param fecha: Incorporation/foundation date
+        """
+        if (razon_social.strip() and isinstance(fecha, datetime.date)):
+            self.razon_social = razon_social
+            self.fecha = fecha
</code_context>

<issue_to_address>
**suggestion:** Strict type checking for fecha may limit flexibility.

Consider allowing fecha to be either a datetime.date or datetime.datetime, or convert datetime.datetime to datetime.date internally.

```suggestion
        if not razon_social.strip():
            raise ValueError('Invalid Values: razon_social must be non-empty')
        if isinstance(fecha, datetime.datetime):
            fecha = fecha.date()
        if not isinstance(fecha, datetime.date):
            raise ValueError('Invalid Values: fecha must be a date or datetime')
        self.razon_social = razon_social
        self.fecha = fecha
        self._rfc = ''
```
</issue_to_address>

### Comment 5
<location> `tests/test_rfc.py:118` </location>
<code_context>
+        # Verify it's recognized as Persona Moral
+        self.assertTrue(RFCValidator(rfc).is_moral())
+
+    def test_razon_social_cleaning(self):
+        """Test that company name cleaning works correctly"""
+        tests = [
</code_context>

<issue_to_address>
**issue (testing):** The test_razon_social_cleaning test does not contain any assertions.

Please add assertions to confirm the cleaning logic works as intended, such as verifying the output matches expected cleaned company names or generated letters.
</issue_to_address>

### Comment 6
<location> `tests/test_rfc.py:138-143` </location>
<code_context>
+        self.assertEqual(len(rfc), 12)
+        self.assertTrue(rfc.startswith('BNM840515'))
+
+    def test_single_word_company(self):
+        """Test RFC generation for single-word company names"""
+        r = RFCGeneratorMorales(razon_social='Bimbo', fecha=datetime.date(1945, 12, 2))
</code_context>

<issue_to_address>
**suggestion (testing):** Single-word company RFC test lacks assertion for expected letter output.

Add an assertion to check that the generated 3-letter code matches the expected value for single-word company names.

```suggestion
    def test_single_word_company(self):
        """Test RFC generation for single-word company names"""
        r = RFCGeneratorMorales(razon_social='Bimbo', fecha=datetime.date(1945, 12, 2))
        rfc = r.rfc
        self.assertEqual(len(rfc), 12)
        # Single word should still generate 3 letters
        self.assertEqual(r.generate_letters(), "BIM")
```
</issue_to_address>

### Comment 7
<location> `tests/test_curp.py:255` </location>
<code_context>
+                estado='Jalisco'
+            )
+
+    def test_special_characters(self):
+        """Test handling of special characters and accents"""
+        generator = CURPGenerator(
</code_context>

<issue_to_address>
**suggestion (testing):** Special character handling test could assert more details.

Add assertions to verify that the CURP output for names with accents or special characters matches the expected value, ensuring normalization is correct.

Suggested implementation:

```python
    def test_special_characters(self):
        """Test handling of special characters and accents"""
        # Example with accents and special characters
        generator = CURPGenerator(
            nombre='José Ángel',
            paterno='Muñoz',
            materno='García-López',
            fecha_nacimiento=datetime.date(1985, 7, 23),
            sexo='H',
            estado='Jalisco'
        )
        # The expected CURP should have normalized letters (no accents, special chars removed)
        # This value should be updated to match the actual normalization logic in CURPGenerator
        expected_curp = "MUGJ850723HJCRPS09"  # Example, replace with correct expected value
        generated_curp = generator.generate()
        self.assertEqual(
            generated_curp,
            expected_curp,
            f"Expected CURP '{expected_curp}' for accented/special chars, got '{generated_curp}'"
        )
        # Also test just the letters part if needed
        expected_letters = "MUGJ"  # Example, replace with correct expected value
        generated_letters = generator.generate_letters()
        self.assertEqual(
            generated_letters,
            expected_letters,
            f"Expected letters '{expected_letters}' for accented/special chars, got '{generated_letters}'"
        )

```

- Make sure to update `expected_curp` and `expected_letters` to match the actual output of your CURP normalization logic.
- If the `generate()` or `generate_letters()` methods do not exist or are named differently, adjust the method calls accordingly.
</issue_to_address>

### Comment 8
<location> `tests/test_curp.py:151` </location>
<code_context>
+        # Juan -> N (first internal consonant)
+        self.assertEqual(consonants, 'RRN')
+
+    def test_no_materno(self):
+        """Test CURP generation without apellido materno"""
+        generator = CURPGenerator(
</code_context>

<issue_to_address>
**suggestion (testing):** No materno test could check for correct placement of 'X'.

Consider updating the test to assert the specific index of 'X' in the CURP string, verifying that the missing materno logic is applied as intended.

Suggested implementation:

```python
    def test_no_materno(self):
        """Test CURP generation without apellido materno"""
        generator = CURPGenerator(

```

```python
        consonants = generator.generate_consonants()
        self.assertEqual(len(consonants), 3)
        # Pérez -> R (first internal consonant)
        # García -> R (first internal consonant)
        # Juan -> N (first internal consonant)
        self.assertEqual(consonants, 'RRN')

        # Generate full CURP and check for 'X' at the expected index (materno position)
        curp = generator.generate_curp()
        # The 'X' should be at index 3 (CURP positions: 0-nombre, 1-paterno, 2-materno, 3-materno missing)
        self.assertEqual(curp[3], 'X', f"Expected 'X' at index 3 for missing materno, got {curp}")

```
</issue_to_address>

### Comment 9
<location> `tests/test_curp.py:219` </location>
<code_context>
+                estado='Jalisco'
+            )
+
+    def test_invalid_date(self):
+        """Test that invalid date raises error"""
+        with self.assertRaises(ValueError):
</code_context>

<issue_to_address>
**suggestion (testing):** Invalid date test could include more formats and types.

Add test cases for None, empty strings, and malformed date formats to better validate error handling.
</issue_to_address>

### Comment 10
<location> `tests/test_curp.py:221-229` </location>
<code_context>
+                estado='Jalisco'
+            )
+
+    def test_missing_nombre(self):
+        """Test that missing nombre raises error"""
+        with self.assertRaises(ValueError):
</code_context>

<issue_to_address>
**suggestion (testing):** Missing nombre test could check for whitespace-only input.

Include a test where nombre consists only of spaces to verify it triggers a ValueError as expected.

```suggestion
        with self.assertRaises(ValueError):
            CURPGenerator(
                nombre='Juan',
                paterno='Pérez',
                materno='García',
                fecha_nacimiento='1990-05-12',  # Should be date object
                sexo='H',
                estado='Jalisco'
            )

        # Test whitespace-only nombre
        with self.assertRaises(ValueError):
            CURPGenerator(
                nombre='   ',
                paterno='Pérez',
                materno='García',
                fecha_nacimiento='1990-05-12',
                sexo='H',
                estado='Jalisco'
            )
```
</issue_to_address>

### Comment 11
<location> `tests/test_curp.py:42` </location>
<code_context>
+
+
+class test_CURPGenerator(unittest.TestCase):
+    def test_generate_letters(self):
+        """Test letter generation for CURP"""
+        tests = [
</code_context>

<issue_to_address>
**suggestion (testing):** Letter generation test for CURP could include edge cases for cacophonic words.

Include tests for cacophonic word cases to verify correct replacement of the last character with 'X'.
</issue_to_address>

### Comment 12
<location> `tests/test_curp.py:134` </location>
<code_context>
+            self.assertIn(expected_code, curp,
+                        f"Failed for state {estado}: expected {expected_code} in {curp}")
+
+    def test_consonants_generation(self):
+        """Test internal consonant extraction"""
+        generator = CURPGenerator(
</code_context>

<issue_to_address>
**suggestion (testing):** Consonant extraction test could include edge cases for short names.

Include tests for single-character names and surnames to verify correct handling of missing internal consonants.

Suggested implementation:

```python
    def test_consonants_generation(self):
        """Test internal consonant extraction, including edge cases for short names."""

        # Typical case
        generator = CURPGenerator(
            nombre="Juan",
            paterno="Pérez",
            materno="García",
            fecha_nacimiento=datetime.date(2000, 1, 1),
            sexo='H',
            estado='Jalisco'
        )
        expected_letters = generator.generate_letters()
        self.assertEqual(expected_letters, generator.generate_letters(),
                         f"Failed for Juan Pérez García: expected {expected_letters}, got {generator.generate_letters()}")

        # Edge case: single-character first name
        generator = CURPGenerator(
            nombre="A",
            paterno="Pérez",
            materno="García",
            fecha_nacimiento=datetime.date(2000, 1, 1),
            sexo='H',
            estado='Jalisco'
        )
        expected_letters = generator.generate_letters()
        self.assertEqual(expected_letters, generator.generate_letters(),
                         f"Failed for single-character name: expected {expected_letters}, got {generator.generate_letters()}")

        # Edge case: single-character paternal surname
        generator = CURPGenerator(
            nombre="Juan",
            paterno="B",
            materno="García",
            fecha_nacimiento=datetime.date(2000, 1, 1),
            sexo='H',
            estado='Jalisco'
        )
        expected_letters = generator.generate_letters()
        self.assertEqual(expected_letters, generator.generate_letters(),
                         f"Failed for single-character paternal surname: expected {expected_letters}, got {generator.generate_letters()}")

        # Edge case: single-character maternal surname
        generator = CURPGenerator(
            nombre="Juan",
            paterno="Pérez",
            materno="C",
            fecha_nacimiento=datetime.date(2000, 1, 1),
            sexo='H',
            estado='Jalisco'
        )
        expected_letters = generator.generate_letters()
        self.assertEqual(expected_letters, generator.generate_letters(),
                         f"Failed for single-character maternal surname: expected {expected_letters}, got {generator.generate_letters()}")

        # Edge case: all single-character names
        generator = CURPGenerator(
            nombre="A",
            paterno="B",
            materno="C",
            fecha_nacimiento=datetime.date(2000, 1, 1),
            sexo='H',
            estado='Jalisco'
        )
        expected_letters = generator.generate_letters()
        self.assertEqual(expected_letters, generator.generate_letters(),
                         f"Failed for all single-character names: expected {expected_letters}, got {generator.generate_letters()}")

```

You may want to manually set `expected_letters` to the expected CURP consonant code for each edge case, depending on your CURPGenerator implementation. If `generate_letters()` does not handle single-character names as expected, you should update its logic to insert 'X' or the appropriate placeholder for missing internal consonants.
</issue_to_address>

### Comment 13
<location> `tests/test_curp.py:70` </location>
<code_context>
+            self.assertEqual(generated, expected_letters,
+                           f"Failed for {nombre} {paterno} {materno}: expected {expected_letters}, got {generated}")
+
+    def test_generate_complete_curp(self):
+        """Test complete CURP generation"""
+        generator = CURPGenerator(
</code_context>

<issue_to_address>
**nitpick (testing):** CURP generation test could check for homoclave placeholder.

Include an assertion to confirm the last two characters of the CURP are '00', reflecting the homoclave placeholder.
</issue_to_address>

### Comment 14
<location> `tests/test_curp.py:85` </location>
<code_context>
+        self.assertTrue(curp.startswith('PEGJ900512H'))
+        self.assertTrue('JC' in curp)  # Jalisco code
+
+    def test_gender_codes(self):
+        """Test gender codes"""
+        male = CURPGenerator(
</code_context>

<issue_to_address>
**suggestion (testing):** Gender code test could include lowercase and invalid inputs.

Please add tests for lowercase and invalid gender values to verify correct handling and error raising.
</issue_to_address>

### Comment 15
<location> `src/rfcmx/curp.py:247` </location>
<code_context>
        self.materno = materno if materno else ''

</code_context>

<issue_to_address>
**suggestion (code-quality):** Replace if-expression with `or` ([`or-if-exp-identity`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/or-if-exp-identity))

```suggestion
        self.materno = materno or ''
```

<br/><details><summary>Explanation</summary>Here we find ourselves setting a value if it evaluates to `True`, and otherwise
using a default.

The 'After' case is a bit easier to read and avoids the duplication of
`input_currency`.

It works because the left-hand side is evaluated first. If it evaluates to
true then `currency` will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
`currency` will be set to `DEFAULT_CURRENCY`.
</details>
</issue_to_address>

### Comment 16
<location> `src/rfcmx/curp.py:302-304` </location>
<code_context>
        if len(words) > 1:
            if words[0] in ('MARIA', 'JOSE', 'MA', 'MA.', 'J', 'J.'):
                return " ".join(words[1:])

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))

```suggestion
        if len(words) > 1 and words[0] in ('MARIA', 'JOSE', 'MA', 'MA.', 'J', 'J.'):
            return " ".join(words[1:])

```

<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

### Comment 17
<location> `tests/test_curp.py:16-18` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 18
<location> `tests/test_curp.py:57-68` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 19
<location> `tests/test_curp.py:121-132` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 20
<location> `tests/test_rfc.py:100-104` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 21
<location> `tests/test_rfc.py:125-127` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 22
<location> `src/rfcmx/curp.py:149-156` </location>
<code_context>
    @classmethod
    def clean_name(cls, nombre):
        """Clean name by removing excluded words and special characters"""
        if not nombre:
            return ''
        result = "".join(
            char if char in cls.allowed_chars else unidecode.unidecode(char)
            for char in " ".join(
                elem for elem in nombre.split(" ")
                if elem.upper() not in cls.excluded_words
            ).strip().upper()
        ).strip().upper()
        return result

</code_context>

<issue_to_address>
**suggestion (code-quality):** Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))

```suggestion
        return (
            "".join(
                char if char in cls.allowed_chars else unidecode.unidecode(char)
                for char in " ".join(
                    elem
                    for elem in nombre.split(" ")
                    if elem.upper() not in cls.excluded_words
                )
                .strip()
                .upper()
            )
            .strip()
            .upper()
        )
```
</issue_to_address>

### Comment 23
<location> `src/rfcmx/curp.py:178-181` </location>
<code_context>
    @classmethod
    def get_first_consonant(cls, word):
        """
        Get the first internal consonant from a word
        (the first consonant that is not the first letter)
        """
        if not word or len(word) <= 1:
            return 'X'

        for char in word[1:]:
            if char in cls.consonantes:
                return char
        return 'X'

</code_context>

<issue_to_address>
**suggestion (code-quality):** Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))

```suggestion
        return next((char for char in word[1:] if char in cls.consonantes), 'X')
```
</issue_to_address>

### Comment 24
<location> `src/rfcmx/curp.py:202-211` </location>
<code_context>
    @classmethod
    def get_state_code(cls, state):
        """
        Get the two-letter state code from state name
        """
        if not state:
            return 'NE'  # Born abroad default

        state_upper = state.upper().strip()

        # Try exact match first
        if state_upper in cls.state_codes:
            return cls.state_codes[state_upper]

        # Clean the state name and try again
        state_clean = cls.clean_name(state).upper()
        if state_clean in cls.state_codes:
            return cls.state_codes[state_clean]

        # Try to find partial match
        for state_name, code in cls.state_codes.items():
            if state_name in state_upper or state_upper in state_name:
                return code

        # If it's already a 2-letter code, validate and return
        if len(state_upper) == 2 and state_upper[0] in cls.allowed_chars and state_upper[1] in cls.allowed_chars:
            return state_upper

        return 'NE'  # Default to born abroad

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return next(
            (
                code
                for state_name, code in cls.state_codes.items()
                if state_name in state_upper or state_upper in state_name
            ),
            (
                state_upper
                if len(state_upper) == 2
                and state_upper[0] in cls.allowed_chars
                and state_upper[1] in cls.allowed_chars
                else 'NE'
            ),
        )
```
</issue_to_address>

### Comment 25
<location> `src/rfcmx/curp.py:308` </location>
<code_context>
    def generate_letters(self):
        """
        Generate the first 4 letters of CURP

        1. First letter of paterno
        2. First vowel of paterno (after first letter)
        3. First letter of materno (or X if none)
        4. First letter of nombre
        """
        clave = []

        # First letter of paterno
        paterno = self.paterno_calculo
        if not paterno:
            raise ValueError('Apellido paterno cannot be empty')

        clave.append(paterno[0])

        # First vowel of paterno (after first letter)
        vowel_found = False
        for char in paterno[1:]:
            if char in self.vocales:
                clave.append(char)
                vowel_found = True
                break

        if not vowel_found:
            clave.append('X')

        # First letter of materno (or X if none)
        materno = self.materno_calculo
        if materno:
            clave.append(materno[0])
        else:
            clave.append('X')

        # First letter of nombre
        nombre = self.nombre_iniciales
        if not nombre:
            raise ValueError('Nombre cannot be empty')

        clave.append(nombre[0])

        result = "".join(clave)

        # Check for cacophonic words and replace last character with 'X'
        if result in self.cacophonic_words:
            result = result[:-1] + 'X'

        return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Merge append into list declaration ([`merge-list-append`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-list-append/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

### Comment 26
<location> `src/rfcmx/curp.py:363` </location>
<code_context>
    def generate_consonants(self):
        """
        Generate the 3-consonant section

        1. First internal consonant of paterno
        2. First internal consonant of materno (or X if none)
        3. First internal consonant of nombre
        """
        consonants = []

        # First internal consonant of paterno
        paterno = self.paterno_calculo
        consonants.append(self.get_first_consonant(paterno))

        # First internal consonant of materno
        materno = self.materno_calculo
        if materno:
            consonants.append(self.get_first_consonant(materno))
        else:
            consonants.append('X')

        # First internal consonant of nombre
        nombre = self.nombre_iniciales
        consonants.append(self.get_first_consonant(nombre))

        return "".join(consonants)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Merge append into list declaration ([`merge-list-append`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-list-append/))
</issue_to_address>

### Comment 27
<location> `src/rfcmx/rfc.py:203` </location>
<code_context>
    def validate(self, strict=True):
        """
        Retrieves the result of the validations and verifies all of them passed.
        :param strict: If True checksum won't be checked:
        :return: True if the RFC is valid, False if the RFC is invalid.
        """
        return not (False in [result for name, result in self.validators(strict=strict).items()])

</code_context>

<issue_to_address>
**suggestion (code-quality):** Simplify logical expression using De Morgan identities ([`de-morgan`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/de-morgan/))

```suggestion
        return False not in [
            result for name, result in self.validators(strict=strict).items()
        ]
```
</issue_to_address>

### Comment 28
<location> `src/rfcmx/rfc.py:644` </location>
<code_context>
    @property
    def razon_social_calculo(self):
        """Clean the company name by removing excluded words and special characters"""
        # Remove excluded words and convert to uppercase
        words = self.razon_social.upper().strip().split()
        filtered_words = []

        for word in words:
            if word not in self.excluded_words_morales:
                filtered_words.append(word)

        # Join and clean special characters
        cleaned = " ".join(filtered_words)
        result = "".join(
            char if char in self.allowed_chars else unidecode.unidecode(char)
            for char in cleaned
        ).strip().upper()

        return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
- Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
</issue_to_address>

### Comment 29
<location> `src/rfcmx/rfc.py:686` </location>
<code_context>
    def generate_letters(self):
        """
        Generate the 3-letter code from company name

        Rules for Persona Moral:
        1. First letter of the first word
        2. First letter of the second word
        3. First letter of the third word

        If there are fewer than 3 words, special rules apply.
        """
        cleaned_name = self.razon_social_calculo

        if not cleaned_name:
            raise ValueError('Company name is empty after cleaning')

        words = cleaned_name.split()

        if not words:
            raise ValueError('No valid words in company name')

        clave = []

        if len(words) == 1:
            # Single word: Use first letter, second letter, third letter
            word = words[0]
            clave.append(word[0] if len(word) > 0 else 'X')
            clave.append(word[1] if len(word) > 1 else 'X')
            clave.append(word[2] if len(word) > 2 else 'X')
        elif len(words) == 2:
            # Two words: First letter of first word, first vowel of first word, first letter of second word
            clave.append(words[0][0])
            # Find first vowel in first word after the first letter
            vowel_found = False
            for char in words[0][1:]:
                if char in self.vocales:
                    clave.append(char)
                    vowel_found = True
                    break
            if not vowel_found:
                # No vowel in first word, use second letter
                clave.append(words[0][1] if len(words[0]) > 1 else 'X')
            # Add first letter of second word
            clave.append(words[1][0])
        else:
            # Three or more words: First letter of each of the first three words
            clave.append(words[0][0])
            clave.append(words[1][0])
            clave.append(words[2][0])

        result = "".join(clave)

        # Check for cacophonic words and replace last character with 'X'
        if result in self.cacophonic_words:
            result = result[:-1] + 'X'

        return result

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Merge consecutive list appends into a single extend [×4] ([`merge-list-appends-into-extend`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-list-appends-into-extend/))
- Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +240 to +241
if not isinstance(fecha_nacimiento, datetime.date):
raise ValueError('fecha_nacimiento must be a datetime.date object')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Strict type checking for fecha_nacimiento may limit flexibility.

Consider allowing both datetime.date and datetime.datetime types for fecha_nacimiento, or convert datetime.datetime to datetime.date as needed.

Suggested change
if not isinstance(fecha_nacimiento, datetime.date):
raise ValueError('fecha_nacimiento must be a datetime.date object')
if isinstance(fecha_nacimiento, datetime.datetime):
fecha_nacimiento = fecha_nacimiento.date()
elif not isinstance(fecha_nacimiento, datetime.date):
raise ValueError('fecha_nacimiento must be a datetime.date or datetime.datetime object')

Comment on lines +404 to +412
if not self._curp:
letters = self.generate_letters()
date = self.generate_date()
gender = self.sexo
state = self.get_state_code(self.estado)
consonants = self.generate_consonants()
homoclave = self.generate_homoclave()

self._curp = letters + date + gender + state + consonants + homoclave
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): CURP generation caches the result, which may cause issues if attributes are changed after instantiation.

If relevant attributes are updated after the CURP is first accessed, the cached value will become outdated. Please implement cache invalidation when these attributes change.

# 'checksum': self.validate_checksum,
}
return {name: function() for name, function in validations.iteritems()}
return {name: function() for name, function in validations.items()}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Use of dict comprehension with function calls may mask exceptions.

Handle exceptions inside the comprehension to ensure individual validation errors are reported without stopping the entire process.

Suggested change
return {name: function() for name, function in validations.items()}
results = {}
for name, function in validations.items():
try:
results[name] = function()
except Exception as e:
results[name] = e
return results

Comment on lines +601 to +606
if (razon_social.strip() and isinstance(fecha, datetime.date)):
self.razon_social = razon_social
self.fecha = fecha
self._rfc = ''
else:
raise ValueError('Invalid Values: razon_social must be non-empty and fecha must be a date')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Strict type checking for fecha may limit flexibility.

Consider allowing fecha to be either a datetime.date or datetime.datetime, or convert datetime.datetime to datetime.date internally.

Suggested change
if (razon_social.strip() and isinstance(fecha, datetime.date)):
self.razon_social = razon_social
self.fecha = fecha
self._rfc = ''
else:
raise ValueError('Invalid Values: razon_social must be non-empty and fecha must be a date')
if not razon_social.strip():
raise ValueError('Invalid Values: razon_social must be non-empty')
if isinstance(fecha, datetime.datetime):
fecha = fecha.date()
if not isinstance(fecha, datetime.date):
raise ValueError('Invalid Values: fecha must be a date or datetime')
self.razon_social = razon_social
self.fecha = fecha
self._rfc = ''

# Verify it's recognized as Persona Moral
self.assertTrue(RFCValidator(rfc).is_moral())

def test_razon_social_cleaning(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): The test_razon_social_cleaning test does not contain any assertions.

Please add assertions to confirm the cleaning logic works as intended, such as verifying the output matches expected cleaned company names or generated letters.

return self.fecha_nacimiento.strftime('%y%m%d')

def generate_consonants(self):
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

:return: True if the RFC is valid, False if the RFC is invalid.
"""
return not (False in [result for name, result in self.validators(strict=strict).iteritems()])
return not (False in [result for name, result in self.validators(strict=strict).items()])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Simplify logical expression using De Morgan identities (de-morgan)

Suggested change
return not (False in [result for name, result in self.validators(strict=strict).items()])
return False not in [
result for name, result in self.validators(strict=strict).items()
]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 35 to 37
general_regex = re.compile(
r"[A-Z][AEIOUX][A-Z]{2}[0-9]{2}[0-1][0-9][0-3][0-9][M,H][A-Z]{2}[BCDFGHJKLMNPQRSTVWXYZ]{3}[0-9,A-Z][0-9]"
r"[A-Z][AEIOUX][A-Z]{2}[0-9]{2}[0-1][0-9][0-3][0-9][MH][A-Z]{2}[BCDFGHJKLMNPQRSTVWXYZ]{3}[0-9A-Z]{2}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce numeric CURP check digit

The updated CURP regex now ends with [0-9A-Z]{2}, which allows the last character of a CURP to be alphabetic. The final character is the verification digit and must always be numeric. With this pattern CURPValidator.validate() will accept invalid CURPs such as HEGG560427MVZRRL0A whose checksum ends with a letter. The previous expression ([0-9,A-Z][0-9]) correctly limited the final position to a digit. This weakens validation accuracy.

Useful? React with 👍 / 👎.

claude added 18 commits November 8, 2025 06:56
This commit implements all official rules from the SAT (Servicio de
Administración Tributaria) document for RFC generation for legal entities
(Personas Morales).

Major Enhancements:

1. Number Conversion (Arabic and Roman)
   - Converts numeric literals to text (0-20)
   - Supports Roman numerals (I-XX)
   - Example: "El 12, S.A." → "El DOCE" → "DOC"
   - Example: "Luis XIV S.A." → "Luis CATORCE" → "LCA"

2. Initial Handling (F.A.Z. Pattern)
   - Detects and expands dot-separated initials
   - Each initial letter counts as a separate word
   - Protected from excluded word filtering
   - Example: "F.A.Z., S.A." → "FAZ" (not "FXZ")

3. Special Character Handling
   - Removes @ % # ! $ " - / + ( ) and other special chars
   - Preserves only letters, numbers, spaces during processing
   - Example: "LA S@NDIA S.A." → "SND"

4. Ñ to X Substitution
   - Replaces Ñ with X as per SAT rules
   - Example: "YÑIGO, S.A." → "YXI"

5. Enhanced Excluded Words List
   - Complete list of legal entity designations
   - Handles variations: S.A., SA, S. A., etc.
   - Includes: COMPAÑÍA, SOCIEDAD, COOPERATIVA, etc.
   - Two-pass filtering (before and after conversion)

6. Consonant Compound Handling
   - CH → C at word beginning
   - LL → L at word beginning
   - As specified in SAT documentation

7. Corrected Letter Extraction Algorithm
   According to official SAT specification:
   - 1 word:  First 3 letters (or pad with X)
   - 2 words: 1st letter of word 1 + first 2 letters of word 2
   - 3+ words: 1st letter of each of first 3 words

Testing:
   - Added comprehensive test suite for SAT special cases
   - Tests for number conversion (Arabic and Roman)
   - Tests for initial handling
   - Tests for special characters and Ñ substitution
   - All 32 tests passing

Examples from SAT Documentation (all working):
   - "Sonora Industrial Azucarera, S. de R.L." → SIA
   - "F.A.Z., S.A." → FAZ
   - "El 12, S.A." → DOC
   - "LA S@NDIA S.A. DE C.V." → SND
   - "YÑIGO, S.A." → YXI
   - "Tienda 5 S.A." → TCI
   - "Luis XIV S.A." → LCA

This implementation now fully complies with the official SAT algorithm
as documented in IFAI-0610100135506-065.
Tests Added:
- Real public RFCs from known Mexican companies (PEMEX, CFE, BIMBO)
- Invalid date handling and validation
- Edge cases: empty company names, mixed case, multiple special chars
- Multiple Ñ character handling
- Numbers outside conversion table
- Comprehensive test for all SAT rules

Bug Fixes:
- Fix S.A.B. (Sociedad Anónima Bursátil) being parsed as initials
- Added S.A.B. and variants to excluded words list
- Sort excluded words by length (longest first) to prevent partial matches
- Fixed test expectations for cacophonic words (only apply to Persona Física)

Test Results:
- All 41 tests passing
- Comprehensive coverage of RFC Persona Física, Persona Moral, and CURP
- Validates all implemented SAT official rules
…rithm

CURP Implementation:
- Added complete official list of 70+ inconvenient words (Anexo 2)
- Implemented check digit (position 18) calculation algorithm
- Clarified that homoclave (position 17) is assigned by RENAPO, not calculable
- Fixed cacophonic word replacement: second letter → X (not last letter)
- Added check digit validation method to CURPValidator

New Features:
- CURPGenerator.calculate_check_digit(): Official RENAPO algorithm
- CURPValidator.validate_check_digit(): Validates position 18
- Differentiator varies by birth year (0 for <2000, A for >=2000)
- Complete list of 70+ inconvenient words from official Anexo 2

Documentation:
- Created CURP_ESPECIFICACIONES_OFICIALES.md with complete specs
- Documented all official rules from DOF 18/10/2021
- Explained algorithm with examples and formulas
- Listed all 32 state codes
- Explained limitations of automatic generators

Tests Added (6 new tests):
- test_cacophonic_words_replacement: Tests official word list
- test_check_digit_calculation: Tests algorithm consistency
- test_check_digit_validation: Tests CURPValidator integration
- test_complete_curp_with_check_digit: Tests generated CURPs
- test_differentiator_by_birth_year: Tests year-based logic
- test_expanded_cacophonic_list: Tests newly added words

Test Results:
- All 47 tests passing (24 CURP + 23 RFC)
- Comprehensive coverage of all official CURP rules
- Validates check digit algorithm correctness

Official Sources:
- Instructivo Normativo CURP (DOF 18/10/2021)
- RENAPO official specifications
- Anexo 2: Palabras Inconvenientes
Key Improvements:
- Clarified that position 17 (differentiator) is assigned by RENAPO for homonyms
- Demonstrated that position 18 (check digit) IS calculable and validatable
- Added tests showing how different differentiators affect check digits

New Tests (2 tests):
- test_check_digit_with_different_differentiators: Shows how RENAPO assigns
  different differentiators (0,1,2... or A,B,C...) for people with same
  first 16 characters, and how each generates a unique check digit
- test_homonymous_curps_validation: Demonstrates complete homonymy workflow
  with examples for both pre-2000 (numeric) and post-2000 (alphanumeric)

Documentation Updates:
- Added "Cómo Funciona" section explaining homonymy with examples
- Added "Validación del Dígito Verificador" section with Python code example
- Reorganized "Capacidades y Limitaciones" to clearly separate what the
  generator CAN do (validate check digits) vs what it CANNOT do (assign
  official differentiators)
- Clarified use cases: detecting typos, verifying database integrity

Key Insight:
Although we cannot determine the exact differentiator RENAPO would assign,
we CAN validate ANY complete CURP by verifying its check digit is correct
for the given first 17 characters. This makes the validator useful for
real-world applications.

Examples in tests:
- PEGJ900512HJCRRS0 → check digit 4
- PEGJ900512HJCRRS1 → check digit 2 (homonym with different differentiator)
- PEGJ900512HJCRRS2 → check digit 0 (another homonym)

Test Results:
- All 49 tests passing (26 CURP + 23 RFC)
- New tests verify algorithm correctness with multiple differentiators
MAJOR IMPROVEMENTS:

1. Modern Helper API (src/rfcmx/helpers.py)
   - Simple, intuitive function-based interface
   - No need to understand class constructors
   - Type hints for better IDE support
   - String date support ('YYYY-MM-DD')
   - Comprehensive docstrings with examples

   New Functions:
   - generate_rfc_persona_fisica() - Simple RFC generation for individuals
   - generate_rfc_persona_moral() - Simple RFC generation for companies
   - validate_rfc() - Quick RFC validation
   - detect_rfc_type() - Detect fisica/moral/generico
   - generate_curp() - Simple CURP generation with custom differentiator support
   - validate_curp() - Quick CURP validation with check digit verification
   - get_curp_info() - Extract information from existing CURP
   - is_valid_rfc() / is_valid_curp() - Quick validation aliases

2. CURP Differentiator Support
   - Added 'differentiator' parameter to generate_curp()
   - Allows generating multiple valid CURPs for homonyms
   - Automatically calculates correct check digit for any differentiator
   - Supports both numeric (0-9) and alphanumeric (A-Z) differentiators

   Example:
   generate_curp(..., differentiator='0')  # First person
   generate_curp(..., differentiator='1')  # Homonym #2
   generate_curp(..., differentiator='A')  # Post-2000 person

3. Comprehensive README.md
   - Modern Markdown format (replaces old RST)
   - Quick start examples with copy-paste code
   - Real-world use cases
   - Complete API reference
   - Beautiful formatting with emojis and badges
   - Examples for all common scenarios
   - Homonymy explanation with examples

4. Improved Package Exports
   - Added all helper functions to __init__.py
   - Clear separation: helpers (recommended) vs classes (advanced)
   - Backward compatible - old API still works
   - Version bump to 0.3.0

5. Comprehensive Tests (25 new tests)
   - test_helpers.py with 3 test classes:
     * TestRFCHelpers - 11 tests for RFC functions
     * TestCURPHelpers - 12 tests for CURP functions
     * TestIntegrationScenarios - 3 real-world workflow tests
   - Tests for string dates, datetime objects, custom differentiators
   - Tests for homonymous CURPs validation
   - Tests for invalid input handling
   - Tests for information extraction

BENEFITS:

For Users:
- Much easier to use - no need to understand class hierarchy
- Better IDE autocomplete with type hints
- Clear, self-documenting API
- Flexible date input (string or datetime object)
- Easy to validate and extract info from existing codes

For Developers:
- Modern Python patterns
- Better code organization
- Comprehensive test coverage (74 tests total)
- Easy to extend with new features

BACKWARD COMPATIBILITY:

All existing code continues to work. The old class-based API is still
available for advanced use cases. New users should use the helper functions.

Example Migration:
OLD: RFCGeneratorFisicas(paterno=..., materno=..., nombre=..., fecha=...)
NEW: generate_rfc_persona_fisica(nombre=..., apellido_paterno=..., apellido_materno=..., fecha_nacimiento='YYYY-MM-DD')

Test Results:
- All 74 tests passing (49 original + 25 new)
- 100% backward compatibility maintained
…idators and catalogs

This is a major architectural change transforming the project into a comprehensive
monorepo for all Mexican official data validators and catalogs.

## 🏗️ New Monorepo Structure

### Validators Implemented
- ✅ RFC (Registro Federal de Contribuyentes) - already existed
- ✅ CURP (Clave Única de Registro de Población) - already existed
- ✅ CLABE (Clave Bancaria Estandarizada) - NEW
  - 18-digit bank account validator
  - Modulo 10 check digit algorithm
  - Bank/branch/account extraction
- ✅ NSS (Número de Seguridad Social IMSS) - NEW
  - 11-digit validator
  - Modified Luhn algorithm
  - Subdelegation/year/serial extraction

### Catalogs Implemented
- ✅ Banxico Banks: 100+ Mexican banks with SPEI participation
- ✅ INEGI States: 32 states with CURP codes, INEGI codes, abbreviations

### Shared Data (JSON)
- Created packages/shared-data/ as single source of truth
- States catalog (inegi/states.json)
- Banks catalog (banxico/banks.json)
- Cacophonic words (misc/cacophonic_words.json)
- Data shared between Python and TypeScript implementations

### Data Fetching Scripts
- fetch_sat_catalogs.py: Download 26 SAT CFDI 4.0 catalogs
- fetch_sepomex_data.py: Download 150k postal codes, generate SQLite DB

### Package Structure
```
packages/
├── python/
│   ├── catalogmx/
│   │   ├── validators/    # RFC, CURP, CLABE, NSS
│   │   └── catalogs/      # SAT, Banxico, INEGI, SEPOMEX, IFT
├── typescript/            # Future TypeScript implementation
└── shared-data/           # JSON catalogs (single source of truth)
```

## 🎯 Implementation Roadmap

### Phase 1: MVP - Core Validators ✅ COMPLETE
- RFC, CURP, CLABE, NSS validators
- Bank and state catalogs
- Monorepo structure
- Download scripts framework

### Phase 2-5: Planned
- SAT essential catalogs (regimen fiscal, uso CFDI, etc.)
- INEGI complete (municipalities, localities, AGEBs)
- SAT extended (52k products, 3k units, nomina)
- SEPOMEX postal codes (150k records)
- IFT telephony catalogs
- TypeScript implementation with parity tests

## 📚 Official Data Sources
- SAT: CFDI 4.0 Anexo 20 catalogs
- Banxico: SPEI participant banks
- INEGI: Marco Geoestadístico
- SEPOMEX: National postal code catalog
- IFT: Phone numbering plans

## 🚀 Vision
Create the definitive library for Mexican data validation and official catalogs,
available for both Python and TypeScript, with all data from authoritative
government sources.

See README_CATALOGMX.md for complete documentation.
This commit adds extensive planning for additional official Mexican catalogs
based on government requirements and common use cases.

## New Catalogs Added to Roadmap

### Phase 2: SAT Essentials (Enhanced)
- **Comercio Exterior (Foreign Trade)**
  - c_Estado for USA (50 states + DC + territories)
  - c_Estado for Canada (13 provinces)
  - Required for CFDI with foreign trade supplement
  - Follows ISO 3166-2 standard

### Phase 4: SAT Extended (Enhanced)
- **Carta Porte 3.0 - Transportation Infrastructure**
  - c_Estaciones - Transport stations (bus, train, maritime, air)
  - c_CodigoTransporteAereo - ~76 Mexican airports (IATA/ICAO)
  - c_NumAutorizacionNaviero - ~100 seaports and maritime authorization
  - c_Carreteras - SCT federal highways catalog
  - c_TipoPermiso - SCT transport permits
  - c_ConfigAutotransporte - Vehicle configurations
  - c_TipoEmbalaje - Packaging types
  - c_MaterialPeligroso - Hazardous materials (NOM-002-SCT)

- **Payroll Catalogs (Detailed)**
  - c_TipoContrato - Contract types
  - c_TipoJornada - Work schedule types
  - c_TipoPercepcion - 50+ income types
  - c_TipoDeduccion - 20+ deduction types
  - c_TipoRegimen - Payroll regime types
  - c_PeriodicidadPago - Payment periodicity

### Phase 5: Complementos (Enhanced)
- **Banxico SIE API - Historical Financial Data**
  - TIIE (Tasa de Interés Interbancaria de Equilibrio)
    - 28, 91, 182 days (Series SF60648, SF60649, SF111916)
  - CETES (Certificados de la Tesorería)
    - 28, 91, 182, 364 days
  - Tasa Objetivo - Banxico target rate (SF61745)
  - Historical data from 1978-present via REST API
  - Exchange rates (FIX)
  - Bank holidays

## Documentation

### CATALOGOS_ADICIONALES.md
Comprehensive technical documentation including:
- **Comercio Exterior**: Why USA/Canada states are needed, validation rules, examples
- **Carta Porte 3.0**: All 8 transportation catalogs with details
  - Airports: Full list with IATA/ICAO codes (MEX, GDL, MTY, CUN, etc.)
  - Seaports: Major ports (Veracruz, Altamira, Manzanillo, etc.)
  - Highways: Federal highway classification and catalog
  - Vehicle configurations, permits, hazmat classification
- **Banxico SIE API**: Complete guide to interest rates API
  - Authentication with token
  - All API endpoints
  - Series codes for TIIE, CETES, target rate
  - Example implementations
  - Use cases (financial apps, economic analysis, compliance)

### README_CATALOGMX.md Updates
- Added all new catalogs to Phase 2, 4, and 5
- Updated official sources with new government URLs
- Enhanced roadmap with detailed checklist items

## Official Sources Referenced
- SAT Carta Porte 3.0 Catalogs (Excel)
- SAT Comercio Exterior Catalogs
- Banxico SIE REST API Documentation
- SCT Federal Highways Information
- Guardia Nacional Highway Catalog

## Priority Assessment
**High Priority:**
1. Comercio Exterior (USA/Canada states) - SAT requirement
2. Airports (IATA/ICAO) - Very common in Carta Porte

**Medium Priority:**
3. Seaports - International trade
4. TIIE/CETES (Banxico SIE) - Financial sector
5. Transport stations

**Low Priority:**
6. Federal highways - Large catalog, specific use
7. Vehicle configurations - Transport-specific
8. Hazardous materials - Niche

All documentation includes complete technical specifications, data structures,
code examples, and links to official government sources.
This commit adds extensive documentation for the Mexican holidays system,
revealing critical distinctions between different types of business days.

## Mexican Holidays Calendar System - 3 Types

### Key Discovery: Days that are hábiles but NOT bancarios

There are THREE different official holiday calendars in Mexico that do NOT coincide:

1. **Labor Holidays** (Ley Federal del Trabajo - LFT)
   - 7 mandatory rest days per year
   - Source: PROFEDET + DOF
   - Example: January 1, First Monday of February, May 1, etc.

2. **Banking Holidays** (CNBV/Banxico)
   - 10 banking holidays per year
   - Includes all labor holidays PLUS:
     - Jueves Santo (Thursday before Easter)
     - Viernes Santo (Friday before Easter)
     - Día del Empleado Bancario (December 12)
   - Published annually in DOF (December prior year)

3. **Judicial Holidays** (Poder Judicial - SCJN)
   - ALL Saturdays and Sundays
   - Additional days: May 5 (Batalla de Puebla), October 12, etc.
   - Court vacation periods (Semana Santa, summer, end of year)

### Critical Distinction

**Viernes Santo (Good Friday):**
- ✅ Is a BUSINESS DAY for most companies (hábil laboral)
- ❌ Is NOT a BANKING DAY (inhábil bancario)
- ❌ Is NOT a JUDICIAL DAY (inhábil judicial)

**Día del Empleado Bancario (December 12):**
- ✅ Is a BUSINESS DAY for most companies
- ❌ Banks are CLOSED
- ✅ Courts are OPEN

This distinction is critical for:
- Payment due dates
- Payroll processing
- Legal proceedings
- Financial compliance

## Proposed Implementation

### Data Structure (JSON)
- Separate catalogs for: labor, banking, judicial
- Historical data: 2000-2024 (25 years)
- Future projections: 2025-2034 (10 years)
- Metadata: source, authority, DOF publication date

### API Design
```python
from catalogmx.calendars import MexicanHolidays

cal = MexicanHolidays()

# Different results for same day!
fecha = date(2025, 4, 18)  # Viernes Santo
cal.is_business_day(fecha)  # True (companies work)
cal.is_banking_day(fecha)   # False (banks closed)
cal.is_judicial_day(fecha)  # False (courts closed)

# Calculate banking business days
fecha_factura = date(2025, 4, 10)
fecha_vencimiento = cal.add_business_days(fecha_factura, 30, type='banking')
# Skips Jueves Santo, Viernes Santo, weekends

# Get all banking holidays for year
holidays_2025 = cal.get_holidays(2025, type='banking')
# Returns 10 holidays with metadata
```

### Use Cases
1. **Payment deadlines** - Must use banking days, not business days
2. **Payroll** - Check if payment date is banking day
3. **Legal compliance** - Different calendars for labor vs banking law
4. **Court proceedings** - Judicial calendar for process deadlines

## Complete Documentation Added

### CATALOGOS_ADICIONALES.md Enhanced

Added 35+ pages covering:

**Holidays System** (NEW - 15 pages):
- Complete explanation of 3 types with examples
- 2025 calendars for all types
- Comparison table showing differences
- Proposed JSON structure
- Complete Python API design
- 4 detailed use cases
- Historical/future data recommendations
- Official sources (CNBV, PROFEDET, SCJN, DOF)

**Previously documented**:
- Comercio Exterior (USA/Canada states)
- Carta Porte 3.0 (8 transportation catalogs)
- Banxico SIE API (interest rates)

### README_CATALOGMX.md Updated

Phase 5 now includes:
- **Mexican Holidays Calendar (3 types)** marked as high priority ⭐⭐⭐
- Historical: 2000-2024
- Future: 2025-2034
- Business days calculator
- Banking days calculator
- Holiday type differentiation API

## Official Sources Referenced

- **CNBV**: https://www.gob.mx/cnbv/acciones-y-programas/calendario-cnbv
- **PROFEDET**: https://www.gob.mx/profedet/articulos/dias-de-descanso-obligatorio
- **SCJN**: https://www.scjn.gob.mx/
- **DOF**: https://www.dof.gob.mx/

## Priority Assessment

**HIGH PRIORITY** for Phase 5:
- Banking holidays (most critical for financial applications)
- Labor holidays (payroll and compliance)
- Business days calculator

**MEDIUM PRIORITY**:
- Judicial holidays (legal tech applications)
- Extended historical data (pre-2000)

This is essential infrastructure for any Mexican financial/payroll/legal system.
The distinction between business days and banking days is a common source
of bugs in payment systems and must be handled correctly.
Expanded Comercio Exterior section with 8 essential catalogs:
- c_INCOTERM: 11 Incoterms 2020 (multimodal + maritime)
- c_ClavePedimento: ~40 customs document keys (A1, V1, C1, etc.)
- c_FraccionArancelaria: ~20,000 TIGIE tariff classifications (NICO)
- c_Moneda: ~180 ISO 4217 currencies with USD conversion rules
- c_Pais: ~250 ISO 3166-1 countries
- c_UnidadAduana: ~30 customs measurement units
- c_RegistroIdentTribReceptor: Foreign tax ID types
- c_MotivoTraslado: Transfer motives for CFDI type T

Key findings documented:
- Comercio Exterior 2.0 (vigente desde enero 18, 2024)
- Eliminated fields: TipoOperacion, Subdivision
- Detailed TIGIE structure explanation (8-digit + 2 NICO)
- Complete validation rules and use cases
- JSON structure and Python API proposals

Added 531 lines of technical documentation to CATALOGOS_ADICIONALES.md
Updated README phase 2 roadmap with all CE catalogs
**Implemented 8 SAT catalogs for Foreign Trade Complement:**

✅ c_INCOTERM (11 Incoterms 2020):
- 7 multimodal: EXW, FCA, CPT, CIP, DAP, DPU, DDP
- 4 maritime: FAS, FOB, CFR, CIF
- Transport mode validation
- Seller responsibilities tracking

✅ c_ClavePedimento (42 customs document keys):
- Export keys (A1, A3, A4, J1, etc.)
- Import keys (V1-V7)
- IMMEX, transit, and special regimes

✅ c_UnidadAduana (32 customs measurement units):
- Weight, volume, length, area units
- Container units (C20, C40)

✅ c_MotivoTraslado (6 transfer motives):
- Validation for CFDI type "T"
- Propietario node requirements

✅ c_RegistroIdentTribReceptor (15 foreign tax ID types):
- US Tax ID, Canadian BN, EU VAT
- Regex validation for each type
- Format validation logic

✅ c_Moneda (150 ISO 4217 currencies):
- Complete currency catalog
- USD conversion validation
- Decimal precision handling

✅ c_Pais (249 ISO 3166-1 countries):
- Alpha-3 and Alpha-2 codes
- Subdivision requirements (USA/CAN)

✅ c_Estado (63 US states + Canadian provinces):
- 50 US states + DC + 5 territories
- 13 Canadian provinces/territories
- ISO 3166-2 codes

**ComercioExteriorValidator:**
- Complete CFDI validation logic
- Integrates all 8 catalogs
- Validates INCOTERM, pedimento, currencies
- Validates foreign addresses (USA/CAN states)
- Validates merchandise data
- Returns detailed error messages

**Data structure:**
- 8 JSON catalogs in shared-data/sat/comercio_exterior/
- 9 Python validator classes + 1 integrated validator
- Lazy-loaded data for performance
- Indexed lookups (O(1) access)

**Updated README:**
- Marked Comercio Exterior 2.0 as COMPLETE ✅
- Added comprehensive usage examples
- Documented validation workflow
- 10 complete catalogs ready for production

**Pending:**
- c_FraccionArancelaria (~20,000 TIGIE codes) - requires official TIGIE data download

Total: 500+ catalog entries implemented
Lines of code: ~2,500 (Python + JSON)
**Catalog update management infrastructure:**

✅ CATALOG_UPDATES.md (1,200+ lines):
- Complete documentation of all 35 catalogs
- Update frequencies and priorities
- Monthly/Quarterly/Annual verification schedule
- Official sources for each catalog
- Step-by-step update processes
- Quality metrics and SLAs

✅ .catalog-versions.json:
- Version tracking for all catalogs
- Checksum/hash tracking for change detection
- Next check dates per catalog
- Implementation status tracking
- Statistics dashboard data
- 11 catalogs implemented, 24 pending

✅ scripts/check_catalog_updates.py:
- Automated update verification
- Multi-source support (SAT, Banxico, INEGI, SEPOMEX, IFT, ISO)
- Checksum-based change detection
- Auto-download capability
- Status reporting
- Email/Slack notifications (planned)

✅ scripts/download_tigie.py:
- TIGIE/NICO downloader (20,000 tariff codes)
- Excel parser for TIGIE data
- SQLite database builder
- Full-text search index creation
- Statistics generator
- JSON export for samples

✅ CHANGELOG_CATALOGS.md:
- Standardized changelog format
- All catalog changes tracking
- Impact assessment framework
- Update verification commands
- Critical update guidelines

**Update frequencies documented:**

🔴 **HIGH PRIORITY - Monthly:**
- SAT CFDI 4.0 (12 catalogs)
- Banxico banks (102 institutions)
- SEPOMEX postal codes (150k)

🟠 **MEDIUM - Quarterly:**
- TIGIE/NICO tariff classifications (20k)
- Banxico SIE new series

🟡 **LOW - Semiannual/Annual:**
- Carta Porte 3.0 (8 catalogs)
- INEGI geographic data
- ISO standards

⚪ **STATIC - Rarely:**
- INCOTERMS (every 10 years, next: 2030)
- US/Canada states (geopolitical changes)

**Automation ready:**
- Monthly cron job support
- CI/CD integration ready
- Version control for all catalogs
- Diff reports for changes
- Rollback capability

**Next steps:**
1. Implement c_FraccionArancelaria with SQLite
2. Add SAT CFDI 4.0 remaining catalogs
3. Configure automated monthly checks
4. Add notification system (email/Slack)
5. Create dashboard for catalog status

Total documentation: ~2,500 lines
Scripts: ~800 lines of Python
Catalogs tracked: 35 (31% implemented)
… Nómina 1.2, SEPOMEX, and INEGI

This commit implements 40+ official Mexican government catalogs across multiple SAT
supplements and government agencies, significantly expanding the catalogmx library.

## SAT CFDI 4.0 Core Catalogs (9 catalogs)
- c_RegimenFiscal: 26 tax regimes with persona física/moral flags
- c_UsoCFDI: 25 CFDI usage codes (G01-G03, I01-I08, D01-D10, CP01, CN01)
- c_FormaPago: 18 payment methods (efectivo, transferencia, tarjeta, etc.)
- c_MetodoPago: 2 payment types (PUE, PPD)
- c_TipoComprobante: 5 receipt types (I, E, T, N, P)
- c_Impuesto: 4 tax types with retention/transfer validation
- c_Exportacion: 4 export keys
- c_TipoRelacion: 9 CFDI relationship types
- c_ObjetoImp: 8 tax object codes (updated Dec 2024)

## SAT Carta Porte 3.0 Transportation (7 catalogs)
- c_CodigoTransporteAereo: 76 Mexican airports with IATA/ICAO codes (sample 20)
- c_NumAutorizacionNaviero: 100 seaports across 4 coasts (sample 25)
- c_Carreteras: 200 SCT federal highways (sample 20)
- c_TipoPermiso: 12 SCT transport permit types
- c_ConfigAutotransporte: 15 vehicle configurations (C2, C3, T2S1, T3S2, etc.)
- c_TipoEmbalaje: 30 UN packaging types (1A, 4G, 5H, etc.)
- c_MaterialPeligroso: 3,000 UN hazardous materials (sample 50)

## SAT Nómina 1.2 Payroll (7 catalogs)
- c_TipoNomina: 2 types (ordinaria, extraordinaria)
- c_TipoContrato: 10 labor contract types
- c_TipoJornada: 8 work shifts (diurna, nocturna, mixta, etc.)
- c_TipoRegimen: 13 regime types (sueldos, asimilados, etc.)
- c_PeriodicidadPago: 10 payment frequencies
- c_RiesgoPuesto: 5 IMSS risk levels (Class I-V) with premium ranges
- c_Banco: 50 banks for payroll deposits

## Geographic Catalogs
- SEPOMEX: 50 postal code samples (full ~150k pending SQLite)
- INEGI Municipios: 50 municipality samples (full 2,469 pending)

## Technical Implementation
- All catalogs use lazy loading for optimal memory usage
- JSON data in packages/shared-data/ for easy maintenance
- Python classes with type hints and comprehensive validation methods
- Modular architecture: cfdi_4, carta_porte, nomina, sepomex, inegi
- Updated README with comprehensive usage examples for all catalogs

## Files Added (57 files)
- 30 Python catalog classes
- 23 JSON data files
- 4 module __init__.py updates

Total catalogs implemented: 40+
Total catalog records: ~500 (samples for large catalogs pending SQLite)

Resolves the complete implementation of Phases 2-4 of the catalogmx roadmap.
…and SEPOMEX postal codes

This commit provides comprehensive coverage of Mexican geographic data across all 32 states,
significantly expanding the catalogmx library's geographic catalog coverage.

## INEGI Municipalities Catalog
- Complete catalog with 209 key municipalities covering all 32 states
- Includes all state capitals and major cities (100k+ population)
- Organized by INEGI official codes (cve_entidad, cve_municipio, cve_completa)
- Sample cities: Guadalajara, Monterrey, Puebla, Querétaro, Cancún, Tijuana, León, etc.
- Coverage: All 32 states with principal municipalities per state

## SEPOMEX Postal Codes Catalog
- Comprehensive catalog with 273 postal codes covering all 32 states
- Includes state capitals, major cities, and multiple zones per city
- Ciudad de México: 25+ postal codes covering Centro, Roma, Condesa, Polanco, etc.
- Coverage for all major metropolitan areas:
  - Guadalajara & Zapopan: 15+ codes
  - Monterrey & San Pedro: 10+ codes
  - Puebla, Querétaro, Tijuana, Cancún, Mérida: Multiple codes each
- Organized by INEGI geographic codes for cross-catalog integration

## Download Scripts for Full Datasets
- scripts/download_inegi_complete.py: Download all 2,469 municipalities from INEGI
- scripts/download_sepomex_complete.py: Download all ~150,000 postal codes from SEPOMEX
- Both scripts include fallback generators and official source URLs

## Technical Improvements
- Updated Python catalog classes to use comprehensive datasets
- Maintained lazy loading architecture for memory efficiency
- Added comprehensive metadata documenting coverage and sources
- Ready for production use with option to expand to full datasets

## Files Added/Modified
- municipios_completo.json: 209 municipalities (all 32 states)
- codigos_postales_completo.json: 273 postal codes (all 32 states)
- 2 download scripts for fetching complete official datasets
- Updated 2 Python catalog classes to reference complete files

Total geographic coverage: All 32 Mexican states with key municipalities and postal codes
Production-ready with clear path to complete ~152k total records via download scripts
…logmx format

This commit adds complete infrastructure for processing official INEGI and SEPOMEX
datasets into catalogmx JSON format.

## New Scripts

**csv_to_catalogmx.py** - SEPOMEX Converter
- Converts official SEPOMEX CSV/Excel to catalogmx JSON
- Supports multiple formats: official SEPOMEX, IcaliaLabs, community formats
- Auto-detects delimiters and column names
- Processes ~150,000 postal codes efficiently
- Shows statistics and distribution by state

**process_inegi_data.py** - INEGI Converter
- Converts official INEGI data to catalogmx JSON
- Supports: TXT (tab-separated), Excel, generic JSON
- Auto-detects column names and format
- Processes 2,478 municipalities
- Shows complete distribution by state

**DESCARGA_RAPIDA.md** - Quick Start Guide
- Step-by-step instructions for obtaining complete catalogs
- 4 methods: automatic download, official sources, SQLite, community repos
- URLs to all official sources
- Expected file formats and examples
- Troubleshooting tips for connectivity issues

## Usage

Download official SEPOMEX (~150,000 postal codes):
```bash
# From official source or GitHub
wget <official-url>

# Convert to catalogmx format
python scripts/csv_to_catalogmx.py sepomex_db.csv
```

Download official INEGI (2,478 municipalities):
```bash
# From INEGI or GitHub
wget <official-url>

# Convert to catalogmx format
python scripts/process_inegi_data.py municipios.txt
```

## Current Catalogs Status

The repository includes development-ready catalogs:
- SEPOMEX: 273 postal codes (all 32 states + major cities)
- INEGI: 209 municipalities (all 32 states + state capitals)

For production with complete datasets, use these scripts to convert official sources.

## Official Sources

- SEPOMEX: https://www.correosdemexico.gob.mx/SSLServicios/ConsultaCP/CodigoPostal_Exportar.aspx
- INEGI: https://www.inegi.org.mx/app/ageeml/
- Community: https://github.com/IcaliaLabs/sepomex

These scripts make it easy to keep catalogs updated with official sources.
- README.md: Complete rewrite showcasing all 40+ catalogs and features
- AGENTS.md: Detailed instructions for AI agents working on the project
- CLAUDE.md: Architecture and technical details including WebAssembly considerations

Features documented:
- 4 validators (RFC, CURP, CLABE, NSS)
- 9 CFDI 4.0 catalogs
- 8 Comercio Exterior 2.0 catalogs
- 7 Carta Porte 3.0 catalogs
- 7 Nómina 1.2 catalogs
- SEPOMEX (273 postal codes)
- INEGI (209 municipalities)

Documentation includes:
- Architecture principles (lazy loading, type safety)
- Implementation patterns and templates
- Testing guidelines and best practices
- Performance optimization strategies
- WebAssembly compilation paths
- Deployment architectures
- Future enhancement roadmap

All references to previous library name removed as requested.
Major upgrade to require Python 3.10+ and use modern Python features throughout
the entire codebase. This improves type safety, code readability, and leverages
the latest Python language features.

## New Build System
- Add modern pyproject.toml (replaces setup.py)
- Python 3.10+ required (was 3.8+)
- Clean classifiers for Python 3.10, 3.11, 3.12, 3.13
- Configured tools: black, ruff, mypy, pytest

## Type Hints Modernization (PEP 604 & PEP 585)
All files updated to use Python 3.10+ built-in type syntax:

### Catalogs (40+ files updated):
- `Optional[List[Dict]]` → `list[dict] | None`
- `Optional[Dict[str, Dict]]` → `dict[str, dict] | None`
- `List[Dict]` → `list[dict]`
- `Optional[Dict]` → `dict | None`
- Added `-> None` return types to all `_load_data()` methods
- Added comprehensive docstrings to all methods

Updated catalogs:
- CFDI 4.0: 9 catalogs (regimen_fiscal, uso_cfdi, forma_pago, etc.)
- Comercio Exterior: 8 catalogs (incoterms, monedas, paises, etc.)
- Carta Porte: 7 catalogs (aeropuertos, puertos, carreteras, etc.)
- Nómina: 7 catalogs (tipo_nomina, tipo_contrato, etc.)
- INEGI: municipios, states
- SEPOMEX: codigos_postales
- Banxico: banks

### Validators (4 files updated):
- Removed Python 2 compatibility (`six`, `string_types`)
- Removed `(object)` from class definitions
- Removed `# -*- coding: utf-8 -*-` headers
- Updated shebang to `#!/usr/bin/env python3`
- Removed `u''` Unicode string prefixes
- Added comprehensive type hints using `|` operator:
  - `rfc.py`: All RFC validator and generator methods
  - `curp.py`: CURP validator with type-safe methods
  - `clabe.py`: CLABE validator with modern types
  - `nss.py`: NSS validator with full type hints

## Documentation Updates
- README.md: Updated badges and architecture section
- AGENTS.md: Updated templates and examples with modern syntax
- CLAUDE.md: Added Python 3.10+ type hints section with PEP 604/585 examples
- All docs now reflect Python 3.10+ requirement

## Benefits
- ✅ Cleaner, more readable code
- ✅ No `typing` module imports needed
- ✅ Better IDE autocomplete and static analysis
- ✅ Follows modern Python best practices
- ✅ Removed all Python 2 legacy code
- ✅ Future-proof for Python 3.11, 3.12, 3.13

All functionality preserved - this is purely a syntax modernization upgrade.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants