Skip to content

Latest commit

 

History

History
154 lines (108 loc) · 4.57 KB

File metadata and controls

154 lines (108 loc) · 4.57 KB

Bidirectional Word-CSV Converter

A Python-based tool that provides bidirectional conversion between Word documents containing product specifications and CSV files.

Overview

This system allows you to:

  1. Parse Word documents with a specific hierarchical structure of product information and export the data to CSV files
  2. Convert CSV files back to Word documents that maintain the original structure and formatting

The system consists of:

  1. Word Parser Module (word_parser.py): Parses Word documents to CSV
  2. CSV to Word Converter (csv_to_word.py): Converts CSV files back to Word format
  3. GUI Applications for both conversion directions
  4. Main Script (main.py): A wrapper script to run any of the modules

Installation

Prerequisites

  • Python 3.7 or higher
  • Required Python packages:
    • python-docx - For working with Word documents
    • tkinter - For the GUI applications (usually comes with Python)

Install Dependencies

pip install python-docx

Usage

Main Script Options

The main script allows you to choose the conversion direction and interface:

python main.py [--mode {word2csv,csv2word}] [--gui | --cli]

Arguments:

  • --mode: Select conversion direction (word2csv or csv2word). Default is word2csv.
  • --gui: Launch the GUI application (default)
  • --cli: Run the command-line interface

Option 1: Word to CSV Conversion

GUI Application

python main.py --mode word2csv
# OR
python main.py  # Default is word2csv

This will open the converter application where you can:

  • Add one or more Word files for processing
  • Select an output directory for the CSV files
  • Start the conversion process
  • View the conversion log

Command-Line Parser

python main.py --mode word2csv --cli
# OR
python word_parser.py -i input.docx -o output.csv [--debug]

Command-line options:

  • -i, --input: Input Word document file path
  • -o, --output: Output CSV file path
  • --debug: Print debug information during parsing

Option 2: CSV to Word Conversion

GUI Application

python main.py --mode csv2word

This will open the converter application where you can:

  • Add one or more CSV files for processing
  • Select an output directory for the Word files
  • Start the conversion process
  • View the conversion log

Command-Line Converter

python main.py --mode csv2word --cli
# OR
python csv_to_word.py -i input.csv -o output.docx

Command-line options:

  • -i, --input: Input CSV file path
  • -o, --output: Output Word document file path (optional)

Data Structure

The converter works with the following hierarchical data structure:

  1. Group_title: Top-level category in UPPERCASE (e.g., "MECHANISCHE SLOTEN")
  2. Subgroup_title: Second level with number format "00.00.00" (e.g., "00.00.00 Mechanische éénpuntsloten")
  3. Item_title_NL: Specific product category with number and brand (e.g., "00.00.00 Standaard klavierslot... |FH| st Litto")
  4. Description_NL: Detailed text description of the product category
  5. LongDescription: Specific product description in purple text with bullet points
  6. Item_Number: Reference code (e.g., "A13E1")
  7. Brand: Brand name extracted from Item_title_NL (e.g., "Litto")
  8. Measuring_State: Special format text (e.g., "|FH| st")

Document Format

Word Document Structure

For proper parsing, the Word document should follow this structure:

  • Group_title: All capital letters, blue color
  • Subgroup_title: Starts with "00.00.00", blue color
  • Item_title_NL: Starts with "00.00.00" and contains "|FH| st" followed by the brand name
  • Description_NL: Regular text below the Item_title_NL
  • LongDescription: Purple text with bullet points just above an Item_Number
  • Item_Number: Contains "REFERENTIE : " followed by a code and "OF EQUIVALENT"

CSV Format

The CSV file uses:

  • UTF-8 encoding with BOM (for Excel compatibility)
  • Semicolon (;) delimiter
  • Quoted fields for handling special characters

Troubleshooting

Common Issues

  1. Color Detection Problems:

    • If purple text isn't being detected, check the RGB values in is_purple_text function
    • The default purple values are RGB(112, 48, 160)
  2. Structure Recognition Issues:

    • Ensure your Word document follows the expected structure
    • Check that bullet points use standard characters (•, -, etc.)
  3. Excel Compatibility:

    • If the CSV doesn't open properly in Excel, ensure it's using UTF-8-BOM encoding and semicolon delimiter

License

This software is distributed under the MIT license.