A command-line tool for searching through Microsoft Word (*.docx) files using grep-like pattern matching.
Note: This tool will not work on:
- Old .doc (pre-Office 2007) binary files
- .dotx template files
- .docm macro-enabled files
The basic "line" to search in the Word file is 1 paragraph. Word files are broken into paragraphs and then searched. Bulleted lists, etc., are composed of multiple paragraphs. If you want complex search patterns, plan accordingly.
- Search single files or recursively through directories
- Regular expression pattern matching
- Case-sensitive and case-insensitive search options
- Colored output highlighting matches
- Output formatting options including hanging indents
- Count-only and filename-only output modes
- Hyperlinked file paths (in supported terminals)
- Progress bar for large searches
- Debug logging support
- Read paths from stdin
See Executables or Native Script below.
grep-docx [options] PATTERN PATHPATTERN: Regular expression pattern to search for (a Python-style regex)PATH: File or directory to search (use - to read paths from stdin)
-C, --color, --colour: Color the prefix and highlight matches-c, --count: Only print a count of matching lines-H, --hyperlink: Print filenames as clickable hyperlinks-I, --hanging-indent: Line output after the 1st line starts with a tab-i, --ignore-case: Ignore case distinctions-l, --files-with-matches: Only print names of files with matches-L, --files-without-matches: Only print names of files without matches-P, --no-progress-bar: Disable the progress bar-q, --quiet, --silent: Suppress all normal output-r, --recursive: Recursively search subdirectories-s, --no-messages: Suppress error messages-T, --initial-tab: Line output starts with a tab character-V, --version: Show program version--debug: Enable debug logging--logfile FILE: Write logs to FILE
Search a single file for the text 'config':
grep-docx config document.docxSearch folders/directories recursively with case insensitive matching:
grep-docx -ri config ./documents/Count the number of times in a directory the partial-word 'construc' (e.g., construct, misconstruction, constructible, etc.) is found in *.docx files:
grep-docx -c construc ./documents/List files where the words 'sake' and 'clarification' occur within the same paragraph:
grep-docx -l '\bsake\b.*clarification\b|\bclarification\b.*sake\b' ./documents/Write output to a file (via STDOUT redirection):
grep-docx -ri config ./documents/ > output.txtCompiled executables may be found in the executables branch of this repository.
Executables were made for:
- grep-docx.exe - Windows 64-bit Intel executable
- grep-docx - Apple Silicon (M1+) binary
- No Python installation required - standalone executables
- Operating system: Windows 7+, macOS 10.15+, or Linux with glibc 2.17+
- Download
- For Linux or MacOS,
chmod +x grep-docx - MacOS, as the binary is not signed with a Developer account, it will require special permissions. MacOS will walk you through this if you read the dialog box instructions.
- Run
A simple Python script, ready to run.
- Python 3.x
- python-docx library
- tqdm library
- (optional & Windows only) colorama library
- Ensure Python 3.x is installed
- Install required dependencies:
pip install python-docx tqdmor
pip install -r requirements.txt- Run the script.
This project is licensed under the MIT License - see the LICENSE file for details.