The purpose of this project is to automate the process of cleaning and restructuring HTML files and their associated directories. This includes renaming files and directories by removing redundant identifiers, replacing inline <style> tags with linked CSS files, and updating all internal references to reflect the changes.
The project provides a Python script that recursively processes HTML directories, ensuring all subdirectories and nested HTML files are handled seamlessly.
This script is designed to streamline managing large HTML directories. Use it to:
- Replace
<style>tags in HTML files with<link>tags that point to a single external CSS file. - Rename files and directories to cleaner versions by removing unnecessary identifiers (e.g., hex IDs).
- Update references within HTML files to ensure consistency after renaming files and directories.
To run the script:
python cleaner.py [directory_path] [css_file_path] [-m optional]directory_path: The root directory containing HTML files and subdirectories (see-mbelow for multiple directories).css_file_path: The relative or absolute path to the CSS file that will be linked to the HTML files.-m: (optional) Flag to enable the cleanup over multiple note exports. If specified,directory_pathshould be the parent directory containing multiple note folders.
Example:
python script.py ./notion_files ./styles.css -mpython script.py ./all_notes ../styles.css -mreplace_paths(html_path, subdir_name, new_subdir_name)
- Replaces all occurrences of
subdir_namewithnew_subdir_namein the provided HTML file.
replace_style(html_path, css_path)
- Replaces inline
<style>blocks in the provided HTML file with a<link>tag to the given CSS file.
convert_dir(directory: str, css_path: str)
- Recursively processes a directory and its subdirectories, performing renames and replacements.
main(dir, css_path)
- Entry point that initiates the conversion process for a given directory.
- Python 3.8 or newer installed
-
Clone the repository
git clone https://github.com/rhudaj/HTML-Cleaner.git
-
Navigate to the directory
cd HTML-Cleaner -
Add your Notion HTML exported folder.
- For example
./my_note
- For example
-
Add your CSS file.
- The CSS file that notion uses is already included:
./notion_styles.css, so you can use that or add your own
- The CSS file that notion uses is already included:
-
Run the script:
python cleaner.py ./my_note ./notion_styles.css
- Expand CSS customization options.
- Optimize performance for directories with thousands of files.
Roman Hudaj - rhudaj@uwaterloo.ca Project Link: HTML Cleaner