Skip to content

Improvement Suggestions #22

@arlinalves

Description

@arlinalves

Hello,
I have no programming background, but I greatly enjoyed using the Chat Export tool.
I encountered some issues when running the exported HTML file with media attachments. I exported a conversation with over 17,000 messages, but starting around message #3000, the audio and video media stopped playing—the media players were inactive.
Another issue I noticed was with localization: message dates were displayed in English (e.g., 'Thu') instead of Brazilian Portuguese, which is my language.
I attempted to implement new updates to the code using Artificial Intelligence, but as I lack programming knowledge, I failed and couldn't complete the task.
Therefore, I would like to offer a few suggestions for improvements. I hope this doesn't come across as being demanding (or "a pain").
Thank you in advance for your dedication to the code." My WhatsApp: 5599981338301

🐛 Identified Issues

  • Media Playback Failure: Audio and video media fail to play (become inactive/non-functional) in the generated HTML file starting around message #3000 in very large conversations (e.g., 17,000+ messages).
  • Localization Inconsistencies: Date information (e.g., "Thu" instead of "Qui") and system messages (e.g., call information like "[call (attempt)]") are displayed in English instead of the user's expected language (Brazilian Portuguese, in this case).

💡 Improvement Suggestions

1. File and Directory Management (Output Structure)

Objective: Standardize the output directory for better organization and predictability.

  • Output Directory Location: The generated output (HTML file, media, CSS, and attachments) must be saved into a new directory created in the same parent folder where the source ZIP file is located.
    • Example: If chat.zip is in C:/Users/User/Documents/, the output directory should be created in C:/Users/User/Documents/.
  • Directory Naming and Conflict Resolution:
    • The output directory should be named after the conversation.
    • If a directory with the exact same name already exists, the existing directory should be preserved, and a new, numbered directory must be created automatically.
    • Naming Convention for Duplicates: Append a counter: [DirectoryName] (2), [DirectoryName] (3), etc.
  • Clarity: Each exported conversation should maintain all its related files (HTML, media, assets) within its single, dedicated output directory.

2. Detailed Message Statistics Reporting

Objective: Enhance the log/console output with a detailed breakdown of message types.

The script, in the step where it displays information about the ZIP file (e.g., "ZIP file is an Android export with media/attachments..."), must include the following detailed message counts:

  1. Participant Message Counts: The number of messages sent by each participant.
  2. System Messages: The count of non-participant messages (e.g., "Messages and calls are end-to-end encrypted...", unparsed system notifications).
  3. Invalid/Unparseable Messages: The count of messages that cannot be definitively classified as system or participant messages. Display 0 if none are found.

The overall output structure in the console log should be updated to follow this format (in English):

Messages in this chat: [total_count]
[Participant 1 Name]: [count_1]
[Participant 2 Name]: [count_2]
_______________________________________
System messages: [system_count]
Invalid messages: [invalid_count]

Formatting Note: All message counts displayed in the console output (like the total count and the count in the user prompt for splitting, see point 4) should use a dot (.) as the thousands separator (e.g., $16.539$).


3. Localization and Internationalization (i18n)

Objective: Ensure all dynamic text in the HTML output respects the conversation's detected language.

  • Automatic Language Detection: Use the langdetect library to automatically identify the predominant language of the chat content (.txt file). Do not prompt the user for the language or display the detected language.
  • Date/Number Formatting: Use the Babel library to adjust the date and number formatting based on the detected language (e.g., dd/mm/yyyy for Portuguese, mm/dd/yyyy for English).
  • Translation of Fixed/System Strings:
    • The date string beneath each message must be translated (e.g., "Thu" $\rightarrow$ "Qui").
    • Call/system information must be translated (e.g., [call (attempt)] $\rightarrow$ chamada (tentativa)).
    • Crucial Formatting Rule: Remove the square brackets ([]) from the translated system messages like call logs, while keeping the parentheses () (e.g., [call (attempt)] $\rightarrow$ chamada (tentativa)).

4. Splitting Large Conversations (Chunking)

Objective: Resolve the media playback issue by splitting the exported HTML into smaller, manageable files.

This is a mandatory feature to address the media failure in long chats.

  1. User Prompt: After the participant selection step, the script must ask the user whether they want to split the conversation.

    • Prompt Text (in English):
    Split chat? [Y/n]:
    Check this option if your conversation contains more than 3.000 messages. In very long conversations, audio and video media may not work. This chat contains [total_count] messages
    

    (Note: [total_count] must use the dot (.) as the thousands separator, e.g., $16.539$)

  2. Splitting Logic:

    • If the user chooses 'Yes' (Y), the chat must be split into multiple HTML files, with each part containing a maximum of 3,000 messages.
    • The splitting process must happen during the initial export/writing phase (read .txt, process, and write to divided HTML files directly), not by attempting to split a single large, already-written HTML file.
  3. HTML File Naming Convention (for Split Files):

    Chat Type Naming Convention
    Individual Conversa_com_[Name]_Part1.html, Conversa_com_[Name]_Part2.html, etc.
    Group Conversa_grupo_[GroupName]_Part1.html, Conversa_grupo_[GroupName]_Part2.html, etc.
  4. Navigation Links:

    • In the generated HTML for each part, a simple, text-based, centered link must be added at the bottom to navigate to the subsequent part.
    • Link Text (Translated): "Clique aqui para ir para a próxima parte da conversa" (or its equivalent in the detected language).
    • Link Behavior: Clicking the link should automatically open the next HTML part in a new browser tab.
    • Final Part: The link should not appear in the last HTML file.
  5. Standard Export Link: The original functionality that exports a separate, media-linked HTML file (chat_media_linked.html) is not necessary when the chat is split, as the splitting function inherently solves the media issue.


📝 Final Output Flow Example (Script Console)

The final console script execution flow should follow this order and include the requested new information:

Welcome to chat-export v1.0.3
----------------------------------------
Select the WhatsApp chat export ZIP file you want to convert to HTML.
Processing selected file: C:/Projetos/Conversa WhatsApp Neuziany.zip...

ZIP file is an Android export with media/attachments, 'Conversa do WhatsApp com Neuziany das Cunhãs.txt' is the chat text file.

Messages in this chat: 16.539
Neuziany das Cunhãs: 8.244
Arlin: 8.295
_____________________________
System messages: 3
Invalid messages: 0

Optional: Enter date range to filter messages
Supported formats: MM/DD/YYYY, DD.MM.YYYY, MM/DD/YY, DD.MM.YY
Leave empty to skip
From date (optional):
Until date (optional):

Found the following participants in the chat:
1. Neuziany das Cunhãs
2. arlin

Enter the number corresponding to your name: 2

Split chat? [Y/n]:
Check this option if your conversation contains more than 3.000 messages. In very long conversations, audio and video media may not work. This chat contains 16.539 messages
y

Exporting 16.539 messages.
Writing HTML files...
Extracting attachments/media...
Processing took 20.887 seconds
Written: C:\Projetos\chat-export-main\Conversa_com_Neuziany_Parte1.html, C:\Projetos\chat-export-main\Conversa_com_Neuziany_Parte2.html, ...
Done.
Would you like to open them in the browser? [Y/n]: n
Do you like the tool and want to buy me a coffee? [y/N]: y

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions