Skip to content

Spanish#2

Open
ghost wants to merge 12 commits intoeventdata:mainfrom
Vikachubro21:Spanish
Open

Spanish#2
ghost wants to merge 12 commits intoeventdata:mainfrom
Vikachubro21:Spanish

Conversation

@ghost
Copy link

@ghost ghost commented Jun 22, 2022

No description provided.

@ghost ghost closed this Jun 22, 2022
@ghost ghost reopened this Jun 22, 2022
@shreyasmeher shreyasmeher requested a review from Copilot September 9, 2025 20:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Spanish language pattern files for identifying relevant and irrelevant content in wiki crawling and processing. The changes introduce three text files containing Spanish keywords and patterns to filter content during data processing.

  • Adds comprehensive Spanish vocabulary patterns for political, military, and social topics
  • Creates exclusion patterns for entertainment, sports, and cultural content
  • Establishes irrelevant keyword filters for commercial and recreational topics

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
wiki_relevant_spanish.txt Contains 265 Spanish terms/patterns for identifying politically and socially relevant content
wiki_relevant_exclude_spanish.txt Contains 23 Spanish terms/patterns to exclude entertainment and sports content
irelevant_keywords_spanish.txt Contains 247 Spanish terms/patterns to filter out commercial, sports, and entertainment content
Comments suppressed due to low confidence (2)

pretrain-corpora/Crawlers and Process/Patterns/wiki_relevant_exclude_spanish.txt:1

  • There is an empty line at the end of the file. Remove this trailing empty line to maintain consistency.
álbum

pretrain-corpora/Crawlers and Process/Patterns/irelevant_keywords_spanish.txt:1

  • There is an empty line at the end of the file. Remove this trailing empty line to maintain consistency.
/\bestilos?\b/

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

asamblea
asilo
ataque
attrocit
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'attrocit' appears to be misspelled. It should be 'atrocidad' (atrocity in Spanish).

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +67
muerte
muerte
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'muerte' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +69
defensa
defensa
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'defensa' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +83
diplomático
diplomático
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'diplomático' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +184
político
político
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'político' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Comment on lines +220 to +221
Asuntos sociales
Asuntos sociales
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase 'Asuntos sociales' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Comment on lines +228 to +229
sospechoso
sospechoso
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'sospechoso' is duplicated on consecutive lines. Remove one of the duplicate entries.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant