- Project Introduction
- Detailed Development Contents
- System Architecture
- User Flow Diagram
- Feature Specifications
- Execution and Testing Environment
WebCleanser is a web content filtering extension designed to protect users from various online threats and provide a safe browsing experience.
WebCleanser is compatible with most Chromium-based browsers (Chrome, Microsoft Edge, Brave, Arc, Naver Whale, etc.) and is developed as an open-source software project, allowing customization and feature extensions for different languages and purposes.
The advancement of internet technology has led to the proliferation of vast amounts of content online. However, this has also increased risks like phishing, hacking, and harmful content. Users unfamiliar with digital environments are especially vulnerable to these risks.
WebCleanser aims to reduce the digital divide by offering intuitive features that protect users from cyber threats, ensuring they can browse the internet safely and confidently.
- URL Filtering:
- Detects and validates links using Google Safe Browsing API. Suspicious or phishing websites are blocked, and users can only proceed if the site is verified as safe.
- Users can report incorrectly blocked sites to Google to contribute to system improvements.
- Comment Filtering:
- Analyzes comments on Naver News and YouTube using AI-based filtering.
- Built on the KcElectra model, it classifies comments into categories such as political, sexual, depressive, and aggressive remarks. Users can customize filtering preferences.
- A two-stage filtering system (Hazard Filter and Type Filter) improves accuracy by first identifying harmful comments and then categorizing their types.
- WebCleanser is licensed under MIT License. Detailed license information can be found in each component folder (WebCleanser_backend, WebCleanser_extension, WebCleanser_model).
- Users are allowed to expand, customize, and integrate WebCleanser for various software development purposes.
The project is divided into three core components: Model, Backend, and Extension.
We used the KcElectra model, a Korean-language Transformer-based AI model, for training comment filtering.
The model was trained on 200,000 comments, combining open datasets and crawled data, with a classification accuracy of approximately 80%.
- Korean Hate Speech Dataset
- AIHub Korean Sentiment Data
- YouTube API Data
- Text Ethics Verification Dataset
- Korean Sentiment Dialog Corpus
The filtering system is structured in two stages:
- Hazard Filter: Classifies comments as either safe or harmful.
- Type Filter: Further categorizes harmful comments into specific types:
- 0: Normal
- 1: Political
- 2: Sexual
- 3: Depressive
- 4: Aggressive
- Spring: Handles URL filtering requests using the Google Safe Browsing API and integrates with Flask for AI comment filtering.
- Flask: Hosts the trained KcElectra model and processes filtering requests.
- Developed in React, the extension serves as the user interface.
- Uses chrome-extension-boilerplate-react for streamlined development.
The system consists of the following components:
- Server: Hosted on AWS EC2 using Docker Compose.
- Spring handles Google Safe Browsing API requests.
- Flask processes AI-based comment filtering.
- Client: Built in React, the extension handles link validation and comment filtering.
- Phishing Detection Toggle: Enable/disable phishing site detection.
- Ignore Button: Bypass warnings to access a page.
- Report Incorrect Blocking: Report errors to Google.
- Filter Type Selection: Customize comment filtering categories.
- Show/Hide Harmful Text: Toggle visibility of harmful comments.
- View Statistics: Access statistics for URL and comment filtering.
- Sorting and Filtering: Sort URLs and comments by date and type.
- Hardware: MacBook Pro 14-inch (Apple M3 Pro)
- Software: IntelliJ IDEA, Visual Studio Code, Node.js, Chrome
- Clone
WebCleanser_backend. - Install dependencies:
pip install -r requirements.txt. - Run Flask server:
app.py. - Set up the Spring backend with JDK 17 and update
application.propertieswith the Google API key. - Run the Spring server:
NetpuriServerApplication.java.
- Clone
WebCleanser_extension. - Install dependencies:
npm install. - Build the extension:
NODE_ENV=production npm run build. - Load the unpacked extension in Chrome:
- Navigate to
chrome://extensions. - Enable Developer Mode and load the unpacked folder.
- Navigate to
- Pin WebCleanser to the Chrome toolbar and restart the browser.






