WebCleanser

Demo

Project Introduction
Detailed Development Contents
System Architecture
User Flow Diagram
Feature Specifications
Execution and Testing Environment

1. Project Introduction

WebCleanser is a web content filtering extension designed to protect users from various online threats and provide a safe browsing experience.

WebCleanser is compatible with most Chromium-based browsers (Chrome, Microsoft Edge, Brave, Arc, Naver Whale, etc.) and is developed as an open-source software project, allowing customization and feature extensions for different languages and purposes.

Background and Necessity

The advancement of internet technology has led to the proliferation of vast amounts of content online. However, this has also increased risks like phishing, hacking, and harmful content. Users unfamiliar with digital environments are especially vulnerable to these risks.

WebCleanser aims to reduce the digital divide by offering intuitive features that protect users from cyber threats, ensuring they can browse the internet safely and confidently.

Core Features and Workflow

URL Filtering:
- Detects and validates links using Google Safe Browsing API. Suspicious or phishing websites are blocked, and users can only proceed if the site is verified as safe.
- Users can report incorrectly blocked sites to Google to contribute to system improvements.

Comment Filtering:
- Analyzes comments on Naver News and YouTube using AI-based filtering.
- Built on the KcElectra model, it classifies comments into categories such as political, sexual, depressive, and aggressive remarks. Users can customize filtering preferences.
- A two-stage filtering system (Hazard Filter and Type Filter) improves accuracy by first identifying harmful comments and then categorizing their types.

Open-Source Software

WebCleanser is licensed under MIT License. Detailed license information can be found in each component folder (WebCleanser_backend, WebCleanser_extension, WebCleanser_model).
Users are allowed to expand, customize, and integrate WebCleanser for various software development purposes.

2. Detailed Development Contents

The project is divided into three core components: Model, Backend, and Extension.

2.1. Model

We used the KcElectra model, a Korean-language Transformer-based AI model, for training comment filtering.

2.1.1 Dataset Overview

The model was trained on 200,000 comments, combining open datasets and crawled data, with a classification accuracy of approximately 80%.

Korean Hate Speech Dataset
AIHub Korean Sentiment Data
YouTube API Data
Text Ethics Verification Dataset
Korean Sentiment Dialog Corpus

2.1.2 Model Training

The filtering system is structured in two stages:

Hazard Filter: Classifies comments as either safe or harmful.
Type Filter: Further categorizes harmful comments into specific types:
- 0: Normal
- 1: Political
- 2: Sexual
- 3: Depressive
- 4: Aggressive

2.2. Backend

Spring: Handles URL filtering requests using the Google Safe Browsing API and integrates with Flask for AI comment filtering.
Flask: Hosts the trained KcElectra model and processes filtering requests.

2.3. Extension

Developed in React, the extension serves as the user interface.
Uses chrome-extension-boilerplate-react for streamlined development.

3. System Architecture

The system consists of the following components:

Server: Hosted on AWS EC2 using Docker Compose.
- Spring handles Google Safe Browsing API requests.
- Flask processes AI-based comment filtering.
Client: Built in React, the extension handles link validation and comment filtering.

4. User Flow Diagram

5. Feature Specifications

URL Filtering

Phishing Detection Toggle: Enable/disable phishing site detection.
Ignore Button: Bypass warnings to access a page.
Report Incorrect Blocking: Report errors to Google.

Comment Filtering

Filter Type Selection: Customize comment filtering categories.
Show/Hide Harmful Text: Toggle visibility of harmful comments.

Statistics

View Statistics: Access statistics for URL and comment filtering.
Sorting and Filtering: Sort URLs and comments by date and type.

6. Execution and Testing Environment

6.1 Testing Environment

Hardware: MacBook Pro 14-inch (Apple M3 Pro)
Software: IntelliJ IDEA, Visual Studio Code, Node.js, Chrome

6.2 Execution Steps

6.2.1 Backend

Clone WebCleanser_backend.
Install dependencies: pip install -r requirements.txt.
Run Flask server: app.py.
Set up the Spring backend with JDK 17 and update application.properties with the Google API key.
Run the Spring server: NetpuriServerApplication.java.

6.2.2 Frontend (Extension)

Clone WebCleanser_extension.
Install dependencies: npm install.
Build the extension: NODE_ENV=production npm run build.
Load the unpacked extension in Chrome:
- Navigate to chrome://extensions.
- Enable Developer Mode and load the unpacked folder.
Pin WebCleanser to the Chrome toolbar and restart the browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebCleanser

Demo

Table of Contents

1. Project Introduction

Background and Necessity

Core Features and Workflow

Open-Source Software

2. Detailed Development Contents

2.1. Model

2.1.1 Dataset Overview

2.1.2 Model Training

2.2. Backend

2.3. Extension

3. System Architecture

4. User Flow Diagram

5. Feature Specifications

URL Filtering

Comment Filtering

Statistics

6. Execution and Testing Environment

6.1 Testing Environment

6.2 Execution Steps

6.2.1 Backend

6.2.2 Frontend (Extension)

FilesExpand file tree

README_ENG.md

Latest commit

History

README_ENG.md

File metadata and controls

WebCleanser

Demo

Table of Contents

1. Project Introduction

Background and Necessity

Core Features and Workflow

Open-Source Software

2. Detailed Development Contents

2.1. Model

2.1.1 Dataset Overview

2.1.2 Model Training

2.2. Backend

2.3. Extension

3. System Architecture

4. User Flow Diagram

5. Feature Specifications

URL Filtering

Comment Filtering

Statistics

6. Execution and Testing Environment

6.1 Testing Environment

6.2 Execution Steps

6.2.1 Backend

6.2.2 Frontend (Extension)