Skip to content

Commit d910006

Browse files
committed
Add JSON error handling
1 parent bc70823 commit d910006

File tree

16 files changed

+3920
-793
lines changed

16 files changed

+3920
-793
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ logs/
1212
data/cache/
1313
.php-cs-fixer.cache
1414
.phpstan/
15+
var/htmlpurifier/

ENV_CONFIGURATION.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,13 @@ RATE_LIMIT_API_WINDOW=60 # Seconds (1 minute)
7878
CACHE_ENABLED=1 # 0 = disabled, 1 = enabled
7979
CACHE_DRIVER=file # file or redis
8080
CACHE_TTL=300 # Seconds (5 minutes)
81+
```
82+
83+
## HTML Sanitization
84+
85+
```bash
86+
SANITIZATION_ENABLED=1 # 0 = disabled, 1 = enabled (default: enabled)
87+
```
8188
CACHE_PATH=data/cache # File cache only
8289
```
8390

HTML_SANITIZATION.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# HTML Sanitization
2+
3+
VibeReader implements comprehensive HTML sanitization to prevent XSS (Cross-Site Scripting) attacks from malicious feed content.
4+
5+
## Overview
6+
7+
Feed content from RSS/Atom/JSON feeds can contain HTML, which could potentially include malicious scripts. This implementation sanitizes all feed content at multiple layers:
8+
9+
1. **Server-side sanitization** - HTMLPurifier sanitizes content before storing in database
10+
2. **Client-side sanitization** - DOMPurify provides defense-in-depth when rendering content
11+
12+
## Server-Side Sanitization (HTMLPurifier)
13+
14+
### Implementation
15+
16+
- **Library**: `ezyang/htmlpurifier` (v4.16+)
17+
- **Location**: `src/Utils/HtmlSanitizer.php`
18+
- **Integration**: Automatically applied in `FeedParser` when parsing feeds
19+
20+
### What Gets Sanitized
21+
22+
- **Feed titles** - Plain text (HTML entities escaped)
23+
- **Feed descriptions** - HTML sanitized
24+
- **Item titles** - Plain text (HTML entities escaped)
25+
- **Item content** - HTML sanitized (preserves formatting)
26+
- **Item summaries** - HTML sanitized
27+
- **Item authors** - Plain text (HTML entities escaped)
28+
29+
### Allowed HTML Tags
30+
31+
The sanitizer allows common formatting tags used in feed content:
32+
- Text formatting: `p`, `br`, `strong`, `b`, `em`, `i`, `u`
33+
- Links: `a[href|title|target]`
34+
- Lists: `ul`, `ol`, `li`
35+
- Code: `pre`, `code`
36+
- Images: `img[src|alt|width|height]`
37+
- Headings: `h1`, `h2`, `h3`, `h4`, `h5`, `h6`
38+
- Structure: `div`, `span[style]`, `blockquote`
39+
- Tables: `table`, `thead`, `tbody`, `tr`, `td`, `th`
40+
41+
### Allowed Attributes
42+
43+
- Links: `href`, `title`, `target`, `rel`
44+
- Images: `src`, `alt`, `width`, `height`
45+
- Styling: `style` (limited CSS properties)
46+
- Allowed CSS properties: `color`, `background-color`, `font-size`, `font-weight`, `font-style`, `text-align`, `text-decoration`, `margin`, `padding`, `border`
47+
48+
### Configuration
49+
50+
Sanitization can be disabled via environment variable:
51+
52+
```bash
53+
SANITIZATION_ENABLED=0 # Disable sanitization (not recommended)
54+
```
55+
56+
**Default**: Enabled (`SANITIZATION_ENABLED=1`)
57+
58+
### Cache
59+
60+
HTMLPurifier uses a cache directory at `var/htmlpurifier/` to improve performance. This directory is automatically created and is excluded from Git.
61+
62+
## Client-Side Sanitization (DOMPurify)
63+
64+
### Implementation
65+
66+
- **Library**: DOMPurify v3.3.1 (via CDN)
67+
- **Location**: Loaded in `views/dashboard.php`
68+
- **Integration**: Applied in `assets/js/modules/items.js` when rendering item content
69+
70+
### Defense in Depth
71+
72+
Even though content is sanitized server-side, DOMPurify provides an additional layer of protection:
73+
- Protects against any content that might bypass server-side sanitization
74+
- Handles edge cases in browser rendering
75+
- Provides real-time sanitization when content is displayed
76+
77+
### Configuration
78+
79+
DOMPurify uses the same allowed tags and attributes as the server-side sanitizer for consistency.
80+
81+
## Usage
82+
83+
### Server-Side
84+
85+
```php
86+
use PhpRss\Utils\HtmlSanitizer;
87+
88+
// Sanitize HTML content (preserves formatting)
89+
$cleanHtml = HtmlSanitizer::sanitize($feedContent);
90+
91+
// Sanitize plain text (escapes HTML entities)
92+
$cleanText = HtmlSanitizer::sanitizeText($feedTitle);
93+
```
94+
95+
### Client-Side
96+
97+
```javascript
98+
// Sanitize HTML before setting innerHTML
99+
const sanitized = DOMPurify.sanitize(content, {
100+
ALLOWED_TAGS: ['p', 'br', 'strong', 'b', 'em', 'i', 'u', 'a', 'ul', 'ol', 'li', 'blockquote', 'pre', 'code', 'img', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'span', 'table', 'thead', 'tbody', 'tr', 'td', 'th'],
101+
ALLOWED_ATTR: ['href', 'title', 'target', 'src', 'alt', 'width', 'height', 'style', 'rel'],
102+
ALLOW_DATA_ATTR: false
103+
});
104+
105+
element.innerHTML = sanitized;
106+
```
107+
108+
## Security Benefits
109+
110+
1. **Prevents Stored XSS** - Malicious scripts in feed content are removed before storage
111+
2. **Prevents Reflected XSS** - Content is sanitized before being sent to the browser
112+
3. **Defense in Depth** - Multiple layers of sanitization (server + client)
113+
4. **Preserves Formatting** - Legitimate HTML formatting is maintained
114+
5. **Configurable** - Can be disabled if needed (though not recommended)
115+
116+
## Performance
117+
118+
- **HTMLPurifier**: Uses caching to improve performance on repeated sanitization
119+
- **DOMPurify**: Lightweight client-side library with minimal performance impact
120+
- **Caching**: HTMLPurifier cache stored in `var/htmlpurifier/` (excluded from Git)
121+
122+
## Troubleshooting
123+
124+
### Content Appears Stripped
125+
126+
If legitimate content is being removed:
127+
1. Check HTMLPurifier logs for warnings
128+
2. Verify the content uses allowed tags/attributes
129+
3. Review `src/Utils/HtmlSanitizer.php` configuration
130+
131+
### Sanitization Not Working
132+
133+
1. Verify `SANITIZATION_ENABLED=1` in environment
134+
2. Check that HTMLPurifier is installed: `composer show ezyang/htmlpurifier`
135+
3. Verify DOMPurify is loaded (check browser console)
136+
4. Check that `var/htmlpurifier/` directory is writable
137+
138+
### Disabling Sanitization
139+
140+
**Not Recommended** - Only disable for debugging:
141+
142+
```bash
143+
SANITIZATION_ENABLED=0
144+
```
145+
146+
This will bypass server-side sanitization. Client-side DOMPurify will still sanitize content.
147+
148+
## Files Modified
149+
150+
- `src/Utils/HtmlSanitizer.php` (new) - HTML sanitization utility
151+
- `src/FeedParser.php` - Integrated sanitization into all parsing methods
152+
- `src/Config.php` - Added sanitization configuration
153+
- `assets/js/modules/items.js` - Added DOMPurify client-side sanitization
154+
- `views/dashboard.php` - Added DOMPurify CDN script
155+
- `composer.json` - Added HTMLPurifier dependency
156+
- `ENV_CONFIGURATION.md` - Added sanitization configuration documentation
157+
- `.gitignore` - Added HTMLPurifier cache directory
158+
159+
## Testing
160+
161+
To test sanitization:
162+
163+
1. **Test with malicious content**:
164+
```php
165+
$malicious = '<script>alert("XSS")</script><p>Safe content</p>';
166+
$sanitized = HtmlSanitizer::sanitize($malicious);
167+
// Result: '<p>Safe content</p>' (script removed)
168+
```
169+
170+
2. **Test with legitimate HTML**:
171+
```php
172+
$legitimate = '<p>This is <strong>bold</strong> text with a <a href="https://example.com">link</a>.</p>';
173+
$sanitized = HtmlSanitizer::sanitize($legitimate);
174+
// Result: Same content (preserved)
175+
```
176+
177+
## References
178+
179+
- [HTMLPurifier Documentation](https://htmlpurifier.org/)
180+
- [DOMPurify Documentation](https://github.com/cure53/DOMPurify)
181+
- [OWASP XSS Prevention](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html)

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,36 @@ The project uses static analysis and code style tools:
229229

230230
See [CODE_QUALITY.md](CODE_QUALITY.md) for usage instructions.
231231

232+
## Security
233+
234+
VibeReader implements comprehensive security measures:
235+
236+
- **HTML Sanitization** - All feed content is sanitized to prevent XSS attacks
237+
- Server-side: HTMLPurifier sanitizes content before storage
238+
- Client-side: DOMPurify provides defense-in-depth when rendering
239+
- See [HTML_SANITIZATION.md](HTML_SANITIZATION.md) for details
240+
- **CSRF Protection** - All state-changing operations require CSRF tokens
241+
- **SSRF Protection** - Feed URLs are validated to prevent internal network access
242+
- **Rate Limiting** - Login and API endpoints are rate-limited
243+
- **Secure Sessions** - HttpOnly, Secure, SameSite cookies
244+
- **Input Validation** - Comprehensive server-side validation
245+
- **SQL Injection Prevention** - All queries use prepared statements
246+
247+
See [SECURITY_AUDIT.md](SECURITY_AUDIT.md) for complete security details.
248+
249+
## Security
250+
251+
VibeReader implements comprehensive security measures:
252+
253+
- **HTML Sanitization** - Server-side (HTMLPurifier) and client-side (DOMPurify) sanitization to prevent XSS attacks
254+
- **CSRF Protection** - All state-changing operations protected
255+
- **SSRF Protection** - Feed URL validation prevents access to internal IPs
256+
- **Rate Limiting** - Prevents brute force attacks
257+
- **Secure Sessions** - HttpOnly, Secure, SameSite cookies
258+
- **Input Validation** - Comprehensive validation on all inputs
259+
260+
See [SECURITY_AUDIT.md](SECURITY_AUDIT.md) for detailed security information and [HTML_SANITIZATION.md](HTML_SANITIZATION.md) for sanitization details.
261+
232262
## Future Enhancements
233263

234264
- Support for MySQL database (currently uses PostgreSQL in Docker)

assets/js/modules/items.js

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,12 +241,21 @@ function renderItemContent(item) {
241241
</div>
242242
` : `<div class="item-content-meta">${metaContent}</div>`;
243243

244+
// Sanitize HTML content with DOMPurify if available (defense in depth)
245+
const sanitizedContent = (typeof DOMPurify !== 'undefined')
246+
? DOMPurify.sanitize(content, {
247+
ALLOWED_TAGS: ['p', 'br', 'strong', 'b', 'em', 'i', 'u', 'a', 'ul', 'ol', 'li', 'blockquote', 'pre', 'code', 'img', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'span', 'table', 'thead', 'tbody', 'tr', 'td', 'th'],
248+
ALLOWED_ATTR: ['href', 'title', 'target', 'src', 'alt', 'width', 'height', 'style', 'rel'],
249+
ALLOW_DATA_ATTR: false
250+
})
251+
: content; // Fallback if DOMPurify not loaded
252+
244253
itemContent.innerHTML = `
245254
<div class="item-content-body">
246255
${titleRow}
247256
${metaRow}
248257
<div class="item-content-text">
249-
${content}
258+
${sanitizedContent}
250259
</div>
251260
</div>
252261
`;

composer.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
"ext-json": "*",
1212
"ext-simplexml": "*",
1313
"ext-libxml": "*",
14-
"monolog/monolog": "^3.0"
14+
"monolog/monolog": "^3.0",
15+
"ezyang/htmlpurifier": "^4.16"
1516
},
1617
"require-dev": {
1718
"phpunit/phpunit": "^10.0",

0 commit comments

Comments
 (0)