A simple, fast API to extract clean article content from web pages using Mozilla's Readability library. Perfect for automation workflows, content aggregation, and data processing pipelines.
- 🚀 Fast & Lightweight - Serverless deployment on Vercel
- 📰 Clean Content Extraction - Uses Mozilla Readability for high-quality results
- 🔗 Simple API - Single endpoint with URL parameter
- 🛠️ Automation Ready - Perfect for n8n workflows and other automation tools
- 📱 Cross-Platform - Works with any web page that contains article content
GET /api/parse?url={article_url}
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL of the article to parse |
{
"title": "Article Title",
"content": "Clean article content without ads, navigation, or other clutter..."
}{
"error": "Error description",
"detail": "Detailed error message"
}# Test with a sample article
curl "https://your-deployment.vercel.app/api/parse?url=https://example.com/article"
# Test with a real article (replace with your deployment URL)
curl "https://your-deployment.vercel.app/api/parse?url=https://techcrunch.com/2024/01/15/sample-article"const response = await fetch('https://your-deployment.vercel.app/api/parse?url=https://example.com/article');
const data = await response.json();
console.log(data.title, data.content);import requests
url = "https://your-deployment.vercel.app/api/parse"
params = {"url": "https://example.com/article"}
response = requests.get(url, params=params)
data = response.json()
print(data["title"], data["content"])Option 1: GitHub Integration (Recommended)
- Fork this repository
- Connect your GitHub account to Vercel
- Import this repository in Vercel
- Deploy automatically
Option 2: CLI Deployment
- Install Vercel CLI:
npm i -g vercel - Run:
npm run vercel:deploy - Follow the prompts to deploy
# Install dependencies
npm install
# Install Vercel CLI
npm i -g vercel
# Start development server
npm run vercel:dev
# Test locally
curl "http://localhost:3000/api/parse?url=https://example.com/article"- Content Aggregation - Extract articles for newsletters or content curation
- n8n Workflows - Integrate with automation workflows for content processing
- Data Analysis - Clean article text for sentiment analysis or NLP tasks
- RSS Enhancement - Get full article content from RSS feed links
- Research Tools - Extract clean text from academic articles or blog posts
{
"method": "GET",
"url": "https://your-deployment.vercel.app/api/parse",
"qs": {
"url": "{{$node[\"Previous Node\"].json[\"article_url\"]}}"
}
}- Runtime: Node.js 18+
- Dependencies:
@mozilla/readability- Mozilla's article extraction libraryjsdom- DOM implementation for Node.jsnode-fetch- HTTP client for fetching web pages
- Deployment: Vercel Serverless Functions
- Response Time: Typically 1-3 seconds depending on article size
- Only works with publicly accessible URLs
- Some websites may block automated requests
- JavaScript-heavy sites might not render properly
- Rate limiting may apply based on your Vercel plan
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - feel free to use this in your projects!
If you encounter issues or have questions:
- Open an issue on GitHub
- Check that the target URL is publicly accessible
- Verify the URL contains article content (not just navigation pages)