Skip to content

Yosua1011/article-parser-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Article Parser API

A simple, fast API to extract clean article content from web pages using Mozilla's Readability library. Perfect for automation workflows, content aggregation, and data processing pipelines.

Features

  • 🚀 Fast & Lightweight - Serverless deployment on Vercel
  • 📰 Clean Content Extraction - Uses Mozilla Readability for high-quality results
  • 🔗 Simple API - Single endpoint with URL parameter
  • 🛠️ Automation Ready - Perfect for n8n workflows and other automation tools
  • 📱 Cross-Platform - Works with any web page that contains article content

API Usage

Endpoint

GET /api/parse?url={article_url}

Parameters

Parameter Type Required Description
url string Yes The URL of the article to parse

Response Format

{
  "title": "Article Title",
  "content": "Clean article content without ads, navigation, or other clutter..."
}

Error Response

{
  "error": "Error description",
  "detail": "Detailed error message"
}

Testing the API

Using cURL

# Test with a sample article
curl "https://your-deployment.vercel.app/api/parse?url=https://example.com/article"

# Test with a real article (replace with your deployment URL)
curl "https://your-deployment.vercel.app/api/parse?url=https://techcrunch.com/2024/01/15/sample-article"

Using JavaScript/Node.js

const response = await fetch('https://your-deployment.vercel.app/api/parse?url=https://example.com/article');
const data = await response.json();
console.log(data.title, data.content);

Using Python

import requests

url = "https://your-deployment.vercel.app/api/parse"
params = {"url": "https://example.com/article"}
response = requests.get(url, params=params)
data = response.json()
print(data["title"], data["content"])

Deployment

Deploy to Vercel

Option 1: GitHub Integration (Recommended)

  1. Fork this repository
  2. Connect your GitHub account to Vercel
  3. Import this repository in Vercel
  4. Deploy automatically

Option 2: CLI Deployment

  1. Install Vercel CLI: npm i -g vercel
  2. Run: npm run vercel:deploy
  3. Follow the prompts to deploy

Local Development

# Install dependencies
npm install

# Install Vercel CLI
npm i -g vercel

# Start development server
npm run vercel:dev

# Test locally
curl "http://localhost:3000/api/parse?url=https://example.com/article"

Use Cases

  • Content Aggregation - Extract articles for newsletters or content curation
  • n8n Workflows - Integrate with automation workflows for content processing
  • Data Analysis - Clean article text for sentiment analysis or NLP tasks
  • RSS Enhancement - Get full article content from RSS feed links
  • Research Tools - Extract clean text from academic articles or blog posts

Integration Examples

n8n Workflow Node

{
  "method": "GET",
  "url": "https://your-deployment.vercel.app/api/parse",
  "qs": {
    "url": "{{$node[\"Previous Node\"].json[\"article_url\"]}}"
  }
}

Technical Details

  • Runtime: Node.js 18+
  • Dependencies:
    • @mozilla/readability - Mozilla's article extraction library
    • jsdom - DOM implementation for Node.js
    • node-fetch - HTTP client for fetching web pages
  • Deployment: Vercel Serverless Functions
  • Response Time: Typically 1-3 seconds depending on article size

Limitations

  • Only works with publicly accessible URLs
  • Some websites may block automated requests
  • JavaScript-heavy sites might not render properly
  • Rate limiting may apply based on your Vercel plan

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

MIT License - feel free to use this in your projects!

Support

If you encounter issues or have questions:

  • Open an issue on GitHub
  • Check that the target URL is publicly accessible
  • Verify the URL contains article content (not just navigation pages)