The Blog Summariser Web App is a full-stack web application built with Next.js 14+, Tailwind CSS, and Shadcn UI. It allows users to input a blog URL, scrape its content, generate a concise English summary using a keyword-based algorithm, translate the summary into Urdu, and save the data to Supabase (summaries) and MongoDB (full blog text). The app features a clean, responsive UI and is designed for ease of use, making it accessible for beginners and developers alike.
- Features
- Technologies
- Live Demo
- Project Structure
- Setup Instructions
- Usage
- How It Works
- Database Schema
- Troubleshooting
- Future Enhancements
- Contributing
- Contact
- License
- Blog Scraping: Extracts text content from a provided blog URL using Cheerio.
- AI Summary: Generates a concise English summary using a keyword-based sentence selection algorithm.
- Urdu Translation: Translates the English summary into Urdu using a custom JavaScript dictionary, leaving unmatched words as-is.
- Data Storage:
- Saves English and Urdu summaries to a Supabase table (
summaries). - Stores the full blog text in a MongoDB collection (
fullblogs).
- Saves English and Urdu summaries to a Supabase table (
- Responsive UI: Built with Shadcn UI and Tailwind CSS for a modern, user-friendly interface.
- Error Handling: Displays clear error messages for invalid URLs or processing failures.
- Loading States: Shows a spinner during data processing for better UX.
- Frontend:
- Next.js 14+ (App Router)
- React
- Tailwind CSS
- Shadcn UI (Input, Button, Card, Alert components)
- Backend:
- Next.js API Routes
- Cheerio (for web scraping)
- Axios (for HTTP requests)
- Database:
- Supabase (PostgreSQL) for storing summaries
- MongoDB for storing full blog text
- Other:
- JavaScript (no TypeScript)
- Custom JavaScript dictionary for Urdu translation
- Environment variables for configuration
/blog-summariser/
├── /app/
│ ├── /api/
│ │ ├── /scrapeBlog/
│ │ │ └── route.js # Scrapes blog content
│ │ ├── /summariseText/
│ │ │ └── route.js # Generates English summary
│ │ ├── /translateToUrdu/
│ │ │ └── route.js # Translates summary to Urdu
│ │ └── /saveData/
│ │ └── route.js # Saves data to Supabase and MongoDB
│ ├── /components/
│ │ ├── URLInputForm.jsx # Form for entering blog URL
│ │ ├── SummaryCard.jsx # Displays scraped text and summaries
│ │ ├── SummaryContent.jsx # Logic for /summary/page.jsx
│ │ ├── LoadingSpinner.jsx # Custom loading spinner
│ │ └── ErrorAlert.jsx # Error message display
│ ├── globals.css # Tailwind and Shadcn UI styles
│ ├── layout.jsx # Root layout with global styles
│ ├── page.jsx # Homepage with URL input
│ ├── /summary/
│ │ └── page.jsx # Summary page with results
│
├── /components/
│ ├── /ui/
│ │ ├── input.jsx # Shadcn UI Input component
│ │ ├── button.jsx # Shadcn UI Button component
│ │ ├── card.jsx # Shadcn UI Card component
│ │ └── alert.jsx # Shadcn UI Alert component
├── /lib/
│ ├── mongo.js # MongoDB client setup
│ └── supabase.js # Supabase client setup
│ └── utils.js #
├── /utils/ # Optional utilities (not used)
├── /public/ # Static assets
├── .env.local # Environment variables
├── components.json # Shadcn UI configuration
├── package.json # Dependencies and scripts
├── next.config.js # Next.js configuration
└── tailwind.config.js # Tailwind CSS configuration
Follow these steps to set up and run the project locally.
- Node.js: Version 18 or higher
- npm: Version 8 or higher
- Supabase Account: Create a project at supabase.com to get
SUPABASE_URLandSUPABASE_KEY. - MongoDB Atlas: Set up a cluster at mongodb.com to get
MONGO_URI.
-
Clone the Repository (if applicable):
git clone <repository-url> cd blog-summariser
Alternatively, create a new Next.js project:
npx create-next-app@latest blog-summariser
- Select: JavaScript, ESLint, Tailwind CSS, App Router, no src/ directory,
@/*import alias.
- Select: JavaScript, ESLint, Tailwind CSS, App Router, no src/ directory,
-
Install Dependencies:
npm install cheerio axios @supabase/supabase-js mongodb
-
Set Up Shadcn UI:
npx shadcn-ui@latest init
- Style: Default
- Base color: Slate
- CSS variables: Yes
- TypeScript: No
- Global CSS:
app/globals.css - Tailwind config:
tailwind.config.js - Import alias: Yes (
@/*)
Install required components:
npx shadcn-ui@latest add input button card alert
-
Configure Environment Variables:
- Create a
.env.localfile in the root directory:SUPABASE_URL=your-supabase-url SUPABASE_KEY=your-supabase-anon-key MONGO_URI=your-mongodb-uri - Replace placeholders with your Supabase and MongoDB credentials.
- Create a
-
Set Up Databases:
- Supabase:
- Create a table named
summarieswith the following SQL:CREATE TABLE summaries ( id SERIAL PRIMARY KEY, blog_url TEXT NOT NULL, summary_en TEXT NOT NULL, summary_urdu TEXT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
- Create a table named
- MongoDB:
- Create a database named
blogdatawith a collection namedfullblogs.
- Create a database named
- Supabase:
-
Copy Project Files:
- Ensure all files from the provided structure (e.g.,
app/,lib/,components.json, etc.) are in place. Refer to the project structure above.
- Ensure all files from the provided structure (e.g.,
-
Run the Application:
npm run dev
- Open
http://localhost:3000in your browser.
- Open
-
Homepage (
http://localhost:3000):- Enter a valid blog URL (e.g., a Medium article) in the input field.
- Click the "Submit" button.
- The app will scrape the blog, generate summaries, and redirect to the summary page.
-
Summary Page (
http://localhost:3000/summary?...):- Displays:
- A preview of the scraped blog text (truncated to 500 characters).
- An English summary (5 sentences, keyword-based).
- An Urdu translation of the summary.
- A success message if data is saved to Supabase and MongoDB.
- If an error occurs (e.g., invalid URL), an alert will display.
- Displays:
-
Database Verification:
- Check Supabase (
summariestable) for English and Urdu summaries. - Check MongoDB (
blogdata.fullblogs) for the full blog text.
- Check Supabase (
- User Input:
- The user enters a blog URL in the
URLInputFormcomponent on the homepage.
- The user enters a blog URL in the
- Scraping:
- The URL is sent to
/api/scrapeBlog, which uses Cheerio to extract paragraph text.
- The URL is sent to
- Summarization:
- The scraped text is sent to
/api/summariseText, which:- Splits text into sentences.
- Removes stopwords and extracts keywords.
- Scores sentences based on keyword frequency.
- Selects the top 5 sentences as the summary.
- The scraped text is sent to
- Translation:
- The English summary is sent to
/api/translateToUrdu, which translates words using a custom dictionary, leaving unmatched words unchanged.
- The English summary is sent to
- Data Storage:
- The
/api/saveDataroute saves:- English and Urdu summaries to Supabase (
summariestable). - Full blog text to MongoDB (
fullblogscollection).
- English and Urdu summaries to Supabase (
- The
- UI Display:
- The
SummaryCardcomponent displays the results using Shadcn UI’sCard. - Errors are shown via the
ErrorAlertcomponent. - A
LoadingSpinnerappears during processing.
- The
- Supabase (
summariestable):id: SERIAL PRIMARY KEY blog_url: TEXT NOT NULL summary_en: TEXT NOT NULL summary_urdu: TEXT NOT NULL created_at: TIMESTAMP DEFAULT CURRENT_TIMESTAMP
- MongoDB (
fullblogscollection):{ "_id": ObjectId, "blog_url": String, "full_text": String, "created_at": Date }
- Styles Not Applied:
- Ensure
app/globals.cssis imported inapp/layout.jsx. - Verify
tailwind.config.jsincludescomponents/**/*.{js,jsx}in thecontentarray. - Run
npm run devto rebuild.
- Ensure
- Urdu Text Misaligned:
- Check that
globals.cssincludes:@import url("https://fonts.googleapis.com/css2?family=Noto+Nastaliq+Urdu&display=swap"); [dir="rtl"] { direction: rtl; font-family: "Noto Nastaliq Urdu", sans-serif; }
- Ensure
SummaryCard.jsxhasdir="rtl"on the Urdu summary<p>tag.
- Check that
- Database Errors:
- Verify
.env.localhas correctSUPABASE_URL,SUPABASE_KEY, andMONGO_URI. - Check Supabase and MongoDB dashboard for table/collection setup.
- Verify
- Scraping Issues:
- Some websites may block scraping. Test with simple blogs (e.g., Medium articles).
- Consider using a headless browser like Puppeteer for complex sites (not implemented here).
- History Table: Add a
HistoryTablecomponent to display past summaries from Supabase using Shadcn UI’sTablecomponent (npx shadcn-ui@latest add table). - Expanded Dictionary: Enhance the Urdu translation dictionary in
translateToUrdu/route.jsfor better coverage. - Navigation Bar: Add a header with a "Back to Home" link in
layout.jsxusing Shadcn UI’sButton. - Advanced Summarization: Integrate a more sophisticated summarization algorithm (e.g., using an external API).
- Error Retry: Add retry logic for failed API calls.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a branch (
git checkout -b feature/your-feature). - Commit changes (
git commit -m "Add your feature"). - Push to the branch (
git push origin feature/your-feature). - Open a pull request.
For questions, feedback, or collaboration, please contact: Rafay Adeel
This project is licensed under the Apache 2.0 License.