Do not overlook
todo.md, where I listed some tasks that must be completed to make this app production-ready.
This is a distributed web crawler project organized as a monorepo. It includes multiple packages and apps to handle crawling, scheduling, storage, and worker management in a scalable and modular way.
The monorepo is structured using a workspace-based setup (pnpm workspaces), with the following key packages:
| Package | Description |
|---|---|
database |
Contains database entities, migrations, and ORM setup. |
worker |
Worker service responsible for crawling and processing jobs. |
scheduler |
Scheduler app that queues crawling tasks for workers. |
shared |
Shared utilities, types, and helpers used across other packages. |
queue |
Contains BullMQ setup. |
You can adjust this table to match your actual package structure.
- Node.js >= 20
pnpm>= 7- Docker
pnpm installCreate a .env file in the root. Use .env.example file as a template.
cat .env.example > .env
You need to run Redis and PostgreSQL for development purposes. Run
docker compose up redis postgresor use use local instances:
docker compose up redis postgres -d
recommended
pnpm dev
Optional
You can run individual packages using pnpm --filter:
pnpm --filter @crawler/worker dev
pnpm --filter @crawler/scheduler dev
Run application in docker
Stop your Redis and Postgres instances, if you already used the docker for development purposes
docker compose down
docker compose build
docker compose up -d
This app is only the first iteration; some tasks need to be completed before running it in production.
See
todo.mdfor more info