-
Models
Movie(title, poster, description, director→FK, release_year, main_actors↔M2M, genres↔M2M)Director(first_name, last_name)Actor(first_name, last_name)Genre(name)Rating(user→FK, movie→FK, score)with a unique constraint on(user, movie).Review(user→FK, movie→FK, title, body)with a unique constraint on(user, movie).
-
User Authentication
- Standard Django
Usermodel for authentication. - Signup:
accounts/signup.htmlwithUserCreationForm. - Login/Logout:
accounts/login.htmland standard Django views. - Display of user status in the header (
base.html).
- Standard Django
-
Frontend Views (Function-Based)
- Movie List (
movie_list):- Displays a paginated grid of movies (25 per page).
- Supports sorting by title, release year, and newest.
- Uses
select_relatedandprefetch_relatedfor efficient querying.
- Movie Details (
movie_details):- Full movie information.
- Shows average rating, rating distribution, and community reviews.
- AJAX-powered star rating submission.
- Form for submitting new reviews.
- Displays up to 5 similar movies based on shared genres.
- Movie List (
-
Admin
- All six models registered in Django Admin for easy add/edit/browse.
-
Data collection (Jupyter Notebook)
- Fetches popular movies from TMDB by genre.
- Skips adult titles, deduplicates across genres, and validates poster downloads.
- Saves posters locally (manually moved to
media/posters/) and writestmdb_movies.jsonwith only the fields my app needs. - Keeps TMDB IDs only in the JSON (for debugging/re-runs); not stored in the DB.
-
Import endpoint (function-based DRF view)
POST /api/import_tmdb_movies/(admin-only viaIsAdminUser).- Accepts either raw JSON or an uploaded JSON file.
- Upserts movies by
(title, release_year). - Splits names and creates/links
Director, creates/links up to 4Actors, and attaches allGenres (M2M). - Assigns posters by relative path if the file exists in the Django default storage (local
MEDIA_ROOT/posters/in dev, S3 in prod).
-
Result of my import run - tmdb_movies.json
- Created 119 movies, 19 genres, 112 directors, 422 actors.
-
AI Chatbot (Gemini/Bedrock)
- Backend:
- A view
chatbot_apithat receives user messages and chat history. - Provider is selected by env var
AI_PROVIDER(dev:gemini, prod on EC2:bedrock). - Two-step flow: (1) parse intent/criteria, (2) query DB and generate a short reply with up to 3 recommendations.
- A view
- Frontend:
- A chat widget in
base.html. static/js/chat.jshandles the communication with thechatbot_apiendpoint.- Displays AI responses and movie recommendations with links to the movie details page.
- A chat widget in
- Backend:
-
Clone the Repository
git clone https://github.com/yanivraveh/mymdb cd mymdb -
Create and Activate a Virtual Environment
- For Windows:
python -m venv .venv .\.venv\Scripts\activate
- For macOS/Linux:
python3 -m venv .venv source .venv/bin/activate
- For Windows:
-
Install Dependencies
pip install -r requirements.txt
-
Set Up Environment Variables
- Navigate to the inner
mymdbdirectory (the one withmanage.py). - Create a file named
.envin that same directory. - Add your Gemini API key to this file:
GEMINI_API_KEY="your_api_key_here"
- Navigate to the inner
-
Database Migrations
- Note: The provided database
db.sqlite3is already migrated.- Running this command will show "No migrations to apply."
- It would only be necessary if you deleted the database file to start with an empty one.
python manage.py migrate
- Note: The provided database
-
Create an Admin Superuser
- You'll need an admin account to access the Django admin panel.
python manage.py createsuperuser
-
Run the Development Server
- Make sure you are in the directory containing
manage.py.
python manage.py runserver
The website will be running at
http://127.0.0.1:8000/. - Make sure you are in the directory containing
-
How the Movie Data Was Imported
Note: The following steps are for documentation only. The provided database is already populated.
- The
fetch_tmdb_movies.ipynbnotebook was run to download posters and create thetmdb_movies.jsonfile. - The downloaded posters were moved from their initial location to the
media/posters/directory. - A tool like Postman was used to make a
POSTrequest to the/api/import_tmdb_movies/endpoint.- Authentication was done using the admin superuser credentials.
- The request body contained the
tmdb_movies.jsondata.
- The data was verified via the /admin panel.
- The
This section documents how the same codebase runs locally and on AWS, in the order we recommend building it.
- Two env files (not committed):
.env.dev(local),.env.prod(EC2). Load viaDEPLOY_ENVanddotenv.load_dotenv(BASE_DIR / f".env.{DEPLOY_ENV}"). - Keep
DEBUG=Trueper course;ALLOWED_HOSTS=["*"]is acceptable. - Dockerfile (Daphne) and
.dockerignoreat repo root; static served by Django withstaticfiles_urlpatterns().
- EC2 role: allow S3 read/write on your media bucket, ECR pull, and
bedrock:InvokeModel(andInvokeModelWithResponseStreamif needed). - Bedrock console: enable access to your chosen model (e.g., Haiku or Sonnet).
- Build/push (replace account/region):
docker build --pull --no-cache -t mymdb:prod . docker tag mymdb:prod <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/mymdb-repo:prod aws ecr get-login-password --region <REGION> | docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com docker push <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/mymdb-repo:prod
- In prod:
ENGINE=django.db.backends.postgresql(dev stays SQLite). - Security Groups: RDS inbound
5432from the EC2 instance SG. - Console steps (quick):
- Create database → Engine: PostgreSQL → Templates: Free tier.
- DB instance identifier: choose a name; Master username/password.
- Storage: gp3 (default is fine for demo).
- Connectivity: Public access = Yes; Do not connect to an EC2 compute resource.
- Create DB and note the Endpoint (hostname) for
.env.prod.
django-storagesdefault storage to S3 (locationmedia,default_acl: public-reador a public bucket policy).MEDIA_URLpoints to your bucket domain (avoid/media/media/).- Console steps (quick):
- Create bucket → General purpose → Bucket name.
- ACLs: Enabled; Block public access: uncheck all (for demo).
- Upload posters from repo path:
mymdb/mymdb/media/posters/(keys will be undermedia/posters/...).
-
Launch an Ubuntu instance with the IAM role; open inbound TCP
8000for testing. -
Connect to the instance using EC2 Instance Connect or SSH
-
Set and Copy the local
.env.prodto the instance's location/home/ubuntu/mymdb/.env.prod. -
Install Docker and AWS CLI
https://docs.docker.com/engine/install/ubuntu/https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html -
Allow docker command without sudo:
sudo usermod -aG docker $USER newgrp docker -
Login to ECR and pull the image:
aws ecr get-login-password --region <REGION> \ | docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com docker pull <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/mymdb-repo:prod
-
Run container:
docker run -d --name mymdb-instance-01 \ -p 8000:8000 --restart unless-stopped \ --env-file /home/ubuntu/mymdb/.env.prod \ -w /app/mymdb \ <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/mymdb-repo:prod
-
Enter the container and Run migrations and create admin inside the container:
docker exec -it mymdb-instance-01 bash python manage.py migrate python manage.py createsuperuser
POST /api/import_tmdb_movies/(admin auth). Accepts raw JSON orfileupload.- Poster keys like
posters/xyz.jpgresolve against S3 in prod.
-
Env:
AI_PROVIDER=bedrock,AWS_REGION,BEDROCK_MODEL_ID(no quotes). -
If throttled or non-JSON text: use Haiku, lower
max_tokens, settemperature=0, and rely on JSON fallback in code. -
Note: Bedrock returns plain text (not valid JSON) for the conversational step, so our JSON parse failed and the fallback kicked in. To reduce this:
- Tighten the conversational system instruction to “return ONLY a JSON object { "text": "..." }”.
- Set temperature=0 in the Bedrock call to make it stick to “JSON only”.
- Improve fallback: if JSON parse fails, use the model’s text as the message and still attach your DB links, not the generic line.
- Create Target Group (Instances, name it, HTTP 80, health path
/, register the two instances to port 8000). - Create ALB (listener 80) → forward to TG; restrict EC2 SG to ALB SG.
- WebSockets: Django Channels (Daphne) with
AuthMiddlewareStack(login required). - Kafka in the loop:
- Producer: WS consumer publishes chat events to topic (
chat-messages, key = room slug). - Broadcaster: a background Kafka consumer embedded in the ASGI process rebroadcasts to the local Channels group (works without Redis).
- Persister: a single management command consumer (
persist_kafka) writesRoomandMessagerows to DB using an idempotentevent_id.
- Producer: WS consumer publishes chat events to topic (
- React frontend: Vite dev server with proxy (
/apiand/ws→http://localhost:8000), rendered inside a small iframe panel launched frombase.html.
chat.Room(name, slug, created_by, created_at)–slugis unique.chat.Message(event_id UUID unique, room→FK, user→FK nullable, content, created_at)–event_idguarantees exactly‑once inserts.
- REST for React:
GET /api/chat/rooms/– list rooms.POST /api/chat/rooms/?name=...– create if missing (auth required).GET /api/chat/rooms/<slug>/messages?limit=50– recent history.
- WebSocket:
ws://<host>/ws/chat/<slug>/(regex allows hyphens:[-\w]+).
- We use the embedded broadcaster:
chat/kafka_background.pystarts a Kafka consumer inside the ASGI process (viachat/apps.py). It rebroadcasts to Channels groups locally, which works with the in‑memory channel layer (no Redis). - Do NOT run
python manage.py broadcast_kafkain normal use. It lives in a separate process and cannot deliver to WebSocket groups with the in‑memory layer. Keep it only for debugging, or if you implement the HTTP relay pattern (external consumer POSTs to a Django endpoint that callsgroup_send). - Normal dev run: Kafka → Daphne (ASGI) → embedded broadcaster → clients; plus a single
persist_kafkaprocess to write to DB.
Prereqs: Docker Desktop, Node 18+, Python venv
- Start Kafka (+ UI)
docker compose -f kafka-infra/docker-compose.yml up -d- Start Django (ASGI)
daphne -b 0.0.0.0 -p 8000 mymdb.asgi:application(Do not run broadcast_kafka; broadcaster is embedded in ASGI.)
- Start DB writer (single process)
python manage.py persist_kafka- Start React dev
cd chat-frontend && npm run devOpen MyMDB and click the 💬 bubble to open the chat panel (an iframe to the React app).
BOOTSTRAP_SERVERS=localhost:29092TOPIC_NAME=chat-messages- Embedded broadcaster: optional
GROUP_ID(leave empty in dev; in K8s set a unique value per pod). - Persister:
PERSIST_GROUP_ID=chat-db-writer(single global writer).
If the browser refuses to load .js/.css as “text/plain”, add in settings.py:
import mimetypes
mimetypes.add_type("application/javascript", ".js", True)
mimetypes.add_type("text/javascript", ".js", True)
mimetypes.add_type("text/css", ".css", True)- Vite proxy in
vite.config.jsroutes/apiand/wstohttp://localhost:8000. CSRF_TRUSTED_ORIGINSincludeshttp://localhost:5173.- Frontend sends
X-CSRFTokenon POST to/api/chat/rooms/(uses cookiecsrftoken).
- Retrieval: FAISS (RAM-only) built at process start from
Movierows. Embeddings viasentence-transformers(no API cost). - Generation: single Gemini call per request; no Gemini embeddings used.
- Intent gate (rule-based): “search”, “more”, “summary”, “off_topic”. OFF-topic/vague shows a clarifying question (no recs). “more” excludes already-shown and inherits prior genres if none specified.
- Filters: genres and year/decade recognized from the user message.
Env (backend):
RAG_TOP_K(default 5): number of candidates to retrieve before filtering/sorting.RAG_MAX_DOCS(default 500): cap movies embedded at startup.RAG_MIN_SIM(default 0.25): min cosine similarity to accept RAG results; below → no recs (clarify instead).RAG_EMBED_MODEL(defaultall-MiniLM-L6-v2).
Local deps:
faiss-cpu,numpy,sentence-transformers(installed in venv; listed in requirements).
Troubleshooting:
- If the first request is slow in a new pod: that’s the one-time model download + FAISS build.
- If responses feel too eager: increase
RAG_MIN_SIM(e.g., 0.35–0.45). - If variety is low: raise
RAG_TOP_K(e.g., 10–15).
- Backend: 3 replicas; each pod runs the embedded broadcaster with a unique
GROUP_IDso every pod receives every message and fans out to its own sockets (no Redis needed per course scope). - Persister: 1 replica deployment with
PERSIST_GROUP_ID=chat-db-writer. - React: single deployment/service; expose via Ingress/LoadBalancer. Kafka can run in‑cluster or externally.
K8s changes after RAG upgrade:
- No new services or components. Only add envs to the backend Deployment:
RAG_TOP_K,RAG_MAX_DOCS,RAG_MIN_SIM,RAG_EMBED_MODEL.
- Prereqs: Docker Desktop, kubectl, minikube.
- Start cluster
minikube start --driver=docker --cpus=4 --memory=8192
kubectl create namespace mymdb- Infra (Postgres, Kafka, Kafka UI)
kubectl apply -n mymdb -f k8s/postgres.yaml
kubectl apply -n mymdb -f k8s/kafka.yaml
kubectl apply -n mymdb -f k8s/kafka-ui.yaml
kubectl rollout status -n mymdb deployment/zookeeper --timeout=180s
kubectl rollout status -n mymdb deployment/kafka --timeout=180s
kubectl rollout status -n mymdb deployment/postgres --timeout=180s
# Optional UI:
# kubectl -n mymdb port-forward svc/kafka-ui 8080:8080- Media PVC (for movie posters)
kubectl apply -n mymdb -f k8s/media.yaml- Build and push images (Docker Hub)
# backend (Daphne + Channels)
docker build -t yanivraveh/mymdb-backend:dev .
docker push yanivraveh/mymdb-backend:dev
# frontend (React build served by Nginx)
docker build -t yanivraveh/mymdb-frontend:dev ./chat-frontend
docker push yanivraveh/mymdb-frontend:dev- Deploy backend (3 replicas) and persister
kubectl apply -n mymdb -f k8s/backend.yaml
kubectl apply -n mymdb -f k8s/persister.yaml
kubectl rollout status -n mymdb deployment/mymdb-backend --timeout=180s
kubectl rollout status -n mymdb deployment/mymdb-persister --timeout=180s- Deploy frontend (Nginx)
kubectl apply -n mymdb -f k8s/frontend.yaml
kubectl rollout status -n mymdb deployment/mymdb-frontend --timeout=180s- Expose LoadBalancers (Windows requires tunnel)
minikube tunnel # keep this terminal open
kubectl get svc -n mymdb mymdb-backend mymdb-frontend
# EXTERNAL-IP will be 127.0.0.1 (backend:8000, frontend:80)- Migrate DB (run once)
POD=$(kubectl get pods -n mymdb -l app=mymdb-backend -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n mymdb $POD -- sh -lc "cd /app/mymdb && python manage.py migrate --noinput"- Create a superuser (for admin and import)
kubectl exec -n mymdb $POD -- sh -lc \
'export DJANGO_SUPERUSER_USERNAME=admin DJANGO_SUPERUSER_EMAIL=admin@example.com DJANGO_SUPERUSER_PASSWORD=admin123 && python manage.py createsuperuser --noinput'- Posters (copy into the media PVC)
POD=$(kubectl get pods -n mymdb -l app=mymdb-backend -o jsonpath='{.items[0].metadata.name}')
kubectl cp mymdb/media/posters $POD:/app/mymdb/media -n mymdb
# in PowerShell use ${POD} instead of $POD- Set Gemini API key and iframe URL
kubectl set env -n mymdb deployment/mymdb-backend GEMINI_API_KEY="YOUR_KEY"
# For Minikube, iframe points to frontend LB on 127.0.0.1:80
kubectl set env -n mymdb deployment/mymdb-backend CHAT_IFRAME_SRC="http://127.0.0.1/"
#Update backend
kubectl rollout status -n mymdb deployment/mymdb-backend --timeout=180s- Seed data
- Use Postman against
http://127.0.0.1:8000/api/import_tmdb_movies/(file upload tmdb_movies.json),
- Test
- Open
http://127.0.0.1:8000/(backend).
Troubleshooting
- Auth/CSRF: use 127.0.0.1 consistently for both frontend and backend in Minikube.
- Function-Based Views (not CBVs), per instruction.
- Essential fields only (no full TMDB clone).
- No serializers for this admin-only import to keep it simple (validation handled inline).
- No external IDs in the DB (kept in JSON only).
- Four main actors enforced during import (exercise requirement).
- Posters handled as files in
media/posters/(not uploaded via API). - Jupyter notebook for data collection: I could have created a Django management command to fetch data from TMDB and import directly to the database, but since we haven't covered management commands in class yet, I went with the Jupyter notebook approach to collect the data first, then import via API.
- Separate
accountsapp: While the current authentication features could live inside the main project, creating a dedicatedaccountsapp is a forward-thinking choice. I may want to add a user profile model with additional fields. - Project Naming: I recognize that having the repository, the Django project, and the main settings application all named
mymdbis a bit confusing. This was an unintentional result of the initial setup. I plan to fix it and in future projects, I'll use more distinct naming (e.g.,coreorconfigfor the settings app, backend or server for the back) to improve clarity.
Things i am thinkging about or haven't got to them yet:
- Data Fetching: The Jupyter notebook could be replaced with a more integrated Django management command to fetch a wider range of movie data and handle poster downloads automatically.
- AI Chatbot: Missing streaming for the chatbot's responses and more tunings and tweakings are needed for the conversational instructions.
- AI Integration: Upgraded the two-step AI intent parsing to FAISS's RAG.
- UI/UX: Both chats currentyly overlap each other, and other minor ui and ux enhancements can be implemented to improve navigation and usability.
- Project Structure: The current project layout, with the
venvandrequirements.txtin the rootMyMDBdirectory, is a result of the initial PyCharm setup. A future refactor could involve moving these into themymdbbackend directory and restructuring the root to accommodate a separatefrontendapplication, creating a more conventional monorepo structure.
Thank you for reviewing my project.