Playwright AI Agent - Full Page Screenshotter

This project uses Playwright to automate taking full-page screenshots of web pages listed in a Google Sheet, uploads them to Google Drive, and updates the sheet with the Drive links.

Features

Reads URLs from a specified Google Sheet range.
Takes full-page screenshots using Playwright.
Handles cookie loading from a cookies.json file.
Attempts to automatically accept cookie consent banners.
Uploads screenshots to a specified Google Drive folder.
Updates the Google Sheet with direct links to the uploaded screenshots.
Logs processing details and errors to a file (playwright_processing_log.txt by default).

Prerequisites

Python 3.8+
Access to Google Drive and Google Sheets APIs.
A Google Cloud Platform project with the Drive and Sheets APIs enabled.
Service account credentials (credentials.json) for accessing Google APIs.

Setup

Clone the repository (or create the project directory playwright-ai-agent).

Create a Python virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements_playwright.txt

Install Playwright browsers:
```
playwright install
```
(This will install default browsers like Chromium, Firefox, WebKit. Chromium is used by default in this script).
Set up Google Credentials:
- Follow the Google Cloud documentation to create a service account and download its JSON key file.
- Rename the key file to credentials.json and place it in the playwright-ai-agent root directory.
- Ensure this service account has necessary permissions for the target Google Sheet (edit access) and Google Drive folder (edit or write access).
Prepare cookies.json (Optional but Recommended):
- If the target websites require login or have specific cookie-based states you want to capture, use a browser extension (like "Get cookies.txt" or similar, making sure it can export in JSON format compatible with Selenium/Playwright) to export cookies for the relevant domains after logging in manually.
- Save these cookies as cookies.json in the playwright-ai-agent root directory.
Configure Environment Variables:
- Create a .env file in the playwright-ai-agent root directory by copying .env.example:
```
cp .env.example .env
```
- Edit the .env file with your specific details:
  - SPREADSHEET_ID: The ID of your Google Sheet.
  - URL_RANGE: The range in your sheet where URLs are listed (e.g., Sheet1!B2:B). Column B for URLs, Column C for GDrive links is assumed by default for is_url_processed and update_metadata.
  - FOLDER_ID: The ID of the Google Drive folder where screenshots will be uploaded.
  - COOKIES_PATH: Path to your cookies file (default: cookies.json).
  - SCREENSHOTS_DIR: Temp directory for screenshots (default: screenshots).
  - LOG_FILE: Path for the log file (default: playwright_processing_log.txt).
  - GOOGLE_APPLICATION_CREDENTIALS: Path to your credentials file (default: credentials.json).
  - INTER_URL_DELAY_SECONDS: Delay between processing URLs (default: 3.0).
  - HEADLESS_BROWSER: Run Playwright in headless mode (True or False, default: True).

Running the Script

Once setup is complete, run the main script from the playwright-ai-agent directory:

python main_playwright.py

Project Structure

(Refer to STRUCTURE.md for a detailed directory layout)

Logging

Detailed logs are saved to playwright_processing_log.txt (or as configured in .env).
Logs are also printed to the console.

Notes

Ensure your Google Sheet is set up with URLs in the specified URL_RANGE (e.g., column B). The script expects to write GDrive links to the next column (e.g., column C).
The is_url_processed function in utils/gsheet_utils.py checks column C by default to see if a GDrive link already exists. Modify this if your sheet structure is different.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
utils		utils
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.MD		ARCHITECTURE.MD
README.md		README.md
STRUCTURE.md		STRUCTURE.md
main_playwright.py		main_playwright.py
requirements_playwright.txt		requirements_playwright.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Playwright AI Agent - Full Page Screenshotter

Features

Prerequisites

Setup

Running the Script

Project Structure

Logging

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Playwright AI Agent - Full Page Screenshotter

Features

Prerequisites

Setup

Running the Script

Project Structure

Logging

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages