Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 121 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,126 @@ This project uses a Docker-based approach for AWS Lambda, which offers several a

## 🚀 Getting Started

### Function Variants

This project provides two variants of the PDF-to-JPG converter function:

#### 1. Standard Version (`pdf-to-jpg-converter`)

- Returns all converted images in a single ZIP file
- Ideal for handling multi-page PDFs where you want to keep all pages together
- Requires additional processing in n8n to unzip the contents

#### 2. Unzipped Version (`pdf-to-jpg-converter-unzipped`)

- Returns individual JPG images directly in the response (no ZIP file)
- Each image is returned as a base64-encoded string with its filename and content type
- Simplifies processing in n8n as no unzipping is required
- Ideal when you need to process each page individually

### Deploying the Functions

#### Standard Version (ZIP output)
```bash
# Make the deployment script executable
chmod +x build_and_deploy.sh

# Run the deployment script
./build_and_deploy.sh
```

#### Unzipped Version (Individual JPGs)
```bash
# Make the deployment script executable
chmod +x build_and_deploy_unzipped.sh

# Run the deployment script
./build_and_deploy_unzipped.sh
```

### n8n Integration

#### For the Standard Version (ZIP output)
```javascript
/**
* Make sure you have installed `jszip` in your n8n environment!
* For example, in your Dockerfile or on your server:
* npm install jszip
*/

const JSZip = require('jszip');

// 1) Get the base64-encoded string of the ZIP data.
const base64Data = $input.first().json.result.body;
const binaryData = Buffer.from(base64Data, 'base64');

// 2) Load the ZIP content using jszip
const zip = new JSZip();
return zip.loadAsync(binaryData)
.then(async (contents) => {
const items = [];

// 3) Loop over each file in the ZIP
for (const fileName of Object.keys(contents.files)) {
const file = contents.files[fileName];

// If it's not a directory, read the file contents
if (!file.dir) {
const fileBuffer = await file.async('nodebuffer');

// 4) Return each unzipped file as a *separate* n8n item
items.push({
json: {
fileName
},
binary: {
// Use a property name like "data" or anything you want
data: {
data: fileBuffer.toString('base64'),
mimeType: 'application/jpeg', // or something more specific if you know
fileName
}
}
});
}
}

// Return an array of items, each with a single unzipped file
return items;
});
```

#### For the Unzipped Version (Individual JPGs)
```javascript
// 1) Get the input data from the Lambda response
const lambdaResponse = $input.first().json.result;

// 2) Parse the body if it's a string, or use it directly if it's already an object
let responseBody;
try {
responseBody = typeof lambdaResponse.body === 'string'
? JSON.parse(lambdaResponse.body)
: lambdaResponse.body;
} catch (error) {
throw new Error(`Failed to parse Lambda response body: ${error.message}`);
}

// 3) Convert each image into an n8n item
return responseBody.images.map(image => ({
json: {
fileName: image.filename,
totalPages: responseBody.total_pages
},
binary: {
data: {
data: image.content,
mimeType: image.content_type,
fileName: image.filename
}
}
}));
```

### Prerequisites

- AWS Account
Expand Down Expand Up @@ -375,4 +495,4 @@ If you need this solution built for you or want personalized guidance, you can s

## 📄 License

MIT
MIT
161 changes: 161 additions & 0 deletions build_and_deploy_unzipped.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
#!/bin/bash

# Set variables
LAMBDA_FUNCTION_NAME="pdf-to-jpg-converter-unzipped"
ECR_REPOSITORY_NAME="pdf-to-jpg-converter-unzipped"
LAMBDA_ROLE_NAME="lambda-execution-role"
AWS_REGION="us-east-1" # Change to your preferred region
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color

# Disable AWS CLI pager to avoid needing to press 'q'
export AWS_PAGER=""

echo -e "${YELLOW}Building and deploying Docker-based Lambda function...${NC}"

# Check if the Lambda execution role exists, create if it doesn't
echo -e "${YELLOW}Checking if Lambda execution role exists...${NC}"
if ! aws iam get-role --role-name ${LAMBDA_ROLE_NAME} &> /dev/null; then
echo -e "${YELLOW}Creating Lambda execution role: ${LAMBDA_ROLE_NAME}${NC}"
aws iam create-role --role-name ${LAMBDA_ROLE_NAME} \
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' \
&> /dev/null

aws iam attach-role-policy --role-name ${LAMBDA_ROLE_NAME} \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole \
&> /dev/null

echo -e "${YELLOW}Waiting for role to propagate (10 seconds)...${NC}"
sleep 10

if [ $? -ne 0 ]; then
echo -e "${RED}Failed to create Lambda execution role.${NC}"
exit 1
else
echo -e "${GREEN}Lambda execution role created successfully.${NC}"
fi
else
echo -e "${GREEN}Lambda execution role already exists.${NC}"
fi

# Get the role ARN
LAMBDA_ROLE_ARN=$(aws iam get-role --role-name ${LAMBDA_ROLE_NAME} --query 'Role.Arn' --output text)
echo -e "${GREEN}Using Lambda role: ${LAMBDA_ROLE_ARN}${NC}"

# Create ECR repository if it doesn't exist
echo -e "${YELLOW}Checking if ECR repository exists...${NC}"
if ! aws ecr describe-repositories --repository-names ${ECR_REPOSITORY_NAME} --region ${AWS_REGION} &> /dev/null; then
echo -e "${YELLOW}Creating ECR repository: ${ECR_REPOSITORY_NAME}${NC}"
aws ecr create-repository --repository-name ${ECR_REPOSITORY_NAME} --region ${AWS_REGION}
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to create ECR repository.${NC}"
exit 1
fi
else
echo -e "${GREEN}ECR repository already exists.${NC}"
fi

# Authenticate Docker to ECR
echo -e "${YELLOW}Authenticating Docker to ECR...${NC}"
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to authenticate Docker to ECR.${NC}"
exit 1
fi

# Create a temporary Dockerfile for the unzipped version
echo -e "${YELLOW}Creating temporary Dockerfile...${NC}"
cat > Dockerfile.unzipped << EOL
FROM public.ecr.aws/lambda/python:3.9

# Install system dependencies
RUN yum update -y && \
yum install -y poppler poppler-utils && \
yum clean all

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt --no-cache-dir

# Copy function code
COPY lambda_function_unzipped.py /var/task/app.py

CMD [ "app.lambda_handler" ]
EOL

# Build Docker image
echo -e "${YELLOW}Building Docker image...${NC}"
docker build --platform linux/amd64 -t ${ECR_REPOSITORY_NAME}:latest -f Dockerfile.unzipped .
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to build Docker image.${NC}"
exit 1
fi

# Tag Docker image
echo -e "${YELLOW}Tagging Docker image...${NC}"
docker tag ${ECR_REPOSITORY_NAME}:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPOSITORY_NAME}:latest
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to tag Docker image.${NC}"
exit 1
fi

# Push Docker image to ECR
echo -e "${YELLOW}Pushing Docker image to ECR...${NC}"
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPOSITORY_NAME}:latest
if [ $? -ne 0 ]; then
echo -e "${RED}Failed to push Docker image to ECR.${NC}"
exit 1
fi

# Check if Lambda function exists
echo -e "${YELLOW}Checking if Lambda function exists...${NC}"
if aws lambda get-function --function-name ${LAMBDA_FUNCTION_NAME} --region ${AWS_REGION} &> /dev/null; then
# Update existing Lambda function
echo -e "${YELLOW}Updating existing Lambda function...${NC}"
aws lambda update-function-code \
--function-name ${LAMBDA_FUNCTION_NAME} \
--image-uri ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPOSITORY_NAME}:latest \
--region ${AWS_REGION}

if [ $? -ne 0 ]; then
echo -e "${RED}Failed to update Lambda function.${NC}"
exit 1
fi
else
# Create new Lambda function
echo -e "${YELLOW}Creating new Lambda function...${NC}"
aws lambda create-function \
--function-name ${LAMBDA_FUNCTION_NAME} \
--package-type Image \
--code ImageUri=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPOSITORY_NAME}:latest \
--role ${LAMBDA_ROLE_ARN} \
--timeout 30 \
--memory-size 1024 \
--region ${AWS_REGION}

if [ $? -ne 0 ]; then
echo -e "${RED}Failed to create Lambda function.${NC}"
exit 1
fi

# Set up CloudWatch Logs log group with retention
echo -e "${YELLOW}Creating CloudWatch Logs log group...${NC}"
LOG_GROUP_NAME="/aws/lambda/${LAMBDA_FUNCTION_NAME}"

aws logs create-log-group --log-group-name ${LOG_GROUP_NAME} --region ${AWS_REGION}
aws logs put-retention-policy --log-group-name ${LOG_GROUP_NAME} --retention-in-days 7 --region ${AWS_REGION}
fi

# Clean up temporary Dockerfile
rm Dockerfile.unzipped

echo -e "${GREEN}Docker-based Lambda function deployed successfully!${NC}"
echo -e "${GREEN}Function ARN: $(aws lambda get-function --function-name ${LAMBDA_FUNCTION_NAME} --query 'Configuration.FunctionArn' --output text --region ${AWS_REGION})${NC}"
echo -e "${GREEN}You can invoke this Lambda directly using the AWS CLI or SDK:${NC}"
echo -e "${GREEN}aws lambda invoke --function-name ${LAMBDA_FUNCTION_NAME} --payload '<base64-encoded-pdf-or-url-json>' output.txt${NC}"
echo -e "${GREEN}Deployment complete!${NC}"
Loading