Skip to content

Feature: Embedded JavaScript extraction (PDF) #97

@coderabbitai

Description

@coderabbitai

Overview

Extract embedded JavaScript code from PDF documents for security analysis.

Parent Epic

Part of #91 - Document & Office Format Awareness

Description

PDFs can contain JavaScript in various locations (actions, annotations, form fields). Extract this code for analysis.

Implementation Details

  • Parse JavaScript actions (/JS, /JavaScript)
  • Extract from document-level scripts
  • Extract from page actions (OpenAction)
  • Extract from form field actions
  • Handle both string and stream JavaScript

String Sources

  • JavaScript code
  • Function names
  • Variable names
  • String literals within JavaScript
  • API calls (app., doc., etc.)

Acceptance Criteria

  • Extract document-level JavaScript
  • Extract page-level JavaScript
  • Extract form field scripts
  • Handle obfuscated JavaScript
  • Pretty-print JavaScript output
  • Tests with JS-enabled PDFs

Security Note

This is important for malware analysis as malicious PDFs often contain JavaScript exploits.

Related

Project: #76

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions