Skip to content
Riddle edited this page Jul 27, 2025 · 2 revisions

Yes, this is AI generated. It's better than what you had before (Nothing)


phply Documentation

Table of Contents

  1. Overview
  2. Installation
  3. Getting Started: Programmatic Usage
  4. Command-Line Tools
  5. How-To Guides
  6. API Reference
  7. Project Information

1. Overview

What is phply?

phply is a parser for the PHP programming language written entirely in Python. It uses the popular PLY (Python Lex-Yacc) library to implement a token-for-token compatible lexer and a parser that generates a detailed Abstract Syntax Tree (AST) for PHP source code.

This allows developers to programmatically read, analyze, and transform PHP code from within a Python environment.

Why phply?

The project was started, in the author's own words, "Because I'm crazy. Because it seemed possible." It serves as a powerful tool for a variety of tasks that involve PHP source code manipulation:

  • Static Analysis: Build custom linters, security scanners, or code metric tools for PHP projects.
  • Code Transformation: Automatically refactor or upgrade legacy PHP codebases.
  • Cross-language Interoperability: Convert PHP code or templates into other languages or formats, such as Python or Jinja2.
  • Education: Learn about the intricacies of parsing a complex, real-world language like PHP.

Core Features

  • PHP-Compatible Lexer: The lexer is designed to produce tokens that match PHP's standard token_get_all function.
  • Rich Abstract Syntax Tree (AST): The parser generates a comprehensive AST that accurately represents the structure and semantics of the input PHP code.
  • Command-Line Tools: A suite of utilities is included for common tasks like converting PHP to JSON, Python, or Jinja2 templates.
  • Pure Python: phply is written in pure Python and depends only on ply, making it easy to install and integrate into Python projects.
  • Resilient Parsing: The lexer and parser are designed to handle many of PHP's syntactic quirks, like the optional closing tag ?> and its interaction with semicolons.

Current Status

phply is considered Production/Stable. It successfully parses most of the PHP 5.x grammar, including complex features like classes, namespaces, closures, and traits. Some newer PHP 7+ features may not be supported.

What's working:

  • Lexer matching the standard PHP lexer.
  • Parser and AST for the vast majority of the PHP grammar.
  • Scripts to convert PHP source to a JSON AST and experimental Jinja2 templates.

What's not (or incomplete):

  • Labels and goto.
  • Full support for the latest PHP 7 and 8 syntax.

2. Installation

You can install phply directly from PyPI using pip. The ply library, its only dependency, will be installed automatically.

pip install phply

To install for development and run tests:

git clone https://github.com/viraptor/phply.git
cd phply
pip install -e .[test]
pytest

3. Getting Started: Programmatic Usage

The two main components you will interact with are the lexer and the parser.

Lexing: Turning PHP Code into Tokens

The lexer, found in phply.phplex, breaks a string of PHP code into a sequence of tokens.

import phply.phplex
import phply.phpparse

php_code = '<?php echo "Hello, " . $world; ?>'

# 1. Create a lexer instance
lexer = phply.phplex.lexer.clone()
lexer.input(php_code)

# 2. Iterate through tokens
while True:
    token = lexer.token()
    if not token:
        break
    # The FilteredLexer (phply.phplex.lexer) automatically handles
    # ignoring whitespace, comments, and converting tags like <?= and ?>
    # into meaningful tokens for the parser.
    print(f"Type: {token.type}, Value: '{token.value}', Line: {token.lineno}")

This will output:

Type: ECHO, Value: 'echo', Line: 1
Type: CONSTANT_ENCAPSED_STRING, Value: '"Hello, "', Line: 1
Type: CONCAT, Value: '.', Line: 1
Type: VARIABLE, Value: '$world', Line: 1
Type: SEMI, Value: ';', Line: 1

Notice how phply.phplex.lexer (which is a FilteredLexer) intelligently converted the closing tag ?> into a SEMI (semicolon) token, a common requirement for valid PHP statements.

Parsing: Building the Abstract Syntax Tree (AST)

The parser, created with phply.phpparse.make_parser(), consumes tokens from the lexer to build an AST. The nodes of this tree are defined in phply.phpast.

import phply.phplex
import phply.phpparse
from pprint import pprint

php_code = '<?php if ($debug) { echo "Debugging..."; } ?>'

# 1. Create a parser instance
parser = phply.phpparse.make_parser()

# 2. The parser's parse() method takes the code and a lexer instance
nodes = parser.parse(php_code, lexer=phply.phplex.lexer.clone())

# 3. The result is a list of AST nodes
pprint(nodes)

The output will be a list containing a single If node object:

[If(expr=Variable(name='$debug'),
    node=Block(nodes=[Echo(nodes=['"Debugging..."'])]),
    elseifs=[],
    else_=None)]

This tree structure can now be walked and analyzed. For a more readable view of the AST, use the php2json tool.

4. Command-Line Tools

phply comes with several helpful command-line scripts.

phplex

Tokenizes a PHP file and prints the token stream. This is useful for debugging the lexer.

Usage:

phplex /path/to/your/file.php

phpparse

Parses a PHP file and pretty-prints the resulting AST using Python's pprint.

Usage:

phpparse /path/to/your/file.php

php2json

Converts a PHP file into a JSON representation of its AST. This is the most convenient way to inspect the AST structure.

Usage:

# Read from a file
php2json < /path/to/your/file.php > output.json

# Pipe from stdin
echo '<?php $a = 1 + 2;' | php2json

Example Output:

[
  [
    "Assignment",
    {
      "lineno": 1,
      "node": [
        "Variable",
        {
          "lineno": 1,
          "name": "$a"
        }
      ],
      "expr": [
        "BinaryOp",
        {
          "lineno": 1,
          "op": "+",
          "left": 1,
          "right": 2
        }
      ],
      "is_ref": false
    }
  ]
]

php2python (Experimental)

Converts PHP source code to Python. The conversion is not perfect, as many PHP constructs do not have a direct equivalent in Python, but it can be a useful starting point.

Usage:

php2python < /path/to/your/file.php > output.py

Example: Input PHP:

<?php
function greet($name) {
    echo "Hello, " . $name;
}
greet("World");

Output Python:

def greet(name):
    (print("Hello, " + name, end=''))
(greet("World"))

php2jinja (Experimental)

Converts a PHP file into a Jinja2 template. This tool is primarily focused on converting PHP template files that mix HTML with PHP logic.

Usage:

php2jinja < /path/to/your/template.phtml > output.html

Mapping Logic:

  • <?php echo $var; ?> becomes {{ var }}
  • <?php if (...): ?> becomes {% if ... %}
  • <?php foreach (...): ?> becomes {% for ... %}
  • <?php include '...'; ?> becomes {% include '...' %}

phpshell

An interactive PHP interpreter PoC. It reads PHP code line by line, parses it, converts it to Python AST, compiles it, and executes it.

Usage:

phpshell
php> echo "Hello, World!";
Hello, World!
php> $a = 10;
php> $b = 20;
php> print $a + $b;
30

5. How-To Guides

How to Traverse the AST with a Visitor

The phpast.Node objects have an accept(visitor) method that implements the visitor pattern. You can create a simple callable class to walk the tree and perform actions on specific nodes.

This example finds all function calls in a PHP script:

import phply.phplex
import phply.phpparse
from phply.phpast import FunctionCall, MethodCall

class FunctionCallVisitor:
    def __init__(self):
        self.calls = []

    def __call__(self, node):
        # This method is called for every node in the tree
        if isinstance(node, FunctionCall):
            self.calls.append(f"Function: {node.name}")
        elif isinstance(node, MethodCall):
            self.calls.append(f"Method: {node.name}")

php_code = """<?php
some_function(1, 2);
$my_object->do_something();
"""

# Parse the code
parser = phply.phpparse.make_parser()
nodes = parser.parse(php_code, lexer=phply.phplex.lexer.clone())

# Create and use the visitor
visitor = FunctionCallVisitor()
for tree_node in nodes:
    tree_node.accept(visitor)

# Print the results
print("Function and method calls found:")
for call in visitor.calls:
    print(f"- {call}")

Output:

Function and method calls found:
- Function: some_function
- Method: do_something

How to Perform Simple Static Analysis

Building on the visitor pattern, you can easily create static analysis tools. For example, let's find all uses of the dangerous eval() function.

import phply.phplex
import phply.phpparse
from phply.phpast import Eval

class EvalVisitor:
    def __init__(self):
        self.evals_found = []

    def __call__(self, node):
        if isinstance(node, Eval):
            self.evals_found.append(node)

# ... (parsing logic as above) ...

# In a real script, you would read this from a file
php_code_with_eval = """<?php
$code = $_GET['code'];
eval($code); // Dangerous!
"""

parser = phply.phpparse.make_parser()
nodes = parser.parse(php_code_with_eval, lexer=phply.phplex.lexer.clone())

visitor = EvalVisitor()
for tree_node in nodes:
    tree_node.accept(visitor)

if visitor.evals_found:
    for node in visitor.evals_found:
        print(f"Dangerous 'eval' call found at line: {node.lineno}")
else:
    print("'eval' not found.")

Output:

Dangerous 'eval' call found at line: 3

6. API Reference

phply.phpast: The PHP Abstract Syntax Tree

This module defines all the node types for the AST. All nodes inherit from phpast.Node.

Common Node Properties:

  • lineno: The line number where the node appears in the source.
  • fields: A list of attribute names for the node (e.g., ['name', 'params', 'nodes'] for a Function).
  • accept(visitor): Method to accept an AST visitor.
  • generic(): Method to produce a dictionary representation, used by php2json.

Key AST Node Classes (Partial List):

  • Statements & Blocks:

    • Block(nodes): A sequence of statements, e.g., within {...}.
    • Echo(nodes): An echo statement.
    • Return(node): A return statement.
    • If(expr, node, elseifs, else_): An if statement.
    • While(expr, node): A while loop.
    • Foreach(expr, keyvar, valvar, node): A foreach loop.
    • Try(nodes, catches, finally): A try-catch-finally block.
  • Expressions:

    • Assignment(node, expr, is_ref): Assignment with =.
    • BinaryOp(op, left, right): An operation like +, ., &&.
    • UnaryOp(op, expr): An operation like !, -, ~.
    • TernaryOp(expr, iftrue, iffalse): The ?: operator.
    • FunctionCall(name, params): e.g., my_func().
    • MethodCall(node, name, params): e.g., $obj->method().
    • StaticMethodCall(class_, name, params): e.g., MyClass::method().
    • Variable(name): A variable, e.g., $foo.
    • Constant(name): A constant, e.g., FOO, true, null.
    • Array(nodes): An array literal, e.g., array(1, 2) or [1, 2].
    • ArrayOffset(node, expr): Array access, e.g., $arr[0].
    • ObjectProperty(node, name): Property access, e.g., $obj->prop.
  • Declarations:

    • Function(name, params, nodes, is_ref): A function declaration.
    • Class(name, type, extends, implements, traits, nodes): A class declaration.
    • Method(name, modifiers, params, nodes, is_ref): A method declaration within a class.
    • Namespace(name, nodes): A namespace declaration.

To explore all nodes, either inspect the phply/phpast.py file or use the php2json tool on various PHP snippets.

phply.phplex: The Lexer

This module provides the lexing functionality.

  • full_lexer: The raw ply.lex object. It emits all tokens, including whitespace and comments.
  • lexer: A FilteredLexer instance that wraps full_lexer. This is the lexer you should typically use. It intelligently filters out noise (whitespace, comments) and performs crucial transformations (?> to ;, <?= to ECHO) to simplify the parsing grammar.

phply.phpparse: The Parser

This module contains the parsing logic.

  • make_parser(debug=False): This factory function constructs and returns a ply.yacc parser object. The parser is built with caching enabled, creating a phply.parsetab file for performance.
  • parser.parse(input_string, lexer, tracking=False): The main method to parse code.
    • input_string: The PHP code to parse.
    • lexer: An instance of a phply lexer.
    • tracking: Set to True to enable line number tracking in the AST.

phply.pythonast: PHP AST to Python AST Conversion

This module is responsible for the logic behind the php2python tool.

  • from_phpast(node): The main function that recursively takes a phpast node and attempts to convert it into an equivalent ast (Python's native AST) node. This conversion is best-effort and will use placeholder calls for constructs with no direct mapping.

7. Project Information

Troubleshooting

Couldn't create 'phply.parsetab'

This error occurs when the ply library tries to regenerate its parse table but lacks the necessary file permissions. ply caches its parsing tables in a file named parsetab.py within the package directory. If you install phply as a superuser and then run it as a regular user, this can happen.

Solutions:

  1. Re-install with correct permissions: The simplest fix is often to uninstall and reinstall the package as the correct user.
  2. Rebuild the package yourself: If you have a development checkout, you can force the regeneration of the table.
  3. Raise an issue: If the problem persists, open an issue on the project's GitHub page.

Contributing

Contributions are welcome! The best way to start is to:

  1. Fork the repository on GitHub.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes.
  4. Run the tests using pytest or tox to ensure nothing has broken.
  5. Submit a pull request.

License

phply is licensed under the BSD License. See the LICENSE file for full details.

Authors

phply was originally written and maintained by Dave Benjamin, with contributions from others. It is currently maintained by Stanisław Pitucha. For a full list of contributors, see the AUTHORS file.