Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 171 additions & 107 deletions BLACKBOX_RULES.md → .blackboxrules

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions .cursor/rules/run_pipelex.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ async def extract_gantt(image_url: str) -> GanttChart:
# Run the pipe
pipe_output = await execute_pipeline(
pipe_code="extract_gantt_by_steps",
input_memory={
inputs={
"gantt_chart_image": {
"concept": "gantt.GanttImage",
"content": ImageContent(url=image_url),
Expand Down Expand Up @@ -94,26 +94,26 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
# If you assign a string, by default it will be considered as a TextContent.
pipe_output = await execute_pipeline(
pipe_code="master_advisory_orchestrator",
input_memory={
inputs={
"user_input": problem_description,
},
)

# Here we have a single input and it's a PDF.
# Because PDFContent is a native concept, we can use it directly as a value,
# Here we have a single input and it's a document.
# Because DocumentContent is a native concept, we can use it directly as a value,
# the system knows what content it corresponds to:
pipe_output = await execute_pipeline(
pipe_code="power_extractor_dpe",
input_memory={
"document": PDFContent(url=pdf_url),
inputs={
"document": DocumentContent(url=pdf_url),
},
)

# Here we have a single input and it's an Image.
# Because ImageContent is a native concept, we can use it directly as a value:
pipe_output = await execute_pipeline(
pipe_code="fashion_variation_pipeline",
input_memory={
inputs={
"fashion_photo": ImageContent(url=image_url),
},
)
Expand All @@ -123,7 +123,7 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
# so we must provide it using a dict with the concept and the content:
pipe_output = await execute_pipeline(
pipe_code="extract_gantt_by_steps",
input_memory={
inputs={
"gantt_chart_image": {
"concept": "gantt.GanttImage",
"content": ImageContent(url=image_url),
Expand All @@ -135,7 +135,7 @@ So here are a few concrete examples of calls to execute_pipeline with various wa
pipe_output = await execute_pipeline(
pipe_code="retrieve_then_answer",
dynamic_output_concept_code="contracts.Fees",
input_memory={
inputs={
"text": load_text_from_path(path=text_path),
"question": {
"concept": "answer.Question",
Expand Down
178 changes: 160 additions & 18 deletions .cursor/rules/write_pipelex.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ globs:
# Guide to write or edit pipelines using the Pipelex language in .plx files

- Always first write your "plan" in natural language, then transcribe it in pipelex.
- You should ALWAYS RUN the terminal command `make validate` when you are writing or editing a `.plx` file. It will ensure the pipe is runnable. If not, iterate.
- You should ALWAYS RUN validation when you are writing or editing a `.plx` file. It will ensure the pipe is runnable. If not, iterate.
- For a specific file: `pipelex validate path_to_file.plx`
- For all pipelines: `pipelex validate all`
- **IMPORTANT**: Ensure the Python virtual environment is activated before running `pipelex` commands. For standard installations, the venv is named `.venv` - always check that first. The commands will not work without proper venv activation.
- Please use POSIX standard for files. (empty lines, no trailing whitespaces, etc.)

## Pipeline File Naming
Expand All @@ -24,10 +27,10 @@ A pipeline file has three main sections:

### Domain Statement
```plx
domain = "domain_name"
domain = "domain_code"
description = "Description of the domain" # Optional
```
Note: The domain name usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.
Note: The domain code usually matches the plx filename for single-file domains. For multi-file domains, use the subdirectory name.

### Concept Definitions

Expand All @@ -42,10 +45,10 @@ ConceptName = "Description of the concept"
- Use PascalCase for concept names
- Never use plurals (no "Stories", use "Story") - lists are handled implicitly by Pipelex
- Avoid circumstantial adjectives (no "LargeText", use "Text") - focus on the essence of what the concept represents
- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page)
- Don't redefine native concepts (Text, Image, PDF, TextAndImages, Number, Page, JSON)

**Native Concepts:**
Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`. Use these directly or refine them when appropriate.
Pipelex provides built-in native concepts: `Text`, `Image`, `PDF`, `TextAndImages`, `Number`, `Page`, `JSON`. Use these directly or refine them when appropriate.

**Refining Native Concepts:**
To create a concept that specializes a native concept without adding fields:
Expand All @@ -63,7 +66,7 @@ For details on how to structure concepts with fields, see the "Structuring Model
## Pipe Base Definition

```plx
[pipe.your_pipe_name]
[pipe.your_pipe_code]
type = "PipeLLM"
description = "A description of what your pipe does"
inputs = { input_1 = "ConceptName1", input_2 = "ConceptName2" }
Expand All @@ -73,7 +76,7 @@ output = "ConceptName"
The pipes will all have at least this base definition.
- `inputs`: Dictionary of key being the variable used in the prompts, and the value being the ConceptName. It should ALSO LIST THE INPUTS OF THE INTERMEDIATE STEPS (if PipeSequence) or of the conditional pipes (if PipeCondition).
So If you have this error:
`StaticValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' •
`PipeValidationError: missing_input_variable • domain='expense_validator' • pipe='validate_expense' •
variable='['invoice']'``
That means that the pipe validate_expense is missing the input `invoice` because one of the subpipe is needing it.

Expand Down Expand Up @@ -128,16 +131,16 @@ For concepts with structured fields, define them inline using TOML syntax:
description = "A commercial document issued by a seller to a buyer"

[concept.Invoice.structure]
invoice_number = "The unique invoice identifier"
invoice_number = "The unique invoice identifier" # This will be optional by default
issue_date = { type = "date", description = "The date the invoice was issued", required = true }
total_amount = { type = "number", description = "The total invoice amount", required = true }
vendor_name = "The name of the vendor"
line_items = { type = "list", item_type = "text", description = "List of items", required = false }
vendor_name = "The name of the vendor" # This will be optional by default
line_items = { type = "list", item_type = "text", description = "List of items" }
```

**Supported inline field types:** `text`, `integer`, `boolean`, `number`, `date`, `list`, `dict`

**Field properties:** `type`, `description`, `required` (default: true), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts)
**Field properties:** `type`, `description`, `required` (default: false), `default_value`, `choices`, `item_type` (for lists), `key_type` and `value_type` (for dicts)

**Simple syntax** (creates required text field):
```plx
Expand All @@ -146,7 +149,7 @@ field_name = "Field description"

**Detailed syntax** (with explicit properties):
```plx
field_name = { type = "text", description = "Field description", required = false, default_value = "default" }
field_name = { type = "text", description = "Field description", default_value = "default" }
```

**3. Python StructuredContent Class (For Advanced Features)**
Expand Down Expand Up @@ -472,7 +475,7 @@ The PipeExtract operator is used to extract text and images from an image or a P
[pipe.extract_info]
type = "PipeExtract"
description = "extract the information"
inputs = { document = "PDF" } # or { image = "Image" } if it's an image. This is the only input.
inputs = { document = "Document" } # or { image = "Image" } if it's an image. This is the only input.
output = "Page"
```

Expand All @@ -481,7 +484,7 @@ Using Extract Model Settings:
[pipe.extract_with_model]
type = "PipeExtract"
description = "Extract with specific model"
inputs = { document = "PDF" }
inputs = { document = "Document" }
output = "Page"
model = "base_extract_mistral" # Use predefined extract preset or model alias
```
Expand Down Expand Up @@ -589,25 +592,160 @@ $sales_rep.phone | $sales_rep.email
"""
```

### Key Parameters
### Key Parameters (Template Mode)

- `template`: Inline template string (mutually exclusive with template_name)
- `template`: Inline template string (mutually exclusive with template_name and construct)
- `template_name`: Name of a predefined template (mutually exclusive with template)
- `template_category`: Template type ("llm_prompt", "html", "markdown", "mermaid", etc.)
- `templating_style`: Styling options for template rendering
- `extra_context`: Additional context variables for template

For more control, you can use a nested `template` section instead of the `template` field:

- `template.template`: The template string
- `template.category`: Template type
- `template.templating_style`: Styling options

### Template Variables

Use the same variable insertion rules as PipeLLM:

- `@variable` for block insertion (multi-line content)
- `$variable` for inline insertion (short text)

### Construct Mode (for StructuredContent Output)

PipeCompose can also generate `StructuredContent` objects using the `construct` section. This mode composes field values from fixed values, variable references, templates, or nested structures.

**When to use construct mode:**

- You need to output a structured object (not just Text)
- You want to deterministically compose fields from existing data
- No LLM is needed - just data composition and templating

#### Basic Construct Usage

```plx
[concept.SalesSummary]
description = "A structured sales summary"

[concept.SalesSummary.structure]
report_title = { type = "text", description = "Title of the report" }
customer_name = { type = "text", description = "Customer name" }
deal_value = { type = "number", description = "Deal value" }
summary_text = { type = "text", description = "Generated summary text" }

[pipe.compose_summary]
type = "PipeCompose"
description = "Compose a sales summary from deal data"
inputs = { deal = "Deal" }
output = "SalesSummary"

[pipe.compose_summary.construct]
report_title = "Monthly Sales Report"
customer_name = { from = "deal.customer_name" }
deal_value = { from = "deal.amount" }
summary_text = { template = "Deal worth $deal.amount with $deal.customer_name" }
```

#### Field Composition Methods

There are four ways to define field values in a construct:

**1. Fixed Value (literal)**

Use a literal value directly:

```plx
[pipe.compose_report.construct]
report_title = "Annual Report"
report_year = 2024
is_draft = false
```

**2. Variable Reference (`from`)**

Get a value from working memory using a dotted path:

```plx
[pipe.compose_report.construct]
customer_name = { from = "deal.customer_name" }
total_amount = { from = "order.total" }
street_address = { from = "customer.address.street" }
```

**3. Template (`template`)**

Render a Jinja2 template with variable substitution:

```plx
[pipe.compose_report.construct]
invoice_number = { template = "INV-$order.id" }
summary = { template = "Deal worth $deal.amount with $deal.customer_name on {{ current_date }}" }
```

**4. Nested Construct**

For nested structures, use a TOML subsection:

```plx
[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

#### Complete Construct Example

```plx
domain = "invoicing"

[concept.Address]
description = "A postal address"

[concept.Address.structure]
street = { type = "text", description = "Street address" }
city = { type = "text", description = "City name" }
country = { type = "text", description = "Country name" }

[concept.Invoice]
description = "An invoice document"

[concept.Invoice.structure]
invoice_number = { type = "text", description = "Invoice number" }
total = { type = "number", description = "Total amount" }

[pipe.compose_invoice]
type = "PipeCompose"
description = "Compose an invoice from order and customer data"
inputs = { order = "Order", customer = "Customer" }
output = "Invoice"

[pipe.compose_invoice.construct]
invoice_number = { template = "INV-$order.id" }
total = { from = "order.total_amount" }

[pipe.compose_invoice.construct.billing_address]
street = { from = "customer.address.street" }
city = { from = "customer.address.city" }
country = "France"
```

#### Key Parameters (Construct Mode)

- `construct`: Dictionary mapping field names to their composition rules
- Each field can be:
- A literal value (string, number, boolean)
- A dict with `from` key for variable reference
- A dict with `template` key for template rendering
- A nested dict for nested structures

**Note:** You must use either `template` or `construct`, not both. They are mutually exclusive.

## PipeImgGen operator

The PipeImgGen operator is used to generate images using AI image generation models.
Expand Down Expand Up @@ -821,7 +959,7 @@ Presets are meant to record the choice of an llm with its hyper parameters (temp

Examples:
```toml
llm_for_complex_reasoning = { model = "base-claude", temperature = 1 }
llm_to_engineer = { model = "base-claude", temperature = 1 }
llm_to_extract_invoice = { model = "claude-3-7-sonnet", temperature = 0.1, max_tokens = "auto" }
```

Expand Down Expand Up @@ -850,6 +988,10 @@ You can override the predefined llm presets by setting them in `.pipelex/inferen

---

ALWAYS RUN `make validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate until it works.
ALWAYS RUN validation when you are finished writing pipelines: This checks for errors. If there are errors, iterate until it works.
- For a specific bundle/file: `pipelex validate path_to_file.plx`
- For all pipelines: `pipelex validate all`
- Remember: Ensure your Python virtual environment is activated (typically `.venv` for standard installations) before running `pipelex` commands.

Then, create an example file to run the pipeline in the `examples` folder.
But don't write documentation unless asked explicitly to.
6 changes: 1 addition & 5 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# [OPTIONAL] Free Pipelex Inference API key - Get yours on Discord: https://go.pipelex.com/discord
# No credit card required, limited time offer
PIPELEX_INFERENCE_API_KEY=
PIPELEX_GATEWAY_API_KEY=

# OpenAI: to use models like GPT-4o and GPT-5
OPENAI_API_KEY=
Expand All @@ -21,10 +21,6 @@ ANTHROPIC_API_KEY=
# To use Mistral models
MISTRAL_API_KEY=

# To use perplexity, including results from web search
PERPLEXITY_API_KEY=
PERPLEXITY_API_ENDPOINT=https://api.perplexity.ai

# To generate images from fal.ai, the service of Forest Labs
FAL_API_KEY=

Expand Down
Loading