Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@
},
"description": "CLI commands for managing browsers, deploying apps, and controlling browser instances. Use when working with the kernel command-line tool.",
"source": "./plugins/kernel-cli"
},
{
"name": "kernel-sdks",
"author": {
"name": "Kernel"
},
"description": "TypeScript and Python SDK skills for building browser automation with Kernel's Typescript and Python SDKs. Use when writing code to control browsers programmatically.",
"source": "./plugins/kernel-sdks"
}
]
}
32 changes: 17 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,17 @@ Official AI agent skills from the Kernel for installing useful skills for our CL

# Install the CLI skill
/plugin install kernel-cli

# Install the SDK skills (TypeScript & Python)
/plugin install kernel-sdks
```

### Manual Installation

```bash
git clone https://github.com/kernel/skills.git
cp -r skills/plugins/kernel-cli ~/.claude/skills/
cp -r skills/plugins/kernel-sdks ~/.claude/skills/
```

## Usage Examples
Expand All @@ -34,26 +38,15 @@ Before using these skills, ensure you have:

2. **Authenticated with Kernel**:
```bash
export KERNEL_API_KEY=<api-key> or
kernel login
```

Once installed, your coding agent will automatically know how to use Kernel. Try prompts like:

### CLI Usage

> "Spin up a browser and take a screenshot of kernel.sh"

Your agent will respond with:

```bash
kernel browsers create -o json
# Extract session_id from output
kernel browsers computer screenshot <session_id> --to screenshot.png
```
## Available Skills

## Skill Structure
### kernel-cli

The kernel-cli skill is organized into focused sub-skills:
Command-line interface skills for using Kernel CLI commands.

| Skill | Description |
|-------|-------------|
Expand All @@ -74,6 +67,15 @@ The kernel-cli skill is organized into focused sub-skills:

Each sub-skill is loaded contextually based on your prompts, minimizing token usage while providing comprehensive Kernel knowledge.

### kernel-sdks

SDK skills for building browser automation with TypeScript and Python.

| Skill | Description |
|-------|-------------|
| **typescript-sdk** | Build automation with Kernel's Typescript SDK |
| **python-sdk** | Build automation with kernel's Python SDK |

## Documentation

- [Kernel Documentation](https://www.kernel.sh/docs)
Expand Down
11 changes: 11 additions & 0 deletions plugins/kernel-sdks/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "kernel-sdks",
"version": "1.0.0",
"description": "TypeScript and Python SDK skills for building browser automation with Kernel's Typescript and Python SDKs",
"author": {
"name": "Kernel",
"url": "www.kernel.sh"
},
"repository": "https://github.com/kernel/skills",
"license": "MIT"
}
165 changes: 165 additions & 0 deletions plugins/kernel-sdks/skills/python-sdk/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
name: kernel-python-sdk
description: Build browser automation scripts using the Kernel Python SDK with Playwright and remote browser management.
context: fork
---

## When to Use This Skill

Use the Kernel Python SDK when you need to:

- **Build browser automation scripts** - Create Python programs that control remote browsers
- **Execute server-side automation** - Run Playwright code directly in the browser VM without local dependencies
- **Manage browser sessions programmatically** - Create, configure, and control browsers from code
- **Build scalable scraping/testing tools** - Use browser pools and profiles for high-volume automation
- **Deploy automation as actions** - Package scripts as Kernel actions for invocation via API

**When NOT to use:**
- For CLI commands (e.g., `kernel browsers create`), use the `kernel-cli` skill instead
- For quick one-off tasks, the CLI may be simpler than writing code

## Core Concepts

### SDK Architecture

The SDK is organized into resource-based modules:

- `kernel.browsers` - Browser session management (create, list, delete)
- `kernel.browsers.playwright` - Server-side Playwright execution
- `kernel.browsers.computer` - OS-level controls (mouse, keyboard, screenshots)
- `kernel.browser_pools` - Pre-warmed browser pool management
- `kernel.profiles` - Persistent browser profiles (auth state)
- `kernel.proxies` - Proxy configuration
- `kernel.extensions` - Chrome extension management
- `kernel.deployments` - App deployment
- `kernel.invocations` - Action invocation

### Two Automation Approaches

**1. Server-side Execution (RECOMMENDED)**
- Execute Playwright code directly in browser VM using `kernel.browsers.playwright.execute(session_id, code="...")`
- `session_id` must be passed as a positional argument (first parameter), not as `id=` keyword
- Response accessed via `response.result` - **MUST use `return` in code to get data back**
- Best for: Most use cases, production automation, parallel execution, actions

**2. CDP Connection (Client-side)**
- Connect Playwright to browser via CDP WebSocket URL
- Code runs locally, browser runs remotely; requires local Playwright installation
- Best for: Complex debugging, specific local development needs

## Server-Side Pattern

```python
import time
from kernel import Kernel

def main():
client = Kernel()
kernel_browser = client.browsers.create(stealth=True, timeout_seconds=300)

try:
time.sleep(3) # Browser may not be immediately ready

# Retry logic for reliability
max_retries = 3
for attempt in range(max_retries):
try:
response = client.browsers.playwright.execute(
kernel_browser.session_id,
code="""
await page.goto('https://example.com', { waitUntil: 'networkidle' });
return await page.evaluate(() => document.title);
"""
)
break
except Exception:
if attempt < max_retries - 1:
time.sleep(2)
else:
raise

if response.success and response.result:
print(f"Result: {response.result}")
else:
print(f"Error: {response.error}, Stderr: {response.stderr}")

finally:
client.browsers.delete_by_id(kernel_browser.session_id)

if __name__ == "__main__":
main()
```

### Critical Rules

1. **Browser Readiness**: Add `time.sleep(3)` after creation + retry logic (3 attempts, 2s delays)
2. **Return Values**: MUST use `return` in Playwright code or `response.result` will be `None`
3. **Browser Cleanup**: ALWAYS delete browser in finally block
4. **Error Handling**: Check `response.success` before accessing `response.result`

### Common Issues

| Error | Cause | Fix |
|-------|-------|-----|
| `400 - browser not found` | Browser not ready | Add sleep + retry logic |
| `response.result` is `None` | Missing `return` | Add `return` statement |
| `TypeError: 'NoneType'` | Missing success check | Check `response.success` first |

## Patterns Reference

**Import Patterns**
- Standard: `from kernel import Kernel`
- For actions: `import kernel` and `from kernel import Kernel, KernelContext`
- For typed payloads: `from typing import TypedDict`
- For CDP: `from playwright.async_api import async_playwright`

**SDK Initialization**
- `client = Kernel()` reads `KERNEL_API_KEY` from environment automatically

**Action Handler Pattern**
```python
from typing import TypedDict
import kernel
from kernel import Kernel, KernelContext

app = kernel.App("app-name")

class TaskInput(TypedDict):
task: str

@app.action("action-name")
async def my_action(ctx: KernelContext, payload: TaskInput):
# Access payload: payload["task"] or payload.get("task")
...
```

**CDP Connection Pattern (Client-side)**
```python
async with async_playwright() as playwright:
browser = await playwright.chromium.connect_over_cdp(kernel_browser.cdp_ws_url)
context = browser.contexts[0] if browser.contexts else await browser.new_context()
page = context.pages[0] if context.pages else await context.new_page()
```

**Binary Data Handling**

Binary data (screenshots, PDFs) returns as Node.js Buffer: `{'data': [byte_array], 'type': 'Buffer'}`

```python
# Follow canonical pattern above, then:
if response.success and response.result:
data = bytes(response.result['data'])
with open("output.png", "wb") as f:
f.write(data)
```

**Installation**
- `uv pip install kernel` or `pip install kernel`
- For CDP: `uv pip install playwright`

## References

- **Kernel Documentation**: https://www.kernel.sh/docs
- **API Reference**: https://www.kernel.sh/docs/api-reference/
- **Templates**: https://www.kernel.sh/docs/reference/cli/create#available-templates
- **Quickstart Guide**: https://www.kernel.sh/docs/quickstart
Loading