Playwright-based data connectors for DataConnect. Each connector exports a user's data from a web platform using browser automation — credentials never leave the device.
| Platform | Company | Runtime | Scopes |
|---|---|---|---|
| ChatGPT | OpenAI | playwright | conversations, memories |
| Meta | playwright | profile, posts, liked_posts | |
| playwright | profile, experience, education, skills | ||
| Spotify | Spotify | playwright | savedTracks, playlists |
connectors/
├── registry.json # Central registry (checksums, versions)
├── types/
│ └── connector.d.ts # TypeScript type definitions
├── schemas/ # JSON schemas for exported data
│ ├── chatgpt.conversations.json
│ └── ...
├── openai/
│ ├── chatgpt-playwright.js # Connector script
│ └── chatgpt-playwright.json # Metadata
├── linkedin/
│ ├── linkedin-playwright.js
│ └── linkedin-playwright.json
├── meta/
│ ├── instagram-playwright.js
│ └── instagram-playwright.json
└── spotify/
├── spotify-playwright.js
└── spotify-playwright.json
Each connector consists of two files inside a <company>/ directory:
<name>-playwright.js— the connector script (plain JS, runs inside the Playwright runner sidecar)<name>-playwright.json— metadata (display name, login URL, selectors, scopes)
Connectors run in a sandboxed Playwright browser managed by the DataConnect app. The runner provides a page API object (not raw Playwright). The browser starts headless; connectors call page.showBrowser() when login is needed and page.goHeadless() after.
Phase 1 — Login (visible browser)
- Navigate to the platform's login page (headless)
- Check if the user is already logged in via persistent session
- If not, show the browser so the user can log in manually
- Extract auth tokens/cookies once logged in
Phase 2 — Data collection (headless)
- Switch to headless mode (browser disappears)
- Fetch data via API calls, network capture, or DOM scraping
- Report structured progress to the UI
- Return the collected data with an export summary
| Pattern | When to use | Example connector |
|---|---|---|
API fetch via page.evaluate() |
Platform has REST/JSON APIs | openai/chatgpt-playwright.js |
Network capture via page.captureNetwork() |
Platform uses GraphQL/XHR that fires on navigation | meta/instagram-playwright.js |
DOM scraping via page.evaluate() |
No API available, data only in rendered HTML | linkedin/linkedin-playwright.js |
Create connectors/<company>/<name>-playwright.json:
{
"id": "<name>-playwright",
"version": "1.0.0",
"name": "Platform Name",
"company": "Company",
"description": "Exports your ... using Playwright browser automation.",
"connectURL": "https://platform.com/login",
"connectSelector": "css-selector-for-logged-in-state",
"exportFrequency": "daily",
"runtime": "playwright",
"vectorize_config": { "documents": "field_name" }
}runtimemust be"playwright"connectURLis where the browser navigates initiallyconnectSelectordetects whether the user is logged in (e.g. an element only visible post-login)
Create connectors/<company>/<name>-playwright.js:
// State management
const state = { isComplete: false };
// ─── Login check ──────────────────────────────────────
const checkLoginStatus = async () => {
try {
return await page.evaluate(`
(() => {
const hasLoggedInEl = !!document.querySelector('LOGGED_IN_SELECTOR');
const hasLoginForm = !!document.querySelector('LOGIN_FORM_SELECTOR');
return hasLoggedInEl && !hasLoginForm;
})()
`);
} catch { return false; }
};
// ─── Main flow ────────────────────────────────────────
(async () => {
// Phase 1: Login
await page.setData('status', 'Checking login status...');
await page.sleep(2000);
if (!(await checkLoginStatus())) {
await page.showBrowser('https://platform.com/login');
await page.setData('status', 'Please log in...');
await page.promptUser(
'Please log in. Click "Done" when ready.',
async () => await checkLoginStatus(),
2000
);
}
// Phase 2: Headless data collection
await page.goHeadless();
await page.setProgress({
phase: { step: 1, total: 2, label: 'Fetching profile' },
message: 'Loading profile data...',
});
// ... fetch your data here ...
const items = [];
// Build result (exportSummary is required)
const result = {
items,
exportSummary: {
count: items.length,
label: items.length === 1 ? 'item' : 'items',
},
timestamp: new Date().toISOString(),
version: '1.0.0-playwright',
platform: 'platform-name',
};
state.isComplete = true;
await page.setData('result', result);
})();Create connectors/schemas/<platform>.<scope>.json to describe the exported data format:
{
"name": "Platform Items",
"version": "1.0.0",
"scope": "platform.items",
"dialect": "json",
"description": "Description of the exported data",
"schema": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"properties": {
"id": { "type": "string" },
"title": { "type": "string" }
},
"required": ["id", "title"]
}
}
},
"required": ["items"]
}
}Add your connector to registry.json. Generate checksums with:
shasum -a 256 <company>/<name>-playwright.js | awk '{print "sha256:" $1}'
shasum -a 256 <company>/<name>-playwright.json | awk '{print "sha256:" $1}'Then add an entry to the connectors array:
{
"id": "<name>-playwright",
"company": "<company>",
"version": "1.0.0",
"name": "Platform Name",
"description": "...",
"files": {
"script": "<company>/<name>-playwright.js",
"metadata": "<company>/<name>-playwright.json"
},
"checksums": {
"script": "sha256:<hash>",
"metadata": "sha256:<hash>"
}
}The page object is available as a global in connector scripts:
| Method | Description |
|---|---|
page.evaluate(jsString) |
Run JS in browser context, return result |
page.goto(url) |
Navigate to URL |
page.sleep(ms) |
Wait for milliseconds |
page.setData(key, value) |
Send data to host ('status', 'error', 'result') |
page.setProgress({phase, message, count}) |
Structured progress for the UI |
page.showBrowser(url?) |
Switch to headed mode (visible browser) |
page.goHeadless() |
Switch to headless mode (invisible) |
page.promptUser(msg, checkFn, interval) |
Show prompt, poll checkFn until truthy |
page.captureNetwork({urlPattern, bodyPattern, key}) |
Register a network capture |
page.getCapturedResponse(key) |
Get captured response or null |
page.clearNetworkCaptures() |
Clear all captures |
page.closeBrowser() |
Close browser, keep process for HTTP work |
await page.setProgress({
phase: { step: 1, total: 3, label: 'Fetching memories' },
message: 'Downloaded 50 of 200 items...',
count: 50,
});phase.step/phase.total— drives the step indicator ("Step 1 of 3")phase.label— short label for the current phasemessage— human-readable progress textcount— numeric count for progress tracking
- DataConnect cloned and able to run (
npm run tauri:dev)
- Clone this repo alongside DataConnect:
git clone https://github.com/vana-com/data-connectors.git- Point DataConnect to your local connectors during development:
# From the DataConnect repo
CONNECTORS_PATH=../data-connectors npm run tauri:devThe CONNECTORS_PATH environment variable tells the fetch script to skip downloading and use your local directory instead.
- After editing connector files, sync them to the app's runtime directory:
# From the DataConnect repo
node scripts/sync-connectors-dev.jsThis copies your connector files to ~/.dataconnect/connectors/ where the running app reads them. The app checks this directory first, so your local edits take effect without rebuilding.
- Edit your connector script
- Run
node scripts/sync-connectors-dev.js(from the DataConnect repo) - Click the connector in the app to test
- Check logs in
~/Library/Logs/DataConnect/(macOS) for debugging
- Fork this repo
- Create a branch:
git checkout -b feat/<platform>-connector - Add your files in
connectors/<company>/:<name>-playwright.js— connector script<name>-playwright.json— metadataschemas/<platform>.<scope>.json— data schema (optional but encouraged)
- Test locally using the instructions above
- Update
registry.jsonwith your connector entry and checksums - Open a pull request
- Fork and branch
- Make your changes to the connector script and/or metadata
- Test locally
- Update the version in the metadata JSON
- Regenerate checksums and update
registry.json - Open a pull request
- Credentials stay on-device. Connectors run in a local browser. Never send tokens or passwords to external servers.
- Use
page.setProgress()to report progress. Users should see what's happening during long exports. - Include
exportSummaryin the result. The UI uses it to display what was collected. - Handle errors gracefully. Use
page.setData('error', message)and provide clear error messages. - Prefer API fetch over DOM scraping when the platform has usable APIs. APIs are more stable than DOM structure.
- Avoid relying on CSS class names — many platforms obfuscate them. Use structural selectors, heading text, and content heuristics instead.
- Rate-limit API calls. Add
page.sleep()between requests to avoid triggering rate limits. - Test pagination edge cases — empty results, single page, large datasets.
The registry uses SHA-256 checksums to verify file integrity during OTA updates. Always regenerate checksums when modifying connector files:
shasum -a 256 <company>/<name>-playwright.js | awk '{print "sha256:" $1}'
shasum -a 256 <company>/<name>-playwright.json | awk '{print "sha256:" $1}'DataConnect fetches registry.json from this repo on app startup and during npm postinstall. For each connector listed:
- Check if local files exist with matching checksums
- If not, download from
baseUrl/<file_path>(this repo's raw GitHub URL) - Verify SHA-256 checksums match
- Write to local
connectors/directory
This enables OTA connector updates without requiring a full app release.