A Python tool to extract blog posts from Kajabi-hosted websites and convert them to Sanity CMS-compatible NDJSON format for easy import.
- π Crawls all blog posts with automatic pagination handling
- π Extracts title, date, content, tags, and featured images
- π Converts to Sanity-compatible NDJSON format
- π Configurable via command-line arguments
- π Progress tracking with detailed logging
- π― Optimized for Kajabi's blog structure
- Clone this repository:
git clone https://github.com/nmayalais/kajabi-to-sanity.git
cd kajabi-to-sanity- Install dependencies:
pip install -r requirements.txtExtract blog posts from the default site:
python extract_kajabi.pyThis creates sanity_import.ndjson with all blog posts ready for import.
python extract_kajabi.py --url https://yourkajabidomain.com --output blog_export.ndjsonusage: extract_kajabi.py [-h] [--url URL] [--blog-path BLOG_PATH] [--output OUTPUT]
[--author AUTHOR] [--log-level {DEBUG,INFO,WARNING,ERROR}]
[--version] [--no-images] [--no-tags]
Extract blog posts from Kajabi and convert to Sanity NDJSON format
optional arguments:
-h, --help show this help message and exit
--url URL Base URL of the Kajabi site (default: https://example-kajabi-site.com)
--blog-path BLOG_PATH Path to the blog section (default: /blog)
--output OUTPUT, -o OUTPUT
Output NDJSON file name (default: sanity_import.ndjson)
--author AUTHOR Default author name for posts (default: Author Name)
--log-level {DEBUG,INFO,WARNING,ERROR}
Logging level (default: INFO)
--version show program's version number and exit
--no-images Skip extracting featured images
--no-tags Skip extracting tags
Extract from a custom domain with debug logging:
python extract_kajabi.py --url https://example.com --log-level DEBUGExtract without images and tags:
python extract_kajabi.py --no-images --no-tags --output minimal_export.ndjsonThe tool generates NDJSON (newline-delimited JSON) with the following structure:
{
"_type": "post",
"title": "Blog Post Title",
"slug": {
"_type": "slug",
"current": "blog-post-url-slug"
},
"publishedAt": "2024-01-01T00:00:00Z",
"body": "Full blog post content...",
"sourceUrl": "https://example.com/blog/post-slug",
"author": "Author Name",
"tags": ["tag1", "tag2"],
"featuredImageUrl": "https://example.com/image.jpg"
}After extraction, import the data using Sanity CLI:
# Install Sanity CLI if you haven't already
npm install -g @sanity/cli
# Import the data
sanity dataset import sanity_import.ndjson productionEnsure your Sanity schema includes a post document type. Here's a minimal example:
// schemas/post.js
export default {
name: 'post',
title: 'Blog Post',
type: 'document',
fields: [
{
name: 'title',
title: 'Title',
type: 'string',
validation: Rule => Rule.required()
},
{
name: 'slug',
title: 'Slug',
type: 'slug',
options: {
source: 'title',
maxLength: 96
},
validation: Rule => Rule.required()
},
{
name: 'publishedAt',
title: 'Published at',
type: 'datetime'
},
{
name: 'body',
title: 'Body',
type: 'text'
},
{
name: 'author',
title: 'Author',
type: 'string'
},
{
name: 'tags',
title: 'Tags',
type: 'array',
of: [{type: 'string'}]
},
{
name: 'featuredImageUrl',
title: 'Featured Image URL',
type: 'url'
},
{
name: 'sourceUrl',
title: 'Original URL',
type: 'url'
}
]
}The tool is designed to be extensible. Key customization points:
- Custom Selectors: Modify the CSS selectors in
extract_post_data()method - Additional Fields: Add new fields to the extraction logic
- Post-Processing: Add data transformation before export
For rich text formatting, consider post-processing the content to Sanity's Portable Text format:
# Example transformation (not included in base tool)
from html2text import HTML2Text
def convert_to_portable_text(html_content):
# Convert HTML to markdown first
h = HTML2Text()
markdown = h.handle(html_content)
# Then convert markdown to Portable Text
# Implementation depends on your needs- No posts found: Check that the blog path is correct and the site is accessible
- Missing content: Verify the CSS selectors match your Kajabi theme
- Date parsing errors: The tool expects dates in "MMM DD, YYYY" format
Run with debug logging to see detailed extraction information:
python extract_kajabi.py --log-level DEBUGContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for migrating from Kajabi to Sanity CMS
- Uses Beautiful Soup for HTML parsing
- Progress bars powered by tqdm