Exa_py is a powerful Python client for Exa Web Search. It automatically selects the best search model (keyword or neural via embeddings) for your query and offers robust filtering and content extraction options. This guide explains how to install the package, set up your client, and use the various search endpoints with detailed parameter information.
- Installation
- Basic Setup
- Basic Search
- Advanced Search with Content Extraction
- API Reference & Parameters
- Additional Examples
- Error Handling & Best Practices
- Rate Limits & Pricing
Install the package using pip:
pip install exa-pyBefore you can start searching, set your Exa API key (which can be provided via the x-api-key header or the Bearer Authorization header). The simplest way to set this up is by using an environment variable.
import os
from exa_py import Exa
# Initialize the client using your API key
exa = Exa(api_key=os.getenv('EXA_API_KEY'))Perform a simple keyword search. The client automatically chooses between a traditional keyword search and Exa’s embeddings-based model for relevance.
# Simple keyword search query
results = exa.search("artificial intelligence news")
# Process and display search results
for result in results.results:
print(f"Title: {result.title}")
print(f"URL: {result.url}")
print(f"Published Date: {result.published_date}")For a more granular search, you can use search_and_contents. This method lets you specify parameters such as search type, auto-prompt conversion, date filters, domain inclusion/exclusion, and options for content extraction (e.g., fetching page text, highlights, summaries, etc.).
Below is an example that uses neural search, custom text limits, and returns additional results information like subpages and extra links.
from exa_py import Exa
exa = Exa(api_key="YOUR_EXA_API_KEY")
results = exa.search_and_contents(
"Latest research in LLMs",
type="auto", # "auto" lets Exa choose between neural and keyword
category="research paper",
num_results=10,
text={
"max_characters": 1000,
"include_html_tags": False
},
summary={
"query": "Main developments and key takeaways"
},
subpages=1,
subpage_target="sources",
extras={
"links": 1,
"image_links": 1
},
use_autoprompt=True,
start_published_date="2025-02-28T20:30:00.000Z",
start_crawl_date="2025-04-29T20:30:01.000Z",
include_domains=["towardsdatascience.com"],
include_text=["LLMs"],
exclude_text=["keras"],
livecrawl="always"
)
print(results)The above example demonstrates:
- Search Type: Setting
typeto"auto"delegates the choice between keyword and neural search. - Content Extraction: Providing a
textdictionary limits text capture to a maximum of 1000 characters. - Summarization: Automatically extract a summary based on the query provided.
- Subpages & Extras: Fetch related subpages (such as source pages) and additional links or image links.
- Filtering: Narrow down results with domain and text inclusion/exclusion, and specific crawl & publish dates.
Exa_py uses a simple POST endpoint (/search) for both search and content extraction. Below is an overview of key parameters:
-
query (
string, required):
The search string.
Example:"Latest developments in LLM capabilities" -
useAutoprompt (
boolean, default:true):
If enabled, autoprompt automatically converts your query into an Exa-style query. (Not available for keyword searches) -
type (
enum<string>, default:"auto"):
Search type – either"keyword","neural", or"auto".
Available options:"keyword","neural","auto" -
category (
enum<string>):
Focus the search by data category.
Options include:"company","research paper","news","pdf","github","tweet","personal site","linkedin profile","financial report" -
numResults (
integer, default:10):
Number of results to return. Maximum values vary by plan. -
includeDomains (
string[]):
List of domains to include in the search (results will only be from these domains). -
excludeDomains (
string[]):
List of domains to exclude from the results. -
startCrawlDate and endCrawlDate (
stringin ISO 8601 format):
Filter results by the crawl date (when a link was first discovered). -
startPublishedDate and endPublishedDate (
stringin ISO 8601 format):
Filter results based on their publication dates. -
includeText (
string[]):
Only return pages containing these strings in their text (limited to one string of up to 5 words). -
excludeText (
string[]):
Exclude pages that contain these strings in their text (limited to one string of up to 5 words). -
contents (
object):
Options for what content to extract. This contains:- text: A dictionary (
max_characters,include_html_tags, etc.) - summary: A dictionary (e.g., with a
"query"for summarization)
- text: A dictionary (
-
subpages (
integer):
Number of subpages to retrieve. -
subpage_target (
string):
Specifies the target type for subpages (for example,"sources"). -
extras (
object):
Additional options to return extra information such as additional links or image links.
A typical response from a successful call contains:
{
"requestId": "b5947044c4b78efa9552a7c89b306d95",
"autopromptString": "Heres a link to the latest research in LLMs:",
"autoDate": "2024-02-08T02:15:42.180Z",
"resolvedSearchType": "neural",
"results": [
{
"title": "A Comprehensive Overview of Large Language Models",
"url": "https://arxiv.org/pdf/2307.06435.pdf",
"publishedDate": "2023-11-16T01:36:32.547Z",
"author": "Author Name",
"score": 0.4600165784358978,
"id": "https://arxiv.org/abs/2307.06435",
"image": "https://arxiv.org/pdf/2307.06435.pdf/page_1.png",
"favicon": "https://arxiv.org/favicon.ico",
"text": "Abstract Large Language Models (LLMs)...",
"highlights": [
"Key highlight of the text..."
],
"summary": "This overview paper on LLMs highlights key developments...",
"subpages": [
{
"id": "https://arxiv.org/abs/2303.17580",
"url": "https://arxiv.org/pdf/2303.17580.pdf",
"title": "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face",
"author": "Researcher Name",
"publishedDate": "2023-11-16T01:36:20.486Z",
"text": "Detailed abstract or snippet...",
"summary": "A brief summary of the subpage content..."
}
],
"extras": {
"links": []
}
}
],
"searchType": "auto",
"costDollars": {
"total": 0.005,
"breakDown": [
{
"search": 0.005,
"contents": 0,
"breakdown": {
"keywordSearch": 0,
"neuralSearch": 0.005,
"contentText": 0,
"contentHighlight": 0,
"contentSummary": 0
}
}
]
}
}Note: The response also includes pricing details under costDollars, which break down the cost of the search and content extraction by operation type and number of results.
Search for content published within the last 24 hours:
from datetime import datetime, timedelta
# Compute start date (24 hours ago)
start_date = (datetime.now() - timedelta(hours=24)).strftime("%Y-%m-%dT%H:%M:%SZ")
results = exa.search(
"breaking news",
start_published_date=start_date
)Restrict search results to specific domains while excluding others:
results = exa.search(
"technology trends",
include_domains=["techcrunch.com", "wired.com"],
exclude_domains=["example.com"]
)results = exa.search_and_contents(
"blog post about AI",
type="neural",
use_autoprompt=True,
category="tweet",
num_results=10,
start_published_date="2025-02-28T20:30:00.000Z",
start_crawl_date="2025-04-29T20:30:01.000Z",
include_domains=["towardsdatascience.com"],
include_text=["LLMs"],
exclude_text=["keras"],
livecrawl="always",
text={
"max_characters": 1000
},
summary={
"query": "main points and important parts of the content"
}
)
print(results)Always wrap your API calls within try/except blocks to gracefully handle any network issues or API errors:
try:
results = exa.search("your query here")
except Exception as e:
print(f"Search failed: {str(e)}")-
Choose the Right Search Type:
Use keyword search for faster results when precise text matching is required. Use neural search when semantic understanding is more important. -
Limit Content Extraction Length:
Restrict text content via themax_charactersoption to improve performance. -
Filter Domains:
Narrow the scope of your search results by including only trusted domains and excluding irrelevant ones. -
Cache Frequent Queries:
If your application performs recurring searches, consider caching results to reduce API calls. -
Respect API Rate Limits:
Be aware of your API plan’s rate limits. Implement throttling and error handling as necessary.
Be sure to understand the cost breakdown associated with your requests. Exa_py returns detailed cost information in the response:
-
Neural Search Pricing:
- 1–25 results: $0.005 per request
- 26–100 results: $0.025 per request
- 100+ results: $1 per request
-
Keyword Search Pricing:
- 1–100 results: $0.0025 per request
- 100+ results: $3 per request
-
Content Extraction Pricing:
- Content text: $0.001 per page
- Highlights: $0.001 per page
- Summaries: $0.001 per page
Make sure you build with cost efficiency in mind, particularly for high-volume or complex queries.
This documentation should provide a comprehensive overview of how to leverage the exa_py package to perform intelligent web searches with advanced content extraction capabilities. For more detailed technical information, consult the official Exa API documentation.
Happy Searching!