Skip to content

Conversation

@cdreetz
Copy link
Owner

@cdreetz cdreetz commented Oct 22, 2025

Summary

  • allow dataset dependency graph and value resolution to work with AsyncGeneratorFunction inputs
  • add Dataset.generate_async for use inside event loops and ensure synchronous generation resolves awaitables
  • cover async dataset usage with new comprehensive tests

Testing

  • PYTHONPATH=src pytest tests/test_generator.py (fails: ModuleNotFoundError: No module named 'pandas')

https://chatgpt.com/codex/tasks/task_e_68eab295d8208320bfc2241b2d27319d

@cdreetz
Copy link
Owner Author

cdreetz commented Oct 22, 2025

@claude i did a benchmark generating 100 rows with 3 columns with the regular generator and dataset and then with async generator dataset. the async one actually took longer. why??

import chatan
from chatan.generator import generator, async_generator
from chatan import dataset
import time
import os

n = 100

gen = generator(model="gpt-4.1-mini", api_key=os.getenv("OPENAI_API_KEY"), max_tokens=100)

ds = dataset({
    "topic": chatan.sample.choice(["Python", "Javascript", "Rust"]),
    "prompt": gen("write a programming question about {topic}"),
    "response": gen("answer this question: {prompt}")
})

start = time.time()
df = ds.generate(n=n)
print(df[1:])
end = time.time()
total = end - start
print(f"Took {total} sec to generate {n} rows")


a_gen = async_generator(model="gpt-4.1-mini", api_key=os.getenv("OPENAI_API_KEY"), max_tokens=100)

a_ds = dataset({
    "topic": chatan.sample.choice(["Python", "Javascript", "Rust"]),
    "prompt": a_gen("write a programming question about {topic}"),
    "response": a_gen("answer this question: {prompt}")
})

a_start = time.time()
a_df = a_ds.generate(n=n)
print(a_df[1:])
a_end = time.time()
a_total = a_end - a_start
print(f"Took {a_total} sec to generate {n} async rows")

@cdreetz
Copy link
Owner Author

cdreetz commented Oct 22, 2025

@claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants