microsoft · gvanrossum-ms · Oct 23, 2025 · Oct 23, 2025 · Oct 23, 2025 · Oct 23, 2025
diff --git a/TODO.md b/TODO.md
@@ -2,12 +2,10 @@
 
 Meta-todo: Gradually move work items from here to repo Issues.
 
-# Leftover TODOs from TADA.md
+# Leftover TODOs from elsewhere
 
 ## Software
 
-Minor:
-
 - Improve load_dotenv() (don't look for `<repo>/ts/.env`, use one loop)
 
 ### Specifically for VTT import (minor)
@@ -22,14 +20,10 @@ Minor:
 
 ## Documentation
 
-- Document how to reproduce the demos from the talk (and podcast)
-- Document test/build/release process
-- Document how to use gmail_dump.py (set up a project etc.)
-
-Maybe later:
-
-- Document how to run evaluations (but don't reveal all the data)
-
+- Test/build/release process
+- How to run evaluations (but don't share the data)
+- Low-level APIs -- at least the key parts that are used directly by the
+  high-level APIs
 
 # TODOs for fully implementing persistence through SQLite
 
@@ -79,19 +73,16 @@ Maybe later:
 
 # From Meeting 8/12/2025 afternoon (edited)
 
-- Indexing (knowledge extraction) operates chunk by chunk
-- TimeRange always points to a TextRange
-- Always import VTT, helper to convert podcast to VTT format
-  (Probably not, podcast format has listeners but VTT doesn't)
 - Rename "Ordinal" to "Id"
 
 # Other stuff
 
 ### Left to do here
 
-- Look more into why the search query schema is so instable
+- Look more into why the search query schema is so unstable
 - Implement at least some @-commands in query.py
 - More debug options (turn on/off various debug prints dynamically)
+
 - Use pydantic.ai for model drivers
 
 ## General: Look for things marked as incomplete in source
@@ -124,7 +115,6 @@ Maybe later:
 - Review Copilot-generated tests for sanity and minimal mocking
 - Add new tests for newly added classes/methods/functions
 - Coverage testing (needs to use a mix of indexing and querying)
-- Automated end-to-end tests using Umesh's test data files
 
 ## Tighter types
 

diff --git a/demo/README.md b/demo/README.md
@@ -0,0 +1,12 @@
+# Demo scripts
+
+The files here are the scripts from
+[Getting Started](../docs/getting-started.md).
+
+- [ingest.py](ingest.py): The ingestion script.
+- [query.py](query.py): The query script.
+- [testdata.txt](testdata.txt): The test data.
+
+Note that for any of this to work you need to acquire an OpenAI API key
+and set some variables; see
+[Environment Variables](../docs/env-vars.md).
diff --git a/demo/demo.py → demo/ingest.py b/demo/demo.py → demo/ingest.py
@@ -21,7 +21,7 @@ def read_messages(filename) -> list[TranscriptMessage]:
 
 async def main():
     conversation = await create_conversation("demo.db", TranscriptMessage)
-    messages = read_messages("transcript.txt")
+    messages = read_messages("testdata.txt")
     print(f"Indexing {len(messages)} messages...")
     results = await conversation.add_messages_with_indexing(messages)
     print(f"Indexed {results.messages_added} messages.")

diff --git a/demo/transcript.txt → demo/testdata.txt b/demo/transcript.txt → demo/testdata.txt
diff --git a/docs/demos.md b/docs/demos.md
@@ -1,5 +1,7 @@
 # How to Reproduce the Demos
 
+All demos require [configuring](env-vars.md) an API key etc.
+
 ## How we did the Monty Python demo
 
 The demo consisted of loading a number (specifically, 11) popular
@@ -19,7 +21,6 @@ This is `tools/ingest_vtt.py`. You run it as follows:
 ```sh
 python tools/ingest_vtt.py FILE1.vtt ... FILEN.vtt -d mp.db
 ```
-(This requires [configuring](env-vars.md) an API key etc.)
 
 The process took maybe 15 minutes for 11 sketches.
 
@@ -72,4 +73,37 @@ used the instructions at [GeeksForGeeks
 
 The rest of the email ingestion pipeline doesn't care where you got
 your `*.eml` files from -- every email provider has its own quirks.
-`
+
+## Bonus content: Podcast demo
+
+The podcast demo is actually the easiest to run:
+The "database" is included in the repo as
+`testdata/Episode_53_AdrianTchaikovsky_index*`,
+and this is in fact the default "database" used by `tools/query.py`
+when no `-d`/`--database` flag is given.
+
+This "database" indexes `test/Episode_53_AdrianTchaikovsky.txt`.
+It was created by a one-off script that invoked
+`typeagent/podcast/podcast_ingest/ingest_podcast()`
+and saved to two files by calling the `.ingest()` method on the
+returned `typeagent/podcasts/podcast/Podcast` object.
+
+Here's a brief sample session:
+```sh
+$ python tools/query.py
+1.318s -- Using Azure OpenAI
+0.054s -- Loading podcast from 'testdata/Episode_53_AdrianTchaikovsky_index'
+TypeAgent demo UI 0.2 (type 'q' to exit)
+TypeAgent> What did Kevin say to Adrian about science fiction?
+--------------------------------------------------
+Kevin Scott expressed his admiration for Adrian Tchaikovsky as his favorite science fiction author. He mentioned that Adrian has a new trilogy called The Final Architecture, and Kevin is eagerly awaiting the third book, Lords of Uncreation, which he has had on preorder for months. Kevin praised Adrian for his impressive writing skills and his ability to produce large, interesting science fiction books at a rate of about one per year.
+--------------------------------------------------
+TypeAgent> How was Asimov mentioned.
+--------------------------------------------------
+Asimov was mentioned in the context of discussing the ethical and moral issues surrounding AI development. Adrian Tchaikovsky referenced Asimov's Laws of Robotics, noting that Asimov's stories often highlight the inadequacy of these laws in governing robots.
+--------------------------------------------------
+TypeAgent> q
+$
+```
+
+Enjoy exploring!
diff --git a/docs/env-vars.md b/docs/env-vars.md
@@ -11,15 +11,24 @@ Typeagent currently supports two families of environment variables:
 
 ## OPENAI environment variables
 
-The (public) OpenAI environment variables include:
+The (public) OpenAI environment variables include the following:
+
+### Required:
 
 - `OPENAI_API_KEY`: Your secret API key that you get from the
   [OpenAI dashboard](https://platform.openai.com/api-keys).
 - `OPENAI_MODEL`: An environment variable introduced by
   [TypeChat](https://microsoft.github.io/TypeChat/docs/examples/)
   indicating the model to use (e.g.`gpt-4o`).
-- `OPENAI_BASE_URL`: **Optional:** The URL for an OpenAI-compatible embedding server, e.g. [Infinity](https://github.com/michaelfeil/infinity). With this option `OPENAI_API_KEY` also needs to be set, but can be any value.
-- `OPENAI_ENDPOINT`: **Optional:** The URL for an server compatible with the OpenAI Chat Completions API. Make sure the `OPENAI_MODEL` variable matches with the deployed model name, e.g. 'llama:3.2:1b'
+
+### Optional:
+
+- `OPENAI_BASE_URL`: The URL for an OpenAI-compatible embedding server,
+  e.g. [Infinity](https://github.com/michaelfeil/infinity). With this
+  option `OPENAI_API_KEY` also needs to be set, but can be any value.
+- `OPENAI_ENDPOINT`: The URL for an server compatible with the OpenAI
+  Chat Completions API. Make sure the `OPENAI_MODEL` variable matches
+  with the deployed model name, e.g. 'llama:3.2:1b'
 
 ## Azure OpenAI environment variables
 
@@ -35,12 +44,13 @@ environment variables, starting with:
 ## Conflicts
 
 If you set both `OPENAI_API_KEY` and `AZURE_OPENAI_API_KEY`,
-plain `OPENAI` will win.
+`OPENAI_API_KEY` will win.
 
 ## Other ways to specify environment variables
 
 It is recommended to put your environment variables in a file named
 `.env` in the current or parent directory.
 To pick up these variables, call `typeagent.aitools.utils.load_dotenv()`
 at the start of your program (before calling any typeagent functions).
-(For simplicity this is not shown in [Getting Started](getting-started.md).)
+(For simplicity this is not shown in
+[Getting Started](getting-started.md).)
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -14,15 +14,15 @@ install wheels from [PyPI](https://pypi.org).
 
 ## "Hello world" ingestion program
 
-### 1. Create a text file named `transcript.txt`
+### 1. Create a text file named `testdata.txt`
 
 ```txt
 STEVE We should really make a Python library for Structured RAG.
 UMESH Who would be a good person to do the Python library?
 GUIDO I volunteer to do the Python library. Give me a few months.
 ```
 
-### 2. Create a Python file named `demo.py`
+### 2. Create a Python file named `ingest.py`
 
 ```py
 from typeagent import create_conversation
@@ -48,7 +48,7 @@ def read_messages(filename) -> list[TranscriptMessage]:
 
 async def main():
     conversation = await create_conversation("demo.db", TranscriptMessage)
-    messages = read_messages("transcript.txt")
+    messages = read_messages("testdata.txt")
     print(f"Indexing {len(messages)} messages...")
     results = await conversation.add_messages_with_indexing(messages)
     print(f"Indexed {results.messages_added} messages.")
@@ -77,7 +77,7 @@ Azure-hosted OpenAI models.
 ### 4. Run your program
 
 ```sh
-$ python demo.py
+$ python ingest.py
 ```
 
 Expected output looks like:
@@ -86,7 +86,7 @@ Expected output looks like:
 0.027s -- Using OpenAI
 Indexing 3 messages...
 Indexed 3 messages.
-Got 26 semantic refs.
+Got 24 semantic refs.
 ```
 
 ## "Hello world" query program