Skip to content

Conversation

@robsimmons
Copy link
Collaborator

@robsimmons robsimmons commented Dec 28, 2025

This PR is a refactoring of some of the intermediate data structure created in #594. VersoDoc was an extremely generic name, and DocThunk, with a corresponding DocThunk.force : DocThunk genre -> Part genre function, better communicates what is happening here: computation of a part is getting deferred during the metaprogramming phase until compiled-program-execution time.

The structure of a DocThunk is also changed somewhat, to better represent and support the operation of replacing metadata without evaluating the thunk, and to recognize that some applications build Part data directly and so don't have a meaningful serialization step.

@robsimmons robsimmons requested a review from jcreedcmu December 28, 2025 17:18
@robsimmons robsimmons marked this pull request as ready for review December 28, 2025 17:18
/--
The result type of values created by Verso's {lit}`#doc` and {lit}`#docs` commands. A value of type
{name}`DocThunk` represents a partially-serialized document of type {lean}`Part` that can be turned
into a value by invoking the `DocThunk.force` method. The internal structure of a {name}`DocThunk`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the constructors private instead of relying on a documentation admonishment?

| serialized
(construct : DocReconstruction → Part genre)
(docReconstructionData : String := "{}")
(replacementMetadata? : Option (Option genre.PartMetadata))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on this field that indicates why it exists would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had one but don't actually know how to put a comment on a field of a branch of a datatype. Or should the comment go on serialized?

-/
| serialized
(construct : DocReconstruction → Part genre)
(docReconstructionData : String := "{}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for this constructor should say a little bit about what this string is expected to contain.

If we're saving lots of strings with serialized JSON in them, perhaps we can make things easier by defining a wrapper structure? Something like:

structure SerializedDocReconstruction where
  json : String

def SerializedDocStructure.empty : SerializedDocReconstruction where
  json := "{}"

This should make the type signatures easier to read, and may prevent some confusion.

/--
Replace the top-level metadata in a {name}`DocThunk` without forcing the thunk.
-/
public def DocThunk.withMetadata (metadata? : Option genre.PartMetadata) : DocThunk genre → DocThunk genre
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this wouldn't be more compositional with a constructor that replaces metadata instead of a field on serialized. Do we expect there to be lots of calls that might lead to a measurable overhead?

into a value by invoking the `DocThunk.force` method. The internal structure of a {name}`DocThunk`
should not be relied upon.
-/
public inductive DocThunk (genre : Genre) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way in which this is very much unlike Thunk is that repeated calls to force will do repeated work. I think this name is better than VersoDoc, which says basically nothing, but I wonder whether DocThunk is the right name given that we don't cache the result.


namespace Verso.Doc
open Lean

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a "why" here as a module docstring would be valuable. Something like:

Suggested change
/-!
In {given}`g : Genre`, the Verso elaborator does not directly emit a {lean}`Part g`. This is because included
Lean code samples tend to be highly repetitive, especially in long proofs with growing contexts. This repetition
can lead to performance problems.
Instead, the elaborator deduplicates the included code samples via hash-consing. The hash-cons table is
saved as a string literal containing a JSON serialization, because large Lean constant data structures generate large C code compared to the string, and the overhead of deserializing it at run-time is negligible compared to the overhead of compiling very large C files (most Verso document code is run only once per compilation).
-/

namespace Verso.Doc
open Lean

public structure DocReconstruction where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring here would be nice:

Suggested change
public structure DocReconstruction where
/--
A deduplication table for code samples embedded in a document, so repeated code (e.g. hypotheses in proof
states) does not lead to code bloat.
-/
public structure DocReconstruction where

Should the constructor of this structure be private?

robsimmons and others added 2 commits January 5, 2026 10:26
Co-authored-by: David Thrane Christiansen <david@lean-fro.org>
Co-authored-by: David Thrane Christiansen <david@lean-fro.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants