Cedar syntax for entity data #104

benkehoe · 2025-08-20T14:53:35Z

cdisselkoen

I like this proposal, and I definitely think we should accept some version of this proposal.

cdisselkoen · 2025-12-05T18:45:17Z

text/0000-entity-literal-syntax.md

@@ -0,0 +1,224 @@
+# Entity literal syntax 


Nit: I don't love the name "Entity literal syntax" for this; would "Cedar syntax for entities" or "Cedar syntax for entity data" make more sense? This would be analogous to the existing "Cedar syntax" and "JSON syntax" for policies, and "Cedar syntax" and "JSON syntax" for schemas.

This goes with the question about the entity keyword. "Cedar syntax for entities" could mean the syntax for entity types that uses the entity keyword or the syntax for entity instances proposed here.

"Cedar syntax for entity data", then?

I like "Cedar syntax for entity data". It captures the intent pretty clearly.

Should the file name be changed?

I took the liberty of renaming this PR haha

cdisselkoen · 2025-12-05T18:49:01Z

text/0000-entity-literal-syntax.md

+Developing and iterating on schemas and policies involves creating and modifying lists of entities in order to test policies, which is very cumbersome with the existing entity syntax in JSON:
+
+* JSON has low information density. Cedar schema was able to provide significant increases in density and readability relative to the JSON syntax.
+* JSON does not support comments. This means that any design intent for a schema cannot be expressed inline.


nit: was this intended to say "design intent for entity data"? does a concept of design intent even make sense for entity data?

(That said, I think that allowing comments in the entity-data format is a win regardless)

Correct, the motivation was largely copied from RFC #24 for custom schema and I missed that update.

cdisselkoen · 2025-12-05T18:53:03Z

text/0000-entity-literal-syntax.md

+
+### Option 1: instance declarations
+
+The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. We would select a new keyword, using `instance` as an example (`entity` would be nice, but it's taken). This should make it easy to reuse the parsing for schema syntax (and make syntax highlighting straightforward), and the parser could identify when they were incorrectly used in schemas and vice versa.


The choice to use instance to avoid conflicting with the schema keyword entity is interesting. I agree with the text here that at first glance, entity rather than instance would be nice. Could we still reuse major parsing components if the Cedar syntax for schemas and for entity data both use the keyword entity? For that matter, there are many other keywords that would be shared between the two formats, like namespace, tags, and in; why are we worried about a conflict on entity but not on those other keywords?

I think it's important that the syntax is unambiguous; if you provide entity types when Cedar is expecting entity instances or vice versa, it should be a syntactical error. namespace has the same meaning for both, so it's not a problem (entity instances can have the same namespaces as entity types). For tags and in, those occur within an entity type or instance declaration, so it's clear syntactically the difference in meaning. But if we use entity for both, how can the parser (and the human) tell the difference?

My response to this would be that it's always clear from context whether we're looking at (or parsing) part of a schema or some entity data. There would be no APIs that would accept both schema syntax and entity-data syntax. In IDEs, schemas use the .cedarschema file extension, and presumably entity data files would use some different extension.

I still think it should be unambiguously different without knowing the context (though I'm fine using the entity keyword if there's another good way of distinguishing the syntax). Inevitably someone will plug the wrong thing into the wrong API and if the parser can't tell the difference, either it will fail in confusing ways or worse it will accept it with no complaints.

This is all assuming the "instance declaration" option is the best one, the others don't necessarily suffer from the same problem—what are your thoughts on the relative merits of the other options?

I like this proposal. It addresses the concern about entity having multiple meanings, and emphasizes that this is an instance named entity_id, while still clearly marking this declaration as related to an entity.

Sometime in the future (probably not in this RFC) this would be compatible with an extension to combine the schema and entity-data syntax like

entity SomeType in OtherType { ... } instance "entity_id" in [OtherType::"blah"] { ... };

Relevant to combining schema (which can be defined only once) and entity data (there can be many instances), would we want a way to define multiple instances for a type without repeating the entity <type> instance prefix? e.g.

entity SomeType instances [ "id1" ["parent"] {}, "id2" {} ];

I think that's a good idea too.

As a small nit, I think repeating the in would increase readability especially for new users, i.e.,

entity SomeType instances [ "id1" in ["parent"] {}, "id2" {} ];

Ok, I buy that. I was thinking about the optional = before the record for entity types, and dispensing with both because = doesn't make sense in the list. So I guess for instance we'd have in and optional =, and for instances we'd have in but disallow =.

This all looks great to me! Two suggestions. First, instead of

entity SomeType instance "entity_id" [OtherNamespace::OtherType::"parent_entity_id"] ...

I'd write

entity SomeType instance "entity_id" in [OtherNamespace::OtherType::"parent_entity_id"] ...

Since as we've pointed out enum will have [...] as a list of instance IDs, putting in in front of the list of parents will hopefully communicate intent a little better, when comparing the two.

Second: If you have no attributes, instead of doing entity Color instance "green" {}; could we write entity Color instance "green"; ? This would more terse and match what you are allowed to do with enum. And, looking ahead, it would create parity in format if we ever extend enum types to have attributes (which we've also talked about for actions).

cdisselkoen · 2025-12-05T18:55:58Z

text/0104-entity-data-syntax.md

+
+Adding another custom format raises the bar for what customers need to know. They may ignore one format at first, but eventually they may need to learn both. For example, if one team uses the JSON format and another uses the custom format, but then the teams merge, they will each end up having to read/update the other's format. They may also fight about which format to move to.
+
+Mitigating this problem is that it's easy and predictable to convert between the two formats, since the syntactic concepts line up very closely. The features you lose when converting from the new format to the JSON one would be 1/ any comments, and 2/ any use of intermingling of `action`, `entity`, and `type` declarations, since they must all be in their own sections in the JSON. Otherwise the features line up very closely and the only difference is syntax. We would expect to write conversion tools as part of implementing the new syntax (which are easily achieved by parsing in the new format and pretty-printing in JSON).


I agree that easy conversion between the formats is a sufficient mitigation for this. And that everything said in the Motivation section outweighs the drawback of increased cognitive burden described here.

mwhicks1

This is looking good! I really like the direction. Before finally approving, I'd love to see a revision of the proposal that:

Proposes the change (Option 1?) in the main body, and puts alternatives in an appendix, for posterity.
Has a Detailed Design section that fills in the details, e.g., around the particular context-free grammar we want to use, or at least more examples that unambiguously flesh out the details. RFC 24 is probably more than you need, but it's a good guide.

philhassey · 2025-12-09T19:28:54Z

text/0000-entity-literal-syntax.md

+]
+```
+
+This would be more explicit that the content of the file is a set of entities and only a set of entities (like with JSON). It's a little less familiar/similar to schema syntax than Option 1, and might be harder to do useful syntax highlighting, but it is relatively close to how extension types like `datetime` are written. It opens up the possibility of composition within policies and schemas, but that is only relevant if that would ever be needed. This syntax doesn't allow for reusable records.


It'd be nice to see examples of how datetime and all the other extensions would be represented. There's also some questions around how to represent datetimes that are before 0000-00-00. (Though I think we might want to not allow those to be represented here.)

Datetimes would be represented exactly as they are in records in policies, e.g. datetime("2024-10-15")

The issue @philhassey is referring to is that it's possible, with the JSON representation of entities, to construct them with extension methods, like this:

"__extn": { "fn": "offset", "args": [ { "__extn": { "fn": "datetime", "arg": "1970-01-01" } }, { "__extn": { "fn": "duration", "arg": "11530782027359ms" } } ] }

In particular, this representation creates a datetime with a string representation that can't be correctly parsed via the datetime(...) Cedar syntax.

The question then, is how such an entity would be represented in the Cedar syntax.

My personal hope is that this leads us to conclude that we shouldn't allow the offset-based JSON representation either. :)

+1 to hoping we shouldn't allow it, but my suspicion is that we will not want to break backward compatibility.

How would we feel about not being able to express this in the Cedar syntax, while still allowing it in the JSON version? The two are no longer equivalent, but maybe that's Ok? What use-case might require inter-conversion? More likely I could imagine using the natural syntax, converting to JSON, and using that moving forward.

I suppose that requires serialization of entities to be a fallible operation? Are there any other known instances of entities that would fail to serialize due only to their value and not a failure of the serialization medium?

I guess the broader point is that, so long as Entity::new() continues to allows pass extension method calls to be passed as attribute values (as opposed to just extension function calls like datetime(...)), it'll be possible to construct entities that can't be serialized in this format.

Swolebrain · 2025-12-09T20:04:53Z

text/0104-entity-data-syntax.md

+
+a more compact syntax could be
+
+```


Can you walk me through a real world use case for this?

One use case I can imagine based on what I've seen some Cedar users do is to export the top n levels of an org hierarchy, since those rarely change. The top n levels of the org hierarchy is something that customers commonly cache in their application layer.

At build time, you can pull this information from your LDAP server, for instance, and then commit it to the repo in cedar entity format. This would make it more human-legible than having it in JSON (but you're not likely to be able to support comments here since they'd get wiped by the import process).

But if you did this, you probably would want to convert the entities back to json (or protobuf when it's available) to use in authz decisions because of performance.

Given that, why wouldn't you store the entities in regular JSON (as they are in the DB, not cedarJSON) or YAML instead of converting them to cedarJson or to cedar entity instance format? Wouldn't that achieve readability too?

Whats the use case you're imagining?

This can probably be added to the motivation section; the use case is not really for deployed systems, but rather the initial iterative development phase. When you're in, for example, the Cedar playground, iterating on your schema and example polices, you're also iterating on the example entities. I've seen the JSON syntax for entities be a significant drag on that process. A Cedar syntax really lowers the barrier to adoption there, and even for production systems it's a good option for display of entities to humans.

benkehoe · 2025-12-09T20:57:13Z

This is looking good! I really like the direction. Before finally approving, I'd love to see a revision of the proposal that:

Proposes the change (Option 1?) in the main body, and puts alternatives in an appendix, for posterity.

Has a Detailed Design section that fills in the details, e.g., around the particular context-free grammar we want to use, or at least more examples that unambiguously flesh out the details. RFC 24 is probably more than you need, but it's a good guide.

@mwhicks1 will do

patjakdev · 2026-01-08T23:47:49Z

I'm strongly in favor of this RFC and would love for it to be accepted and implemented.

One request which might come into play when choosing between the implementation options: I would actually love to be able to have a single, human-readable document (i.e. not JSON) that describes all my entities AND policies. It would also be great to get template instantiations in there, but I won't hold my breath on that one.

At first glance, I can't see anything in any of the options that would prevent such a thing but I wanted to put my two cents in in case it changes the desirability of any of the options.

mwhicks1 · 2026-01-09T14:38:31Z

One request which might come into play when choosing between the implementation options: I would actually love to be able to have a single, human-readable document (i.e. not JSON) that describes all my entities AND policies.

I agree. We should make sure that the proposed grammar for this RFC is compatible with the policy grammar, so that it would enable this option. I think it is, but acceptance of the RFC could be contingent on this requirement. We can see when it's implemented if some adjustments might be needed.

benkehoe · 2026-01-15T12:08:54Z

text/0104-entity-data-syntax.md

+Record := '{' [KeyValues] '}'
+KeyValues := Key ':' Value [',' | ',' KeyValues]
+Key := IDENT | STR
+Value := // TODO: scoped down from policy grammar, also need some bits from schema grammar


I could use some help here. The policy grammar contains the grammar for values, but with a lot of stuff that does not make sense for instance declarations (computed expressions, variables, etc.) and my grasp of it is not strong enough to be confident that I'm pulling in the right things. I'm also not clear on what reserved identifiers are needed for this syntax, because they differ between policy and schema grammars.

benkehoe · 2026-01-15T12:09:54Z

text/0104-entity-data-syntax.md

+Key := IDENT | STR
+Value := // TODO: scoped down from policy grammar, also need some bits from schema grammar
+
+Tags := Record


The values in tags need to be all the same type; can/should the grammar capture this?

Initial draft of entity literal syntax RFC

71a2895

benkehoe changed the title ~~Initial draft of entity literal syntax RFC~~ Entity literal syntax Aug 20, 2025

benkehoe mentioned this pull request Aug 20, 2025

Cedar syntax for entity collections #102

Closed

cdisselkoen reviewed Dec 5, 2025

View reviewed changes

rename 0000-entity-literal-syntax.md to 0104-entity-literal-syntax.md

d0c9aaa

mwhicks1 reviewed Dec 9, 2025

View reviewed changes

philhassey reviewed Dec 9, 2025

View reviewed changes

Swolebrain reviewed Dec 9, 2025

View reviewed changes

cdisselkoen changed the title ~~Entity literal syntax~~ Cedar syntax for entity data Jan 13, 2026

Update RF104 for selection of instance declaration, add grammar

b8d0b39

benkehoe commented Jan 15, 2026

View reviewed changes


		### Option 1: instance declarations

		The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. We would select a new keyword, using `instance` as an example (`entity` would be nice, but it's taken). This should make it easy to reuse the parsing for schema syntax (and make syntax highlighting straightforward), and the parser could identify when they were incorrectly used in schemas and vice versa.


		Adding another custom format raises the bar for what customers need to know. They may ignore one format at first, but eventually they may need to learn both. For example, if one team uses the JSON format and another uses the custom format, but then the teams merge, they will each end up having to read/update the other's format. They may also fight about which format to move to.

		Mitigating this problem is that it's easy and predictable to convert between the two formats, since the syntactic concepts line up very closely. The features you lose when converting from the new format to the JSON one would be 1/ any comments, and 2/ any use of intermingling of `action`, `entity`, and `type` declarations, since they must all be in their own sections in the JSON. Otherwise the features line up very closely and the only difference is syntax. We would expect to write conversion tools as part of implementing the new syntax (which are easily achieved by parsing in the new format and pretty-printing in JSON).

Cedar syntax for entity data #104

Are you sure you want to change the base?

Cedar syntax for entity data #104

Conversation

benkehoe commented Aug 20, 2025 • edited by cdisselkoen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdisselkoen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benkehoe Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benkehoe Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwhicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patjakdev Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benkehoe commented Dec 9, 2025

Uh oh!

patjakdev commented Jan 8, 2026

benkehoe commented Aug 20, 2025 •

edited by cdisselkoen

Loading

benkehoe Dec 9, 2025 •

edited

Loading

benkehoe Dec 9, 2025 •

edited

Loading

patjakdev Dec 16, 2025 •

edited

Loading