From 71a28953d8411ce15ec5095e2a2124ebdff8d5dd Mon Sep 17 00:00:00 2001 From: Ben Kehoe Date: Wed, 20 Aug 2025 16:52:22 +0200 Subject: [PATCH 1/3] Initial draft of entity literal syntax RFC --- text/0000-entity-literal-syntax.md | 224 +++++++++++++++++++++++++++++ 1 file changed, 224 insertions(+) create mode 100644 text/0000-entity-literal-syntax.md diff --git a/text/0000-entity-literal-syntax.md b/text/0000-entity-literal-syntax.md new file mode 100644 index 00000000..44b053ca --- /dev/null +++ b/text/0000-entity-literal-syntax.md @@ -0,0 +1,224 @@ +# Entity literal syntax + +## Related issues and PRs + +- Reference Issues: [#102](https://github.com/cedar-policy/rfcs/issues/102), [RFC24](https://github.com/cedar-policy/rfcs/blob/main/text/0024-schema-syntax.md) +- Implementation PR(s): (leave this empty) + +## Timeline + +- Started: 2025-08-20 +- Accepted: TBD +- Stabilized: TBD + +## Summary + +This document proposes a custom syntax for entity literals. This follows RFC24, which did the same for schemas. The summary, motivation, drawbacks and alternatives are largely copied from RFC24 as this RFC is very similar in these respects. + +The syntax was developed with the following goals in mind: + +* Use the same concepts in both the JSON-based and custom syntax, allowing them to be naturally inter-converted +* Reuse Cedar policy/schema syntax as much as possible, to help with intuition +* When no Cedar policy/schema syntax analogue exists, leverage ideas from other programming languages that favor readability +* Use as few syntactic concepts as possible; similar ideas in the schema should look similar syntactically +* Support converting back and forth between the JSON-based schema without losing information + +This is _not_ a proposal to _replace_ the JSON-based syntax; the aim is to provide an additional syntax that better satisfies the goals above. The JSON syntax is appropriate in many circumstances, such as when constructing entities programmatically (the usual case for a runtime system). Adopting this approach would allow for development-time ease of use, especially during exploration and iteration on schemas. + +https://github.com/cedar-policy/cedar-examples/blob/85df878b68b0986d97433be14728858ae2a545ab/tinytodo/entities.json + +## Basic example + +The default list of entities that's populated in the Cedar playground for the photo sharing app is 57 lines of JSON: + +```json +[ + { + "uid": { + "type": "PhotoApp::User", + "id": "alice" + }, + "attrs": { + "userId": "897345789237492878", + "personInformation": { + "age": 25, + "name": "alice" + } + }, + "parents": [ + { + "type": "PhotoApp::UserGroup", + "id": "alice_friends" + }, + { + "type": "PhotoApp::UserGroup", + "id": "AVTeam" + } + ] + }, + { + "uid": { + "type": "PhotoApp::Photo", + "id": "vacationPhoto.jpg" + }, + "attrs": { + "private": false, + "account": { + "__entity": { + "type": "PhotoApp::Account", + "id": "ahmad" + } + } + }, + "parents": [] + }, + { + "uid": { + "type": "PhotoApp::UserGroup", + "id": "alice_friends" + }, + "attrs": {}, + "parents": [] + }, + { + "uid": { + "type": "PhotoApp::UserGroup", + "id": "AVTeam" + }, + "attrs": {}, + "parents": [] + } +] +``` + +a more compact syntax could be + +``` +namespace PhotoApp { + instance User::"alice" in [UserGroup::"alice_friends", UserGroup::"AVTeam"] = { + userId: "897345789237492878", + personInformation: { + age: 25, + name: "alice" + } + }; + + instance Photo::"vacationPhoto.jpg" = { + private: false, + account: Account::"ahmad" + }; + + instance UserGroup::"alice_friends" = {}; + instance UserGroup::"AVTeam" = {}; +} +``` + +## Motivation + +Developing and iterating on schemas and policies involves creating and modifying lists of entities in order to test policies, which is very cumbersome with the existing entity syntax in JSON: + +* JSON has low information density. Cedar schema was able to provide significant increases in density and readability relative to the JSON syntax. +* JSON does not support comments. This means that any design intent for a schema cannot be expressed inline. + +We believe that a custom syntax for entities can help. It can be more information-dense, support comments, and leverage familiar syntactic constructs. The success of Cedar syntax for schemas points to the value of this change could have. + +## Detailed design + +To be precise, we are defining a syntax that would allow for `Entities::from_cedar_str()` to parallel `Entities::from_json_str()` and so on. Some design goals are: + +* Easy to write +* Consistency with policy and especially schema syntax +* Easy for parsers to differentiate from policy and schema +* Optionally, allow for reusable record (and maybe primitive) literals +* Optionally, if use cases exist for it, reusability of the syntax for inline literals in schemas and policies + +Two potential options for syntax are listed below. + +### Option 1: instance declarations + +The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. We would select a new keyword, using `instance` as an example (`entity` would be nice, but it's taken). This should make it easy to reuse the parsing for schema syntax (and make syntax highlighting straightforward), and the parser could identify when they were incorrectly used in schemas and vice versa. + +``` +namespace Namespace { + instance SomeType::"entity_id" in [OtherNamespace::OtherType::"parent_entity_id"] = { + attribute: "value" + } tags { + tagName: "tagValue" + }; +} + +// subsequent declarations +``` + +The declaration style would make it possible to define and reference reusable records (which would need another keyword). The primary drawback is that the thing that is being defined, which is a set of entities, is only implicitly defined through the file itself. This makes sense for schema, but here it's maybe a bit weirder. On the other hand, while there is not currently a need for other information in the file, but this option would allow for that. This option also doesn't allow composition, if that's desirable. + +### Option 2: instance expressions + +The main alternative would be an inline syntax. As a separate syntax, it could look like this: + +``` +[ + Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { + attribute: "value" + }, { + tagName: "value" + }), + //subsequent entities +] +``` + +This would be more explicit that the content of the file is a set of entities and only a set of entities (like with JSON). It's a little less familiar/similar to schema syntax than Option 1, and might be harder to do useful syntax highlighting, but it is relatively close to how extension types like `datetime` are written. It opens up the possibility of composition within policies and schemas, but that is only relevant if that would ever be needed. This syntax doesn't allow for reusable records. + +### Option 3: instance set declaration + +A third alternative that would allow for inline literals _and_ reusable records would be to define a keyword for the set, say `entities`: + +``` +entities [ + Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { + attribute: "value", + recordAttr: myRecord + }, { + tagName: "value" + }), + //subsequent entities +] + +record myRecord = { + foo: "bar" +}; +``` + +This could allow for namespace blocks, but a complication is whether there can be multiple `entities` declarations in a file or only one. + +## Drawbacks + +There are several reasons not to develop a custom syntax: + +### Multiple formats can raise cognitive burden + +Adding another custom format raises the bar for what customers need to know. They may ignore one format at first, but eventually they may need to learn both. For example, if one team uses the JSON format and another uses the custom format, but then the teams merge, they will each end up having to read/update the other's format. They may also fight about which format to move to. + +Mitigating this problem is that it's easy and predictable to convert between the two formats, since the syntactic concepts line up very closely. The features you lose when converting from the new format to the JSON one would be 1/ any comments, and 2/ any use of intermingling of `action`, `entity`, and `type` declarations, since they must all be in their own sections in the JSON. Otherwise the features line up very closely and the only difference is syntax. We would expect to write conversion tools as part of implementing the new syntax (which are easily achieved by parsing in the new format and pretty-printing in JSON). + +### Requirement of good tooling + +As a practical matter, having a custom entity literal syntax will require that we develop high-quality tools to help authors write in the syntax. + +Parse error messages need to be of good quality, and report common issues such as missing curly braces, missing semi-colons, incorrect keywords, etc. A formatting tool can help ensure standardized presentation. An IDE plugin can put these things together to provide interactive feedback. For the JSON syntax, IDE plugins already exist to flag parse errors and to format documents uniformly. + +TODO Note that the problem of matching up curly braces in JSON-formatted schemas is more acute than in the custom syntax, since it's common to have deep nesting levels where matching is hard to eyeball. For example, in the `TinyTodo` JSON example, we have up to seven levels of direct nesting of curly braces, whereas the custom syntax has only three, and these levels are more visually isolated because of the other syntax in between them. + +### Greater implementation cost + +Supporting a new syntax is an extra implementation cost, including the new tools mentioned above, now and going forward. More code/tools means more potential for bugs. + +## Alternatives + +One alternative would be to _replace_ the current JSON-based sytnax with the one in this proposal. This proposal would avoid the "cognitive burden" drawback mentioned above, but would be a disruptive, backward-incompatible change, and would lose the JSON format's benefits of existing tooling and easier programmatic schema construction. + +Another alternative would be to adopt a [Yaml](https://en.wikipedia.org/wiki/YAML)-based syntax. This approach would meet our goals of greater information density and support for comments, and it would come with some existing tooling (such as IDE extensions). A downside of Yaml is that it provides _more_ than we need, with a lack of conciseness leading to confusing. We could make our own parser for a subset of Yaml we wish to support for schemas, but that may lead to a confusing user experience. Yaml's indentation-sensitive parsing also means that an indentation mistake will be silently accepted, leading to a confusing user experience. Our custom syntax is whitespace-insensitive, and having total control over the grammar means better context for error messages. + +## Unresolved questions + +The syntax design needs feedback and iteration. From d0c9aaae9291074d47682587210032f597208015 Mon Sep 17 00:00:00 2001 From: Ben Kehoe Date: Mon, 8 Dec 2025 17:30:41 -0500 Subject: [PATCH 2/3] rename 0000-entity-literal-syntax.md to 0104-entity-literal-syntax.md --- ...000-entity-literal-syntax.md => 0104-entity-literal-syntax.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-entity-literal-syntax.md => 0104-entity-literal-syntax.md} (100%) diff --git a/text/0000-entity-literal-syntax.md b/text/0104-entity-literal-syntax.md similarity index 100% rename from text/0000-entity-literal-syntax.md rename to text/0104-entity-literal-syntax.md From b8d0b3919ae74b71d1f70775d49165c604a4bf13 Mon Sep 17 00:00:00 2001 From: Ben Kehoe Date: Thu, 15 Jan 2026 13:05:30 +0100 Subject: [PATCH 3/3] Update RF104 for selection of instance declaration, add grammar --- ...l-syntax.md => 0104-entity-data-syntax.md} | 207 +++++++++++++----- 1 file changed, 148 insertions(+), 59 deletions(-) rename text/{0104-entity-literal-syntax.md => 0104-entity-data-syntax.md} (56%) diff --git a/text/0104-entity-literal-syntax.md b/text/0104-entity-data-syntax.md similarity index 56% rename from text/0104-entity-literal-syntax.md rename to text/0104-entity-data-syntax.md index 44b053ca..b93a8452 100644 --- a/text/0104-entity-literal-syntax.md +++ b/text/0104-entity-data-syntax.md @@ -1,4 +1,4 @@ -# Entity literal syntax +# Cedar syntax for entity data ## Related issues and PRs @@ -25,8 +25,6 @@ The syntax was developed with the following goals in mind: This is _not_ a proposal to _replace_ the JSON-based syntax; the aim is to provide an additional syntax that better satisfies the goals above. The JSON syntax is appropriate in many circumstances, such as when constructing entities programmatically (the usual case for a runtime system). Adopting this approach would allow for development-time ease of use, especially during exploration and iteration on schemas. -https://github.com/cedar-policy/cedar-examples/blob/85df878b68b0986d97433be14728858ae2a545ab/tinytodo/entities.json - ## Basic example The default list of entities that's populated in the Cedar playground for the photo sharing app is 57 lines of JSON: @@ -91,25 +89,27 @@ The default list of entities that's populated in the Cedar playground for the ph ] ``` -a more compact syntax could be +with this proposal, it could be represented compactly as follows: ``` namespace PhotoApp { - instance User::"alice" in [UserGroup::"alice_friends", UserGroup::"AVTeam"] = { + entity User instance "alice" in [UserGroup::"alice_friends", UserGroup::"AVTeam"] { userId: "897345789237492878", personInformation: { age: 25, name: "alice" } }; - - instance Photo::"vacationPhoto.jpg" = { + + entity Photo instance "vacationPhoto.jpg" { private: false, account: Account::"ahmad" }; - - instance UserGroup::"alice_friends" = {}; - instance UserGroup::"AVTeam" = {}; + + entity UserGroup instances [ + "alice_friends", + "AVTeam" + ]; } ``` @@ -118,7 +118,13 @@ namespace PhotoApp { Developing and iterating on schemas and policies involves creating and modifying lists of entities in order to test policies, which is very cumbersome with the existing entity syntax in JSON: * JSON has low information density. Cedar schema was able to provide significant increases in density and readability relative to the JSON syntax. -* JSON does not support comments. This means that any design intent for a schema cannot be expressed inline. +* JSON does not support comments, which can be useful when defining a collection of entities and their relationships. + +This stands as an obstacle to Cedar adoption. New users want to experiment with Cedar before they use it, and a critical step before a successful test of an authorization request is defining the entities for the request. Doing this in JSON is cumbersome and imposing. Smoothing this process will create better first impressions of Cedar. + +The JSON syntax for entities also creates friction during the development of a Cedar schema. To test a schema, entities that conform to the schema must be defined. JSON is inconvenient enough for doing this once by hand, but when iterating on the schema, the entities often need to be updated, and the JSON syntax again makes this cumbersome. + +The JSON syntax is _not_ an impediment for the final implementation of a system using Cedar, but the provision of a more human-readable format can still be of use when debugging such systems. We believe that a custom syntax for entities can help. It can be more information-dense, support comments, and leverage familiar syntactic constructs. The success of Cedar syntax for schemas points to the value of this change could have. @@ -129,67 +135,122 @@ To be precise, we are defining a syntax that would allow for `Entities::from_ced * Easy to write * Consistency with policy and especially schema syntax * Easy for parsers to differentiate from policy and schema -* Optionally, allow for reusable record (and maybe primitive) literals -* Optionally, if use cases exist for it, reusability of the syntax for inline literals in schemas and policies -Two potential options for syntax are listed below. +The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. -### Option 1: instance declarations +### Comments -The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. We would select a new keyword, using `instance` as an example (`entity` would be nice, but it's taken). This should make it easy to reuse the parsing for schema syntax (and make syntax highlighting straightforward), and the parser could identify when they were incorrectly used in schemas and vice versa. +Instance declarations can use line-ending comments in the same style as Cedar schemas and policies, e.g., `// this is a comment`. + +### Namespace format + +Namespaces are identical to schemas. Nested namespaces should be allowed if and when they are allowed in schemas in the future. ``` -namespace Namespace { - instance SomeType::"entity_id" in [OtherNamespace::OtherType::"parent_entity_id"] = { - attribute: "value" - } tags { - tagName: "tagValue" - }; +namespace My::Namespace { + ... } +``` + +### Instance declarations + +Instance declarations use the `instance` keyword (or `instances`, see below) after an entity type written as `entity {entityType}`. This allows for a potential future where instances could be declared after a full entity type declaration. + +Instance declarations look largely like entity type declarations, but use literals rather than types as the values. -// subsequent declarations +``` +entity SomeType1 instance "entity_id_1" = { + attribute: "value" +}; ``` -The declaration style would make it possible to define and reference reusable records (which would need another keyword). The primary drawback is that the thing that is being defined, which is a set of entities, is only implicitly defined through the file itself. This makes sense for schema, but here it's maybe a bit weirder. On the other hand, while there is not currently a need for other information in the file, but this option would allow for that. This option also doesn't allow composition, if that's desirable. +Like entity type declarations, the `=` is optional, and instances without attributes can be declared without a record. -### Option 2: instance expressions +``` +entity SomeType1 instance "entity_id_2" { + attribute: "value" +}; + +entity SomeType1 instance "entity_id_3"; +``` -The main alternative would be an inline syntax. As a separate syntax, it could look like this: +Parents are defined like in entity type declarations, but use entity identifiers rather than types. Tags are similar. ``` -[ - Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { - attribute: "value" - }, { - tagName: "value" - }), - //subsequent entities -] +entity SomeType1 instance "entity_id_4" in [OtherType::"parent_entity_id"] { + attribute: "value" +} tags { + tagName: "tagValue" +}; ``` -This would be more explicit that the content of the file is a set of entities and only a set of entities (like with JSON). It's a little less familiar/similar to schema syntax than Option 1, and might be harder to do useful syntax highlighting, but it is relatively close to how extension types like `datetime` are written. It opens up the possibility of composition within policies and schemas, but that is only relevant if that would ever be needed. This syntax doesn't allow for reusable records. +### Multiple instances in one declaration -### Option 3: instance set declaration +For convenience, multiple instances can be defined without repeating the entity type using the `instances` keyword followed by a list of instances. These instances follow the same syntax except that `=` is not permissible. -A third alternative that would allow for inline literals _and_ reusable records would be to define a keyword for the set, say `entities`: +``` +entity SomeType1 instances [ + "entity_id_5" { + attribute: "value" + }, + "entity_id_6" in [OtherType::"parent_entity_id"] { + attribute: "value" + } tags { + tagName: "tagValue" + }, + "entity_id_7" {}, + "entity_id_8" +]; +``` ``` -entities [ - Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { - attribute: "value", - recordAttr: myRecord - }, { - tagName: "value" - }), - //subsequent entities -] +// syntax error +entity SomeType2 instances [ + "entity_id_9" = { + attribute: "value" + } +]; +``` + +### Grammar -record myRecord = { - foo: "bar" -}; ``` +Entities := {NamespaceOrEntityDeclaration} +NamespaceOrEntityDeclaration := Namespace | EntityDeclaration +Namespace := 'namespace' Path '{' {EntityDeclaration} '}' +EntityDeclaration := EntityInstanceDeclaration | EntityInstancesDeclaration -This could allow for namespace blocks, but a complication is whether there can be multiple `entities` declarations in a file or only one. +EntityInstanceDeclaration := 'entity' Path 'instance' EntityInstance ';' +EntityInstance := Name ['in' EntityRefOrRefs] [['='] Record] ['tags' Tags] + +EntityInstancesDeclaration := 'entity' Path 'instances' '[' EntityInstanceNoEqualsList ']' ';' +EntityInstanceNoEqualsList := EntityInstanceNoEquals {',' EntityInstanceNoEquals} +EntityInstanceNoEquals := Name ['in' EntityRefOrRefs] [Record] ['tags' Tags] + +Path := IDENT {'::' IDENT} +Name := STR +EntityRefOrRefs := EntityRef | '[' [EntityRefOrRefs] ']' +EntityRef := Path '::' STR + +Record := '{' [KeyValues] '}' +KeyValues := Key ':' Value [',' | ',' KeyValues] +Key := IDENT | STR +Value := // TODO: scoped down from policy grammar, also need some bits from schema grammar + +Tags := Record + +IDENT := ['_''a'-'z''A'-'Z']['_''a'-'z''A'-'Z''0'-'9']* +STR := Fully-escaped Unicode surrounded by '"'s +``` + +## Potential future additions + +Not addressed in this proposal but open for future improvement: + +* Deduplication of values, e.g. similar to common types in schemas, the ability to define a record once and use it in multiple instances +* Annotations +* Intermixing of policies, instances, and/or schema in a single file + * Note the syntax leaves open the possibility for instances to be declared inline with a full entity type declaration ## Drawbacks @@ -203,22 +264,50 @@ Mitigating this problem is that it's easy and predictable to convert between the ### Requirement of good tooling -As a practical matter, having a custom entity literal syntax will require that we develop high-quality tools to help authors write in the syntax. - -Parse error messages need to be of good quality, and report common issues such as missing curly braces, missing semi-colons, incorrect keywords, etc. A formatting tool can help ensure standardized presentation. An IDE plugin can put these things together to provide interactive feedback. For the JSON syntax, IDE plugins already exist to flag parse errors and to format documents uniformly. +As a practical matter, having a custom entity literal syntax will require that we develop high-quality tools to help authors write in the syntax. -TODO Note that the problem of matching up curly braces in JSON-formatted schemas is more acute than in the custom syntax, since it's common to have deep nesting levels where matching is hard to eyeball. For example, in the `TinyTodo` JSON example, we have up to seven levels of direct nesting of curly braces, whereas the custom syntax has only three, and these levels are more visually isolated because of the other syntax in between them. +Parse error messages need to be of good quality, and report common issues such as missing curly braces, missing semi-colons, incorrect keywords, etc. A formatting tool can help ensure standardized presentation. An IDE plugin can put these things together to provide interactive feedback. For the JSON syntax, IDE plugins already exist to flag parse errors and to format documents uniformly. ### Greater implementation cost -Supporting a new syntax is an extra implementation cost, including the new tools mentioned above, now and going forward. More code/tools means more potential for bugs. +Supporting a new syntax is an extra implementation cost, including the new tools mentioned above, now and going forward. More code/tools means more potential for bugs. -## Alternatives +## Appendix A: Alternatives -One alternative would be to _replace_ the current JSON-based sytnax with the one in this proposal. This proposal would avoid the "cognitive burden" drawback mentioned above, but would be a disruptive, backward-incompatible change, and would lose the JSON format's benefits of existing tooling and easier programmatic schema construction. +### Replacing the JSON syntax + +One alternative would be to _replace_ the current JSON-based syntax with the one in this proposal. This proposal would avoid the "cognitive burden" drawback mentioned above, but would be a disruptive, backward-incompatible change, and would lose the JSON format's benefits of existing tooling and easier programmatic entity instance construction. Another alternative would be to adopt a [Yaml](https://en.wikipedia.org/wiki/YAML)-based syntax. This approach would meet our goals of greater information density and support for comments, and it would come with some existing tooling (such as IDE extensions). A downside of Yaml is that it provides _more_ than we need, with a lack of conciseness leading to confusing. We could make our own parser for a subset of Yaml we wish to support for schemas, but that may lead to a confusing user experience. Yaml's indentation-sensitive parsing also means that an indentation mistake will be silently accepted, leading to a confusing user experience. Our custom syntax is whitespace-insensitive, and having total control over the grammar means better context for error messages. -## Unresolved questions +### Instance expressions + +Instances could be defined using expressions. It's a little less familiar/similar to schema syntax than the selected design, and might be harder to do useful syntax highlighting, but it is relatively close to how extension types like `datetime` are written. It opens up the possibility of composition within policies and schemas, but that is only relevant if that would ever be needed. This syntax doesn't allow for reusable records. + +The syntax could be a file containing a simple list of instances. This would be more explicit that the content of the file is a set of entities and only a set of entities (like with JSON). + +``` +[ + Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { + attribute: "value" + }, { + tagName: "value" + }), + //subsequent entities +] +``` + +The syntax could alternatively use a keyword to make it a declaration, which would allow for reusable records. + +``` +entities [ + Namespace::SomeType::"entity_id"([Namespace::OtherType::"parent_entity_id"], { + attribute: "value", + }, { + tagName: "value" + }), + //subsequent entities +] +``` -The syntax design needs feedback and iteration. +This could allow for namespace blocks, but a complication is whether there could be multiple `entities` declarations in a file or only one.