-
Notifications
You must be signed in to change notification settings - Fork 15
Cedar syntax for entity data #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cdisselkoen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposal, and I definitely think we should accept some version of this proposal.
text/0000-entity-literal-syntax.md
Outdated
| @@ -0,0 +1,224 @@ | |||
| # Entity literal syntax | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't love the name "Entity literal syntax" for this; would "Cedar syntax for entities" or "Cedar syntax for entity data" make more sense? This would be analogous to the existing "Cedar syntax" and "JSON syntax" for policies, and "Cedar syntax" and "JSON syntax" for schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes with the question about the entity keyword. "Cedar syntax for entities" could mean the syntax for entity types that uses the entity keyword or the syntax for entity instances proposed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cedar syntax for entity data", then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "Cedar syntax for entity data". It captures the intent pretty clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the file name be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of renaming this PR haha
text/0000-entity-literal-syntax.md
Outdated
| Developing and iterating on schemas and policies involves creating and modifying lists of entities in order to test policies, which is very cumbersome with the existing entity syntax in JSON: | ||
|
|
||
| * JSON has low information density. Cedar schema was able to provide significant increases in density and readability relative to the JSON syntax. | ||
| * JSON does not support comments. This means that any design intent for a schema cannot be expressed inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was this intended to say "design intent for entity data"? does a concept of design intent even make sense for entity data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(That said, I think that allowing comments in the entity-data format is a win regardless)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the motivation was largely copied from RFC #24 for custom schema and I missed that update.
text/0000-entity-literal-syntax.md
Outdated
|
|
||
| ### Option 1: instance declarations | ||
|
|
||
| The schema syntax already defines a syntax for entity types where the values of parents and attributes are types, and actions are already sort of entity literals; replacing these with values provides the basis for this option. We would select a new keyword, using `instance` as an example (`entity` would be nice, but it's taken). This should make it easy to reuse the parsing for schema syntax (and make syntax highlighting straightforward), and the parser could identify when they were incorrectly used in schemas and vice versa. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice to use instance to avoid conflicting with the schema keyword entity is interesting. I agree with the text here that at first glance, entity rather than instance would be nice. Could we still reuse major parsing components if the Cedar syntax for schemas and for entity data both use the keyword entity? For that matter, there are many other keywords that would be shared between the two formats, like namespace, tags, and in; why are we worried about a conflict on entity but not on those other keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important that the syntax is unambiguous; if you provide entity types when Cedar is expecting entity instances or vice versa, it should be a syntactical error. namespace has the same meaning for both, so it's not a problem (entity instances can have the same namespaces as entity types). For tags and in, those occur within an entity type or instance declaration, so it's clear syntactically the difference in meaning. But if we use entity for both, how can the parser (and the human) tell the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My response to this would be that it's always clear from context whether we're looking at (or parsing) part of a schema or some entity data. There would be no APIs that would accept both schema syntax and entity-data syntax. In IDEs, schemas use the .cedarschema file extension, and presumably entity data files would use some different extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it should be unambiguously different without knowing the context (though I'm fine using the entity keyword if there's another good way of distinguishing the syntax). Inevitably someone will plug the wrong thing into the wrong API and if the parser can't tell the difference, either it will fail in confusing ways or worse it will accept it with no complaints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all assuming the "instance declaration" option is the best one, the others don't necessarily suffer from the same problem—what are your thoughts on the relative merits of the other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposal. It addresses the concern about entity having multiple meanings, and emphasizes that this is an instance named entity_id, while still clearly marking this declaration as related to an entity.
Sometime in the future (probably not in this RFC) this would be compatible with an extension to combine the schema and entity-data syntax like
entity SomeType in OtherType {
...
} instance "entity_id" in [OtherType::"blah"] {
...
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant to combining schema (which can be defined only once) and entity data (there can be many instances), would we want a way to define multiple instances for a type without repeating the entity <type> instance prefix? e.g.
entity SomeType instances [
"id1" ["parent"] {},
"id2" {}
];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea too.
As a small nit, I think repeating the in would increase readability especially for new users, i.e.,
entity SomeType instances [
"id1" in ["parent"] {},
"id2" {}
];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I buy that. I was thinking about the optional = before the record for entity types, and dispensing with both because = doesn't make sense in the list. So I guess for instance we'd have in and optional =, and for instances we'd have in but disallow =.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me! Two suggestions. First, instead of
entity SomeType instance "entity_id" [OtherNamespace::OtherType::"parent_entity_id"] ...
I'd write
entity SomeType instance "entity_id" in [OtherNamespace::OtherType::"parent_entity_id"] ...
Since as we've pointed out enum will have [...] as a list of instance IDs, putting in in front of the list of parents will hopefully communicate intent a little better, when comparing the two.
Second: If you have no attributes, instead of doing entity Color instance "green" {}; could we write entity Color instance "green"; ? This would more terse and match what you are allowed to do with enum. And, looking ahead, it would create parity in format if we ever extend enum types to have attributes (which we've also talked about for actions).
|
|
||
| Adding another custom format raises the bar for what customers need to know. They may ignore one format at first, but eventually they may need to learn both. For example, if one team uses the JSON format and another uses the custom format, but then the teams merge, they will each end up having to read/update the other's format. They may also fight about which format to move to. | ||
|
|
||
| Mitigating this problem is that it's easy and predictable to convert between the two formats, since the syntactic concepts line up very closely. The features you lose when converting from the new format to the JSON one would be 1/ any comments, and 2/ any use of intermingling of `action`, `entity`, and `type` declarations, since they must all be in their own sections in the JSON. Otherwise the features line up very closely and the only difference is syntax. We would expect to write conversion tools as part of implementing the new syntax (which are easily achieved by parsing in the new format and pretty-printing in JSON). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that easy conversion between the formats is a sufficient mitigation for this. And that everything said in the Motivation section outweighs the drawback of increased cognitive burden described here.
mwhicks1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! I really like the direction. Before finally approving, I'd love to see a revision of the proposal that:
- Proposes the change (Option 1?) in the main body, and puts alternatives in an appendix, for posterity.
- Has a Detailed Design section that fills in the details, e.g., around the particular context-free grammar we want to use, or at least more examples that unambiguously flesh out the details. RFC 24 is probably more than you need, but it's a good guide.
text/0000-entity-literal-syntax.md
Outdated
| ] | ||
| ``` | ||
|
|
||
| This would be more explicit that the content of the file is a set of entities and only a set of entities (like with JSON). It's a little less familiar/similar to schema syntax than Option 1, and might be harder to do useful syntax highlighting, but it is relatively close to how extension types like `datetime` are written. It opens up the possibility of composition within policies and schemas, but that is only relevant if that would ever be needed. This syntax doesn't allow for reusable records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to see examples of how datetime and all the other extensions would be represented. There's also some questions around how to represent datetimes that are before 0000-00-00. (Though I think we might want to not allow those to be represented here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datetimes would be represented exactly as they are in records in policies, e.g. datetime("2024-10-15")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue @philhassey is referring to is that it's possible, with the JSON representation of entities, to construct them with extension methods, like this:
"__extn": {
"fn": "offset",
"args": [
{
"__extn": {
"fn": "datetime",
"arg": "1970-01-01"
}
},
{
"__extn": {
"fn": "duration",
"arg": "11530782027359ms"
}
}
]
}
In particular, this representation creates a datetime with a string representation that can't be correctly parsed via the datetime(...) Cedar syntax.
The question then, is how such an entity would be represented in the Cedar syntax.
My personal hope is that this leads us to conclude that we shouldn't allow the offset-based JSON representation either. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to hoping we shouldn't allow it, but my suspicion is that we will not want to break backward compatibility.
How would we feel about not being able to express this in the Cedar syntax, while still allowing it in the JSON version? The two are no longer equivalent, but maybe that's Ok? What use-case might require inter-conversion? More likely I could imagine using the natural syntax, converting to JSON, and using that moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that requires serialization of entities to be a fallible operation? Are there any other known instances of entities that would fail to serialize due only to their value and not a failure of the serialization medium?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the broader point is that, so long as Entity::new() continues to allows pass extension method calls to be passed as attribute values (as opposed to just extension function calls like datetime(...)), it'll be possible to construct entities that can't be serialized in this format.
|
|
||
| a more compact syntax could be | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you walk me through a real world use case for this?
One use case I can imagine based on what I've seen some Cedar users do is to export the top n levels of an org hierarchy, since those rarely change. The top n levels of the org hierarchy is something that customers commonly cache in their application layer.
At build time, you can pull this information from your LDAP server, for instance, and then commit it to the repo in cedar entity format. This would make it more human-legible than having it in JSON (but you're not likely to be able to support comments here since they'd get wiped by the import process).
But if you did this, you probably would want to convert the entities back to json (or protobuf when it's available) to use in authz decisions because of performance.
Given that, why wouldn't you store the entities in regular JSON (as they are in the DB, not cedarJSON) or YAML instead of converting them to cedarJson or to cedar entity instance format? Wouldn't that achieve readability too?
Whats the use case you're imagining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be added to the motivation section; the use case is not really for deployed systems, but rather the initial iterative development phase. When you're in, for example, the Cedar playground, iterating on your schema and example polices, you're also iterating on the example entities. I've seen the JSON syntax for entities be a significant drag on that process. A Cedar syntax really lowers the barrier to adoption there, and even for production systems it's a good option for display of entities to humans.
@mwhicks1 will do |
|
I'm strongly in favor of this RFC and would love for it to be accepted and implemented. One request which might come into play when choosing between the implementation options: I would actually love to be able to have a single, human-readable document (i.e. not JSON) that describes all my entities AND policies. It would also be great to get template instantiations in there, but I won't hold my breath on that one. At first glance, I can't see anything in any of the options that would prevent such a thing but I wanted to put my two cents in in case it changes the desirability of any of the options. |
I agree. We should make sure that the proposed grammar for this RFC is compatible with the policy grammar, so that it would enable this option. I think it is, but acceptance of the RFC could be contingent on this requirement. We can see when it's implemented if some adjustments might be needed. |
| Record := '{' [KeyValues] '}' | ||
| KeyValues := Key ':' Value [',' | ',' KeyValues] | ||
| Key := IDENT | STR | ||
| Value := // TODO: scoped down from policy grammar, also need some bits from schema grammar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some help here. The policy grammar contains the grammar for values, but with a lot of stuff that does not make sense for instance declarations (computed expressions, variables, etc.) and my grasp of it is not strong enough to be confident that I'm pulling in the right things. I'm also not clear on what reserved identifiers are needed for this syntax, because they differ between policy and schema grammars.
| Key := IDENT | STR | ||
| Value := // TODO: scoped down from policy grammar, also need some bits from schema grammar | ||
|
|
||
| Tags := Record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values in tags need to be all the same type; can/should the grammar capture this?
Rendered