-
Notifications
You must be signed in to change notification settings - Fork 0
Phase Overview
JSON Schema is used for describing and validating JSON data. Phase provides a simple, intuitive, easy to write and easy to process schema language for declaring type information that can be transformed into formal JSON Schema, used for validating JSON data.
Phase provides annotations that deal with core JSON Schema transformations, but developers can register custom annotations to augment runtime validation and formatting needs.
This document covers using Phase for generating JSON Schema. Custom annotations will be covered separately.
JSON is an effective format for representing and exchanging structured data. One of its virtues is that not only is it easy for machines to generate and parse, it is relatively easy for humans to read and write data as well.
JSON Schema is intended to be a clear, human- and machine-readable format for describing what constitutes valid JSON data. In other words, JSON defines the rules for formatting structured data; JSON Schema defines the constraints on what is valid data and how that data can be structured.
JSON Schema is itself specified in JSON format. This means a system only needs one type of parser to parse both data and schema (the constraints to enforce in validating the data). In other words, when processing data, a JSON validator uses the same parser for parsing JSON schema and data inputs.
This is efficient from a runtime processing perspective. However, expressing constraints in terms of structured data isn't ideal from a human perspective. Potentially easier formats for expressing rules can be transformed into valid JSON Schema.
For example, consider the following constraint written as JSON data:
"id": {
"type": "integer",
"format": "int64"
}
The following hypothetical declaration conveys the constraints in a more programmer-friendly format:
id (integer, format=int64)
This is the goal of Phase: to provide the ability to express constraints in a programmer-friendly format
For a brief overview of JSON, see [here](JSON Overview). For a brief overview of JSON Schema, see [here](JSON Schema Overview).
[JSON](JSON Overview) is lightweight, text-based format for representing data. For example, consider the following information:
Fido is a one-year old dog that obeys the following commands: come, sit, heel, fetch, and play dead.
This can be expressed as a JSON document:
{
"name": "Fido",
"category": "dog",
"age": 1,
"commands": [ "come", "sit", "heel", "fetch", "play dead" ]
}
The interesting thing about this example is that the JSON representation of the data is probably easier to mentally parse than the English language paragraph above it (at least for most experienced programmers).
JSON is a reasonably human-readable format for data. That doesn't mean it's the best way to convey semantic meaning.
For example, what does the following abstract syntax tree represent?
{
"statement": {
"operation": {
"operator": "*",
"operands": {
"left": {
"operation": {
"operator": "+",
"operands": {
"left": {
"variable": {
"name": "a"
}
},
"right": {
"variable": {
"name": "b"
}
}
}
}
},
"right": {
"literal": {
"type": "integer",
"value": 2
}
}
}
}
}
}
This preceeding document has actually been formatted with indentation to make parsing easier for humans. A machine is perfectly fine parsing the same thing without any unnecessary whitespace:
{"statement":{"operation":{"operator":"*","operands":{"left":{"operation":{"operator":"+","operands":{"left":{"variable":{"name":"a"}},"right":{"variable":{"name":"b"}}}}},"right":{"literal":{"type":"integer","value":2}}}}}}
If we know that humans will be editing this type of abstract syntax tree by hand, we could probably come up with a simpler format, like the following:
{
"*": [
{ "+": [a, b] },
2
]
}
Of course, for a machine, this can also be expressed more compactly:
{"*":[{"+":[a,b]},2]}
Manually editing JSON can be tedious, requiring fastidious attention to balancing object braces and array brackets. When editing JSON by hand, we tend to use newlines and indentation not only to make it easier to parse visually, but to aid in keeping braces and brackets balanced. As a consequence, JSON structures generally tend to grow quickly in the vertical dimension, presenting its own challenges to mental processing. Even with the care paid to alignment, it can become difficult to keep track of node relationships for relatively modestly-sized data structures.
Consider that all four of the previous JSON structures represent the following expression, which most humans would certainly find easier to both write and comprehend.
(a + b) * 2
One of the primary reasons that humans use high level programming languages is to make it easier to express and comprehend the semantic meaning of various statements and expressions, reducing both cognitive overhead and editing burden by providing an abstraction over the way a machine expects instructions and data to be represented.
For example, JavaScript has an object literal syntax that inspired JSON. Using the first example we presented for Fido the dog, the data is represented in JavaScript like this:
{
"name": "Fido",
"category": "dog",
"age": 1,
"commands": [ "come", "sit", "heel", "fetch", "play dead" ]
}
In other words, it's identical. For editing convenience, JavaScript allows us to omit the quotes for the property names as long as they are valid identifiers according to JavaScript. Also, single quotes can be substituted for double, according to programmer preference.
{
name: 'Fido',
category: 'dog',
age: 1,
commands: [ 'come', 'sit', 'heel', 'fetch', 'play dead' ]
}Obviously JavaScript has much more to offer than object literal syntax, which is why (a + b) * 2 is a valid expression in the language. Otherwise we would need to evaluate code as data using object literal syntax like {"*":[{"+":[a,b]},2]}.
We choose the level of abstraction that is convenient for humans, not computers. This is one of the goals of Phase.
Given the following example JSON representation of a specific pet:
{
"id": 101,
"name": "fido",
"category": "dog",
"photoUrls": [ "https://images.petsbest.com/marketing/blog/Dogs-Bite.jpg" ],
"tags" [{ "id": 1, "name": "goofy" }],
"status": "available"
}
The following is a snippet of the complete petstore JSON Schema for specifying valid Pet data:
{
"definitions": {
"Tag": {
"type": "object",
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"name": {
"type": "string"
}
}
},
"Pet": {
"type": "object",
"required": [
"name",
"photoUrls"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"category": {},
"name": {
"type": "string"
},
"photoUrls": {
"type": "array",
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"items": {
"$ref": "#/definitions/Tag"
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
}
}
}
}
The following is the same thing expressed as a Phase schema:
tag {
id integer @format('int64')
name string
}
pet {
id integer @format('int64')
name string @required
category {}
photoUrls [string] @required
tags [$tag]
status string @description('pet status in the store')
@enum('available', 'pending', 'sold')
}