Understanding JSON Schema Lexical and Dynamic Scopes
Most of the keywords defined
by the JSON Schema organization can be either evaluated on their own, or by
only considering the values of their adjacent keywords. For example, the type
keyword is independent of any other keywords, while the additionalProperties
keyword depends on the properties
and patternProperties
keywords defined in
the same schema object.
If you want to learn more about keyword dependencies, check out the Static Analysis of JSON Schema article by Greg Dennis.
However, there is a small set of keywords whose evaluation depend on the
scope they are in. These keywords are $ref
, $dynamicRef
,
unevaluatedItems
, and unevaluatedProperties
. Additionally, there is also a
set of keywords that affect the scope they are declared in. These keywords
are $id
, $schema
, $anchor
, and $dynamicAnchor
.
JSON Schema defines two types of scopes for the purpose of URI resolution: the lexical scope and the dynamic scope. Understanding how these scopes work is essential for mastering some of the most advanced (and often confusing!) features of JSON Schema, such as dynamic referencing.
Schema Resources
Before we jump into lexical and dynamic scopes, lets review some JSON Schema fundamentals.
The $id
keyword defines the URI of a
schema. While this keyword is
typically set at the top level, any
subschema may declare it to
distinguish itself with a different URI. For example, the following schema
defines 4 identifiers, some relative and some absolute:
In JSON Schema
parlance,
we say that the $id
keyword introduces a new schema resource, and that the
top level schema resource is referred to as the root schema resource.
Consider the following example. This schema consists of 3 schema resources,
each highlighted using a different color: the root schema resource (red), the
schema resource at /properties/foo
(blue), and the schema resource at
/properties/bar
(green). Note that the subschema at /properties/baz
is part
of the root schema resource, as it does not introduce a new identifier:
Note that children schema resources are not considered part of the parent
schema resource. For example, in the previous figure, https://example.com/foo
or https://example.com/bar
are separate schema resources and not part of
the root schema resource, despite their structural relationship.
Schemas as Directed Graphs
JSON Schema is a recursive data structure. In the context of schema resources,
this means that a schema resource may introduce nested schema resources (like
we saw on the previous section) and use referencing keywords (like $ref
) to
point to external schema resources, creating a directed graph of schema
resources.
Consider the following example. In the top left, a root schema resource named
https://example.com/origin
that declares a nested schema resource named
https://example.com/nested
(at /properties/bar
) and references an external
schema resource named https://example.com/destination
(from
/properties/foo/$ref
). In the bottom left, a root schema resource named
https://example.com/destination
that references a nested schema resource
called https://example.com/nested-string
(from /items/$ref
). On the right,
a directed graph representation of the relationship between these schema
resources:
As you will see, thinking of a schema as a directed graph of schema resources greatly helps in understanding both lexical and dynamic scopes.
Lexical Scope
Under the graph analogy from the previous section, the lexical scope of a schema consists of the node being evaluated. In other words, the lexical scope of a schema consists of the entire schema resource to which it belongs.
Consider the following sequence of examples. On the left, a JSON Schema with a
single nested schema resource. On the right, the corresponding directed graph
representations for the root schema resource called
https://example.com/person
and the nested schema resource called
https://example.com/surname
. At each step of the evaluation process, we gray
out the parts of the schema and of the directed graph that are not part of the
lexical scope.
The evaluation process starts with the top level schema. The lexical scope at that point is the root schema resource, and the nested schema resource is out of scope.
Then, we enter the properties
applicator, and if the instance defines a
firstName
property, we get into the subschema at /properties/firstName
.
This subschema is part of the root schema resource (as it does not declare its
own identifier), so the lexical scope remains the same as the previous step.
Finally, if the instance defines a lastName
property, we follow the
properties
applicator into the subschema at /properties/lastName
. This
subschema defines a new schema resource, so the lexical scope at this point is
the nested schema resource, and the root schema resource is out of scope.
Note that by definition, the lexical scope of any subschema can be statically determined without taking instances into account, just as we did here.
Lexical Scope and Anchors
As another practical example, consider the $anchor
keyword that defines a
location-independent
identifier
for a schema. This keyword not only affects the schema object it is defined in,
but also its lexical scope. This is why declaring the same anchor identifier
more than once in the same schema resource is an error (a clash in the lexical
scope), while it is possible to declare the same anchor identifier on different
schema resources (as the lexical scopes are different):
Following References
When the evaluation process encounters a reference keyword, it abandons the lexical scope of the reference schema and enters the lexical scope of the destination schema.
If the reference points to a subschema within the same schema resource, the lexical scope remains the same. Coming back to the graph analogy, each node represents a schema resource, so the evaluation process remains at the same node. However, if the reference points to a subschema on a different schema resource, the schema resource of the destination becomes the new lexical scope. In the graph analogy, the evaluation process follows an arrow to another node.
Within Schema Resources
In the following example, the reference at /items/$ref
points to
/$defs/person-name
. The destination schema is part of the same schema
resource (the root schema resource), so the lexical scope remains the same:
Across Schema Resources
Now consider the following sequence of examples. On the left, a JSON Schema
called https://example.com/point-in-time
with a nested schema resource (at
/$defs/timestamp
) and a reference to an external schema called
https://example.com/epoch
(from /anyOf/1/$ref
). On the right, the
corresponding directed graph representations of the root schema resource, the
nested schema resource, and the external schema resource. Like before, at each
step of the evaluation process, we gray out the parts of the schema and of the
directed graph that are not part of the lexical scope.
The evaluation process starts with the top level schema. The lexical scope at that point is the root schema resource, and both the nested schema resource and the external schema resource are out of scope:
Then, we enter the first branch of the anyOf
logic applicator and follow the
reference at /anyOf/0/$ref
(highlighted in red) into /$defs/timestamp
. This
subschema has its own identifier, so the lexical scope becomes the nested
schema resource and both the root schema resource and the external schema
resource go out of scope:
Finally, we go back to the root schema resource, enter the second branch of the
anyOf
logic applicator, and follow the remote reference at /anyOf/1/$ref
(highlighted in red) into https://example.com/epoch
. This external schema is
by definition a separate schema resource. Therefore, it becomes the new lexical
scope. This time, both the root schema resource and its nested schema resource
are out of scope:
Dynamic Scope
To recap, the lexical scope of a schema consists of its enclosing schema resource. In comparison, the dynamic scope of a schema consists of the stack of schema resources evaluated so far. Coming back to our analogy of a schema as a graph, the dynamic scope corresponds to the ordered sequence of nodes that were visited by the evaluation process.
Consider the following sequence of examples. In the top left, a root schema
resource named https://example.com/person
that declares two nested schema
resources: https://example.com/name
(at /properties/name
) and
https://example.com/age
(at /properties/age
). In the bottom left, an
example instance that successfully validates against the schema. Note that the
instance does not declare the age
optional property. On the right, a directed
graph representation of the relationship between these schema resources.
Similar to how we did before, we gray out the parts of the schema and of the
directed graph that are not part of the dynamic scope.
The evaluation process starts with the top level schema. The dynamic scope at that point is the root schema resource, and the nested schema resources are out of scope. So far the lexical and dynamic scope align:
Because the instance defines a name
property, we enter the properties
applicator into the subschema at /properties/name
. This subschema introduces
a new schema resource. Therefore, the dynamic scope now consists of both the
root schema resource and the nested schema resource called
https://example.com/name
, in order:
In comparison to the lexical scope, the dynamic scope of a schema cannot always
be statically determined, as the evaluation path often depends on the instance.
For example, for schemas that make use of logic applicator keywords such as
if
or oneOf
, the ordered sequence of schema resources in scope may vary
depending on the characteristics of the instance.
Following References
So far, we've learned that for the lexical scope, following a reference consists of abandoning the lexical scope of the origin schema and entering the lexical scope of the destination schema. In comparison, for the dynamic scope, following a reference to another schema resource involves retaining the current dynamic scope and pushing the destination schema resource to the top of the stack.
Within Schema Resources
Just like with the lexical scope, if a reference points to a subschema within the same schema resource, the dynamic scope remains the same. In other words, if the destination schema resource is the same as the schema resource at the top of the stack, the dynamic scope is not modified. Therefore, until the evaluation process encounters a reference to another schema resource (either local or remote), the lexical scope and the dynamic scope align:
Across Schema Resources
Leaving the simple case behind, lets consider an example consisting of local
and remote references across schema resources. In the top left, an example
instance and a root schema resource named https://example.com
that declares
two nested schema resources: https://example.com/name
(at /properties/name
)
and https://example.com/person
(at /$defs/person
) where the former
references the latter (from /properties/name/$ref
). Furthermore,
https://example.com/person
references an anchored schema called item
(from
/$defs/person/$ref
) that is part of an external schema resource called
https://example.com/people
shown in the bottom left. On the right, a directed
graph representation of the relationship between these schema resources and the
dynamic scope.
Like the other examples so far, the evaluation process starts with the top level schema. The dynamic scope at that point is the root schema resource, and all other schema resources are out of scope:
Because the instance defines a name
property, we enter the properties
applicator into the subschema at /properties/name
. This subschema introduces
a new schema resource. Therefore, the dynamic scope now consists of
https://example.com
(the root schema resource) followed by
https://example.com/name
(the nested schema resource at /properties/name
):
The https://example.com/name
schema resource references the other nested
schema resource: https://example.com/person
. After following this reference,
the dynamic scope now consists of https://example.com
(the root schema
resource), followed by https://example.com/name
(the nested schema resource
at /properties/name
), followed by https://example.com/person
(the nested
schema resource at /$defs/person
):
Now comes an interesting case. We are currently evaluating the nested schema
resource called https://example.com/person
. This schema resource points to
the remote schema called https://example.com/people
(the people
part of the
people#item
URI reference), but does not land at its root. Instead, it lands
at the subschema in /items
(where the item
anchor from the people#item
URI reference is located). This subschema is part of the root schema resource,
so the dynamic scope now consists of https://example.com
(the root schema
resource), followed by https://example.com/name
(the nested schema resource
at /properties/name
), followed by https://example.com/person
(the nested
schema resource at /$defs/person
), followed by https://example.com/people
:
The Dynamic Scope as a Stack
At the beginning of this section, we said that the dynamic scope of a schema consists of the stack of schema resources evaluated so far. However, our examples so far only considered pushing schema resources to the top the of stack.
In traditional programming languages, program execution typically involves procedures calling other procedures, creating what is referred to in Computer Science as a call stack. Eventually, a procedure will not call any other procedures. When such leaf procedures finish executing, the call stack will unwind (a pop operation) and control will return to the caller frame.
If you are having trouble understanding the previous paragraph, you might enjoy watching Call Stacks - CS50 Shorts by Harvard University.
The JSON Schema dynamic scope works in the same way. At some point, a schema resource will not reference any other schema resource. Then, the dynamic scope will unwind, popping the last schema resource from the stack.
Consider the following sequence of examples. In the top left, a root schema
resource named https://example.com/integer
that makes use of the if
,
then
, and else
logic applicators to check whether a positive integer is
even or odd and produce a corresponding title
annotation. Note that each
subschema is a separate schema resource: https://example.com/check
(at
/if
), https://example.com/even
(at /then
), and https://example.com/odd
(at /else
). In the bottom left, the even integer instance
42.
On the right, a directed graph representation of the relationship between these
schema resources and the dynamic scope.
As usual, the evaluation process starts with the top level schema. The dynamic scope at that point is the root schema resource, and all other schema resources are out of scope:
Next, we enter the if
applicator that checks whether the integer instance is
even or odd. This subschema declares a new schema resource called
https://example.com/check
, which is pushed onto the stack. Therefore the
dynamic scope consists of https://example.com/integer
followed by
https://example.com/check
:
The https://example.com/check
nested schema resource does not reference any
other schema resource. When the evaluation process completes and determines
that the instance is an even integer, the stack unwinds, the
https://example.com/check
schema resource is popped, and the evaluation
process returns to the root schema resource. Therefore the dynamic scope is
back to just https://example.com/integer
:
Because the if
subschema successfully validated the instance, we enter the
then
applicator. This subschema declares a new schema resource called
https://example.com/even
, which is pushed onto the stack. Therefore the
dynamic scope consists of https://example.com/integer
followed by
https://example.com/even
:
Like before, the https://example.com/even
nested schema resource does not
reference any other schema resource. Therefore, the evaluation process returns
once more to the root schema resource, the dynamic scope is back to just
https://example.com/integer
, and the evaluation process completes:
Summary
Understanding how static and dynamic scopes work is essential for gaining a deeper understanding of JSON Schema. The most important points to remember are summarized in the following table:
Comparison Point | Lexical Scope | Dynamic Scope |
---|---|---|
Definition | Consists of the schema resource being evaluated | Consists of the stack of schema resources evaluated so far |
Determining the scope | Can be statically determined without taking instances into account | Cannot always be statically determined. It may vary depending on the instance |
Following references | Consists of abandoning the lexical scope of the origin schema and entering the lexical scope of the destination schema | Consists of pushing the destination schema resource to the top of the dynamic scope stack |
In a future post, we will build on top of the concepts introduced in this
article to demystify how dynamic referencing ($dynamicRef
and
$dynamicAnchor
) works.
If you enjoyed this content and want to put your JSON Schema skills into practice in the data industry, check out my O'Reilly book: Unifying Business, Data, and Code: Designing Data Products using JSON Schema. You can also connect with me on LinkedIn.
Image by Christina Morillo from Pexels.