Structuring a complex schema¶
When writing computer programs of even moderate complexity, it’s commonly accepted that “structuring” the program into reusable functions is better than copying-and-pasting duplicate bits of code everywhere they are used. Likewise in JSON Schema, for anything but the most trivial schema, it’s really useful to structure the schema into parts that can be reused in a number of places. This chapter will present the tools available for reusing and structuring schemas as well as some practical examples that use those tools.
Schema Identification¶
Like any other code, schemas are easier to maintain if they can be broken down into logical units that reference each other as necessary. In order to reference a schema, we need a way to identify a schema. Schema documents are identified by non-relative URIs.
Schema documents are not required to have an identifier, but you will need one if you want to reference one schema from another. In this document, we will refer to schemas with no identifier as “anonymous schemas”.
In the following sections we will see how the “identifier” for a schema is determined.
Note
URI terminology can sometimes be unintuitive. In this document, the following definitions are used.
- URI [1] or
non-relative URI: A full URI containing a scheme (
https
). It may contain a URI fragment (#foo
). Sometimes this document will use “non-relative URI” to make it extra clear that relative URIs are not allowed. - relative reference [2]: A
partial URI that does not contain a scheme (
https
). It may contain a fragment (#foo
). - URI-reference [3]: A
relative reference or non-relative URI. It may contain a URI
fragment (
#foo
). - absolute URI [4] A
full URI containing a scheme (
https
) but not a URI fragment (#foo
).
Note
Even though schemas are identified by URIs, those identifiers are
not necessarily network-addressable. They are just identifiers.
Generally, implementations don’t make HTTP requests (https://
)
or read from the file system (file://
) to fetch schemas.
Instead, they provide a way to load schemas into an internal schema
database. When a schema is referenced by it’s URI identifier, the
schema is retrieved from the internal schema database.
JSON Pointer¶
In addition to identifying a schema document, you can also identify subschemas. The most common way to do that is to use a JSON Pointer in the URI fragment that points to the subschema.
A JSON Pointer describes a slash-separated path to traverse the keys
in the objects in the document. Therefore,
/properties/street_address
means:
- find the value of the key
properties
- within that object, find the value of the key
street_address
The URI
https://example.com/schemas/address#/properties/street_address
identifies the highlighted subschema in the following schema.
{
"$id": "https://example.com/schemas/address",
"type": "object",
"properties": {
"street_address":
{ "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
Named Anchors¶
A less common way to identify a subschema is to create a named anchor
in the schema using the $id
keyword and using that name in the URI
fragment. When the $id
keyword contains a URI fragment, the
fragment defines a named anchor using the value of the fragment. Named
anchors must start with a letter followed by any number of letters,
digits, -
, _
, :
, or .
.
$id
is just id
(without the dollar sign).Note
If a named anchor is defined that doesn’t follow these naming rules, then behavior is undefined. Your anchors might work in some implementation, but not others.
The URI https://example.com/schemas/address#street_address
identifies the subschema on the highlighted part of the following
schema.
{
"$id": "https://example.com/schemas/address",
"type": "object",
"properties": {
"street_address":
{
"$id": "#street_address",
"type": "string"
},
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
Note
JSON Schema doesn’t define how $id
should be interpreted when
it contains both fragment and non-fragment URI parts. Therefore,
when setting a named anchor, you should not use non-fragment URI
parts in the URI-reference.
Base URI¶
Using non-relative URIs can be cumbersome, so any URIs used in JSON Schema can be URI-references that resolve against the schema’s base URI resulting in a non-relative URI. This section describes how a schema’s base URI is determined.
Note
Base URI determination and relative reference resolution is defined by RFC-3986. If you are familiar with how this works in HTML, this section should feel very familiar.
Retrieval URI¶
The URI used to fetch a schema is known as the “retrieval URI”. It’s often possible to pass an anonymous schema to an implementation in which case that schema would have no retrieval URI.
Let’s assume a schema is referenced using the URI
https://example.com/schemas/address
and the following schema is
retrieved.
{
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
The base URI for this schema is the same as the retrieval URI,
https://example.com/schemas/address
.
$id¶
You can set the base URI using the $id
keyword. The value of
$id
is a URI-reference that resolves against the Retrieval URI.
The resulting URI is the base URI for the schema.
$id
is just id
(without the dollar sign).Note
This is analogous to the <base>
tag HTML.
Let’s assume the URI https://example.com/schema/address
and
https://example.com/schema/billing-address
both identify the
following schema.
{
"$id": "/schemas/address",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
No matter which of the two URIs is used to retrieve this schema, the
base URI will be https://example.com/schemas/address
, which is the
result of the $id
URI-reference resolving against the
Retrieval URI.
However, using a relative reference when setting a base URI can be
problematic. For example, we couldn’t use this schema as an
anonymous schema because there would be no Retrieval URI and you
can’t resolve a relative reference against nothing. For this and other
reasons, it’s recommended that you always use an absolute URI when
declaring a base URI with $id
.
The base URI of the following schema will always be
https://example.com/schemas/address
no matter what the
Retrieval URI was or if it’s used as an anonymous schema.
{
"$id": "https://example.com/schemas/address",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
Note
The behavior when setting a base URI that contains a URI fragment is undefined and should not be used because implementations may treat them differently.
$ref¶
A schema can reference another schema using the $ref
keyword. The
value of $ref
is a URI-reference that is resolved against the
schema’s Base URI. When evaluating a schema, an implementation uses
the resolved identifier to retrieve the referenced schema and
evaluation is continued from the retrieved schema.
$ref
can be used anywhere a schema is expected. When an object
contains a $ref
property, the object is considered a reference,
not a schema. Therefore, any other properties you put there will not
be treated as JSON Schema keywords and will be ignored by the
validator.
For this example, let’s say we want to define a customer record, where each customer may have both a shipping and a billing address. Addresses are always the same—they have a street address, city and state—so we don’t want to duplicate that part of the schema everywhere we want to store an address. Not only would that make the schema more verbose, but it makes updating it in the future more difficult. If our imaginary company were to start doing international business in the future and we wanted to add a country field to all the addresses, it would be better to do this in a single place rather than everywhere that addresses are used.
{
"$id": "https://example.com/schemas/customer",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"]
}
The URI-references in $ref
resolve against the schema’s Base URI
(https://example.com/schemas/customer
) which results in
https://example.com/schemas/address
. The implementation retrieves
that schema and uses it to evaluate the “shipping_address” and
“billing_address” properties.
Note
When using $ref
in an anonymous schema, relative references may
not be resolvable. Let’s assume this example is used as an
anonymous schema.
{
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "https://example.com/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"]
}
The $ref
at /properties/shipping_address
can resolve just
fine without a non-relative base URI to resolve against, but the
$ref
at /properties/billing_address
can’t resolve to a
non-relative URI and therefore can’t can be used to retrieve the
address schema.
definitions¶
Sometimes we have small subschemas that are only intended for use in
the current schema and it doesn’t make sense to define them as
separate schemas. Although we can identify any subschema using JSON
Pointers or named anchors, the definitions
keyword gives us a
standardized place to keep subschemas intended for reuse in the
current schema document.
Let’s extend the previous customer schema example to use a common
schema for the name properties. It doesn’t make sense to define a new
schema for this and it will only be used in this schema, so it’s a
good candidate for using definitions
.
{
"$id": "https://example.com/schemas/customer",
"type": "object",
"properties": {
"first_name": { "$ref": "#/definitions/name" },
"last_name": { "$ref": "#/definitions/name" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"definitions": {
"name": { "type": "string" }
}
}
$ref
isn’t just good for avoiding duplication. It can also be
useful for writing schemas that are easier to read and maintain.
Complex parts of the schema can be defined in definitions
with
descriptive names and referenced where it’s needed. This allows
readers of the schema to more quickly and easily understand the schema
at a high level before diving into the more complex parts.
Note
It’s possible to reference an external subschema, but generally you
want to limit a $ref
to referencing either an external schema
or an internal subschema defined in definitions
.
Recursion¶
The $ref
keyword may be used to create recursive schemas that
refer to themselves. For example, you might have a person
schema
that has an array of children
, each of which are also person
instances.
{
"type": "object",
"properties": {
"name": { "type": "string" },
"children": {
"type": "array",
"items": { "$ref": "#" }
}
}
}
A snippet of the British royal family tree
{
"name": "Elizabeth",
"children": [
{
"name": "Charles",
"children": [
{
"name": "William",
"children": [
{ "name": "George" },
{ "name": "Charlotte" }
]
},
{
"name": "Harry"
}
]
}
]
}
Above, we created a schema that refers to itself, effectively creating
a “loop” in the validator, which is both allowed and useful. Note,
however, that a $ref
referring to another $ref
could cause
an infinite loop in the resolver, and is explicitly disallowed.
{
"definitions": {
"alice": { "$ref": "#/definitions/bob" },
"bob": { "$ref": "#/definitions/alice" }
}
}
Bundling¶
Working with multiple schema documents is convenient for development,
but it is often more convenient for distribution to bundle all of your
schemas into a single schema document. This can be done using the
$id
keyword in a subschema. When $id
is used in a subschema,
it creates a new Base URI that any references in that subschema and
any descendant subschemas will resolve against. The new Base URI is
the value of $id
resolved against the Base URI of the schema it
appears in.
$id
is just id
(without the dollar sign).This example shows the customer schema example and the address schema example bundled into a single schema document.
{
"$id": "https://example.com/schemas/customer",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"definitions": {
"address": {
"$id": "/schemas/address",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "$ref": "#/definitions/state" }
},
"required": ["street_address", "city", "state"],
"definitions": {
"state": { "enum": ["CA", "NY", "... etc ..."] }
}
}
}
}
Notice that the $ref
keywords from the customer schema resolve the
same way they did before except that the address schema is now defined
at /definitions/address
instead of a separate schema document. You
should also see that "$ref": "#/definitions/state"
resolves to the
definitions
keyword in the address schema rather than the one at
the top level schema like it would if the subschema $id
wasn’t
used.
You might notice that this creates a situation where there are
multiple ways to identify a schema. Instead of referencing
/schemas/address
(https://example.com/schemas/address
) You
could have used #/definitions/address
(https://example.com/schemas/customer#/definitions/address
). While
both of these will work, the one shown in the example is preferred.
Note
It is unusual to use $id
in a subschema when developing
schemas. It’s generally best not to use this feature explicitly and
use schema bundling tools to construct bundled schemas if such a
thing is needed.