GraphQL Schema Modeling: Part 1

The power of GraphQL is that it allows a client to query all parts of a system simultaneously, flexibly and safely. The difficulty of GraphQL is that you now need to publish a schema that describes every public-facing part of your system while being consistent, coherent and understandable. This is often a positive challenge — the need to publish a consistent public API can guide backend development to better patterns — but it is a challenge nevertheless.

At Findmypast we’ve been early adopters of GraphQL and have recently become early adopters of schema stitching as a way to keep large schemata manageable. This is Part 1 in a series of posts about patterns and pitfalls we have seen when authoring schemata.

Pattern: Entities and Value Objects

An Entity is an object defined by its identity. The data associated with the identity might change but the entity is still the same “thing” and is treated by the system as the same. For example, a User would be an Entity because you can imagine properties of that User changing (email address, name, address) while it still referring to the same individual. Entities are modeled as GraphQL Objects with an id field:

type User {
    id: ID!
    name: String!
    email: String!
    address: Address
    activeSubscription: Subscription
    familyTree: FamilyTree
}

Many GraphQL client frameworks will leverage the id field for automatic cache updating and normalisation. Even if they don’t it’s useful to ensure that clients can treat two entities with the same ID as the same object by having them follow a few rules:

An ID must uniquely identify an entity for that specific entity type. Watch out for accidental id clashes or two parts of the system using the same entity to represent data from non-synced data sources.
Given an ID and entity type the client should be able to fetch any field from that entity. It’s possible to work with entities that don’t fulfil this requirement but you will quickly run into trouble designing resolvers if entities aren’t independently queryable.

Value objects, meanwhile, are objects which are identified not by an identity field but by their value. Two value objects are the same object if they contain the same data and different if they do not. The most important consequence of this is that it is impossible to mutate a value object: if you change any data, it is no longer the same object.

All scalars are by necessity value objects but GraphQL Objects may also be value objects. Take the Address object above, modeled thusly:

type Address {
    streetAddress: String!
    postcode: String!
    county: String!
    country: String!
}

This is a composite value object. The only operation you can perform with an Address is to create a new one. Do not be fooled by the fact it has child fields. Those fields might be other composite value objects or even entities! That doesn’t mean it needs to be converted into an entity.

Pitfall: Entities masquerading as Value Objects

A common pitfall is to express something as a value object in the schema when it is an entity in the backend model. Take FamilyTree above. This is a simplified model in which a user can create only one family tree on the service (it’s the kind of feature we provide in Findmypast). Let’s say that when creating it for the first time there was no use case that required the tree to be separate from the user object, so we modeled FamilyTree as a value object and were done with it. Saved us writing code we didn’t need to fetch a tree by ID after all.

However, design smells quickly start appearing when designing mutations. Say we want users to be able to add family members to their tree:

type Mutation {
    addPersonToTree(userId: ID!): AddPersonToTreePayload
}

type AddPersonToTreePayload {
    success: Boolean!
    message: String
    updatedUser: User
}

Having to use the user ID to update a tree already feels janky and then we realise that the mutation has to return a whole User to update the client. If we tolerate this and keep going down this path, eventually we’ll end up having to implement multiple trees per account or the capability to share trees with other users and we’ll be faced with a model that simply cannot be used for that but would now be very expensive to refactor. A little extra work making entities up front can save you a lot in the long run.

Pitfall: Non-idempotent queries

GraphQL queries should be idempotent. That is, if executed with the same variables and the same context they should return identical results. What does this have to do with entities and value objects? Well, remember that entities are identical if they share an ID while value objects are identical only if they contain the exact same data. Errors in modeling objects can easily introduce dangerous non-idempotency to your queries.

Imagine that in the family tree example above we still resist remodeling FamilyTree as an entity. To share a tree, we just give it a sharing ID and make a Query that fetches a tree to display to a guest user:

type Query {
    sharedFamilyTree(sharingId: ID!): FamilyTree
}

We have now introduced non-idempotency into the system. The tree owner could be making modifications to their tree, causing the same query with the same variables to return different results. In the client we would need to have a way to trigger a refetch to update the view. However, since the object does not have an ID it is not enough to have a reference to the FamilyTree to request an update. The UI component needs to know what sharingId was used originally to fetch it. This can lead to incredibly messy client code where pieces of additional information not logically part of any object have to be passed around everywhere. It can also make caching and schema stitching harder in subtle ways.

Pitfall: Entity cannot be fetched by ID

This represents the case where you do make an object an entity but decide against making it queryable by ID. This often happens when interacting with backend services that don’t expose their entities, making it very laborious to add that kind of query. Saving yourself the effort can sometimes be justified but I’ve more often seen such a short-sighted decision come back to bite the team that took it.

If an entity cannot be fetched independently:

Returning an updated object from mutations must now be done manually in every case, rather than relying on the query-by-ID endpoint.
There’s no easy way of resolving a reference to the entity elsewhere. If a Record can only be fetched as part of a Search then what do we do when a User contains a list of Record IDs as ViewedRecords? With the query-by-ID endpoint we could easily resolve those IDs into Record objects during a query.
If a UI component wants to provide the user an option to refresh the data it cannot do so with a reference to the entity, it also needs the full query the entity was originally fetched with (the same problem we got with idempotency failures).

Conclusion

Spend some time thinking about your design before you start writing a graphQL API. Make things entities with unique IDs even if it’s a little extra work. And stay tuned for Part 2 where I’ll talk about pagination strategies.