Factoshi is pleased to announce the beta release of Graff, a GraphQL wrapper for the factomd RPC API. Graff is a GraphQL API layer that can be placed in front of a factomd instance to allow developers to make complex, graph-based queries to factomd. In layman’s terms, it is a new way to interact with factomd that can reduce latency and bandwidth, whilst providing additional features that make it easier for developers to create an interactive user experience.

What is GraphQL?

In computer science, a graph is a type of abstract data structure that represents a network of smaller data structures known as nodes. Factomd’s data structures can naturally be described in terms of a graph. For instance, a factoid block is one type of data structure and a transaction is another type of data structure. The implicit graph is the relationship between these two data structures: a factoid block has transactions and transaction belongs to a factoid block.

These relationships are everywhere in factomd: a directory block has an entry credit block; an entry credit block has commits; commits have an entry; an entry has an entry block; an entry block was referenced by a directory block; and so on. A graph is a description of these data structures and their relationships to each other.

Factomd graph represented by the Graff schema. Almost every node in the graph is a factomd data structure.

So that's the graph part, but what about the QL? QL stands for query language. A query language is a type of language that enables you to query data. For example, SQL (note the QL) is a query language that lets you make queries to an SQL database. GraphQL, therefore, is a query language that allows you to make queries expressed in terms of a graph.

Why use a GraphQL API for factomd?

Me IRL

There are several advantages to using GraphQL to query factomd. To understand these advantages, it is useful to understand the problems associated with the current RPC API.

Overfetching

Currently, when a developer wants to fetch data from factomd, they would send a request using one of a number of finite methods that instructs factomd to send back a fixed data structure containing the data that they need.

For example, say I want the factoidBlockRef from the directory block at the tip of the chain. Using the current API, I would get that by sending factomd a request using the directory-block-head method. This works! Factomd would send me back the directory block head, which includes within it the factoidBlockRef.

But there is a problem here. Factomd not only sent me the factoidBlockRef, it also sent me lots of other stuff too. I got the keyMR, the height, the entryCreditBlockRef and a potentially very long list of entryBlockRefs. That is a lot of stuff that I didn't ask for and do not need, and it is using up my precious bandwidth.

Enter GraphQL. The beauty of GraphQL is that it lets you request exactly what fields you need, and you will get only those fields back. Nothing more, nothing less. So, to fetch the factoidBlockRef, I might send a query that looks like:

query {
  directoryBlockHead {
    factoidBlockRef
  }
}

In this query, I first state that I would like data contained in the directory block head. I then specify that I would like the field factoidBlockRef to be included in the response. The data I get back might look something like:

"directoryBlockHead" {
  "factoidBlockRef": "538b1396a83811ac32d938fbdb4126af7ede9834559ea948f6c3294d3b626e67"
}

Huzzah! I have reduced the bandwidth requirements for my application and everyone lived happily ever after.

Underfetching

But wait. Now I have a new problem. I don't really want the factoidBlockRef. What I really want is the factoid block head; I want the factoid block at the tip of the factoid chain. There is no factoid-block-head method on factomd, so I first have to get a reference to it using the directory-block-head method, and then I need to use the information returned from that request to make a second request for the factoid block itself.

Why is this an issue? The problem is that I need to make two synchronous network requests in order to fetch the data that I need from factomd. Simply put, I need to wait for one network request to complete before I can issue a second. That is two full round trips. So, whilst I have solved the problem of factomd sending back more information than I need, known as overfetching, I have not solved a second (and potentially more serious) problem, which is that factomd might send back less information than I need, known as underfetching or the N+1 problem.

Earlier on in this article I described how factomd's data structures have relationships between each other. These relationships between data structures can be described in terms of a graph: a directory block has a factoid block, and a factoid block belongs to a directory block. How can this knowledge be used to solve the problem of underfetching?

Well, what if, in the example above, we did not request the factoidBlockRef? What if, instead, we could request the factoid block itself? Then our query might look something like:

query {
  directoryBlockHead {
    factoidBlock {
      keyMR
      entryCreditRate
    }
  }
}

Here, we are specifying that we want the factoidBlock belonging to the directoryBlockHead, and within that factoid block, we want the fields keyMR and entryCreditRate. This is a real Graff query! Go ahead and paste it into the lefthand panel here then click the play button in the middle of the window. The response might look something like:

{
  "data": {
    "directoryBlockHead": {
      "factoidBlock": {
        "keyMR": "538b1396a83811ac32d938fbdb4126af7ede9834559ea948f6c3294d3b626e67",
        "entryCreditRate": 22000
      }
    }
  }
}

And there you have it: all the data you need, none of the data you don't, and only one request. You have successfully reduced the bandwidth requirements of your application and overall application latency.

Polling for new events

Say you have a Factom web wallet. For the sake of argument, we'll call it YourFactomWallet, or YFW.

YFW is an application that lets users send and receive factoids. It has a live UI, meaning that the UI will change automatically in response to certain events without the user needing to refresh the page. For example, when one of your addresses receives a transaction, it will increment the balance. Similarly, when one of your outgoing transactions is confirmed, it will mark the transaction as confirmed in the UI and decrement the balance.

How can this be achieved using the current factomd RPC API? Well, one way to do it is to ask factomd for the directory block head every x seconds, then compare that to a directory block you hold locally. When the directory block you receive back from factomd has a greater height than the directory block you already have, you can be sure the directory block is new. You can then use that information to grab all the transactions associated with that new height as per the overfetching and underfetching sections above. If those data contain anything relevant to the user, you can render the updates on the screen.

This is effective, but it is also costly. If you were worried about overfetching before, you can now multiply that problem by the number of useless blocks you need to discard. Factom.js has a feature called FactomEventEmitter (implemented by yours truly) which does exactly that. Every 7.5 seconds it asks for new data, and only once in 10 minutes (on Factom mainnet) will it actually get something interesting back.

This is a naive implementation of a polling algorithm which does not make any assumptions about block times. Something more sophisticated for Factom mainnet might make probabilistic requests based on the time since the last new block. Nevertheless, the problem remains: if you want to update your data in response to blockchain events, you are very likely going to discard much more data than you actually use.

Graff solves this problem by utilising GraphQL subscriptions. The client is able to subscribe to new factomd events that it is interested in, and those events will be pushed  from the server to the client over websockets. So, if YFW wanted to learn about new transactions for a given address, it could set up a subscription to do so:

subscription {
  newFactoidTransaction(address: "FA2MZs5wASMo9cCiKezdiQKCd8KA6Zbg2xKXKGmYEZBqon9J3ZKv") {
    totalInputs
    totalFactoidOutputs
    totalEntryCreditOutputs
    fees
  }
}

Using the above subscription, the client can inform Graff that it wants to be notified of all new transactions involving the address FA2MZs5wASMo9cCiKezdiQKCd8KA6Zbg2xKXKGmYEZBqon9J3ZKv, and that it would like totalInputs, totalFactoidOutputs, totalEntryCreditOutputs and fees for the transaction. No more polling, no more unused data.

Other good stuff

For fear of making this article painfully long, I will briefly summarise a few other advantages of using Graff to query factomd:

  1. Static typing: The schema Graff uses creates a contract between the server and the client that describes the shape of the response and scalar values for every type on the API.
  2. Upgradeability: It's super easy to upgrade. Fields can be deprecated and new fields can be added without disrupting users. Versioning is actively discouraged.
  3. Declarative queries: This has already been covered indirectly, but it is worth making it explicit. The client can declare what data it needs, and the API will do the heavy lifting to get it done. No need to work out how to navigate your way between factomd's data structures then attempt to implement that in code on the client.

Want to go deeper? Check out the official GraphQL website.

Summary

Graff is a GraphQL wrapper around the factomd RPC API. It can be used to make graph-based queries to factomd, which can reduce bandwidth requirements for your application and make it feel more snappy for the end user.

Head over to Factoshi's courtesy Graff playground to try it out.

Coming soon: a fullstack tutorial on setting up Graff and integrating it into your React application.