opensource.google.com

Menu

Cayley: graphs in Go

Wednesday, June 25, 2014


Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google. It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day.

When I moved to New York last year, I saw just how far the concepts of Freebase and its data had spread through Google’s worldwide offices. I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with.

With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.

So today we are excited to release Cayley, an open source graph database.

Cayley is a spiritual successor to graphd; it shares a similar query strategy for speed. While not an exact replica of it’s predecessor, it brings it’s own features to the table:
RESTful API
Multiple (modular) backend stores, such as LevelDB and MongoDB
Multiple (modular) query languages
Easy to get started
Simple to build on top of as a library
and of course
Open Source

Cayley is written in Go, which was a natural choice. As a backend service that depends upon speed and concurrent access, Go seemed like a good fit. Go did not disappoint; with a fantastic standard library and easy access to open source libraries from the community, the necessary building blocks were already there. Combined with Go’s effective concurrency patterns compared to C, creating a performance-competitive successor to graphd became a reality.

To get a sense of Cayley, check out the I/O Bytes video we created where we “Build A Small Knowledge Graph”. The video includes a quick introduction to graph stores as well as an example of processing Freebase and Schema.org linked data.


You can also check out the demo dataset in a live instance running on Google App Engine. It’s running with the sample dataset in the repository — 30,000 movies and their actors, roles, and directors using Freebase film schema. For a more-than-trivial query, try running the following code, both as a query and as a visualization; what you’ll see is the neighborhood of the given actor and how the actors who co-star with that actor interact with each other:

costar = 
g.M().In("/film/performance/actor").In("/film/film/starring")


function getCostars(x) {
 return g.V(x).As("source").In("name")
         .Follow(costar).FollowR(costar)
         .Out("name").As("target")
}


function getActorNeighborhood(primary_actor) {
 actors = getCostars(primary_actor).TagArray()
 seen = {}
 for (a in actors) {
   g.Emit(actors[a])
   seen[actors[a].target] = true
 }
 seen[primary_actor] = false
 actor_list = []
 for (actor in seen) {
   if (seen[actor]) {
     actor_list.push(actor)
   }
 }
 getCostars(actor_list).Intersect(g.V(actor_list)).ForEach(function(d)
{
   if (d.source < d.target) {
     g.Emit(d)
   }
 })
}

getActorNeighborhood("Humphrey Bogart")
To get involved, check out the project on GitHub and join the mailing list. But most importantly, have fun building your own graphs!

By Barak Michener, Software Engineer, Knowledge NYC

.