UBER Case Study
Conexus automates manual work on which tens of billions are spent annually.
Conexus also automates tasks that were ideally to be done, but were not because the system configuration was too slow and unreliable.
Sometimes, however, the work that Conexus automates is work previously not done because it is just too hard. Take Uber.
Background on Uber IT
Uber needs to understand the relationship between riders, drivers, and prices. This database has grown to encompass over 3.5Bn rides in 200+ cities. To manage all of this, it has created a very large number of models and an order of magnitude more databases. These databases have different code bases and connect with each other internally. They also interact with public
data and models like maps and traffic data.
Why is Uber’s tech like this?
Uber’s IT infrastructure system has evolved like this in part because of the loose coupling initially required between cities serviced by Uber. For example, to start up quickly, Uber Denver developed separate IT operations from Uber Seattle. Such highly distributed IT infrastructures are common in large enterprises. Speed trumps integration.
What did they want to solve?
To generate the most effective conclusions from their data analysis, Uber wanted to model the whole system. They wanted to compare regions and run experiments.
They wanted to answer questions such as:
- How many customers travel between cities?
- What kind of cars are most in demand on weekends?
- How does weather affect pricing by region?
What was the problem?
This data was stored in silos given the nature of how Uber’s system evolved. The analytics could, therefore, be done on individual cities, but not on regions or the whole global system. Even with an effectively open-ended technology budget and the ability to attract the highest caliber technologists, Uber was unable to model the whole system. All of their data could not function as one database. They had different systems and different architectures.
How did Conexus help?
What we did is to create a meta model in our platform (CQL) for a new knowledge “graph” with ten specific properties that they specified.
- 8/10 of the properties that they specified were satisfied in one hour. With just one hundred lines of CQL.
- The other 2/10 properties that they specified were immediately found to be not well-defined (versus traditional methods that cannot even identify failures).
- Uber liked Conexus CQL so much that it has advocated for a subset of CQL to be the 4th generation of Tinkerpop, the Apache Foundation’s standard for graph models. (Tinkerpop is a consortium of leading graph DB vendors including Google, Amazon, and Facebook.)
- Co-authorship with Uber of a technical paper submitted for publication.
- CQL licensing to deploy the knowledge graph at Uber at scale is now underway.
The Technical take
Uber’s problem is that although there is a high degree of “semantic” overlap between these schemas (a ride in Denver is pretty similar to a ride in Seattle), queries written against one schema cannot run against another; moreover, queries on the same schema can’t be run against two different underlying implementations (e.g., SQL on Oracle vs SQL on IBM).
To solve this problem traditionally, Uber tasks some of their 1000+ engineers with performing query rewriting by hand, and employs various ad-hoc software toolkits. They learned that this didn’t scale, so they developed a “meta data model” (called Dragon) for defining data models, such as relational, RDF, etc., as well as about a dozen data models (Apache Avro, Thrift, relational, etc). This gives Uber the ability to convert a schema from one implementation, say Oracle SQL, to another, say RDF. That was such a time saver that the team began to look for ways to automate more of the rewriting process (such as from schema to schema, as opposed to implementation to implementation). This is when they discovered Conexus CQL.
Upon engagement, Conexus discovered that a big chunk of Dragon was a fragment of Conexus CQL. Conexus then worked with Uber to formalize that chunk, resulting in a co-authored paper entitled “Algebraic Property Graphs—APG,” describing how Conexus CQL can be used to address many of the re-writing tasks they are interested in automating, such as migrating queries from one schema to another. Conexus is now working with Uber to deploy this APG technology at Uber and to extend it for other applications both inside and outside of Uber. As part of this, Uber has committed to open-sourcing their internal Dragon codebase, and to propose it as part of version 4 of the Apache Tinkerpop Graph project. (Their press release announcing this is being held up as part of their Covid-19 PR freeze).