Uber Data Consolidation

Uber’s incredible growth was partially enabled by an organic data infrastructure strategy, where each city was treated like its own company free to decide how to built their own technology stack. This created big challenges later, when Uber wanted to interoperate and unify data across all cities.

Keep Reading

“Current solutions for enterprise data modeling are often informal and ad hoc. Conexus’ use of [categorical mathematics] as an interlingua for data can dramatically increase productivity by removing barriers to data integration.”

– Joshua Shinavier, Uber

The Problem

Uber’s most complex engineered system is their data infrastructure. As Uber grew, each city was allowed to decide how to architect its own technology stack. The result was rapid growth of the company, and an overwhelmingly complex data integration challenge.



Microservices

7,000+ Protocol Buffers and Thrift IDLs



Datasets

260,000+ Hive Tables
15,000+ Kafka topics
6,000+ MySQL and PostgreSQL tables
2,000+ Vertica Tables
1,000+ Schemaless, Cassandra, Gairos, Apollo, Pilot and other tables

The Objective

Uber had developed many models and tools to aid in forecasting, risk and fraud analysis, but in most cases they only worked for the one city they were developed in. Uber wanted to be able to apply any model or analysis to any city, as well as consolidate data across all cities to answer questions on a global scale. Uber wanted data interoperability and consolidation across all their systems and cities.

The Approach

Conexus worked with Uber to develop a global meta data model and a way to translate schemas across all of Ubers different data technologies. The work was formalized as Algebraic Property Graphs, an extension of Categorical Data Integration and the Conexus platform.



White Paper

Dive into the technical details of how Conexus enabled Uber to interoperate between over a dozen different data technologies.

Read White Paper

The Results

Uber was able to leverage their new data interoperability capabilities to construct multiple knowledge graphs and metadata repositories. Data consumers in risk and fraud analysis, as well as application developers, engineers and business analysts could more easily find and consolidate the data they needed to achieve their respective goals.

Smash Data Bottlenecks

Unleash the value of all your data, faster. The future of your company depends on it.

Speak With an Expert

Resources - Company - News - Contact - Privacy Policy