Informal Data Transformation Considered Harmful

Abstract

We take a popular position that AI systems are limited more by the integrity of the data they are learning from than by the sophistication of their algorithms. We also take an uncommon position that this limitation is not overcome by the data lake approach of cleaning data after it is loaded. Rather, we suggest formal and automatic guarantees on how data integrity remains preserved during migration, integration and other enterprise transformations. This can avoid constant revalidation of programs and data for each particular use case.

Share This