Robert Crowe - TensorFlow Developer Advocate

Taking Machine Learning Research to Production: Solving Real Problems

Most of the focus in the ML community is on research, which is exciting and important. Equally important however is bringing that research to production applications to solve real-world problems, but the issues and approaches for doing that are often poorly understood.
An ML application in production must address all of the issues of modern software development methodology, as well as issues unique to ML and data science. Often ML applications are developed and trained using tools like notebooks and suffer from inherent limitations in testability, scalability across clusters, training/serving skew, and the modularity and reusability of components. In addition, ML application measurement often emphasizes top level metrics, leading to issues in model fairness as well as predictive performance across user segments. The user experience of any ML application is unique to the model’s performance on that user’s input data, so if the model doesn’t perform well on that particular data segment then the user has a poor experience.
We discuss the use of ML pipeline architectures for implementing production ML applications, and in particular we review Google’s experience with TensorFlow Extended (TFX). Google uses TFX for large scale ML applications, and offers an open-source version to the community. TFX scales to very large training sets and very high request volumes, and enables strong software methodology including testability, hot versioning, and deep performance analysis. Robert Crowe is a data scientist and TFX Developer Advocate at Google and will discuss how developers can move their ML [...]

