How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

"When you access a web page, bidders such as Criteo must determine in a few dozens of milliseconds if they want to purchase the advertising space on the page. At that moment, a real-time auction takes place, and once you remove all the communication exchange delays, it leaves a handful of milli...

Descripción completa

Detalles Bibliográficos
Autor Corporativo: O'Reilly TensorFlow World Conference (-)
Otros Autores: Kowalski, Nicolas, on-screen presenter (onscreen presenter), Antoniotti, Axel, on-screen presenter
Formato: Vídeo online
Idioma:Inglés
Publicado: [Place of publication not identified] : O'Reilly Media 2020.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009822832006719
Descripción
Sumario:"When you access a web page, bidders such as Criteo must determine in a few dozens of milliseconds if they want to purchase the advertising space on the page. At that moment, a real-time auction takes place, and once you remove all the communication exchange delays, it leaves a handful of milliseconds to compute exactly how much to bid. In the past year, Criteo has put a large amount of effort into reshaping its in-house machine learning stack responsible for making such predictions--in particular, opening it to new technologies such as TensorFlow. Unfortunately, even for simple logistic regression models and small neural networks, Criteo's initial TensorFlow implementations saw inference time increase by 100, going from 300 microseconds to 30 milliseconds. Nicolas Kowalski and Axel Antoniotti outline how Criteo approached this issue, discussing how Criteo profiled its model to understand its bottleneck; why commonly shared solutions such as optimizing TensorFlow build for the target hardware, freezing and cleaning up the model, and using accelerated linear algebra (XLA) ended up being lackluster; and how Criteo rewrote is models from scratch, reimplementing cross-features and hashing functions using low-level TF operations in order to factorize as much as possible all TensorFlow nodes in its model."--Resource description page.
Notas:Title from resource description page (viewed July 21, 2020).
This session is from the 2019 O'Reilly TensorFlow World Conference in Santa Clara, CA.
Descripción Física:1 online resource (1 streaming video file (38 min., 29 sec.)) : digital, sound, color