Deploying Spark ML Pipelines in Production on AWS

Translating a Spark application from running in a local environment to running on a production cluster in the cloud requires several critical steps, including publishing artifacts, installing dependencies, and defining the steps in a pipeline. This video is a hands-on guide through the process of de...

Descripción completa

Detalles Bibliográficos
Otros Autores: Slepicka, Jason, author (author)
Formato: Video
Idioma:Inglés
Publicado: O'Reilly Media, Inc 2017.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630968706719
Descripción
Sumario:Translating a Spark application from running in a local environment to running on a production cluster in the cloud requires several critical steps, including publishing artifacts, installing dependencies, and defining the steps in a pipeline. This video is a hands-on guide through the process of deploying your Spark ML pipelines in production. You’ll learn how to create a pipeline that supports model reproducibility—making your machine learning models more reliable—and how to update your pipeline incrementally as the underlying data change. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Amazon Web Services such as S3, EMR, and EC2; Bash, Docker, and REST. Understand how various cloud ecosystem components interact (i.e., Amazon S3, EMR, EC2, and so on) Learn how to architect the components of a cloud ecosystem into an end-to-end model pipeline Explore the capabilities and limitations of Spark in building an end-to-end model pipeline Learn to write, publish, deploy, and schedule an ETL process using Spark on AWS using EMR Understand how to create a pipeline that supports model reproducibility and reliability Jason Slepicka is a senior data engineer with Los Angeles based DataScience, where he builds pipelines and data science platform infrastructure. He has a decade of experience integrating data to support efforts like fighting human trafficking for DARPA, exploring the evolution of evolvability in yeast, and tracking intruders in computer networks. Jason has both a Bachelor's and Master’s in Computer Science from the University of Arizona and is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.
Notas:Title from title screen (Safari, viewed January 15, 2018).
Release date from resource description page (Safari, viewed January 15, 2018).
Descripción Física:1 online resource (1 video file, approximately 23 min.)