Spark : the definitive guide big data processing made simple
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sec...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Sebastopol, CA :
O'Reilly
February 2018.
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630073506719 |
Tabla de Contenidos:
- Part 1. Gentle overview of big data and Spark. What is Apache Spark?
- A gentle introduction to Spark
- A tour of Spark's toolset
- Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview
- Basic structured operations
- Working with different types of data
- Aggregations
- Joins
- Data sources
- Spark SQL
- Datasets
- Part 3. Low-level APIs. Resilient distributed datasets (RDDs)
- Advanced RDDs
- Distributed shared variables
- Part 4. Production applications. How Spark runs on a cluster
- Developing Spark applications
- Deploying Spark
- Monitoring and debugging
- Performance tuning
- Part 5. Streaming. Stream processing fundamentals
- Structured streaming basics
- Event-time and stateful processing
- Structured streaming in production
- Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview
- Preprocessing and feature engineering
- Classification
- Regression
- Recommendation
- Unsupervised learning
- Graph analytics
- Deep learning
- Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr)
- Ecosystem and community.