Data algorithms
If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massi...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Beijing, China :
O'Reilly
2015.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629802006719 |
Tabla de Contenidos:
- ""Copyright""; ""Table of Contents""; ""Foreword""; ""Preface""; ""What Is MapReduce?""; ""Simple Explanation of MapReduce""; ""When to Use MapReduce""; ""What MapReduce Isn't""; ""Why Use MapReduce?""; ""Hadoop and Spark""; ""What Is in This Book?""; ""What Is the Focus of This Book?""; ""Who Is This Book For?""; ""Online Resources""; ""What Software Is Used in This Book?""; ""Conventions Used in This Book""; ""Using Code Examples""; ""Safari® Books Online""; ""How to Contact Us""; ""Acknowledgments""; ""Comments and Questions for This Book""; ""Chapter 1. Secondary Sort: Introduction""
- ""Solutions to the Secondary Sort Problem""""Implementation Details""; ""Data Flow Using Plug-in Classes""; ""MapReduce/Hadoop Solution to Secondary Sort""; ""Input""; ""Expected Output""; ""map() Function""; ""reduce() Function""; ""Hadoop Implementation Classes""; ""Sample Run of Hadoop Implementation""; ""How to Sort in Ascending or Descending Order""; ""Spark Solution to Secondary Sort""; ""Time Series as Input""; ""Expected Output""; ""Option 1: Secondary Sorting in Memory""; ""Spark Sample Run""; ""Option #2: Secondary Sorting Using the Spark Framework""
- ""Further Reading on Secondary Sorting""""Chapter 2. Secondary Sort: A Detailed Example""; ""Secondary Sorting Technique""; ""Complete Example of Secondary Sorting""; ""Input Format""; ""Output Format""; ""Composite Key""; ""Sample Run-Old Hadoop API""; ""Input""; ""Running the MapReduce Job""; ""Output""; ""Sample Run-New Hadoop API""; ""Input""; ""Running the MapReduce Job""; ""Output""; ""Chapter 3. Top 10 List""; ""Top N, Formalized""; ""MapReduce/Hadoop Implementation: Unique Keys""; ""Implementation Classes in MapReduce/Hadoop""; ""Top 10 Sample Run""; ""Finding the Top 5""
- ""Finding the Bottom 10""""Spark Implementation: Unique Keys""; ""RDD Refresher""; ""Spark's Function Classes""; ""Review of the Top N Pattern for Spark""; ""Complete Spark Top 10 Solution""; ""Sample Run: Finding the Top 10""; ""Parameterizing Top N""; ""Finding the Bottom N""; ""Spark Implementation: Nonunique Keys""; ""Complete Spark Top 10 Solution""; ""Sample Run""; ""Spark Top 10 Solution Using takeOrdered()""; ""Complete Spark Implementation""; ""Finding the Bottom N""; ""Alternative to Using takeOrdered()""; ""MapReduce/Hadoop Top 10 Solution: Nonunique Keys""; ""Sample Run""
- ""Chapter 4. Left Outer Join""""Left Outer Join Example""; ""Example Queries""; ""Implementation of Left Outer Join in MapReduce""; ""MapReduce Phase 1: Finding Product Locations""; ""MapReduce Phase 2: Counting Unique Locations""; ""Implementation Classes in Hadoop""; ""Sample Run""; ""Spark Implementation of Left Outer Join""; ""Spark Program""; ""Running the Spark Solution""; ""Running Spark on YARN""; ""Spark Implementation with leftOuterJoin()""; ""Spark Program""; ""Sample Run on YARN""; ""Chapter 5. Order Inversion""; ""Example of the Order Inversion Pattern""
- ""MapReduce/Hadoop Implementation of the Order Inversion Pattern""