Hadoop application architectures

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete...

Descripción completa

Detalles Bibliográficos
Otros Autores: Grover, Mark, author (author), Malaska, Ted, author, Seidman, Jonathan, author, Shapira, Gwen, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Sebastopol, California : O'Reilly Media, Inc 2015.
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009628181006719
Tabla de Contenidos:
  • ""Copyright""; ""Table of Contents""; ""Foreword""; ""Preface""; ""A Note About the Code Examples""; ""Who Should Read This Book""; ""Why We Wrote This Book""; ""Navigating This Book""; ""Conventions Used in This Book""; ""Using Code Examples""; ""Safari® Books Online""; ""How to Contact Us""; ""Acknowledgments""; ""Part I. Architectural Considerations for Hadoop Applications""; ""Chapter 1. Data Modeling in Hadoop""; ""Data Storage Options""; ""Standard File Formats""; ""Hadoop File Types""; ""Serialization Formats""; ""Columnar Formats""; ""Compression""; ""HDFS Schema Design""
  • ""Location of HDFS Files""""Advanced HDFS Schema Design""; ""HDFS Schema Design Summary""; ""HBase Schema Design""; ""Row Key""; ""Timestamp""; ""Hops""; ""Tables and Regions""; ""Using Columns""; ""Using Column Families""; ""Time-to-Live""; ""Managing Metadata""; ""What Is Metadata?""; ""Why Care About Metadata?""; ""Where to Store Metadata?""; ""Examples of Managing Metadata""; ""Limitations of the Hive Metastore and HCatalog""; ""Other Ways of Storing Metadata""; ""Conclusion""; ""Chapter 2. Data Movement""; ""Data Ingestion Considerations""; ""Timeliness of Data Ingestion""
  • ""Incremental Updates""""Access Patterns""; ""Original Source System and Data Structure""; ""Transformations""; ""Network Bottlenecks""; ""Network Security""; ""Push or Pull""; ""Failure Handling""; ""Level of Complexity""; ""Data Ingestion Options""; ""File Transfers""; ""Considerations for File Transfers versus Other Ingest Methods""; ""Sqoop: Batch Transfer Between Hadoop and Relational Databases""; ""Flume: Event-Based Data Collection and Processing""; ""Kafka""; ""Data Extraction""; ""Conclusion""; ""Chapter 3. Processing Data in Hadoop""; ""MapReduce""; ""MapReduce Overview""
  • ""Example for MapReduce""""When to Use MapReduce""; ""Spark""; ""Spark Overview""; ""Overview of Spark Components""; ""Basic Spark Concepts""; ""Benefits of Using Spark""; ""Spark Example""; ""When to Use Spark""; ""Abstractions""; ""Pig""; ""Pig Example""; ""When to Use Pig""; ""Crunch""; ""Crunch Example""; ""When to Use Crunch""; ""Cascading""; ""Cascading Example""; ""When to Use Cascading""; ""Hive""; ""Hive Overview""; ""Example of Hive Code""; ""When to Use Hive""; ""Impala""; ""Impala Overview""; ""Speed-Oriented Design""; ""Impala Example""; ""When to Use Impala""; ""Conclusion""
  • ""Chapter 4. Common Hadoop Processing Patterns""""Pattern: Removing Duplicate Records by Primary Key""; ""Data Generation for Deduplication Example""; ""Code Example: Spark Deduplication in Scala""; ""Code Example: Deduplication in SQL""; ""Pattern: Windowing Analysis""; ""Data Generation for Windowing Analysis Example""; ""Code Example: Peaks and Valleys in Spark""; ""Code Example: Peaks and Valleys in SQL""; ""Pattern: Time Series Modifications""; ""Use HBase and Versioning""; ""Use HBase with a RowKey of RecordKey and StartTime""; ""Use HDFS and Rewrite the Whole Table""
  • ""Use Partitions on HDFS for Current and Historical Records""