Scalable Data Architecture with Java Build Efficient Enterprise-Grade Data Architecting Solutions Using Java

Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clientsKey Features:Learn how to adapt to the ever-evolving data architecture technology landscapeUnderstand how to choose the best suited techn...

Descripción completa

Detalles Bibliográficos
Autor principal: Banerjee, Sinchan (-)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, Limited 2022.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009686282506719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Contributors
  • About the reviewers
  • Table of Contents
  • Preface
  • Section 1
  • Foundation of Data Systems
  • Chapter 1: Basics of Modern Data Architecture
  • Exploring the landscape of data engineering
  • What is data engineering?
  • Dimensions of data
  • Types of data engineering problems
  • Responsibilities and challenges of a Java data architect
  • Data architect versus data engineer
  • Challenges of a data architect
  • Techniques to mitigate those challenges
  • Summary
  • Chapter 2: Data Storage and Databases
  • Understanding data types, formats, and encodings
  • Data types
  • Data formats
  • Understanding file, block, and object storage
  • File storage
  • Block storage
  • Object storage
  • The data lake, data warehouse, and data mart
  • Data lake
  • Data warehouse
  • Data marts
  • Databases and their types
  • Relational database
  • NoSQL database
  • Data model design considerations
  • Summary
  • Chapter 3: Identifying the Right Data Platform
  • Technical requirements
  • Virtualization and containerization platforms
  • Benefits of virtualization
  • Containerization
  • Benefits of containerization
  • Kubernetes
  • Hadoop platforms
  • Hadoop architecture
  • Cloud platforms
  • Benefits of cloud computing
  • Choosing the correct platform
  • When to choose virtualization versus containerization
  • When to use big data
  • Choosing between on-premise versus cloud-based solutions
  • Choosing between various cloud vendors
  • Summary
  • Section 2
  • Building Data Processing Pipelines
  • Chapter 4: ETL Data Load
  • A Batch-Based Solution to Ingesting Data in a Data Warehouse
  • Technical requirements
  • Understanding the problem and source data
  • Problem statement
  • Understanding the source data
  • Building an effective data model
  • Relational data warehouse schemas
  • Evaluation of the schema design
  • Designing the solution
  • Implementing and unit testing the solution
  • Summary
  • Chapter 5: Architecting a Batch Processing Pipeline
  • Technical requirements
  • Developing the architecture and choosing the right tools
  • Problem statement
  • Analyzing the problem
  • Architecting the solution
  • Factors that affect your choice of storage
  • Determining storage based on cost
  • The cost factor in the processing layer
  • Implementing the solution
  • Profiling the source data
  • Writing the Spark application
  • Deploying and running the Spark application
  • Developing and testing a Lambda trigger
  • Performance tuning a Spark job
  • Querying the ODL using AWS Athena
  • Summary
  • Chapter 6: Architecting a Real-Time Processing Pipeline
  • Technical requirements
  • Understanding and analyzing the streaming problem
  • Problem statement
  • Analyzing the problem
  • Architecting the solution
  • Implementing and verifying the design
  • Setting up Apache Kafka on your local machine
  • Developing the Kafka streaming application
  • Unit testing a Kafka Streams application