Scalable Data Architecture with Java Build Efficient Enterprise-Grade Data Architecting Solutions Using Java

Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clientsKey Features:Learn how to adapt to the ever-evolving data architecture technology landscapeUnderstand how to choose the best suited techn...

Descripción completa

Detalles Bibliográficos
Autor principal:	Banerjee, Sinchan (-)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham : Packt Publishing, Limited 2022.
Materias:	Software architecture. Java (Computer program language)
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009686282506719

Tabla de Contenidos:

Cover
Title Page
Copyright and Credits
Contributors
About the reviewers
Table of Contents
Preface
Section 1
Foundation of Data Systems
Chapter 1: Basics of Modern Data Architecture
Exploring the landscape of data engineering
What is data engineering?
Dimensions of data
Types of data engineering problems
Responsibilities and challenges of a Java data architect
Data architect versus data engineer
Challenges of a data architect
Techniques to mitigate those challenges
Summary
Chapter 2: Data Storage and Databases
Understanding data types, formats, and encodings
Data types
Data formats
Understanding file, block, and object storage
File storage
Block storage
Object storage
The data lake, data warehouse, and data mart
Data lake
Data warehouse
Data marts
Databases and their types
Relational database
NoSQL database
Data model design considerations
Summary
Chapter 3: Identifying the Right Data Platform
Technical requirements
Virtualization and containerization platforms
Benefits of virtualization
Containerization
Benefits of containerization
Kubernetes
Hadoop platforms
Hadoop architecture
Cloud platforms
Benefits of cloud computing
Choosing the correct platform
When to choose virtualization versus containerization
When to use big data
Choosing between on-premise versus cloud-based solutions
Choosing between various cloud vendors
Summary
Section 2
Building Data Processing Pipelines
Chapter 4: ETL Data Load
A Batch-Based Solution to Ingesting Data in a Data Warehouse
Technical requirements
Understanding the problem and source data
Problem statement
Understanding the source data
Building an effective data model
Relational data warehouse schemas
Evaluation of the schema design
Designing the solution
Implementing and unit testing the solution
Summary
Chapter 5: Architecting a Batch Processing Pipeline
Technical requirements
Developing the architecture and choosing the right tools
Problem statement
Analyzing the problem
Architecting the solution
Factors that affect your choice of storage
Determining storage based on cost
The cost factor in the processing layer
Implementing the solution
Profiling the source data
Writing the Spark application
Deploying and running the Spark application
Developing and testing a Lambda trigger
Performance tuning a Spark job
Querying the ODL using AWS Athena
Summary
Chapter 6: Architecting a Real-Time Processing Pipeline
Technical requirements
Understanding and analyzing the streaming problem
Problem statement
Analyzing the problem
Architecting the solution
Implementing and verifying the design
Setting up Apache Kafka on your local machine
Developing the Kafka streaming application
Unit testing a Kafka Streams application

Scalable Data Architecture with Java Build Efficient Enterprise-Grade Data Architecting Solutions Using Java

Ejemplares similares