Scalable Data Architecture with Java Build Efficient Enterprise-Grade Data Architecting Solutions Using Java
Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clientsKey Features:Learn how to adapt to the ever-evolving data architecture technology landscapeUnderstand how to choose the best suited techn...
Autor principal: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited
2022.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009686282506719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Contributors
- About the reviewers
- Table of Contents
- Preface
- Section 1
- Foundation of Data Systems
- Chapter 1: Basics of Modern Data Architecture
- Exploring the landscape of data engineering
- What is data engineering?
- Dimensions of data
- Types of data engineering problems
- Responsibilities and challenges of a Java data architect
- Data architect versus data engineer
- Challenges of a data architect
- Techniques to mitigate those challenges
- Summary
- Chapter 2: Data Storage and Databases
- Understanding data types, formats, and encodings
- Data types
- Data formats
- Understanding file, block, and object storage
- File storage
- Block storage
- Object storage
- The data lake, data warehouse, and data mart
- Data lake
- Data warehouse
- Data marts
- Databases and their types
- Relational database
- NoSQL database
- Data model design considerations
- Summary
- Chapter 3: Identifying the Right Data Platform
- Technical requirements
- Virtualization and containerization platforms
- Benefits of virtualization
- Containerization
- Benefits of containerization
- Kubernetes
- Hadoop platforms
- Hadoop architecture
- Cloud platforms
- Benefits of cloud computing
- Choosing the correct platform
- When to choose virtualization versus containerization
- When to use big data
- Choosing between on-premise versus cloud-based solutions
- Choosing between various cloud vendors
- Summary
- Section 2
- Building Data Processing Pipelines
- Chapter 4: ETL Data Load
- A Batch-Based Solution to Ingesting Data in a Data Warehouse
- Technical requirements
- Understanding the problem and source data
- Problem statement
- Understanding the source data
- Building an effective data model
- Relational data warehouse schemas
- Evaluation of the schema design
- Designing the solution
- Implementing and unit testing the solution
- Summary
- Chapter 5: Architecting a Batch Processing Pipeline
- Technical requirements
- Developing the architecture and choosing the right tools
- Problem statement
- Analyzing the problem
- Architecting the solution
- Factors that affect your choice of storage
- Determining storage based on cost
- The cost factor in the processing layer
- Implementing the solution
- Profiling the source data
- Writing the Spark application
- Deploying and running the Spark application
- Developing and testing a Lambda trigger
- Performance tuning a Spark job
- Querying the ODL using AWS Athena
- Summary
- Chapter 6: Architecting a Real-Time Processing Pipeline
- Technical requirements
- Understanding and analyzing the streaming problem
- Problem statement
- Analyzing the problem
- Architecting the solution
- Implementing and verifying the design
- Setting up Apache Kafka on your local machine
- Developing the Kafka streaming application
- Unit testing a Kafka Streams application