Architecting data-intensive applications develop scalable, data-intensive, and robust applications the smart way

Architect and design data-intensive applications and, in the process, learn how to collect, process, store, govern, and expose data for a variety of use cases Key Features Integrate the data-intensive approach into your application architecture Create a robust application layout with effective messa...

Full description

Bibliographic Details
Other Authors:	Kumar, Anuj, author (author)
Format:	eBook
Language:	Inglés
Published:	Birmingham, England : Packt 2018.
Edition:	1st edition
Subjects:	Application software > Development. Software architecture.
See on Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630737306719

Table of Contents:

Cover
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Exploring the Data Ecosystem
What is a data ecosystem?
A complex set of interconnected data
Data environment
What constitutes a data ecosystem?
Data sharing
Traffic light protocol
Information exchange policy
Handling policy statements
Action policy statements
Sharing policy statements
Licensing policy statements
Metadata policy statements
The 3 V's
Volume
Variety
Velocity
Use cases
Use case 1 - Security
Use case 2 - Modem data collection
Summary
Chapter 2: Defining a Reference Architecture for Data-Intensive Systems
What is a reference architecture?
Problem statement
Reference architecture for a data-intensive system
Component view
Data ingest
Data preparation
Data processing
Workflow management
Data access
Data insight
Data governance
Data pipeline
Oracle's information management conceptual reference architecture
Conceptual view
Oracle's information management reference architecture
Data process view
Reference architecture - business view
Real-life use case examples
Machine learning use case
Data enrichment use case
Extract transform load use case
Desired properties of a data-intensive system
Defining architectural principles
Principle 1
Principle 2
Principle 3
Principle 4
Principle 5
Principle 6
Principle 7
Listing architectural assumptions
Architectural capabilities
UI capabilities
Content mashup
Multi-channel support
User workflow
AR/VR support
Service gateway/API gateway capabilities
Security
Traffic control
Mediation
Caching
Routing
Service orchestration
Business service capabilities
Microservices
Messaging.
Distributed (batch/stream) processing
Data capabilities
Data partitioning
Data replication
Summary
Chapter 3: Patterns of the Data Intensive Architecture
Application styles
API Platform
Message-oriented application style
Micro Services application styles
Communication styles
Combining different application styles
Architectural patterns
The retry pattern
The circuit breaker
Throttling
Bulk heads
Event-sourcing
Command and Query Responsibility Segregation
Summary
Chapter 4: Discussing Data-Centric Architectures
Coordination service
Reliable messaging
Distributed processing
Distributed storage
Lambda architecture
Kappa architecture
A brief comparison of different leading No-Sql data stores
Summary
Chapter 5: Understanding Data Collection and Normalization Requirements and Techniques
Data lineage
Apache Atlas
Apache Atlas high-level architecture
Apache Falcon
Data quality
Types of data sources
Data collection system requirements
Data collection system architecture principles
High-level component architecture
High-level architecture
Service gateway
Discovery server
Architecture technology mapping
An introduction to ETCD
Scheduler
Designing the Micro Service
Summary
Chapter 6: Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination
Query-Data pipelines
Event-Data Pipelines
Topology 1
Topology 2
Topology 3
Resilience
High-availability
Availability Chart
Clustering
Clustering and Network Partitions
Mirrored queues
Persistent Messages
Data Manipulation and Security
Use Case 1
Use Case 2
Exchanges
Guidelines on choosing the right Exchange Type
Headers versus Topic Exchanges
Routing
Header-Based Content Routing.
Topic-Based Content Routing
Alternate Exchanges
Dead-Letter Exchanges
Summary
Chapter 7: Building a Robust and Fault-Tolerant Data Collection System
Apache Flume
Flume event flow reliability
Flume multi-agent flow
Flow multiplexer
Apache Sqoop
ELK
Beats
Load-balancing
Logstash
Back pressure
High-availability
Centralized collection of distributed data
Apache Nifi
Summary
Chapter 8: Challenges of Data Processing
Making sense of the data
What is data processing?
The 3 + 1 Vs and how they affect choice in data processing design
Cost associated with latency
Classic way of doing things
Sharing resources among processing applications
How to perform the processing
Where to perform the processing
Quality of data
Networks are everywhere
Effective consumption of the data
Summary
Chapter 9: Let Us Process Data in Batches
What do we mean by batch processing
Lambda architecture and batch processing
Batch layer components and subcomponents
Read/extract component
Normalizer component
Validation component
Processing component
Writer/formatter component
Basic shell component
Scheduler/executor component
Processing strategy
Data partitioning
Range-based partitioning
Hash-based partitioning
Distributed processing
What are Hadoop and HDFS
NameNode
DataNode
MapReduce
Data pipeline
Luigi
Azkaban
Oozie
AirFlow
Summary
Chapter 10: Handling Streams of Data
What is a streaming system?
Capabilities (and non-capabilities) of a streaming application
Lambda architecture's speed layer
Computing real time views
High-level reference architecture
Samza architecture
Architectural concepts
Event-streaming layer
Apache Kafka as an event bus
Message persistence
Persistent Queue Design.
Message batch
Kafka and the sendfile operation
Compression
Kafka streams
Stream processing topology
Notion of time in stream processing
Samza's stream processing API
The scheduler/executor component of the streaming architecture
Processing concepts and tradeoffs
Processing guarantees
Micro-batch stream processing
Windowing
Types of windows
Summary
References
Chapter 11: Let Us Store the Data
The data explosion problem
Relational Database Management Systems and Big data
Introducing Hadoop, the Big Elephant
Apache YARN
Hadoop Distributed Filesystem
HDFS architecture principles (and assumptions)
High-level architecture of HDFS
HDFS file formats
HBase
Understanding the basics of HBase
HBase data model
HBase architecture
Horizontal scaling with automatic sharding of HBase tables
HMaster, region assignment, and balancing
Components of Apache HBase architecture
Tips for improved performance from your HBase cluster
Graph stores
Background of the use case
Scenario
Solution discussion
Bank fraud data model (as can be designed in a property graph data store such as Neo4J)
Semantic graph
Linked data
Vocabularies
Semantic Query Language
Inference
Stardog
GraphQL queries
Gremlin
Virtual Graphs - a Unifying DAO
Structured data
CVS
BITES - Unstructured/Semistructured document store
Structured data extraction
Text extraction
Document queries
Highly-available clusters
Guarantees
Scaling up
Integration with SPARQL
Data Formats
Data integrity and validating constraints
Strict parsing of RDF
Integrity Constraint Validation
Monitoring and operation
Performance
Summary
Further reading
Chapter 12: When Data Dissemination is as Important as Data Itself
Data dissemination.
Communication protocol
Target audience
Use case
Response schema
Communication channel
Data dissemination architecture in a threat intel sharing system
Threat intel share - backend
RT query processor
View builder
Threat intel share - frontend
AWS Lambda
AWS API gateway
Cache population
Cache eviction
Discussing the non-functional aspects of the preceding architecture
Non-functional use cases for dissemination architecture
Elastic search and free text search queries
Summary
Other Books You May Enjoy
Index.

Architecting data-intensive applications develop scalable, data-intensive, and robust applications the smart way

Similar Items