Architecting data-intensive applications develop scalable, data-intensive, and robust applications the smart way
Architect and design data-intensive applications and, in the process, learn how to collect, process, store, govern, and expose data for a variety of use cases Key Features Integrate the data-intensive approach into your application architecture Create a robust application layout with effective messa...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt
2018.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630737306719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Exploring the Data Ecosystem
- What is a data ecosystem?
- A complex set of interconnected data
- Data environment
- What constitutes a data ecosystem?
- Data sharing
- Traffic light protocol
- Information exchange policy
- Handling policy statements
- Action policy statements
- Sharing policy statements
- Licensing policy statements
- Metadata policy statements
- The 3 V's
- Volume
- Variety
- Velocity
- Use cases
- Use case 1 - Security
- Use case 2 - Modem data collection
- Summary
- Chapter 2: Defining a Reference Architecture for Data-Intensive Systems
- What is a reference architecture?
- Problem statement
- Reference architecture for a data-intensive system
- Component view
- Data ingest
- Data preparation
- Data processing
- Workflow management
- Data access
- Data insight
- Data governance
- Data pipeline
- Oracle's information management conceptual reference architecture
- Conceptual view
- Oracle's information management reference architecture
- Data process view
- Reference architecture - business view
- Real-life use case examples
- Machine learning use case
- Data enrichment use case
- Extract transform load use case
- Desired properties of a data-intensive system
- Defining architectural principles
- Principle 1
- Principle 2
- Principle 3
- Principle 4
- Principle 5
- Principle 6
- Principle 7
- Listing architectural assumptions
- Architectural capabilities
- UI capabilities
- Content mashup
- Multi-channel support
- User workflow
- AR/VR support
- Service gateway/API gateway capabilities
- Security
- Traffic control
- Mediation
- Caching
- Routing
- Service orchestration
- Business service capabilities
- Microservices
- Messaging.
- Distributed (batch/stream) processing
- Data capabilities
- Data partitioning
- Data replication
- Summary
- Chapter 3: Patterns of the Data Intensive Architecture
- Application styles
- API Platform
- Message-oriented application style
- Micro Services application styles
- Communication styles
- Combining different application styles
- Architectural patterns
- The retry pattern
- The circuit breaker
- Throttling
- Bulk heads
- Event-sourcing
- Command and Query Responsibility Segregation
- Summary
- Chapter 4: Discussing Data-Centric Architectures
- Coordination service
- Reliable messaging
- Distributed processing
- Distributed storage
- Lambda architecture
- Kappa architecture
- A brief comparison of different leading No-Sql data stores
- Summary
- Chapter 5: Understanding Data Collection and Normalization Requirements and Techniques
- Data lineage
- Apache Atlas
- Apache Atlas high-level architecture
- Apache Falcon
- Data quality
- Types of data sources
- Data collection system requirements
- Data collection system architecture principles
- High-level component architecture
- High-level architecture
- Service gateway
- Discovery server
- Architecture technology mapping
- An introduction to ETCD
- Scheduler
- Designing the Micro Service
- Summary
- Chapter 6: Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination
- Query-Data pipelines
- Event-Data Pipelines
- Topology 1
- Topology 2
- Topology 3
- Resilience
- High-availability
- Availability Chart
- Clustering
- Clustering and Network Partitions
- Mirrored queues
- Persistent Messages
- Data Manipulation and Security
- Use Case 1
- Use Case 2
- Exchanges
- Guidelines on choosing the right Exchange Type
- Headers versus Topic Exchanges
- Routing
- Header-Based Content Routing.
- Topic-Based Content Routing
- Alternate Exchanges
- Dead-Letter Exchanges
- Summary
- Chapter 7: Building a Robust and Fault-Tolerant Data Collection System
- Apache Flume
- Flume event flow reliability
- Flume multi-agent flow
- Flow multiplexer
- Apache Sqoop
- ELK
- Beats
- Load-balancing
- Logstash
- Back pressure
- High-availability
- Centralized collection of distributed data
- Apache Nifi
- Summary
- Chapter 8: Challenges of Data Processing
- Making sense of the data
- What is data processing?
- The 3 + 1 Vs and how they affect choice in data processing design
- Cost associated with latency
- Classic way of doing things
- Sharing resources among processing applications
- How to perform the processing
- Where to perform the processing
- Quality of data
- Networks are everywhere
- Effective consumption of the data
- Summary
- Chapter 9: Let Us Process Data in Batches
- What do we mean by batch processing
- Lambda architecture and batch processing
- Batch layer components and subcomponents
- Read/extract component
- Normalizer component
- Validation component
- Processing component
- Writer/formatter component
- Basic shell component
- Scheduler/executor component
- Processing strategy
- Data partitioning
- Range-based partitioning
- Hash-based partitioning
- Distributed processing
- What are Hadoop and HDFS
- NameNode
- DataNode
- MapReduce
- Data pipeline
- Luigi
- Azkaban
- Oozie
- AirFlow
- Summary
- Chapter 10: Handling Streams of Data
- What is a streaming system?
- Capabilities (and non-capabilities) of a streaming application
- Lambda architecture's speed layer
- Computing real time views
- High-level reference architecture
- Samza architecture
- Architectural concepts
- Event-streaming layer
- Apache Kafka as an event bus
- Message persistence
- Persistent Queue Design.
- Message batch
- Kafka and the sendfile operation
- Compression
- Kafka streams
- Stream processing topology
- Notion of time in stream processing
- Samza's stream processing API
- The scheduler/executor component of the streaming architecture
- Processing concepts and tradeoffs
- Processing guarantees
- Micro-batch stream processing
- Windowing
- Types of windows
- Summary
- References
- Chapter 11: Let Us Store the Data
- The data explosion problem
- Relational Database Management Systems and Big data
- Introducing Hadoop, the Big Elephant
- Apache YARN
- Hadoop Distributed Filesystem
- HDFS architecture principles (and assumptions)
- High-level architecture of HDFS
- HDFS file formats
- HBase
- Understanding the basics of HBase
- HBase data model
- HBase architecture
- Horizontal scaling with automatic sharding of HBase tables
- HMaster, region assignment, and balancing
- Components of Apache HBase architecture
- Tips for improved performance from your HBase cluster
- Graph stores
- Background of the use case
- Scenario
- Solution discussion
- Bank fraud data model (as can be designed in a property graph data store such as Neo4J)
- Semantic graph
- Linked data
- Vocabularies
- Semantic Query Language
- Inference
- Stardog
- GraphQL queries
- Gremlin
- Virtual Graphs - a Unifying DAO
- Structured data
- CVS
- BITES - Unstructured/Semistructured document store
- Structured data extraction
- Text extraction
- Document queries
- Highly-available clusters
- Guarantees
- Scaling up
- Integration with SPARQL
- Data Formats
- Data integrity and validating constraints
- Strict parsing of RDF
- Integrity Constraint Validation
- Monitoring and operation
- Performance
- Summary
- Further reading
- Chapter 12: When Data Dissemination is as Important as Data Itself
- Data dissemination.
- Communication protocol
- Target audience
- Use case
- Response schema
- Communication channel
- Data dissemination architecture in a threat intel sharing system
- Threat intel share - backend
- RT query processor
- View builder
- Threat intel share - frontend
- AWS Lambda
- AWS API gateway
- Cache population
- Cache eviction
- Discussing the non-functional aspects of the preceding architecture
- Non-functional use cases for dissemination architecture
- Elastic search and free text search queries
- Summary
- Other Books You May Enjoy
- Index.