Architecting data-intensive applications develop scalable, data-intensive, and robust applications the smart way

Architect and design data-intensive applications and, in the process, learn how to collect, process, store, govern, and expose data for a variety of use cases Key Features Integrate the data-intensive approach into your application architecture Create a robust application layout with effective messa...

Descripción completa

Detalles Bibliográficos
Otros Autores: Kumar, Anuj, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt 2018.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630737306719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Packt Upsell
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Exploring the Data Ecosystem
  • What is a data ecosystem?
  • A complex set of interconnected data
  • Data environment
  • What constitutes a data ecosystem?
  • Data sharing
  • Traffic light protocol
  • Information exchange policy
  • Handling policy statements
  • Action policy statements
  • Sharing policy statements
  • Licensing policy statements
  • Metadata policy statements
  • The 3 V's
  • Volume
  • Variety
  • Velocity
  • Use cases
  • Use case 1 - Security
  • Use case 2 - Modem data collection
  • Summary
  • Chapter 2: Defining a Reference Architecture for Data-Intensive Systems
  • What is a reference architecture?
  • Problem statement
  • Reference architecture for a data-intensive system
  • Component view
  • Data ingest
  • Data preparation
  • Data processing
  • Workflow management
  • Data access
  • Data insight
  • Data governance
  • Data pipeline
  • Oracle's information management conceptual reference architecture
  • Conceptual view
  • Oracle's information management reference architecture
  • Data process view
  • Reference architecture - business view
  • Real-life use case examples
  • Machine learning use case
  • Data enrichment use case
  • Extract transform load use case
  • Desired properties of a data-intensive system
  • Defining architectural principles
  • Principle 1
  • Principle 2
  • Principle 3
  • Principle 4
  • Principle 5
  • Principle 6
  • Principle 7
  • Listing architectural assumptions
  • Architectural capabilities
  • UI capabilities
  • Content mashup
  • Multi-channel support
  • User workflow
  • AR/VR support
  • Service gateway/API gateway capabilities
  • Security
  • Traffic control
  • Mediation
  • Caching
  • Routing
  • Service orchestration
  • Business service capabilities
  • Microservices
  • Messaging.
  • Distributed (batch/stream) processing
  • Data capabilities
  • Data partitioning
  • Data replication
  • Summary
  • Chapter 3: Patterns of the Data Intensive Architecture
  • Application styles
  • API Platform
  • Message-oriented application style
  • Micro Services application styles
  • Communication styles
  • Combining different application styles
  • Architectural patterns
  • The retry pattern
  • The circuit breaker
  • Throttling
  • Bulk heads
  • Event-sourcing
  • Command and Query Responsibility Segregation
  • Summary
  • Chapter 4: Discussing Data-Centric Architectures
  • Coordination service
  • Reliable messaging
  • Distributed processing
  • Distributed storage
  • Lambda architecture
  • Kappa architecture
  • A brief comparison of different leading No-Sql data stores
  • Summary
  • Chapter 5: Understanding Data Collection and Normalization Requirements and Techniques
  • Data lineage
  • Apache Atlas
  • Apache Atlas high-level architecture
  • Apache Falcon
  • Data quality
  • Types of data sources
  • Data collection system requirements
  • Data collection system architecture principles
  • High-level component architecture
  • High-level architecture
  • Service gateway
  • Discovery server
  • Architecture technology mapping
  • An introduction to ETCD
  • Scheduler
  • Designing the Micro Service
  • Summary
  • Chapter 6: Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination
  • Query-Data pipelines
  • Event-Data Pipelines
  • Topology 1
  • Topology 2
  • Topology 3
  • Resilience
  • High-availability
  • Availability Chart
  • Clustering
  • Clustering and Network Partitions
  • Mirrored queues
  • Persistent Messages
  • Data Manipulation and Security
  • Use Case 1
  • Use Case 2
  • Exchanges
  • Guidelines on choosing the right Exchange Type
  • Headers versus Topic Exchanges
  • Routing
  • Header-Based Content Routing.
  • Topic-Based Content Routing
  • Alternate Exchanges
  • Dead-Letter Exchanges
  • Summary
  • Chapter 7: Building a Robust and Fault-Tolerant Data Collection System
  • Apache Flume
  • Flume event flow reliability
  • Flume multi-agent flow
  • Flow multiplexer
  • Apache Sqoop
  • ELK
  • Beats
  • Load-balancing
  • Logstash
  • Back pressure
  • High-availability
  • Centralized collection of distributed data
  • Apache Nifi
  • Summary
  • Chapter 8: Challenges of Data Processing
  • Making sense of the data
  • What is data processing?
  • The 3 + 1 Vs and how they affect choice in data processing design
  • Cost associated with latency
  • Classic way of doing things
  • Sharing resources among processing applications
  • How to perform the processing
  • Where to perform the processing
  • Quality of data
  • Networks are everywhere
  • Effective consumption of the data
  • Summary
  • Chapter 9: Let Us Process Data in Batches
  • What do we mean by batch processing
  • Lambda architecture and batch processing
  • Batch layer components and subcomponents
  • Read/extract component
  • Normalizer component
  • Validation component
  • Processing component
  • Writer/formatter component
  • Basic shell component
  • Scheduler/executor component
  • Processing strategy
  • Data partitioning
  • Range-based partitioning
  • Hash-based partitioning
  • Distributed processing
  • What are Hadoop and HDFS
  • NameNode
  • DataNode
  • MapReduce
  • Data pipeline
  • Luigi
  • Azkaban
  • Oozie
  • AirFlow
  • Summary
  • Chapter 10: Handling Streams of Data
  • What is a streaming system?
  • Capabilities (and non-capabilities) of a streaming application
  • Lambda architecture's speed layer
  • Computing real time views
  • High-level reference architecture
  • Samza architecture
  • Architectural concepts
  • Event-streaming layer
  • Apache Kafka as an event bus
  • Message persistence
  • Persistent Queue Design.
  • Message batch
  • Kafka and the sendfile operation
  • Compression
  • Kafka streams
  • Stream processing topology
  • Notion of time in stream processing
  • Samza's stream processing API
  • The scheduler/executor component of the streaming architecture
  • Processing concepts and tradeoffs
  • Processing guarantees
  • Micro-batch stream processing
  • Windowing
  • Types of windows
  • Summary
  • References
  • Chapter 11: Let Us Store the Data
  • The data explosion problem
  • Relational Database Management Systems and Big data
  • Introducing Hadoop, the Big Elephant
  • Apache YARN
  • Hadoop Distributed Filesystem
  • HDFS architecture principles (and assumptions)
  • High-level architecture of HDFS
  • HDFS file formats
  • HBase
  • Understanding the basics of HBase
  • HBase data model
  • HBase architecture
  • Horizontal scaling with automatic sharding of HBase tables
  • HMaster, region assignment, and balancing
  • Components of Apache HBase architecture
  • Tips for improved performance from your HBase cluster
  • Graph stores
  • Background of the use case
  • Scenario
  • Solution discussion
  • Bank fraud data model (as can be designed in a property graph data store such as Neo4J)
  • Semantic graph
  • Linked data
  • Vocabularies
  • Semantic Query Language
  • Inference
  • Stardog
  • GraphQL queries
  • Gremlin
  • Virtual Graphs - a Unifying DAO
  • Structured data
  • CVS
  • BITES - Unstructured/Semistructured document store
  • Structured data extraction
  • Text extraction
  • Document queries
  • Highly-available clusters
  • Guarantees
  • Scaling up
  • Integration with SPARQL
  • Data Formats
  • Data integrity and validating constraints
  • Strict parsing of RDF
  • Integrity Constraint Validation
  • Monitoring and operation
  • Performance
  • Summary
  • Further reading
  • Chapter 12: When Data Dissemination is as Important as Data Itself
  • Data dissemination.
  • Communication protocol
  • Target audience
  • Use case
  • Response schema
  • Communication channel
  • Data dissemination architecture in a threat intel sharing system
  • Threat intel share - backend
  • RT query processor
  • View builder
  • Threat intel share - frontend
  • AWS Lambda
  • AWS API gateway
  • Cache population
  • Cache eviction
  • Discussing the non-functional aspects of the preceding architecture
  • Non-functional use cases for dissemination architecture
  • Elastic search and free text search queries
  • Summary
  • Other Books You May Enjoy
  • Index.