Building a scalable data warehouse with data vault 2.0
The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data...
Other Authors: | , |
---|---|
Format: | eBook |
Language: | Inglés |
Published: |
Amsterdam, [Netherlands] :
Morgan Kaufmann
2016.
|
Edition: | 1st edition |
Subjects: | |
See on Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629896106719 |
Table of Contents:
- Cover
- Title Page
- Copyright Page
- Contents
- Authors Biography
- Daniel Linstedt
- Michael Olschimke
- Foreword
- Preface
- Acknowledgments
- Daniel Linstedt
- Michael Olschimke
- Chapter 1 - Introduction to Data Warehousing
- 1.1 - History of Data Warehousing
- 1.1.1 - Decision Support Systems
- 1.1.2 - Data Warehouse Systems
- 1.2 - The Enterprise Data Warehouse Environment
- 1.2.1 - Access
- 1.2.2 - Multiple Subject Areas
- 1.2.3 - Single Version of Truth
- 1.2.4 - Single Version of Facts
- 1.2.5 - Mission Criticality
- 1.2.6 - Scalability
- 1.2.7 - Big Data
- 1.2.8 - Performance Issues
- 1.2.9 - Complexity
- 1.2.10 - Auditing and Compliance
- 1.2.11 - Costs
- 1.2.12 - Other Business Requirements
- 1.3 - Introduction to Data Vault 2.0
- 1.4 - Data Warehouse Architecture
- 1.4.1 - Typical Two-Layer Architecture
- 1.4.2 - Typical Three-Layer Architecture
- References
- Chapter 2 - Scalable Data Warehouse Architecture
- 2.1 - Dimensions of Scalable Data Warehouse Architectures
- 2.1.1 - Workload
- 2.1.2 - Data Complexity
- 2.1.3 - Analytical Complexity
- 2.1.4 - Query Complexity
- 2.1.5 - Availability
- 2.1.6 - Security
- 2.2 - Data Vault 2.0 Architecture
- 2.2.1 - Business Rules Definition
- 2.2.2 - Business Rules Application
- 2.2.3 - Staging Area Layer
- 2.2.4 - Data Warehouse Layer
- 2.2.5 - Information Mart Layer
- 2.2.6 - Metrics Vault
- 2.2.7 - Business Vault
- 2.2.8 - Operational Vault
- 2.2.9 - Managed Self-Service BI
- 2.2.10 - Other Features
- References
- Chapter 3 - The Data Vault 2.0 Methodology
- 3.1 - Project Planning
- 3.1.1 - Capability Maturity Model Integration
- 3.1.1.1 - Capability Levels
- 3.1.1.2 - Maturity Levels
- 3.1.1.3 - Advancing to Maturity Level 5
- 3.1.1.4 - Integrating CMMI in the Data Vault 2.0 Methodology
- 3.1.2 - Managing the Project.
- 3.1.2.1 - Scrum
- 3.1.2.2 - Iterative Approach
- 3.1.2.3 - Product and Sprint Backlog
- 3.1.2.4 - Integrating Scrum with the Data Vault 2.0 Methodology
- 3.1.3 - Defining the Project
- 3.1.3.1 - Agile Requirements Gathering
- 3.1.4 - Estimation of the Project
- 3.1.4.1 - Function Point Analysis
- 3.1.4.2 - Measuring with Function Points
- 3.1.4.3 - Function Point Analysis for Data Warehousing
- 3.1.4.4 - Boundaries in Data Warehousing
- 3.1.4.5 - Estimating Size
- 3.1.4.6 - Assessing ETL Complexity Factors
- 3.1.4.7 - Applying Function Point Analysis to Data Warehousing
- 3.1.4.8 - Function Points for Enterprise Data Warehouse
- 3.2 - Project Execution
- 3.2.1 - Traditional Software Development Life-Cycle
- 3.2.1.1 - Requirements Engineering
- 3.2.1.2 - Design
- 3.2.1.3 - Implementation and Unit Testing
- 3.2.1.4 - Integration and System Testing
- 3.2.1.5 - Operation and Maintenance
- 3.2.2 - Applying Software Development Life-Cycle to the Data Vault 2.0 Methodology
- 3.2.3 - Parallel Teams
- 3.2.4 - Technical Numbering
- 3.3 - Review and Improvement
- 3.3.1 - Six Sigma
- 3.3.1.1 - Applying Six Sigma to Software
- 3.3.1.2 - Six Sigma Framework
- 3.3.1.3 - DMAIC Improvement
- 3.3.1.4 - Applying Six Sigma to Data Warehousing
- 3.3.2 - Total Quality Management
- 3.3.2.1 - Data Quality Dimensions
- 3.3.2.2 - Total Data Quality Management
- 3.3.2.3 - Data Warehouse Quality
- 3.3.2.4 - Integrating TQM with the Data Vault 2.0 Methodology
- References
- Chapter 4 - Data Vault 2.0 Modeling
- 4.1 - Introduction to Data Vault Modeling
- 4.2 - Data Vault Modeling Vocabulary
- 4.2.1 - Hub Entities
- 4.2.2 - Link Entities
- 4.2.3 - Satellite Entities
- 4.3 - Hub Definition
- 4.3.1 - Definition of a Business Key
- 4.3.1.1 - Composite Keys (aka Smart Keys, Intelligent Keys)
- 4.3.1.2 - Business Key Identification Process.
- 4.3.1.3 - Scope of Business Keys
- 4.3.1.4 - Difference between Business Keys and Surrogate Keys
- 4.3.2 - Hub Entity Structure
- 4.3.2.1 - Hash Key
- 4.3.2.2 - Business Key
- 4.3.2.3 - Load Date
- 4.3.2.4 - Record Source
- 4.3.2.5 - Last Seen Date
- 4.3.3 - Hub Examples
- 4.4 - Link Definition
- 4.4.1 - Reasons for Many-to-Many Relationships
- 4.4.2 - Flexibility of Links
- 4.4.3 - Granularity of Links
- 4.4.4 - Link Unit-of-Work
- 4.4.5 - Link Entity Structure
- 4.4.5.1 - Hash Key
- 4.4.5.2 - Dependent Child Key
- 4.4.6 - Link Examples
- 4.5 - Satellite Definition
- 4.5.1 - Importance of Keeping History
- 4.5.2 - Splitting Satellites
- 4.5.2.1 - Splitting by Source System
- 4.5.2.2 - Splitting by Rate of Change
- 4.5.3 - Satellite Entity Structure
- 4.5.3.1 - Parent Hash Key
- 4.5.3.2 - Load Date
- 4.5.3.3 - Load End Date
- 4.5.3.4 - Hash Difference
- 4.5.3.5 - Extract Date
- 4.5.4 - Satellite Examples
- 4.5.5 - Link Driving Key
- References
- Chapter 5 - Intermediate Data Vault Modeling
- 5.1 - Hub Applications
- 5.1.1 - Business Key Consolidation
- 5.2 - Link Applications
- 5.2.1 - Link-on-link
- 5.2.2 - Same-as Links
- 5.2.3 - Hierarchical Links
- 5.2.4 - Nonhistorized Links
- 5.2.5 - Nondescriptive Links
- 5.2.6 - Computed Aggregate Links
- 5.2.7 - Exploration Links
- 5.3 - Satellite Applications
- 5.3.1 - Overloaded Satellites
- 5.3.2 - Multi-Active Satellites
- 5.3.3 - Status Tracking Satellites
- 5.3.4 - Effectivity Satellites
- 5.3.5 - Record Tracking Satellites
- 5.3.6 - Computed Satellites
- References
- Chapter 6 - Advanced Data Vault Modeling
- 6.1 - Point-in-Time Tables
- 6.1.1 - Point-in-Time Table Structure
- 6.1.2 - Managed PIT Window
- 6.2 - Bridge Tables
- 6.2.1 - Bridge Table Structure
- 6.2.2 - Comparing PIT Tables with Bridge Tables
- 6.3 - Reference Tables.
- 6.3.1 - No-History Reference Tables
- 6.3.2 - History-Based Reference Tables
- 6.3.3 - Code and Descriptions
- 6.3.3.1 - Code and Descriptions with History
- Reference
- Chapter 7 - Dimensional Modeling
- 7.1 - Introduction
- 7.2 - Star Schemas
- 7.2.1 - Fact Tables
- 7.2.1.1 - The Grain of a Fact Table
- 7.2.2 - Dimension Tables
- 7.2.3 - Querying Star Schemas
- 7.3 - Multiple Stars
- 7.3.1 - Conformed Dimensions
- 7.4 - Dimension Design
- 7.4.1 - Slowly Changing Dimensions
- 7.4.2 - Hierarchies
- 7.4.3 - Snowflake Design
- References
- Chapter 8 - Physical Data Warehouse Design
- 8.1 - Database Workloads
- 8.1.1 - Workload Characteristics
- 8.2 - Separate Environments for Development, Testing, and Production
- 8.2.1 - Blue-Green Deployment
- 8.3 - Microsoft Azure Cloud Computing Platform
- 8.4 - Physical Data Warehouse Architecture on Premise
- 8.4.1 - Hardware Architectures and Databases
- 8.4.2 - Processor Options
- 8.4.3 - Memory Options
- 8.4.4 - Storage Options
- 8.4.5 - Network Options
- 8.5 - Database Options
- 8.5.1 - tempdb Options
- 8.5.2 - Partitioning
- 8.5.3 - Filegroups
- 8.5.4 - Data Compression
- 8.6 - Setting up the Data Warehouse
- 8.6.1 - Setting up the Stage Area
- 8.6.1.1 - Hardware Considerations for Stage Area
- 8.6.1.2 - Stage Database Setup
- 8.6.2 - Setting up the Data Vault
- 8.6.2.1 - Hardware Considerations for Data Vault Layer
- 8.6.2.2 - Backing Up the Data Vault
- 8.6.2.3 - Data Vault Database Setup
- 8.6.3 - Setting up Information Marts
- 8.6.3.1 - Hardware Considerations for Information Marts
- 8.6.3.2 - Information Mart Database Setup
- 8.6.4 - Setting up the Meta, Metrics, and Error Marts
- 8.6.4.1 - Hardware Considerations for Meta, Metrics, and Error Marts
- 8.6.4.2 - Meta, Metrics, and Error Marts Database Setup
- References.
- Chapter 9 - Master Data Management
- 9.1 - Definitions
- 9.1.1 - Master Data
- 9.1.2 - Data Management
- 9.1.3 - Master Data Management
- 9.2 - Master Data Management Goals
- 9.3 - Drivers for Managing Master Data
- 9.4 - Operational vs. Analytical Master Data Management
- 9.5 - Master Data Management as an Enabler for Managed Self-Service BI
- 9.6 - Master Data Management as an Enabler for Total Quality Management
- 9.6.1 - MDS Object Model
- 9.6.1.1 - Models
- 9.6.1.2 - Entities
- 9.6.1.3 - Members
- 9.6.1.4 - Attributes
- 9.6.1.5 - Attribute Groups
- 9.6.1.6 - Hierarchies
- 9.6.2 - Master Data Manager
- 9.6.3 - Explorer
- 9.6.3.1 - Microsoft Excel and MDS
- 9.6.4 - Version Management
- 9.6.5 - Integration Management
- 9.6.6 - System Administration
- 9.6.7 - User and Group Permissions
- 9.7 - Creating a Model
- 9.7.1 - Creating Entities
- 9.7.2 - Creating Business Rules
- 9.8 - Importing a Model
- 9.9 - Integrating MDS with the Data Vault and Operational Systems
- 9.9.1 - Stage Tables
- 9.9.1.1 - Using T-SQL to Stage Master Data into Microsoft Master Data Services
- 9.9.1.2 - Using SQL Server Integration Services to Stage Master Data into Microsoft Master Data Services
- 9.9.2 - Subscription Views
- References
- Chapter 10 - Metadata Management
- 10.1 - What Is Metadata?
- 10.1.1 - Business Metadata
- 10.1.2 - Technical Metadata
- 10.1.3 - Process Execution Metadata
- 10.2 - Implementing the Meta Mart
- 10.2.1 - SQL Server BI Metadata Toolkit
- 10.2.2 - Naming Conventions
- 10.2.3 - Capturing Source System Definitions
- 10.2.4 - Capturing Hard Rules
- 10.2.5 - Capturing Metadata for the Staging Area
- 10.2.6 - Capturing Requirements to Source Tables
- 10.2.7 - Capturing Source Tables to Data Vault Tables
- 10.2.7.1 - Metadata for Loading Hub Entities
- 10.2.7.2 - Metadata for Loading Link Entities.
- 10.2.7.3 - Metadata for Loading Satellite Entities on Hubs.