Data architecture a primer for the data scientist

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There rem...

Full description

Bibliographic Details
Other Authors:	Inmon, William H., author (author), Linstedt, Daniel, author, Levins, Mary, author
Format:	eBook
Language:	Inglés
Published:	London, England : Academic Press [2019]
Edition:	Second edition
Subjects:	Data warehousing. Big data. Electronic data processing
See on Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630748606719

Table of Contents:

Front Cover
Data Architecture: A Primer for the Data Scientist
Copyright
Dedication
Contents
Chapter 1.1: An Introduction to Data Architecture
Subdividing Data
Repetitive/Nonrepetitive Unstructured Data
The Great Divide of Data
Textual/Nontextual Data
The Different Forms of Data
Business Value
Chapter 1.2: The Data Infrastructure
Two Types of Repetitive Data
Repetitive Structured Data
Repetitive Big Data
The Two Infrastructures
What's Being Optimized?
Comparing the Two Infrastructures
Chapter 1.3: The ``Great Divide´´
Classifying Corporate Data
The ``Great Divide´´
Repetitive Unstructured Data
Nonrepetitive Unstructured Data
Different Worlds
Chapter 1.4: Demographics of Corporate Data
Chapter 1.5: Corporate Data Analysis
Chapter 1.6: The Life Cycle of Data: Understanding Data Over Time
Chapter 1.7: A Brief History of Data
Paper Tape and Punch Cards
Magnetic Tapes
Disk Storage
Data Base Management System (DBMS)
Coupled Processors
Online Transaction Processing
Data Warehouse
Parallel Data Management
Data Vault
Big Data
The Great Divide
Chapter 2.1: The End-State Architecture-The ``World Map´´
Architectural Components
Different Kinds of Data in the End State Architecture
Shaping the Data Through Models
Where Is the Data Warehouse?
Where Different Types of Questions Are Answered Across the End State Architecture
Data in the Data Lake
Metadata in the End State Architecture
Networked Metadata
An Evolutionary Experience
The Data Lake Architecture
Chapter 3.1: Transformations in the End-State Architecture
Redundant Data
Transformations
Customizing Data
Transforming Text
Transforming Application Data
Transforming Data Into a Customized State
Transforming Data Into Bulk Storage.
Transforming Data Generated Automatically
Transforming Bulk Data
Transformation and Redundancy
Chapter 4.1: A Brief History of Big Data
An Analogy-Taking the High Ground
Taking the High Ground
Standardization With the 360
Online Transaction Processing
Enter Teradata and MPP Processing
Then Came Hadoop and Big Data
IBM and Hadoop
Holding the High Ground
Chapter 4.2: What Is Big Data?
Another Definition
Large Volumes
Inexpensive Storage
The Roman Census Approach
Unstructured Data
Data in Big Data
Context in Repetitive Data
Nonrepetitive Data
Context in Nonrepetitive Data
Chapter 4.3: Parallel Processing
Chapter 4.4: Unstructured Data
Textual Information-Everywhere
Decisions Based on Structured Data
The Business Value Proposition
Repetitive and Nonrepetitive Unstructured Information
Ease of Analysis
Contextualization
Some Approaches to Contextualization
Map Reduce
Manual Analysis
Chapter 4.5: Contextualizing Repetitive Unstructured Data
Parsing Repetitive Unstructured Data
Recasting the Output Data
Chapter 4.6: Textual Disambiguation
From Narrative Into an Analytical Data Base
Input Into Textual Disambiguation
Mapping
Input/Output
Document Fracturing/Named Value Processing
Preprocessing a Document
E-mails-A Special Case
Spreadsheets
Report Decompilation
Chapter 4.7: Taxonomies
Data Models/Taxonomies
Applicability of Taxonomies
What Is a Taxonomy?
Taxonomies in Multiple Languages
Commercial or Private Taxonomies?
Dynamics of Taxonomies and Textual Disambiguation
Taxonomies and Textual Disambiguation-Separate Technologies
Different Types of Taxonomies
Taxonomies-Maintenance Over Time
Chapter 5.1: The Siloed Application Environment
The Challenge of Siloed Applications.
Building Siloed Applications
What Does a Siloed Application Look Like?
Current Valued Data
Minimal Historical Data
High Availability
Overlap Between Siloed Applications
Frozen Business Requirements
Dismantling Siloed Applications
Chapter 6.1: Introduction to Data Vault 2.0
Data Vault Origins and Background
The ``Old´´ Data Vault 1.0
The New and Updated Data Vault 2.0
What Is Data Vault 2.0 Modeling?
A Business View
A Technical View
How Is Data Vault 2.0 Methodology Defined?
A Business View
A Technical View
Why Do We Need a Data Vault 2.0 Architecture?
Where Does Data Vault 2.0 Implementation Fit?
What Are the Business Benefits of Data Vault 2.0?
What Is Data Vault 1.0?
Chapter 6.2: Introduction to Data Vault Modeling
What Is a Data Vault Model Concept?
Data Vault Model Defined
Components of a Data Vault Model
What Makes Business Keys So Interesting?
What Does This Have to Do With Data Vault and Data Warehousing?
How Does This Translate to Data Vault Modeling?
Why Restructure the Data From the Staging Area?
What Are the Basic Rules of the Data Vault Model?
Why Do We Need Many to Many Link Structures?
Primary Key Options for Data Vault 2.0
Sequence Numbers
Hash Keys
Business Keys
Source System Sequence Business Keys
Multipart Source Business Keys
Chapter 6.3: Introduction to Data Vault Architecture
What Is a Data Vault 2.0 Architecture?
How Does NoSQL Fit in to the Architecture?
What Are the Objectives of the Data Vault 2.0 Architecture?
What Is the Objective of the Data Vault 2.0 Model?
What Are Hard and Soft Business Rules?
How Does Managed Self Service BI Fit in the Architecture?
Chapter 6.4: Introduction to Data Vault Methodology
Data Vault 2.0 Methodology Overview
How Does CMMI Contribute to the Methodology?.
If CMMI Is So Great, Why Should We Care About Agility Then?
Why Include PMP, SDLC If CMMI and Agile Should Be All That's Needed?
So Then, What Does Six Sigma Contribute to the Data Vault 2 Methodology?
Where Does TQM (Total Quality Management) Fit in to All of This?
Chapter 6.5: Introduction to Data Vault Implementation
Implementation Overview
What's So Important About Patterns?
Why Does Reengineering Happen Because of Big Data?
Why Do We Need to Virtualize Our Data Marts?
What Is Managed Self-Service BI?
Chapter 7.1: The Operational Environment: A Short History
Commercial Uses of the Computer
The First Applications
Ed Yourdon and the Structured Revolution
The SDLC
Disk Technology
Enter the DBMS
Response Time and Availability
Corporate Computing Today
Chapter 7.2: The Standard Work Unit
Elements of Response Time
An Hourglass Analogy
The Racetrack Analogy
Your Vehicle Runs as Fast as the Vehicle in Front of It
The Standard Work Unit
The SLA
Chapter 7.3: Data Modeling for the Structured Environment
The Purpose of the Roadmap
Granular Data Only
The ERD
The Dis
Physical Data Base Design
Relating the Different Levels of the Data Model
An Example of the Linkage
Generic Data Models
Operational Data Models/Data Warehouse Data Models
Chapter 8.1: A Brief History of Data Architecture
Chapter 8.2: Big Data/Existing System Interface
The Big Data/Existing Systems Interface
The Repetitive Raw Big Data/Existing Systems Interface
Exception Based Data
The Nonrepetitive Raw Big Data/Existing Systems Interface
Into the Existing Systems Environment
The ``Context Enriched´´ Big Data Environment
Analyzing Structured Data/Unstructured Data Together
Chapter 8.3: The Data Warehouse/Operational Environment Interface.
The Operational/Data Warehouse Interface
The Classical ETL Interface
The ODS and the ETL Interface
The Staging Area
Changed Data Capture
Inline Transformation
ELT Processing
Chapter 8.4: Data Architecture: A High-Level Perspective
A High Level Perspective
Redundancy
The System of Record
Different Types of Questions
Different Communities
Chapter 9.1: Repetitive Analytics: Some Basics
Different Kinds of Analysis
Looking for Patterns
Heuristic Processing
Freezing Data
The Sandbox
The ``Normal´´ Profile
Distillation, Filtering
Subsetting Data
Bias of the Sample
Filtering Data
Repetitive Data and Context
Linking Repetitive Records
Log Tape Records
Analyzing Points of Data
Outliers
Data Over Time
Chapter 9.2: Analyzing Repetitive Data
Log Data
Active/Passive Indexing of Data
Summary/Detailed Data
Metadata in Big Data
Linking Data
Chapter 9.3: Repetitive Analysis
Internal, External Data
Universal Identifiers
Security
Filtering, Distillation
Archiving Results
Metrics
Chapter 10.1: Nonrepetitive Data
Inline Contextualization
Taxonomy/Ontology Processing
Custom Variables
Homographic Resolution
Acronym Resolution
Negation Analysis
Numeric Tagging
Date Tagging
Date Standardization
List Processing
Associative Word Processing
Stop Word Processing
Word Stemming
Document Metadata
Document Classification
Proximity Analysis
Functional Sequencing Within Textual ETL
Internal Referential Integrity
Preprocessing, Postprocessing
Chapter 10.2: Mapping
Chapter 10.3: Analytics From Nonrepetitive Data
Call Center Information
Medical Records
Chapter 11.1: Operational Analytics: Response Time
Transaction Response Time
Chapter 12.1: Operational Analytics.
Different Perspectives of Data.

Data architecture a primer for the data scientist

Similar Items