Principles and practice of big data preparing, sharing, and analyzing complex information
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on la...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
London, United Kingdom :
Academic Press, imprint of Elsevier
[2018]
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630758606719 |
Tabla de Contenidos:
- Front Cover
- Principles and Practice of Big Data: Preparing, sharing, and analyzing complex information
- Copyright
- Other Books by Jules J. Berman
- Dedication
- Contents
- About the Author
- Author's Preface to Second Edition
- Author's Preface to First Edition
- References
- Chapter 1: Introduction
- Section 1.1. Definition of Big Data
- Section 1.2. Big Data Versus Small Data
- Section 1.3. Whence Comest Big Data?
- Section 1.4. The Most Common Purpose of Big Data Is to Produce Small Data
- Section 1.5. Big Data Sits at the Center of the Research Universe
- Glossary
- References
- Chapter 2: Providing Structure to Unstructured Data
- Section 2.1. Nearly All Data Is Unstructured and Unusable in Its Raw Form
- Section 2.2. Concordances
- Section 2.3. Term Extraction
- Section 2.4. Indexing
- Section 2.5. Autocoding
- Section 2.6. Case Study: Instantly Finding the Precise Location of Any Atom in the Universe (Some Assembly Required)
- Section 2.7. Case Study (Advanced): A Complete Autocoder (in 12 Lines of Python Code)
- Section 2.8. Case Study: Concordances as Transformations of Text
- Section 2.9. Case Study (Advanced): Burrows Wheeler Transform (BWT)
- Glossary
- References
- Chapter 3: Identification, Deidentification, and Reidentification
- Section 3.1. What Are Identifiers?
- Section 3.2. Difference Between an Identifier and an Identifier System
- Section 3.3. Generating Unique Identifiers
- Section 3.4. Really Bad Identifier Methods
- Section 3.5. Registering Unique Object Identifiers
- Section 3.6. Deidentification and Reidentification
- Section 3.7. Case Study: Data Scrubbing
- Section 3.8. Case Study (Advanced): Identifiers in Image Headers
- Section 3.9. Case Study: One-Way Hashes
- Glossary
- References
- Chapter 4: Metadata, Semantics, and Triples
- Section 4.1. Metadata.
- Section 4.2. eXtensible Markup Language
- Section 4.3. Semantics and Triples
- Section 4.4. Namespaces
- Section 4.5. Case Study: A Syntax for Triples
- Section 4.6. Case Study: Dublin Core
- Glossary
- References
- Chapter 5: Classifications and Ontologies
- Section 5.1. It's All About Object Relationships
- Section 5.2. Classifications, the Simplest of Ontologies
- Section 5.3. Ontologies, Classes With Multiple Parents
- Section 5.4. Choosing a Class Model
- Section 5.5. Class Blending
- Section 5.6. Common Pitfalls in Ontology Development
- Section 5.7. Case Study: An Upper Level Ontology
- Section 5.8. Case Study (Advanced): Paradoxes
- Section 5.9. Case Study (Advanced): RDF Schemas and Class Properties
- Section 5.10. Case Study (Advanced): Visualizing Class Relationships
- Glossary
- References
- Chapter 6: Introspection
- Section 6.1. Knowledge of Self
- Section 6.2. Data Objects: The Essential Ingredient of Every Big Data Collection
- Section 6.3. How Big Data Uses Introspection
- Section 6.4. Case Study: Time Stamping Data
- Section 6.5. Case Study: A Visit to the TripleStore
- Section 6.6. Case Study (Advanced): Proof That Big Data Must Be Object-Oriented
- Glossary
- References
- Chapter 7: Standards and Data Integration
- Section 7.1. Standards
- Section 7.2. Specifications Versus Standards
- Section 7.3. Versioning
- Section 7.4. Compliance Issues
- Section 7.5. Case Study: Standardizing the Chocolate Teapot
- Glossary
- References
- Chapter 8: Immutability and Immortality
- Section 8.1. The Importance of Data That Cannot Change
- Section 8.2. Immutability and Identifiers
- Section 8.3. Coping With the Data That Data Creates
- Section 8.4. Reconciling Identifiers Across Institutions
- Section 8.5. Case Study: The Trusted Timestamp
- Section 8.6. Case Study: Blockchains and Distributed Ledgers.
- Section 8.7. Case Study (Advanced): Zero-Knowledge Reconciliation
- Glossary
- References
- Chapter 9: Assessing the Adequacy of a Big Data Resource
- Section 9.1. Looking at the Data
- Section 9.2. The Minimal Necessary Properties of Big Data
- Section 9.3. Data That Comes With Conditions
- Section 9.4. Case Study: Utilities for Viewing and Searching Large Files
- Section 9.5. Case Study: Flattened Data
- Glossary
- References
- Chapter 10: Measurement
- Section 10.1. Accuracy and Precision
- Section 10.2. Data Range
- Section 10.3. Counting
- Section 10.4. Normalizing and Transforming Your Data
- Section 10.5. Reducing Your Data
- Section 10.6. Understanding Your Control
- Section 10.7. Statistical Significance Without Practical Significance
- Section 10.8. Case Study: Gene Counting
- Section 10.9. Case Study: Early Biometrics, and the Significance of Narrow Data Ranges
- Glossary
- References
- Chapter 11: Indispensable Tips for Fast and Simple Big Data Analysis
- Section 11.1. Speed and Scalability
- Section 11.2. Fast Operations, Suitable for Big Data, That Every Computer Supports
- Section 11.3. The Dot Product, a Simple and Fast Correlation Method
- Section 11.4. Clustering
- Section 11.5. Methods for Data Persistence (Without Using a Database)
- Section 11.6. Case Study: Climbing a Classification
- Section 11.7. Case Study (Advanced): A Database Example
- Section 11.8. Case Study (Advanced): NoSQL
- Glossary
- References
- Chapter 12: Finding the Clues in Large Collections of Data
- Section 12.1. Denominators
- Section 12.2. Word Frequency Distributions
- Section 12.3. Outliers and Anomalies
- Section 12.4. Back-of-Envelope Analyses
- Section 12.5. Case Study: Predicting User Preferences
- Section 12.6. Case Study: Multimodality in Population Data
- Section 12.7. Case Study: Big and Small Black Holes.
- Glossary
- References
- Chapter 13: Using Random Numbers to Knock Your Big Data Analytic Problems Down to Size
- Section 13.1. The Remarkable Utility of (Pseudo)Random Numbers
- Section 13.2. Repeated Sampling
- Section 13.3. Monte Carlo Simulations
- Section 13.4. Case Study: Proving the Central Limit Theorem
- Section 13.5. Case Study: Frequency of Unlikely String of Occurrences
- Section 13.6. Case Study: The Infamous Birthday Problem
- Section 13.7. Case Study (Advanced): The Monty Hall Problem
- Section 13.8. Case Study (Advanced): A Bayesian Analysis
- Glossary
- References
- Chapter 14: Special Considerations in Big Data Analysis
- Section 14.1. Theory in Search of Data
- Section 14.2. Data in Search of Theory
- Section 14.3. Bigness Biases
- Section 14.4. Data Subsets in Big Data: Neither Additive Nor Transitive
- Section 14.5. Additional Big Data Pitfalls
- Section 14.6. Case Study (Advanced): Curse of Dimensionality
- Glossary
- References
- Chapter 15: Big Data Failures and How to Avoid (Some of) Them
- Section 15.1. Failure Is Common
- Section 15.2. Failed Standards
- Section 15.3. Blaming Complexity
- Section 15.4. An Approach to Big Data That May Work for You
- Section 15.5. After Failure
- Section 15.6. Case Study: Cancer Biomedical Informatics Grid, a Bridge Too Far
- Section 15.7. Case Study: The Gaussian Copula Function
- Glossary
- References
- Chapter 16: Data Reanalysis: Much More Important Than Analysis
- Section 16.1. First Analysis (Nearly) Always Wrong
- Section 16.2. Why Reanalysis Is More Important Than Analysis
- Section 16.3. Case Study: Reanalysis of Old JADE Collider Data
- Section 16.4. Case Study: Vindication Through Reanalysis
- Section 16.5. Case Study: Finding New Planets From Old Data
- Glossary
- References
- Chapter 17: Repurposing Big Data.
- Section 17.1. What Is Data Repurposing?
- Section 17.2. Dark Data, Abandoned Data, and Legacy Data
- Section 17.3. Case Study: From Postal Code to Demographic Keystone
- Section 17.4. Case Study: Scientific Inferencing From a Database of Genetic Sequences
- Section 17.5. Case Study: Linking Global Warming to High-Intensity Hurricanes
- Section 17.6. Case Study: Inferring Climate Trends With Geologic Data
- Section 17.7. Case Study: Lunar Orbiter Image Recovery Project
- Glossary
- References
- Chapter 18: Data Sharing and Data Security
- Section 18.1. What Is Data Sharing, and Why Don't We Do More of It?
- Section 18.2. Common Complaints
- Section 18.3. Data Security and Cryptographic Protocols
- Section 18.4. Case Study: Life on Mars
- Section 18.5. Case Study: Personal Identifiers
- Glossary
- References
- Chapter 19: Legalities
- Section 19.1. Responsibility for the Accuracy and Legitimacy of Data
- Section 19.2. Rights to Create, Use, and Share the Resource
- Section 19.3. Copyright and Patent Infringements Incurred by Using Standards
- Section 19.4. Protections for Individuals
- Section 19.5. Consent
- Section 19.6. Unconsented Data
- Section 19.7. Privacy Policies
- Section 19.8. Case Study: Timely Access to Big Data
- Section 19.9. Case Study: The Havasupai Story
- Glossary
- References
- Chapter 20: Societal Issues
- Section 20.1. How Big Data Is Perceived by the Public
- Section 20.2. Reducing Costs and Increasing Productivity With Big Data
- Section 20.3. Public Mistrust
- Section 20.4. Saving Us From Ourselves
- Section 20.5. Who Is Big Data?
- Section 20.6. Hubris and Hyperbole
- Section 20.7. Case Study: The Citizen Scientists
- Section 20.8. Case Study: 1984, by George Orwell
- Glossary
- References
- Index
- Back Cover.