Principles and practice of big data preparing, sharing, and analyzing complex information

Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on la...

Descripción completa

Detalles Bibliográficos
Otros Autores: Berman, Jules J., author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: London, United Kingdom : Academic Press, imprint of Elsevier [2018]
Edición:Second edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630758606719
Tabla de Contenidos:
  • Front Cover
  • Principles and Practice of Big Data: Preparing, sharing, and analyzing complex information
  • Copyright
  • Other Books by Jules J. Berman
  • Dedication
  • Contents
  • About the Author
  • Author's Preface to Second Edition
  • Author's Preface to First Edition
  • References
  • Chapter 1: Introduction
  • Section 1.1. Definition of Big Data
  • Section 1.2. Big Data Versus Small Data
  • Section 1.3. Whence Comest Big Data?
  • Section 1.4. The Most Common Purpose of Big Data Is to Produce Small Data
  • Section 1.5. Big Data Sits at the Center of the Research Universe
  • Glossary
  • References
  • Chapter 2: Providing Structure to Unstructured Data
  • Section 2.1. Nearly All Data Is Unstructured and Unusable in Its Raw Form
  • Section 2.2. Concordances
  • Section 2.3. Term Extraction
  • Section 2.4. Indexing
  • Section 2.5. Autocoding
  • Section 2.6. Case Study: Instantly Finding the Precise Location of Any Atom in the Universe (Some Assembly Required)
  • Section 2.7. Case Study (Advanced): A Complete Autocoder (in 12 Lines of Python Code)
  • Section 2.8. Case Study: Concordances as Transformations of Text
  • Section 2.9. Case Study (Advanced): Burrows Wheeler Transform (BWT)
  • Glossary
  • References
  • Chapter 3: Identification, Deidentification, and Reidentification
  • Section 3.1. What Are Identifiers?
  • Section 3.2. Difference Between an Identifier and an Identifier System
  • Section 3.3. Generating Unique Identifiers
  • Section 3.4. Really Bad Identifier Methods
  • Section 3.5. Registering Unique Object Identifiers
  • Section 3.6. Deidentification and Reidentification
  • Section 3.7. Case Study: Data Scrubbing
  • Section 3.8. Case Study (Advanced): Identifiers in Image Headers
  • Section 3.9. Case Study: One-Way Hashes
  • Glossary
  • References
  • Chapter 4: Metadata, Semantics, and Triples
  • Section 4.1. Metadata.
  • Section 4.2. eXtensible Markup Language
  • Section 4.3. Semantics and Triples
  • Section 4.4. Namespaces
  • Section 4.5. Case Study: A Syntax for Triples
  • Section 4.6. Case Study: Dublin Core
  • Glossary
  • References
  • Chapter 5: Classifications and Ontologies
  • Section 5.1. It's All About Object Relationships
  • Section 5.2. Classifications, the Simplest of Ontologies
  • Section 5.3. Ontologies, Classes With Multiple Parents
  • Section 5.4. Choosing a Class Model
  • Section 5.5. Class Blending
  • Section 5.6. Common Pitfalls in Ontology Development
  • Section 5.7. Case Study: An Upper Level Ontology
  • Section 5.8. Case Study (Advanced): Paradoxes
  • Section 5.9. Case Study (Advanced): RDF Schemas and Class Properties
  • Section 5.10. Case Study (Advanced): Visualizing Class Relationships
  • Glossary
  • References
  • Chapter 6: Introspection
  • Section 6.1. Knowledge of Self
  • Section 6.2. Data Objects: The Essential Ingredient of Every Big Data Collection
  • Section 6.3. How Big Data Uses Introspection
  • Section 6.4. Case Study: Time Stamping Data
  • Section 6.5. Case Study: A Visit to the TripleStore
  • Section 6.6. Case Study (Advanced): Proof That Big Data Must Be Object-Oriented
  • Glossary
  • References
  • Chapter 7: Standards and Data Integration
  • Section 7.1. Standards
  • Section 7.2. Specifications Versus Standards
  • Section 7.3. Versioning
  • Section 7.4. Compliance Issues
  • Section 7.5. Case Study: Standardizing the Chocolate Teapot
  • Glossary
  • References
  • Chapter 8: Immutability and Immortality
  • Section 8.1. The Importance of Data That Cannot Change
  • Section 8.2. Immutability and Identifiers
  • Section 8.3. Coping With the Data That Data Creates
  • Section 8.4. Reconciling Identifiers Across Institutions
  • Section 8.5. Case Study: The Trusted Timestamp
  • Section 8.6. Case Study: Blockchains and Distributed Ledgers.
  • Section 8.7. Case Study (Advanced): Zero-Knowledge Reconciliation
  • Glossary
  • References
  • Chapter 9: Assessing the Adequacy of a Big Data Resource
  • Section 9.1. Looking at the Data
  • Section 9.2. The Minimal Necessary Properties of Big Data
  • Section 9.3. Data That Comes With Conditions
  • Section 9.4. Case Study: Utilities for Viewing and Searching Large Files
  • Section 9.5. Case Study: Flattened Data
  • Glossary
  • References
  • Chapter 10: Measurement
  • Section 10.1. Accuracy and Precision
  • Section 10.2. Data Range
  • Section 10.3. Counting
  • Section 10.4. Normalizing and Transforming Your Data
  • Section 10.5. Reducing Your Data
  • Section 10.6. Understanding Your Control
  • Section 10.7. Statistical Significance Without Practical Significance
  • Section 10.8. Case Study: Gene Counting
  • Section 10.9. Case Study: Early Biometrics, and the Significance of Narrow Data Ranges
  • Glossary
  • References
  • Chapter 11: Indispensable Tips for Fast and Simple Big Data Analysis
  • Section 11.1. Speed and Scalability
  • Section 11.2. Fast Operations, Suitable for Big Data, That Every Computer Supports
  • Section 11.3. The Dot Product, a Simple and Fast Correlation Method
  • Section 11.4. Clustering
  • Section 11.5. Methods for Data Persistence (Without Using a Database)
  • Section 11.6. Case Study: Climbing a Classification
  • Section 11.7. Case Study (Advanced): A Database Example
  • Section 11.8. Case Study (Advanced): NoSQL
  • Glossary
  • References
  • Chapter 12: Finding the Clues in Large Collections of Data
  • Section 12.1. Denominators
  • Section 12.2. Word Frequency Distributions
  • Section 12.3. Outliers and Anomalies
  • Section 12.4. Back-of-Envelope Analyses
  • Section 12.5. Case Study: Predicting User Preferences
  • Section 12.6. Case Study: Multimodality in Population Data
  • Section 12.7. Case Study: Big and Small Black Holes.
  • Glossary
  • References
  • Chapter 13: Using Random Numbers to Knock Your Big Data Analytic Problems Down to Size
  • Section 13.1. The Remarkable Utility of (Pseudo)Random Numbers
  • Section 13.2. Repeated Sampling
  • Section 13.3. Monte Carlo Simulations
  • Section 13.4. Case Study: Proving the Central Limit Theorem
  • Section 13.5. Case Study: Frequency of Unlikely String of Occurrences
  • Section 13.6. Case Study: The Infamous Birthday Problem
  • Section 13.7. Case Study (Advanced): The Monty Hall Problem
  • Section 13.8. Case Study (Advanced): A Bayesian Analysis
  • Glossary
  • References
  • Chapter 14: Special Considerations in Big Data Analysis
  • Section 14.1. Theory in Search of Data
  • Section 14.2. Data in Search of Theory
  • Section 14.3. Bigness Biases
  • Section 14.4. Data Subsets in Big Data: Neither Additive Nor Transitive
  • Section 14.5. Additional Big Data Pitfalls
  • Section 14.6. Case Study (Advanced): Curse of Dimensionality
  • Glossary
  • References
  • Chapter 15: Big Data Failures and How to Avoid (Some of) Them
  • Section 15.1. Failure Is Common
  • Section 15.2. Failed Standards
  • Section 15.3. Blaming Complexity
  • Section 15.4. An Approach to Big Data That May Work for You
  • Section 15.5. After Failure
  • Section 15.6. Case Study: Cancer Biomedical Informatics Grid, a Bridge Too Far
  • Section 15.7. Case Study: The Gaussian Copula Function
  • Glossary
  • References
  • Chapter 16: Data Reanalysis: Much More Important Than Analysis
  • Section 16.1. First Analysis (Nearly) Always Wrong
  • Section 16.2. Why Reanalysis Is More Important Than Analysis
  • Section 16.3. Case Study: Reanalysis of Old JADE Collider Data
  • Section 16.4. Case Study: Vindication Through Reanalysis
  • Section 16.5. Case Study: Finding New Planets From Old Data
  • Glossary
  • References
  • Chapter 17: Repurposing Big Data.
  • Section 17.1. What Is Data Repurposing?
  • Section 17.2. Dark Data, Abandoned Data, and Legacy Data
  • Section 17.3. Case Study: From Postal Code to Demographic Keystone
  • Section 17.4. Case Study: Scientific Inferencing From a Database of Genetic Sequences
  • Section 17.5. Case Study: Linking Global Warming to High-Intensity Hurricanes
  • Section 17.6. Case Study: Inferring Climate Trends With Geologic Data
  • Section 17.7. Case Study: Lunar Orbiter Image Recovery Project
  • Glossary
  • References
  • Chapter 18: Data Sharing and Data Security
  • Section 18.1. What Is Data Sharing, and Why Don't We Do More of It?
  • Section 18.2. Common Complaints
  • Section 18.3. Data Security and Cryptographic Protocols
  • Section 18.4. Case Study: Life on Mars
  • Section 18.5. Case Study: Personal Identifiers
  • Glossary
  • References
  • Chapter 19: Legalities
  • Section 19.1. Responsibility for the Accuracy and Legitimacy of Data
  • Section 19.2. Rights to Create, Use, and Share the Resource
  • Section 19.3. Copyright and Patent Infringements Incurred by Using Standards
  • Section 19.4. Protections for Individuals
  • Section 19.5. Consent
  • Section 19.6. Unconsented Data
  • Section 19.7. Privacy Policies
  • Section 19.8. Case Study: Timely Access to Big Data
  • Section 19.9. Case Study: The Havasupai Story
  • Glossary
  • References
  • Chapter 20: Societal Issues
  • Section 20.1. How Big Data Is Perceived by the Public
  • Section 20.2. Reducing Costs and Increasing Productivity With Big Data
  • Section 20.3. Public Mistrust
  • Section 20.4. Saving Us From Ourselves
  • Section 20.5. Who Is Big Data?
  • Section 20.6. Hubris and Hyperbole
  • Section 20.7. Case Study: The Citizen Scientists
  • Section 20.8. Case Study: 1984, by George Orwell
  • Glossary
  • References
  • Index
  • Back Cover.