Finding ghosts in your data anomaly detection techniques with examples in Python

Discover key information buried in the noise of data by learning a variety of anomaly detection techniques and using the Python programming language to build a robust service for anomaly detection against a variety of data types. The book starts with an overview of what anomalies and outliers are an...

Descripción completa

Detalles Bibliográficos
Otros Autores: Feasel, Kevin, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: New York, New York : Apress [2022]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009701173306719
Tabla de Contenidos:
  • Intro
  • Table of Contents
  • About the Author
  • About the Technical Reviewer
  • Introduction
  • Part I: What Is an Anomaly?
  • Chapter 1: The Importance of  Anomalies and Anomaly Detection
  • Defining Anomalies
  • Outlier
  • Noise vs. Anomalies
  • Diagnosing an Example
  • What If We're Wrong?
  • Anomalies in the Wild
  • Finance
  • Medicine
  • Sports Analytics
  • A 23 Million Mistake
  • A Persistent Anomaly
  • Web Analytics
  • And Many More
  • Classes of Anomaly Detection
  • Statistical Anomaly Detection
  • Clustering Anomaly Detection
  • Model-Based Anomaly Detection
  • Building an Anomaly Detector
  • Key Goals
  • How Do Humans Handle Anomalies?
  • Known Unknowns
  • Conclusion
  • Chapter 2: Humans Are Pattern Matchers
  • A Primer on the Gestalt School
  • Key Findings of the Gestalt School
  • Emergence
  • Reification
  • Invariance
  • Multistability
  • Principles Implied in the Key Findings
  • Meaningfulness
  • Conciseness
  • Closure
  • Similarity
  • Good Continuation
  • Figure and Ground
  • Proximity
  • Connectedness
  • Common Region
  • Symmetry
  • Common Fate
  • Synchrony
  • Helping People Find Anomalies
  • Use Color As a Signal
  • Limit Nonmeaningful Information
  • Enable "Connecting the Dots"
  • Conclusion
  • Chapter 3: Formalizing Anomaly Detection
  • The Importance of Formalization
  • "I'll Know It When I See It" Isn't Enough
  • Human Fallibility
  • Marginal Outliers
  • The Limits of Visualization
  • The First Formal Tool: Univariate Analysis
  • Distributions and Histograms
  • The Normal Distribution
  • Mean, Variance, and Standard Deviation
  • Additional Distributions
  • Log-Normal
  • Uniform
  • Cauchy
  • Robustness and the Mean
  • The Susceptibility of Outliers
  • The Median and "Robust" Statistics
  • Beyond the Median: Calculating Percentiles
  • Control Charts
  • Conclusion
  • Chapter 4: Laying Out the Framework
  • Tools of the Trade.
  • Choosing a Programming Language
  • Making Plumbing Choices
  • Reducing Architectural Variables
  • Developing an Initial Framework
  • Battlespace Preparation
  • Framing the API
  • Input and Output Signatures
  • Defining a Common Signature
  • Defining an Outlier
  • Sensitivity and Fraction of Anomalies
  • Single Solution
  • Combined Arms
  • Framing the Solution
  • Containerizing the Solution
  • Conclusion
  • Chapter 5: Building a Test Suite
  • Tools of the Trade
  • Unit Test Library
  • Integration Testing
  • Writing Testable Code
  • Keep Methods Separated
  • Emphasize Use Cases
  • Functional or Clean: Your Choice
  • Creating the Initial Tests
  • Unit Tests
  • Integration Tests
  • Conclusion
  • Chapter 6: Implementing the First Methods
  • A Motivating Example
  • Ensembling As a Technique
  • Sequential Ensembling
  • Independent Ensembling
  • Choosing Between Sequential and Independent Ensembling
  • Implementing the First Checks
  • Standard Deviations from the Mean
  • Median Absolute Deviations from the Median
  • Distance from the Interquartile Range
  • Completing the run_tests() Function
  • Building a Scoreboard
  • Weighting Results
  • Determining Outliers
  • Updating Tests
  • Updating Unit Tests
  • Updating Integration Tests
  • Conclusion
  • Chapter 7: Extending the Ensemble
  • Adding New Tests
  • Checking for Normality
  • Approaching Normality
  • A Framework for New Tests
  • Grubbs' Test for Outliers
  • Generalized ESD Test for Outliers
  • Dixon's Q Test
  • Calling the Tests
  • Updating Tests
  • Updating Unit Tests
  • Updating Integration Tests
  • Multi-peaked Data
  • A Hidden Assumption
  • The Solution: A Sneak Peek
  • Conclusion
  • Untitled
  • Chapter 8: Visualize the Results
  • Building a Plan
  • What Do We Want to Show?
  • How Do We Want to Show It?
  • Developing a Visualization App
  • Getting Started with Streamlit
  • Building the Initial Screen.
  • Displaying Results and Details
  • Conclusion
  • Chapter 9: Clustering and Anomalies
  • What Is Clustering?
  • Common Cluster Terminology
  • K-Means Clustering
  • K-Nearest Neighbors
  • When Clustering Makes Sense
  • Gaussian Mixture Modeling
  • Implementing a Univariate Version
  • Updating Tests
  • Common Problems with Clusters
  • Choosing the Correct Number of Clusters
  • Clustering Is Nondeterministic
  • Alternative Approaches
  • Tree-Based Approaches
  • The Problem with Trees
  • Conclusion
  • Chapter 10: Connectivity-Based Outlier Factor (COF)
  • Distance or Density?
  • Local Outlier Factor
  • Connectivity-Based Outlier Factor
  • Introducing Multivariate Support
  • Laying the Groundwork
  • Implementing COF
  • Test and Website Updates
  • Unit Test Updates
  • Integration Test Updates
  • Website Updates
  • Conclusion
  • Chapter 11: Local Correlation Integral (LOCI)
  • Local Correlation Integral
  • Discovering the Neighborhood
  • Multi-granularity Deviation Factor (MDEF)
  • Multivariate Algorithm Ensembles
  • Ensemble Types
  • COF Combinations
  • Incorporating LOCI
  • Test and Website Updates
  • Unit Test Updates
  • Website Updates
  • Conclusion
  • Chapter 12: Copula-Based Outlier Detection (COPOD)
  • Copula-Based Outlier Detection
  • What's a Copula?
  • Intuition Behind COPOD
  • Implementing COPOD
  • Test and Website Updates
  • Unit Test Updates
  • Integration Test Updates
  • Website Updates
  • Conclusion
  • Part IV: Time Series Anomaly Detection
  • Chapter 13: Time and Anomalies
  • What Is Time Series?
  • Time Series Changes Our Thinking
  • Autocorrelation
  • Smooth Movement
  • The Nature of Change
  • Data Requirements
  • Time Series Modeling
  • (Weighted) Moving Average
  • Exponential Smoothing
  • Autoregressive Models
  • What Constitutes an Outlier?
  • Local Outlier
  • Behavioral Changes over Time
  • Local Non-outlier in a Global Change.
  • Differences from Peer Groups
  • Common Classes of Technique
  • Conclusion
  • Untitled
  • Chapter 14: Change Point Detection
  • What Is Change Point Detection?
  • Benefits of Change Point Detection
  • Change Point Detection with ruptures
  • Dynamic Programming
  • PELT
  • Implementing Change Point Detection
  • Test and Website Updates
  • Unit Tests
  • Integration Tests
  • Website Updates
  • Avenues of Further Improvement
  • Conclusion
  • Chapter 15: An Introduction to Multi-series Anomaly Detection
  • What Is Multi-series Time Series?
  • Key Aspects of Multi-series Time Series
  • What Needs to Change?
  • What's the Difference?
  • Leading and Lagging Factors
  • Available Processes
  • Cross-Euclidean Distance
  • Cross-Correlation Coefficient
  • SameTrend (STREND)
  • Common Problems
  • Conclusion
  • Chapter 16: Standard Deviation of Differences (DIFFSTD)
  • What Is DIFFSTD?
  • Calculating DIFFSTD
  • Key Assumptions
  • Writing DIFFSTD
  • Series Processing
  • Segmentation
  • Comparing the Norm
  • Determining Outliers
  • Test and Website Updates
  • Unit Tests
  • Integration Tests
  • Website Updates
  • Conclusion
  • Chapter 17: Symbolic Aggregate Approximation (SAX)
  • What Is SAX?
  • Motifs and Discords
  • Subsequences and Matches
  • Discretizing the Data
  • Implementing SAX
  • Segmentation and Blocking
  • Making SAX Multi-series
  • Scoring Outliers
  • Test and Website Updates
  • Unit and Integration Tests
  • Website Updates
  • Conclusion
  • Part V: Stacking Up to the Competition
  • Chapter 18: Configuring Azure Cognitive Services Anomaly Detector
  • Gathering Market Intelligence
  • Amazon Web Services: SageMaker
  • Microsoft Azure: Cognitive Services
  • Google Cloud: AI Services
  • Configuring Azure Cognitive Services
  • Set Up an Account
  • Using the Demo Application
  • Conclusion
  • Chapter 19: Performing a Bake-Off
  • Preparing the Comparison.
  • Supervised vs. Unsupervised Learning
  • Choosing Datasets
  • Scoring Results
  • Performing the Bake-Off
  • Accessing Cognitive Services via Python
  • Accessing Our API via Python
  • Dataset Comparisons
  • Lessons Learned
  • Making a Better Anomaly Detector
  • Increasing Robustness
  • Extending the Ensembles
  • Training Parameter Values
  • Conclusion
  • Untitled
  • Appendix
  • Index.