Finding ghosts in your data anomaly detection techniques with examples in Python
Discover key information buried in the noise of data by learning a variety of anomaly detection techniques and using the Python programming language to build a robust service for anomaly detection against a variety of data types. The book starts with an overview of what anomalies and outliers are an...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
New York, New York :
Apress
[2022]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009701173306719 |
Tabla de Contenidos:
- Intro
- Table of Contents
- About the Author
- About the Technical Reviewer
- Introduction
- Part I: What Is an Anomaly?
- Chapter 1: The Importance of Anomalies and Anomaly Detection
- Defining Anomalies
- Outlier
- Noise vs. Anomalies
- Diagnosing an Example
- What If We're Wrong?
- Anomalies in the Wild
- Finance
- Medicine
- Sports Analytics
- A 23 Million Mistake
- A Persistent Anomaly
- Web Analytics
- And Many More
- Classes of Anomaly Detection
- Statistical Anomaly Detection
- Clustering Anomaly Detection
- Model-Based Anomaly Detection
- Building an Anomaly Detector
- Key Goals
- How Do Humans Handle Anomalies?
- Known Unknowns
- Conclusion
- Chapter 2: Humans Are Pattern Matchers
- A Primer on the Gestalt School
- Key Findings of the Gestalt School
- Emergence
- Reification
- Invariance
- Multistability
- Principles Implied in the Key Findings
- Meaningfulness
- Conciseness
- Closure
- Similarity
- Good Continuation
- Figure and Ground
- Proximity
- Connectedness
- Common Region
- Symmetry
- Common Fate
- Synchrony
- Helping People Find Anomalies
- Use Color As a Signal
- Limit Nonmeaningful Information
- Enable "Connecting the Dots"
- Conclusion
- Chapter 3: Formalizing Anomaly Detection
- The Importance of Formalization
- "I'll Know It When I See It" Isn't Enough
- Human Fallibility
- Marginal Outliers
- The Limits of Visualization
- The First Formal Tool: Univariate Analysis
- Distributions and Histograms
- The Normal Distribution
- Mean, Variance, and Standard Deviation
- Additional Distributions
- Log-Normal
- Uniform
- Cauchy
- Robustness and the Mean
- The Susceptibility of Outliers
- The Median and "Robust" Statistics
- Beyond the Median: Calculating Percentiles
- Control Charts
- Conclusion
- Chapter 4: Laying Out the Framework
- Tools of the Trade.
- Choosing a Programming Language
- Making Plumbing Choices
- Reducing Architectural Variables
- Developing an Initial Framework
- Battlespace Preparation
- Framing the API
- Input and Output Signatures
- Defining a Common Signature
- Defining an Outlier
- Sensitivity and Fraction of Anomalies
- Single Solution
- Combined Arms
- Framing the Solution
- Containerizing the Solution
- Conclusion
- Chapter 5: Building a Test Suite
- Tools of the Trade
- Unit Test Library
- Integration Testing
- Writing Testable Code
- Keep Methods Separated
- Emphasize Use Cases
- Functional or Clean: Your Choice
- Creating the Initial Tests
- Unit Tests
- Integration Tests
- Conclusion
- Chapter 6: Implementing the First Methods
- A Motivating Example
- Ensembling As a Technique
- Sequential Ensembling
- Independent Ensembling
- Choosing Between Sequential and Independent Ensembling
- Implementing the First Checks
- Standard Deviations from the Mean
- Median Absolute Deviations from the Median
- Distance from the Interquartile Range
- Completing the run_tests() Function
- Building a Scoreboard
- Weighting Results
- Determining Outliers
- Updating Tests
- Updating Unit Tests
- Updating Integration Tests
- Conclusion
- Chapter 7: Extending the Ensemble
- Adding New Tests
- Checking for Normality
- Approaching Normality
- A Framework for New Tests
- Grubbs' Test for Outliers
- Generalized ESD Test for Outliers
- Dixon's Q Test
- Calling the Tests
- Updating Tests
- Updating Unit Tests
- Updating Integration Tests
- Multi-peaked Data
- A Hidden Assumption
- The Solution: A Sneak Peek
- Conclusion
- Untitled
- Chapter 8: Visualize the Results
- Building a Plan
- What Do We Want to Show?
- How Do We Want to Show It?
- Developing a Visualization App
- Getting Started with Streamlit
- Building the Initial Screen.
- Displaying Results and Details
- Conclusion
- Chapter 9: Clustering and Anomalies
- What Is Clustering?
- Common Cluster Terminology
- K-Means Clustering
- K-Nearest Neighbors
- When Clustering Makes Sense
- Gaussian Mixture Modeling
- Implementing a Univariate Version
- Updating Tests
- Common Problems with Clusters
- Choosing the Correct Number of Clusters
- Clustering Is Nondeterministic
- Alternative Approaches
- Tree-Based Approaches
- The Problem with Trees
- Conclusion
- Chapter 10: Connectivity-Based Outlier Factor (COF)
- Distance or Density?
- Local Outlier Factor
- Connectivity-Based Outlier Factor
- Introducing Multivariate Support
- Laying the Groundwork
- Implementing COF
- Test and Website Updates
- Unit Test Updates
- Integration Test Updates
- Website Updates
- Conclusion
- Chapter 11: Local Correlation Integral (LOCI)
- Local Correlation Integral
- Discovering the Neighborhood
- Multi-granularity Deviation Factor (MDEF)
- Multivariate Algorithm Ensembles
- Ensemble Types
- COF Combinations
- Incorporating LOCI
- Test and Website Updates
- Unit Test Updates
- Website Updates
- Conclusion
- Chapter 12: Copula-Based Outlier Detection (COPOD)
- Copula-Based Outlier Detection
- What's a Copula?
- Intuition Behind COPOD
- Implementing COPOD
- Test and Website Updates
- Unit Test Updates
- Integration Test Updates
- Website Updates
- Conclusion
- Part IV: Time Series Anomaly Detection
- Chapter 13: Time and Anomalies
- What Is Time Series?
- Time Series Changes Our Thinking
- Autocorrelation
- Smooth Movement
- The Nature of Change
- Data Requirements
- Time Series Modeling
- (Weighted) Moving Average
- Exponential Smoothing
- Autoregressive Models
- What Constitutes an Outlier?
- Local Outlier
- Behavioral Changes over Time
- Local Non-outlier in a Global Change.
- Differences from Peer Groups
- Common Classes of Technique
- Conclusion
- Untitled
- Chapter 14: Change Point Detection
- What Is Change Point Detection?
- Benefits of Change Point Detection
- Change Point Detection with ruptures
- Dynamic Programming
- PELT
- Implementing Change Point Detection
- Test and Website Updates
- Unit Tests
- Integration Tests
- Website Updates
- Avenues of Further Improvement
- Conclusion
- Chapter 15: An Introduction to Multi-series Anomaly Detection
- What Is Multi-series Time Series?
- Key Aspects of Multi-series Time Series
- What Needs to Change?
- What's the Difference?
- Leading and Lagging Factors
- Available Processes
- Cross-Euclidean Distance
- Cross-Correlation Coefficient
- SameTrend (STREND)
- Common Problems
- Conclusion
- Chapter 16: Standard Deviation of Differences (DIFFSTD)
- What Is DIFFSTD?
- Calculating DIFFSTD
- Key Assumptions
- Writing DIFFSTD
- Series Processing
- Segmentation
- Comparing the Norm
- Determining Outliers
- Test and Website Updates
- Unit Tests
- Integration Tests
- Website Updates
- Conclusion
- Chapter 17: Symbolic Aggregate Approximation (SAX)
- What Is SAX?
- Motifs and Discords
- Subsequences and Matches
- Discretizing the Data
- Implementing SAX
- Segmentation and Blocking
- Making SAX Multi-series
- Scoring Outliers
- Test and Website Updates
- Unit and Integration Tests
- Website Updates
- Conclusion
- Part V: Stacking Up to the Competition
- Chapter 18: Configuring Azure Cognitive Services Anomaly Detector
- Gathering Market Intelligence
- Amazon Web Services: SageMaker
- Microsoft Azure: Cognitive Services
- Google Cloud: AI Services
- Configuring Azure Cognitive Services
- Set Up an Account
- Using the Demo Application
- Conclusion
- Chapter 19: Performing a Bake-Off
- Preparing the Comparison.
- Supervised vs. Unsupervised Learning
- Choosing Datasets
- Scoring Results
- Performing the Bake-Off
- Accessing Cognitive Services via Python
- Accessing Our API via Python
- Dataset Comparisons
- Lessons Learned
- Making a Better Anomaly Detector
- Increasing Robustness
- Extending the Ensembles
- Training Parameter Values
- Conclusion
- Untitled
- Appendix
- Index.