15 Math Concepts Every Data Scientist Should Know Understand and Learn How to Apply the Math Behind Data Science Algorithms
Create more effective and powerful data science solutions by learning when, where, and how to apply key math principles that drive most data science algorithms Key Features Understand key data science algorithms with Python-based examples Increase the impact of your data science solutions by learnin...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2024]
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009843337806719 |
Tabla de Contenidos:
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Part 1: Essential Concepts
- Chapter 1: Recap of Mathematical Notation and Terminology
- Technical requirements
- Number systems
- Notation for numbers and fields
- Complex numbers
- What we learned
- Linear algebra
- Vectors
- Matrices
- What we learned
- Sums, products, and logarithms
- Sums and the notation
- Products and the notation
- Logarithms
- What we learned
- Differential and integral calculus
- Differentiation
- Finding maxima and minima
- Integration
- What we learned
- Analysis
- Limits
- Order notation
- Taylor series expansions
- What we learned
- Combinatorics
- Binomial coefficients
- What we learned
- Summary
- Notes and further reading
- Chapter 2: Random Variables and Probability Distributions
- Technical requirements
- All data is random
- A little example
- Systematic variation can be learned - random variation can't
- Random variation is not just measurement error
- What are the consequences of data being random?
- What we learned
- Random variables and probability distributions
- A new concept - random variables
- Summarizing probability distributions
- Continuous distributions
- Transforming and combining random variables
- Named distributions
- What we learned
- Sampling from distributions
- How datasets relate to random variables and probability distributions
- How big is the population from which a dataset is sampled?
- How to sample
- Generating your own random numbers code example
- Sampling from numpy distributions code example
- What we learned
- Understanding statistical estimators
- Consistency, bias, and efficiency
- The empirical distribution function
- What we learned
- The Central Limit Theorem
- Sums of random variables
- CLT code example.
- CLT example with discrete variables
- Computational estimation of a PDF from data
- KDE code example
- What we learned
- Summary
- Exercises
- Chapter 3: Matrices and Linear Algebra
- Technical requirements
- Inner and outer products of vectors
- Inner product of two vectors
- Outer product of two vectors
- What we learned
- Matrices as transformations
- Matrix multiplication
- The identity matrix
- The inverse matrix
- More examples of matrices as transformations
- Matrix transformation code example
- What we learned
- Matrix decompositions
- Eigen-decompositions
- Eigenvector and eigenvalues
- Eigen-decomposition of a square matrix
- Eigen-decomposition code example
- Singular value decomposition
- The SVD of a complex matrix
- What we learned
- Matrix properties
- Trace
- Determinant
- What we learned
- Matrix factorization and dimensionality reduction
- Dimensionality reduction
- Principal component analysis
- Non-negative matrix factorization
- What we learned
- Summary
- Exercises
- Notes and further reading
- Chapter 4: Loss Functions and Optimization
- Technical requirements
- Loss functions - what are they?
- Risk functions
- There are many loss functions
- Different loss functions = different end results
- Loss functions for anything
- A loss function by any other name
- What we learned
- Least Squares
- The squared-loss function
- OLS regression
- OLS, outliers, and robust regression
- What we learned
- Linear models
- Practical issues
- The model residuals
- OLS regression code example
- What we learned
- Gradient descent
- Locating the minimum of a simple risk function
- Gradient descent code example
- Gradient descent is a general technique
- Beyond simple gradient descent
- What we learned
- Summary
- Exercises
- Chapter 5: Probabilistic Modeling.
- Technical requirements
- Likelihood
- A simple probabilistic model
- Log likelihood
- Maximum likelihood estimation
- What we have learned
- Bayes' theorem
- Conditional probability and Bayes' theorem
- Priors
- The posterior
- What we have learned
- Bayesian modeling
- Bayesian model averaging
- MAP estimation
- As N becomes large the prior becomes irrelevant
- Least squares as an approximation to Bayesian modeling
- What we have learned
- Bayesian modeling in practice
- Analytic approximation of the posterior
- Computational sampling
- MCMC code example
- Probabilistic programming languages
- What we have learned
- Summary
- Exercises
- Part 2: Intermediate Concepts
- Chapter 6: Time Series and Forecasting
- Technical requirements
- What is time series data?
- What does auto-correlation mean for modeling time series data?
- The auto-correlation function (ACF)
- The partial auto-correlation function (PACF)
- Other data science implications of time series data
- What we have learned
- ARIMA models
- Integrated
- Auto-regression
- Moving average
- Combining the AR(p), I(d), and MA(q) into an ARIMA model
- Variants of ARIMA modeling
- What we have learned
- ARIMA modeling in practice
- Unit root testing
- Interpreting ACF and PACF plots
- auto.arima
- What we have learned
- Machine learning approaches to time series analysis
- Routine application of machine learning to time series analysis
- Deep learning approaches to time series analysis
- AutoML approaches to time series analysis
- What we have learned
- Summary
- Exercises
- Notes and further reading
- Chapter 7: Hypothesis Testing
- Technical requirements
- What is a hypothesis test?
- Example
- The general form of a hypothesis test
- The p-value
- The effect of increasing sample size
- The effect of decreasing noise.
- One-tailed and two-tailed tests
- Using samples variances in the test statistic - the t-test
- Computationally intensive methods for p-value estimation
- Parametric versus non-parametric hypothesis tests
- What we learned
- Confidence intervals
- What does a confidence interval really represent?
- Confidence intervals for any parameter
- A confidence interval code example
- What we learned
- Type I and Type II errors, and power
- What we learned
- Summary
- Exercises
- Notes and further reading
- Chapter 8: Model Complexity
- Technical requirements
- Generalization, overfitting, and the role of model complexity
- Overfitting
- Why overfitting is bad
- Overfitting increases the variability of predictions
- Underfitting is also a problem
- Measuring prediction error
- What we learned
- The bias-variance trade-off
- Proof of the bias-variance trade-off formula
- Double descent - a modern twist on the generalization error diagram
- What we learned
- Model complexity measures for model selection
- Selecting between classes of models
- Akaike Information Criterion
- Bayesian Information Criterion
- What we learned
- Summary
- Notes and further reading
- Chapter 9: Function Decomposition
- Technical requirements
- Why do we want to decompose a function?
- What is a decomposition of a function?
- Example 1 - decomposing a one-dimensional function into symmetric and anti-symmetric parts
- Example 2 - decomposing a time series into its seasonal and non-seasonal components
- What we've learned
- Expanding a function in terms of basis functions
- What we've learned
- Fourier series
- What we've learned
- Fourier transforms
- The multi-dimensional Fourier transform
- What we've learned
- The discrete Fourier transform
- DFT code example
- Uses of the DFT.
- What is the difference between the DFT, Fourier series, and the Fourier transform?
- What we've learned
- Summary
- Exercises
- Chapter 10: Network Analysis
- Technical requirements
- Graphs and network data
- Network data is about relationships
- Example 1 - substituting goods in a supermarket
- Example 2 - international trade
- What is a graph?
- What we've learned
- Basic characteristics of graphs
- Undirected and directed edges
- The adjacency matrix
- In-degree and out-degree
- Centrality
- What we've learned
- Different types of graphs
- Fully connected graphs
- Disconnected graphs
- Directed acyclic graphs
- Small-world networks
- Scale-free networks
- What we've learned
- Community detection and decomposing graphs
- What is a community?
- How to do community detection
- Community detection algorithms
- Community detection code example
- What we've learned
- Summary
- Exercises
- Notes and further reading
- Part 3: Selected Advanced Concepts
- Chapter 11: Dynamical Systems
- Technical requirements
- What is a dynamical system and what is an evolution equation?
- Time can be discrete or continuous
- Time does not have to mean chronological time
- Evolution equations
- What we learned
- First-order discrete Markov processes
- Variations of first-order Markov processes
- A Markov process is a probabilistic model
- The transition probability matrix
- Properties of the transition probability matrix
- Epidemic modeling with a first-order discrete Markov process
- The transition probability matrix is a network
- Using the transition matrix to generate state trajectories
- Evolution of the state probability distribution
- Stationary distributions and limiting distributions
- First-order discrete Markov processes are memoryless
- Likelihood of the state sequence
- What we learned.
- Higher-order discrete Markov processes.