Principles of Data Science A Beginner's Guide to Essential Math and Coding Skills for Data Fluency and Machine Learning
Transform your data into insights with must-know techniques and mathematical concepts to unravel the secrets hidden within your data Key Features Learn practical data science combined with data theory to gain maximum insights from data Discover methods for deploying actionable machine learning pipel...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2024]
|
Edición: | Third edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009799143806719 |
Tabla de Contenidos:
- Intro
- Title Page
- Copyright and Credits
- Dedication
- Contributor
- Table of Contents
- Preface
- Chapter 1: Data Science Terminology
- What is data science?
- Understanding basic data science terminology
- Why data science?
- Example - predicting COVID-19 with machine learning
- The data science Venn diagram
- The math
- Computer programming
- Example - parsing a single tweet
- Domain knowledge
- Some more terminology
- Data science case studies
- Case study - automating government paper pushing
- Case study - what's in a job description?
- Summary
- Chapter 2: Types of Data
- Structured versus unstructured data
- Quantitative versus qualitative data
- Digging deeper
- The four levels of data
- The nominal level
- Measures of center
- The ordinal level
- The interval level
- The ratio level
- Data is in the eye of the beholder
- Summary
- Questions and answers
- Chapter 3: The Five Steps of Data Science
- Introduction to data science
- Overview of the five steps
- Exploring the data
- Guiding questions for data exploration
- DataFrames
- Series
- Exploration tips for qualitative data
- Summary
- Chapter 4: Basic Mathematics
- Basic symbols and terminology
- Vectors and matrices
- Arithmetic symbols
- Summation
- Logarithms/exponents
- Set theory
- Linear algebra
- Matrix multiplication
- How to multiply matrices together
- Summary
- Chapter 5: Impossible or Improbable - A Gentle Introduction to Probability
- Basic definitions
- What do we mean by "probability"?
- Bayesian versus frequentist
- Frequentist approach
- The law of large numbers
- Compound events
- Conditional probability
- How to utilize the rules of probability
- The addition rule
- Mutual exclusivity
- The multiplication rule
- Independence
- Complementary events
- Introduction to binary classifiers
- Summary.
- Chapter 6: Advanced Probability
- Bayesian ideas revisited
- Bayes' theorem
- More applications of Bayes' theorem
- Random variables
- Discrete random variables
- Continuous random variables
- Summary
- Chapter 7: What Are the Chances? An Introduction to Statistics
- What are statistics?
- How do we obtain and sample data?
- Obtaining data
- Observational
- Experimental
- Sampling data
- How do we measure statistics?
- Measures of center
- Measures of variation
- The coefficient of variation
- Measures of relative standing
- The insightful part - correlations in data
- The empirical rule
- Example - exam scores
- Summary
- Chapter 8: Advanced Statistics
- Understanding point estimates
- Sampling distributions
- Confidence intervals
- Hypothesis tests
- Conducting a hypothesis test
- One-sample t-tests
- Type I and Type II errors
- Hypothesis testing for categorical variables
- Chi-square goodness of fit test
- Chi-square test for association/independence
- Summary
- Chapter 9: Communicating Data
- Why does communication matter?
- Identifying effective visualizations
- Scatter plots
- Line graphs
- Bar charts
- Histograms
- Box plots
- When graphs and statistics lie
- Correlation versus causation
- Simpson's paradox
- If correlation doesn't imply causation, then what does?
- Verbal communication
- It's about telling a story
- On the more formal side of things
- The why/how/what strategy for presenting
- Summary
- Chapter 10: How to Tell if Your Toaster is Learning - Machine Learning Essentials
- Introducing ML
- Example - facial recognition
- ML isn't perfect
- How does ML work?
- Types of ML
- SL
- UL
- RL
- Overview of the types of ML
- ML paradigms - pros and cons
- Predicting continuous variables with linear regression
- Correlation versus causation
- Causation.
- Adding more predictors
- Regression metrics
- Summary
- Chapter 11: Predictions Don't Grow on Trees, or Do They?
- Performing naïve Bayes classification
- Classification metrics
- Understanding decision trees
- Measuring purity
- Exploring the Titanic dataset
- Dummy variables
- Diving deep into UL
- When to use UL
- k-means clustering
- The Silhouette Coefficient
- Feature extraction and PCA
- Summary
- Chapter 12: Introduction to Transfer Learning and Pre-Trained Models
- Understanding pre-trained models
- Benefits of using pre-trained models
- Commonly used pre-trained models
- Decoding BERT's pre-training
- TL
- Different types of TL
- Inductive TL
- Transductive TL
- Unsupervised TL - feature extraction
- TL with BERT and GPT
- Examples of TL
- Example - Fine-tuning a pre-trained model for text classification
- Summary
- Chapter 13: Mitigating Algorithmic Bias and Tackling Model and Data Drift
- Understanding algorithmic bias
- Types of bias
- Sources of algorithmic bias
- Measuring bias
- Consequences of unaddressed bias and the importance of fairness
- Mitigating algorithmic bias
- Mitigation during data preprocessing
- Mitigation during model in-processing
- Mitigation during model postprocessing
- Bias in LLMs
- Uncovering bias in GPT-2
- Emerging techniques in bias and fairness in ML
- Understanding model drift and decay
- Model drift
- Data drift
- Mitigating drift
- Understanding the context
- Continuous monitoring
- Regular model retraining
- Implementing feedback systems
- Model adaptation techniques
- Summary
- Chapter 14: AI Governance
- Mastering data governance
- Current hurdles in data governance
- Data management: crafting the bedrock
- Data ingestion - the gateway to information
- Data integration - from collection to delivery
- Data warehouses and entity resolution.
- The quest for data quality
- Documentation and cataloging - the unsung heroes of governance
- Understanding the path of data
- Regulatory compliance and audit preparedness
- Change management and impact analysis
- Upholding data quality
- Troubleshooting and analysis
- Navigating the intricacy and the anatomy of ML governance
- ML governance pillars
- Model interpretability
- The many facets of ML development
- Beyond training - model deployment and monitoring
- A guide to architectural governance
- The five pillars of architectural governance
- Transformative architectural principles
- Zooming in on architectural dimensions
- Summary
- Chapter 15: Navigating Real-World Data Science Case Studies in Action
- Introduction to the COMPAS dataset case study
- Understanding the task/outlining success
- Preliminary data exploration
- Preparing the data for modeling
- Final thoughts
- Text embeddings using pretrainedmodels and OpenAI
- Setting up and importing necessary libraries
- Data collection - fetching the textbook data
- Converting text to embeddings
- Querying - searching for relevant information
- Concluding thoughts - the power of modern pre-trained models
- Summary
- Index
- About Packt
- Other Books You May Enjoy.