Experimentation for engineers from A/B testing to Bayesian optimization
Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You'll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in indust...
Other Authors: | |
---|---|
Format: | eBook |
Language: | Inglés |
Published: |
Shelter Island, NY :
Manning Publications Co
[2023]
|
Edition: | [First edition] |
Subjects: | |
See on Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009724209506719 |
Table of Contents:
- Intro
- inside front cover
- Experimentation for Engineers
- Copyright
- dedication
- contents
- front matter
- preface
- acknowledgments
- about this book
- Who should read this book
- How this book is organized: A road map
- About the code
- liveBook discussion forum
- about the author
- about the cover illustration
- 1 Optimizing systems by experiment
- 1.1 Examples of engineering workflows
- 1.1.1 Machine learning engineer's workflow
- 1.1.2 Quantitative trader's workflow
- 1.1.3 Software engineer's workflow
- 1.2 Measuring by experiment
- 1.2.1 Experimental methods
- 1.2.2 Practical problems and pitfalls
- 1.3 Why are experiments necessary?
- 1.3.1 Domain knowledge
- 1.3.2 Offline model quality
- 1.3.3 Simulation
- Summary
- 2 A/B testing: Evaluating a modification to your system
- 2.1 Take an ad hoc measurement
- 2.1.1 Simulate the trading system
- 2.1.2 Compare execution costs
- 2.2 Take a precise measurement
- 2.2.1 Mitigate measurement variation with replication
- 2.3 Run an A/B test
- 2.3.1 Analyze your measurements
- 2.3.2 Design the A/B test
- 2.3.3 Measure and analyze
- 2.3.4 Recap of A/B test stages
- Summary
- 3 Multi-armed bandits: Maximizing business metrics while experimenting
- 3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics
- 3.1.1 A/B testing as a baseline
- 3.1.2 The epsilon-greedy algorithm
- 3.1.3 Deciding when to stop
- 3.2 Evaluating multiple system changes simultaneously
- 3.3 Thompson sampling: A more efficient MAB algorithm
- 3.3.1 Estimate the probability that an arm is the best
- 3.3.2 Randomized probability matching
- 3.3.3 The complete algorithm
- Summary
- 4 Response surface methodology: Optimizing continuous parameters
- 4.1 Optimize a single continuous parameter
- 4.1.1 Design: Choose parameter values to measure.
- 4.1.2 Take the measurements
- 4.1.3 Analyze I: Interpolate between measurements
- 4.1.4 Analyze II: Optimize the business metric
- 4.1.5 Validate the optimal parameter value
- 4.2 Optimizing two or more continuous parameters
- 4.2.1 Design the two-parameter experiment
- 4.2.2 Measure, analyze, and validate the 2D experiment
- Summary
- 5 Contextual bandits: Making targeted decisions
- 5.1 Model a business metric offline to make decisions online
- 5.1.1 Model the business-metric outcome of a decision
- 5.1.2 Add the decision-making component
- 5.1.3 Run and evaluate the greedy recommender
- 5.2 Explore actions with epsilon-greedy
- 5.2.1 Missing counterfactuals degrade predictions
- 5.2.2 Explore with epsilon-greedy to collect counterfactuals
- 5.3 Explore parameters with Thompson sampling
- 5.3.1 Create an ensemble of prediction models
- 5.3.2 Randomized probability matching
- 5.4 Validate the contextual bandit
- Summary
- 6 Bayesian optimization: Automating experimental optimization
- 6.1 Optimizing a single compiler parameter, a visual explanation
- 6.1.1 Simulate the compiler
- 6.1.2 Run the initial experiment
- 6.1.3 Analyze: Model the response surface
- 6.1.4 Design: Select the parameter value to measure next
- 6.1.5 Design: Balance exploration with exploitation
- 6.2 Model the response surface with Gaussian process regression
- 6.2.1 Estimate the expected CPU time
- 6.2.2 Estimate uncertainty with GPR
- 6.3 Optimize over an acquisition function
- 6.3.1 Minimize the acquisition function
- 6.4 Optimize all seven compiler parameters
- 6.4.1 Random search
- 6.4.2 A complete Bayesian optimization
- Summary
- 7 Managing business metrics
- 7.1 Focus on the business
- 7.1.1 Don't evaluate a model
- 7.1.2 Evaluate the product
- 7.2 Define business metrics
- 7.2.1 Be specific to your business.
- 7.2.2 Update business metrics periodically
- 7.2.3 Business metric timescales
- 7.3 Trade off multiple business metrics
- 7.3.1 Reduce negative side effects
- 7.3.2 Evaluate with multiple metrics
- Summary
- 8 Practical considerations
- 8.1 Violations of statistical assumptions
- 8.1.1 Violation of the iid assumption
- 8.1.2 Nonstationarity
- 8.2 Don't stop early
- 8.3 Control family-wise error
- 8.3.1 Cherry-picking increases the false-positive rate
- 8.3.2 Control false positives with the Bonferroni correction
- 8.4 Be aware of common biases
- 8.4.1 Confounder bias
- 8.4.2 Small-sample bias
- 8.4.3 Optimism bias
- 8.4.4 Experimenter bias
- 8.5 Replicate to validate results
- 8.5.1 Validate complex experiments
- 8.5.2 Monitor changes with a reverse A/B test
- 8.5.3 Measure quarterly changes with holdouts
- 8.6 Wrapping up
- Summary
- Appendix A Linear regression and the normal equations
- A.1 Univariate linear regression
- A.2 Multivariate linear regression
- Appendix B One factor at a time
- Appendix C Gaussian process regression
- index
- inside back cover.