Experimentation for engineers from A/B testing to Bayesian optimization

Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You'll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in indust...

Full description

Bibliographic Details
Other Authors: Sweet, David, author (author)
Format: eBook
Language:Inglés
Published: Shelter Island, NY : Manning Publications Co [2023]
Edition:[First edition]
Subjects:
See on Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009724209506719
Table of Contents:
  • Intro
  • inside front cover
  • Experimentation for Engineers
  • Copyright
  • dedication
  • contents
  • front matter
  • preface
  • acknowledgments
  • about this book
  • Who should read this book
  • How this book is organized: A road map
  • About the code
  • liveBook discussion forum
  • about the author
  • about the cover illustration
  • 1 Optimizing systems by experiment
  • 1.1 Examples of engineering workflows
  • 1.1.1 Machine learning engineer's workflow
  • 1.1.2 Quantitative trader's workflow
  • 1.1.3 Software engineer's workflow
  • 1.2 Measuring by experiment
  • 1.2.1 Experimental methods
  • 1.2.2 Practical problems and pitfalls
  • 1.3 Why are experiments necessary?
  • 1.3.1 Domain knowledge
  • 1.3.2 Offline model quality
  • 1.3.3 Simulation
  • Summary
  • 2 A/B testing: Evaluating a modification to your system
  • 2.1 Take an ad hoc measurement
  • 2.1.1 Simulate the trading system
  • 2.1.2 Compare execution costs
  • 2.2 Take a precise measurement
  • 2.2.1 Mitigate measurement variation with replication
  • 2.3 Run an A/B test
  • 2.3.1 Analyze your measurements
  • 2.3.2 Design the A/B test
  • 2.3.3 Measure and analyze
  • 2.3.4 Recap of A/B test stages
  • Summary
  • 3 Multi-armed bandits: Maximizing business metrics while experimenting
  • 3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics
  • 3.1.1 A/B testing as a baseline
  • 3.1.2 The epsilon-greedy algorithm
  • 3.1.3 Deciding when to stop
  • 3.2 Evaluating multiple system changes simultaneously
  • 3.3 Thompson sampling: A more efficient MAB algorithm
  • 3.3.1 Estimate the probability that an arm is the best
  • 3.3.2 Randomized probability matching
  • 3.3.3 The complete algorithm
  • Summary
  • 4 Response surface methodology: Optimizing continuous parameters
  • 4.1 Optimize a single continuous parameter
  • 4.1.1 Design: Choose parameter values to measure.
  • 4.1.2 Take the measurements
  • 4.1.3 Analyze I: Interpolate between measurements
  • 4.1.4 Analyze II: Optimize the business metric
  • 4.1.5 Validate the optimal parameter value
  • 4.2 Optimizing two or more continuous parameters
  • 4.2.1 Design the two-parameter experiment
  • 4.2.2 Measure, analyze, and validate the 2D experiment
  • Summary
  • 5 Contextual bandits: Making targeted decisions
  • 5.1 Model a business metric offline to make decisions online
  • 5.1.1 Model the business-metric outcome of a decision
  • 5.1.2 Add the decision-making component
  • 5.1.3 Run and evaluate the greedy recommender
  • 5.2 Explore actions with epsilon-greedy
  • 5.2.1 Missing counterfactuals degrade predictions
  • 5.2.2 Explore with epsilon-greedy to collect counterfactuals
  • 5.3 Explore parameters with Thompson sampling
  • 5.3.1 Create an ensemble of prediction models
  • 5.3.2 Randomized probability matching
  • 5.4 Validate the contextual bandit
  • Summary
  • 6 Bayesian optimization: Automating experimental optimization
  • 6.1 Optimizing a single compiler parameter, a visual explanation
  • 6.1.1 Simulate the compiler
  • 6.1.2 Run the initial experiment
  • 6.1.3 Analyze: Model the response surface
  • 6.1.4 Design: Select the parameter value to measure next
  • 6.1.5 Design: Balance exploration with exploitation
  • 6.2 Model the response surface with Gaussian process regression
  • 6.2.1 Estimate the expected CPU time
  • 6.2.2 Estimate uncertainty with GPR
  • 6.3 Optimize over an acquisition function
  • 6.3.1 Minimize the acquisition function
  • 6.4 Optimize all seven compiler parameters
  • 6.4.1 Random search
  • 6.4.2 A complete Bayesian optimization
  • Summary
  • 7 Managing business metrics
  • 7.1 Focus on the business
  • 7.1.1 Don't evaluate a model
  • 7.1.2 Evaluate the product
  • 7.2 Define business metrics
  • 7.2.1 Be specific to your business.
  • 7.2.2 Update business metrics periodically
  • 7.2.3 Business metric timescales
  • 7.3 Trade off multiple business metrics
  • 7.3.1 Reduce negative side effects
  • 7.3.2 Evaluate with multiple metrics
  • Summary
  • 8 Practical considerations
  • 8.1 Violations of statistical assumptions
  • 8.1.1 Violation of the iid assumption
  • 8.1.2 Nonstationarity
  • 8.2 Don't stop early
  • 8.3 Control family-wise error
  • 8.3.1 Cherry-picking increases the false-positive rate
  • 8.3.2 Control false positives with the Bonferroni correction
  • 8.4 Be aware of common biases
  • 8.4.1 Confounder bias
  • 8.4.2 Small-sample bias
  • 8.4.3 Optimism bias
  • 8.4.4 Experimenter bias
  • 8.5 Replicate to validate results
  • 8.5.1 Validate complex experiments
  • 8.5.2 Monitor changes with a reverse A/B test
  • 8.5.3 Measure quarterly changes with holdouts
  • 8.6 Wrapping up
  • Summary
  • Appendix A Linear regression and the normal equations
  • A.1 Univariate linear regression
  • A.2 Multivariate linear regression
  • Appendix B One factor at a time
  • Appendix C Gaussian process regression
  • index
  • inside back cover.