Deep reinforcement learning with Python with PyTorch, TensorFlow and OpenAI Gym
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
[Place of publication not identified] :
APress
[2021]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631718806719 |
Tabla de Contenidos:
- Intro
- Table of Contents
- About the Author
- About the Technical Reviewer
- Acknowledgments
- Introduction
- Chapter 1: Introduction to Reinforcement Learning
- Reinforcement Learning
- Machine Learning Branches
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Core Elements
- Deep Learning with Reinforcement Learning
- Examples and Case Studies
- Autonomous Vehicles
- Robots
- Recommendation Systems
- Finance and Trading
- Healthcare
- Game Playing
- Libraries and Environment Setup
- Alternate Way to Install Local Environment
- Summary
- Chapter 2: Markov Decision Processes
- Definition of Reinforcement Learning
- Agent and Environment
- Rewards
- Markov Processes
- Markov Chains
- Markov Reward Processes
- Markov Decision Processes
- Policies and Value Functions
- Bellman Equations
- Optimality Bellman Equations
- Types of Solution Approaches with a Mind-Map
- Summary
- Chapter 3: Model-Based Algorithms
- OpenAI Gym
- Dynamic Programming
- Policy Evaluation/Prediction
- Policy Improvement and Iterations
- Value Iteration
- Generalized Policy Iteration
- Asynchronous Backups
- Summary
- Chapter 4: Model-Free Approaches
- Estimation/Prediction with Monte Carlo
- Bias and Variance of MC Predication Methods
- Control with Monte Carlo
- Off-Policy MC Control
- Temporal Difference Learning Methods
- Temporal Difference Control
- On-Policy SARSA
- Q-Learning: An Off-Policy TD Control
- Maximization Bias and Double Learning
- Expected SARSA Control
- Replay Buffer and Off-Policy Learning
- Q-Learning for Continuous State Spaces
- n-Step Returns
- Eligibility Traces and TD(λ)
- Relationships Between DP, MC, and TD
- Summary
- Chapter 5: Function Approximation
- Introduction
- Theory of Approximation
- Coarse Coding
- Tile Encoding
- Challenges in Approximation.
- Incremental Prediction: MC, TD, TD(λ)
- Incremental Control
- Semi-gradient N-step SARSA Control
- Semi-gradient SARSA(λ) Control
- Convergence in Functional Approximation
- Gradient Temporal Difference Learning
- Batch Methods (DQN)
- Linear Least Squares Method
- Deep Learning Libraries
- Summary
- Chapter 6: Deep Q-Learning
- Deep Q Networks
- Atari Game-Playing Agent Using DQN
- Prioritized Replay
- Double Q-Learning
- Dueling DQN
- NoisyNets DQN
- Categorical 51-Atom DQN (C51)
- Quantile Regression DQN
- Hindsight Experience Replay
- Summary
- Chapter 7: Policy Gradient Algorithms
- Introduction
- Pros and Cons of Policy-Based Methods
- Policy Representation
- Discrete Case
- Continuous Case
- Policy Gradient Derivation
- Objective Function
- Derivative Update Rule
- Intuition Behind the Update Rule
- REINFORCE Algorithm
- Variance Reduction with Reward to Go
- Further Variance Reduction with Baselines
- Actor-Critic Methods
- Defining Advantage
- Advantage Actor Critic
- Implementation of the A2C Algorithm
- Asynchronous Advantage Actor Critic
- Trust Region Policy Optimization Algorithm
- Proximal Policy Optimization Algorithm
- Summary
- Chapter 8: Combining Policy Gradient and Q-Learning
- Trade-Offs in Policy Gradient and Q-Learning
- General Framework to Combine Policy Gradient with Q-Learning
- Deep Deterministic Policy Gradient
- Q-Learning in DDPG (Critic)
- Policy Learning in DDPG (Actor)
- Pseudocode and Implementation
- Gym Environments Used in Code
- Code Listing
- Policy Network Actor (PyTorch)
- Policy Network Actor (TensorFlow)
- Q-Network Critic Implementation
- PyTorch
- TensorFlow
- Combined Model-Actor Critic Implementation
- Experience Replay
- Q-Loss Implementation
- PyTorch
- TensorFlow
- Policy Loss Implementation
- One Step Update Implementation.
- DDPG: Main Loop
- Twin Delayed DDPG
- Target-Policy Smoothing
- Q-Loss (Critic)
- Policy Loss (Actor)
- Delayed Update
- Pseudocode and Implementation
- Code Implementation
- Combined Model-Actor Critic Implementation
- Q-Loss Implementation
- Policy-Loss Implementation
- One-Step Update Implementation
- TD3 Main Loop
- Reparameterization Trick
- Score/Reinforce Way
- Reparameterization Trick and Pathwise Derivatives
- Experiment
- Entropy Explained
- Soft Actor Critic
- SAC vs. TD3
- Q-Loss with Entropy-Regularization
- Policy Loss with Reparameterization Trick
- Pseudocode and Implementation
- Code Implementation
- Policy Network-Actor Implementation
- Q-Network, Combined Model, and Experience Replay
- Q-Loss and Policy-Loss Implementation
- One-Step Update and SAC Main Loop
- Summary
- Chapter 9: Integrated Planning and Learning
- Model-Based Reinforcement Learning
- Planning with a Learned Model
- Integrating Learning and Planning (Dyna)
- Dyna Q and Changing Environments
- Dyna Q+
- Expected vs. Sample Updates
- Exploration vs. Exploitation
- Multi-arm Bandit
- Regret: Measure of Quality of Exploration
- Epsilon Greedy Exploration
- Upper Confidence Bound Exploration
- Thompson Sampling Exploration
- Comparing Different Exploration Strategies
- Planning at Decision Time and Monte Carlo Tree Search
- AlphaGo Walk-Through
- Summary
- Chapter 10: Further Exploration and Next Steps
- Model-Based RL: Additional Approaches
- World Models
- Imagination-Augmented Agents (I2A)
- Model-Based RL with Model-Free Fine-Tuning (MBMF)
- Model-Based Value Expansion (MBVE)
- Imitation Learning and Inverse Reinforcement Learning
- Derivative-Free Methods
- Transfer Learning and Multitask Learning
- Meta-Learning
- Popular RL Libraries
- How to Continue Studying
- Summary
- Index.