Deep reinforcement learning with Python with PyTorch, TensorFlow and OpenAI Gym

Detalles Bibliográficos
Otros Autores: Sanghi, Nimish, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: [Place of publication not identified] : APress [2021]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631718806719
Tabla de Contenidos:
  • Intro
  • Table of Contents
  • About the Author
  • About the Technical Reviewer
  • Acknowledgments
  • Introduction
  • Chapter 1: Introduction to Reinforcement Learning
  • Reinforcement Learning
  • Machine Learning Branches
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Core Elements
  • Deep Learning with Reinforcement Learning
  • Examples and Case Studies
  • Autonomous Vehicles
  • Robots
  • Recommendation Systems
  • Finance and Trading
  • Healthcare
  • Game Playing
  • Libraries and Environment Setup
  • Alternate Way to Install Local Environment
  • Summary
  • Chapter 2: Markov Decision Processes
  • Definition of Reinforcement Learning
  • Agent and Environment
  • Rewards
  • Markov Processes
  • Markov Chains
  • Markov Reward Processes
  • Markov Decision Processes
  • Policies and Value Functions
  • Bellman Equations
  • Optimality Bellman Equations
  • Types of Solution Approaches with a Mind-Map
  • Summary
  • Chapter 3: Model-Based Algorithms
  • OpenAI Gym
  • Dynamic Programming
  • Policy Evaluation/Prediction
  • Policy Improvement and Iterations
  • Value Iteration
  • Generalized Policy Iteration
  • Asynchronous Backups
  • Summary
  • Chapter 4: Model-Free Approaches
  • Estimation/Prediction with Monte Carlo
  • Bias and Variance of MC Predication Methods
  • Control with Monte Carlo
  • Off-Policy MC Control
  • Temporal Difference Learning Methods
  • Temporal Difference Control
  • On-Policy SARSA
  • Q-Learning: An Off-Policy TD Control
  • Maximization Bias and Double Learning
  • Expected SARSA Control
  • Replay Buffer and Off-Policy Learning
  • Q-Learning for Continuous State Spaces
  • n-Step Returns
  • Eligibility Traces and TD(λ)
  • Relationships Between DP, MC, and TD
  • Summary
  • Chapter 5: Function Approximation
  • Introduction
  • Theory of Approximation
  • Coarse Coding
  • Tile Encoding
  • Challenges in Approximation.
  • Incremental Prediction: MC, TD, TD(λ)
  • Incremental Control
  • Semi-gradient N-step SARSA Control
  • Semi-gradient SARSA(λ) Control
  • Convergence in Functional Approximation
  • Gradient Temporal Difference Learning
  • Batch Methods (DQN)
  • Linear Least Squares Method
  • Deep Learning Libraries
  • Summary
  • Chapter 6: Deep Q-Learning
  • Deep Q Networks
  • Atari Game-Playing Agent Using DQN
  • Prioritized Replay
  • Double Q-Learning
  • Dueling DQN
  • NoisyNets DQN
  • Categorical 51-Atom DQN (C51)
  • Quantile Regression DQN
  • Hindsight Experience Replay
  • Summary
  • Chapter 7: Policy Gradient Algorithms
  • Introduction
  • Pros and Cons of Policy-Based Methods
  • Policy Representation
  • Discrete Case
  • Continuous Case
  • Policy Gradient Derivation
  • Objective Function
  • Derivative Update Rule
  • Intuition Behind the Update Rule
  • REINFORCE Algorithm
  • Variance Reduction with Reward to Go
  • Further Variance Reduction with Baselines
  • Actor-Critic Methods
  • Defining Advantage
  • Advantage Actor Critic
  • Implementation of the A2C Algorithm
  • Asynchronous Advantage Actor Critic
  • Trust Region Policy Optimization Algorithm
  • Proximal Policy Optimization Algorithm
  • Summary
  • Chapter 8: Combining Policy Gradient and Q-Learning
  • Trade-Offs in Policy Gradient and Q-Learning
  • General Framework to Combine Policy Gradient with Q-Learning
  • Deep Deterministic Policy Gradient
  • Q-Learning in DDPG (Critic)
  • Policy Learning in DDPG (Actor)
  • Pseudocode and Implementation
  • Gym Environments Used in Code
  • Code Listing
  • Policy Network Actor (PyTorch)
  • Policy Network Actor (TensorFlow)
  • Q-Network Critic Implementation
  • PyTorch
  • TensorFlow
  • Combined Model-Actor Critic Implementation
  • Experience Replay
  • Q-Loss Implementation
  • PyTorch
  • TensorFlow
  • Policy Loss Implementation
  • One Step Update Implementation.
  • DDPG: Main Loop
  • Twin Delayed DDPG
  • Target-Policy Smoothing
  • Q-Loss (Critic)
  • Policy Loss (Actor)
  • Delayed Update
  • Pseudocode and Implementation
  • Code Implementation
  • Combined Model-Actor Critic Implementation
  • Q-Loss Implementation
  • Policy-Loss Implementation
  • One-Step Update Implementation
  • TD3 Main Loop
  • Reparameterization Trick
  • Score/Reinforce Way
  • Reparameterization Trick and Pathwise Derivatives
  • Experiment
  • Entropy Explained
  • Soft Actor Critic
  • SAC vs. TD3
  • Q-Loss with Entropy-Regularization
  • Policy Loss with Reparameterization Trick
  • Pseudocode and Implementation
  • Code Implementation
  • Policy Network-Actor Implementation
  • Q-Network, Combined Model, and Experience Replay
  • Q-Loss and Policy-Loss Implementation
  • One-Step Update and SAC Main Loop
  • Summary
  • Chapter 9: Integrated Planning and Learning
  • Model-Based Reinforcement Learning
  • Planning with a Learned Model
  • Integrating Learning and Planning (Dyna)
  • Dyna Q and Changing Environments
  • Dyna Q+
  • Expected vs. Sample Updates
  • Exploration vs. Exploitation
  • Multi-arm Bandit
  • Regret: Measure of Quality of Exploration
  • Epsilon Greedy Exploration
  • Upper Confidence Bound Exploration
  • Thompson Sampling Exploration
  • Comparing Different Exploration Strategies
  • Planning at Decision Time and Monte Carlo Tree Search
  • AlphaGo Walk-Through
  • Summary
  • Chapter 10: Further Exploration and Next Steps
  • Model-Based RL: Additional Approaches
  • World Models
  • Imagination-Augmented Agents (I2A)
  • Model-Based RL with Model-Free Fine-Tuning (MBMF)
  • Model-Based Value Expansion (MBVE)
  • Imitation Learning and Inverse Reinforcement Learning
  • Derivative-Free Methods
  • Transfer Learning and Multitask Learning
  • Meta-Learning
  • Popular RL Libraries
  • How to Continue Studying
  • Summary
  • Index.