Modern Computer Vision with Pytorch A Practical Roadmap from Deep Learning Fundamentals to Advanced Applications and Generative AI
Whether you are a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and how to implement state-of-the-art architectures for real-world tasks. The second edition of Modern Computer Vision with PyTorch...
Autor principal: | |
---|---|
Otros Autores: | |
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited
2023.
|
Edición: | 2nd ed |
Colección: | Expert insight.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009828023106719 |
Tabla de Contenidos:
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Section 1: Fundamentals of Deep Learning for Computer Vision
- Chapter 1: Artificial Neural Network Fundamentals
- Comparing AI and traditional machine learning
- Learning about the ANN building blocks
- Implementing feedforward propagation
- Calculating the hidden layer unit values
- Applying the activation function
- Calculating the output layer values
- Calculating loss values
- Calculating loss during continuous variable prediction
- Calculating loss during categorical variable prediction
- Feedforward propagation in code
- Activation functions in code
- Loss functions in code
- Implementing backpropagation
- Gradient descent in code
- Implementing backpropagation using the chain rule
- Putting feedforward propagation and backpropagation together
- Understanding the impact of the learning rate
- Learning rate of 0.01
- Learning rate of 0.1
- Learning rate of 1
- Summarizing the training process of a neural network
- Summary
- Questions
- Chapter 2: PyTorch Fundamentals
- Installing PyTorch
- PyTorch tensors
- Initializing a tensor
- Operations on tensors
- Auto gradients of tensor objects
- Advantages of PyTorch's tensors over NumPy's ndarrays
- Building a neural network using PyTorch
- Dataset, DataLoader, and batch size
- Predicting on new data points
- Implementing a custom loss function
- Fetching the values of intermediate layers
- Using a sequential method to build a neural network
- Saving and loading a PyTorch model
- Using state_dict
- Saving
- Loading
- Summary
- Questions
- Chapter 3: Building a Deep Neural Network with PyTorch
- Representing an image
- Converting images into structured arrays and scalars
- Creating a structured array for colored images
- Why leverage neural networks for image analysis?.
- Preparing our data for image classification
- Training a neural network
- Scaling a dataset to improve model accuracy
- Understanding the impact of varying the batch size
- Batch size of 32
- Batch size of 10,000
- Understanding the impact of varying the loss optimizer
- Building a deeper neural network
- Understanding the impact of batch normalization
- Very small input values without batch normalization
- Very small input values with batch normalization
- The concept of overfitting
- Impact of adding dropout
- Impact of regularization
- L1 regularization
- L2 regularization
- Summary
- Questions
- Section 2: Object Classification and Detection
- Chapter 4: Introducing Convolutional Neural Networks
- The problem with traditional deep neural networks
- Building blocks of a CNN
- Convolution
- Filters
- Strides and padding
- Strides
- Padding
- Pooling
- Putting them all together
- How convolution and pooling help in image translation
- Implementing a CNN
- Classifying images using deep CNNs
- Visualizing the outcome of feature learning
- Building a CNN for classifying real-world images
- Impact on the number of images used for training
- Summary
- Questions
- Chapter 5: Transfer Learning for Image Classification
- Introducing transfer learning
- Understanding the VGG16 architecture
- Implementing VGG16
- Understanding the ResNet architecture
- Implementing ResNet18
- Implementing facial keypoint detection
- 2D and 3D facial keypoint detection
- Implementing age estimation and gender classification
- Introducing the torch_snippets library
- Summary
- Questions
- Chapter 6: Practical Aspects of Image Classification
- Generating CAMs
- Understanding the impact of data augmentation and batch normalization
- Coding up road sign detection
- Practical aspects to take care of during model implementation.
- Imbalanced data
- The size of the object within an image
- The difference between training and validation data
- The number of nodes in the flatten layer
- Image size
- OpenCV utilities
- Summary
- Questions
- Chapter 7: Basics of Object Detection
- Introducing object detection
- Creating a bounding-box ground truth for training
- Understanding region proposals
- Leveraging SelectiveSearch to generate region proposals
- Implementing SelectiveSearch to generate region proposals
- Understanding IoU
- Non-max suppression
- Mean average precision
- Training R-CNN-based custom object detectors
- Working details of R-CNN
- Implementing R-CNN for object detection on a custom dataset
- Downloading the dataset
- Preparing the dataset
- Fetching region proposals and the ground truth of offset
- Creating the training data
- R-CNN network architecture
- Predicting on a new image
- Training Fast R-CNN-based custom object detectors
- Working details of Fast R-CNN
- Implementing Fast R-CNN for object detection on a custom dataset
- Summary
- Questions
- Chapter 8: Advanced Object Detection
- Components of modern object detection algorithms
- Anchor boxes
- Region proposal network
- Classification and regression
- Training Faster R-CNN on a custom dataset
- Working details of YOLO
- Training YOLO on a custom dataset
- Installing Darknet
- Setting up the dataset format
- Configuring the architecture
- Training and testing the model
- Working details of SSD
- Components in SSD code
- SSD300
- MultiBoxLoss
- Training SSD on a custom dataset
- Summary
- Questions
- Chapter 9: Image Segmentation
- Exploring the U-Net architecture
- Performing upscaling
- Implementing semantic segmentation using U-Net
- Exploring the Mask R-CNN architecture
- RoI Align
- Mask head.
- Implementing instance segmentation using Mask R-CNN
- Predicting multiple instances of multiple classes
- Summary
- Questions
- Chapter 10: Applications of Object Detection and Segmentation
- Multi-object instance segmentation
- Fetching and preparing data
- Training the model for instance segmentation
- Making inferences on a new image
- Human pose detection
- Crowd counting
- Implementing crowd counting
- Image colorization
- 3D object detection with point clouds
- Theory
- Input encoding
- Output encoding
- Training the YOLO model for 3D object detection
- Data format
- Data inspection
- Training
- Testing
- Action recognition from video
- Identifying an action in a given video
- Training a recognizer on a custom dataset
- Summary
- Questions
- Section 3: Image Manipulation
- Chapter 11: Autoencoders and Image Manipulation
- Understanding autoencoders
- How autoencoders work
- Implementing vanilla autoencoders
- Implementing convolutional autoencoders
- Grouping similar images using t-SNE
- Understanding variational autoencoders
- The need for VAEs
- How VAEs work
- KL divergence
- Building a VAE
- Performing an adversarial attack on images
- Understanding neural style transfer
- How neural style transfer works
- Performing neural style transfer
- Understanding deepfakes
- How deepfakes work
- Generating a deepfake
- Summary
- Questions
- Chapter 12: Image Generation Using GANs
- Introducing GANs
- Using GANs to generate handwritten digits
- Using DCGANs to generate face images
- Implementing conditional GANs
- Summary
- Questions
- Chapter 13: Advanced GANs to Manipulate Images
- Leveraging the Pix2Pix GAN
- Leveraging CycleGAN
- How CycleGAN works
- Implementing CycleGAN
- Leveraging StyleGAN on custom images
- The evolution of StyleGAN
- Implementing StyleGAN
- Introducing SRGAN.
- Architecture
- Coding SRGAN
- Summary
- Questions
- Section 4: Combining Computer Vision with Other Techniques
- Chapter 14: Combining Computer Vision and Reinforcement Learning
- Learning the basics of reinforcement learning
- Calculating the state value
- Calculating the state-action value
- Implementing Q-learning
- Defining the Q-value
- Understanding the Gym environment
- Building a Q-table
- Leveraging exploration-exploitation
- Implementing deep Q-learning
- Understanding the CartPole environment
- Performing CartPole balancing
- Implementing deep Q-learning with the fixed targets model
- Understanding the use case
- Coding up an agent to play Pong
- Implementing an agent to perform autonomous driving
- Setting up the CARLA environment
- Installing the CARLA binaries
- Installing the CARLA Gym environment
- Training a self-driving agent
- Creating model.py
- Creating actor.py
- Training a DQN with fixed targets
- Summary
- Questions
- Chapter 15: Combining Computer Vision and NLP Techniques
- Introducing transformers
- Basics of transformers
- Encoder block
- Decoder block
- How ViTs work
- Implementing ViTs
- Transcribing handwritten images
- Handwriting transcription workflow
- Handwriting transcription in code
- Document layout analysis
- Understanding LayoutLM
- Implementing LayoutLMv3
- Visual question answering
- Introducing BLIP2
- Representation learning
- Generative learning
- Implementing BLIP2
- Summary
- Questions
- Chapter 16: Foundation Models in Computer Vision
- Introducing CLIP
- How CLIP works
- Building a CLIP model from scratch
- Leveraging OpenAI CLIP
- Introducing SAM
- How SAM works
- Implementing SAM
- How FastSAM works
- All-instance segmentation
- Prompt-guided selection
- Implementing FastSAM
- Introducing diffusion models
- How diffusion models work.
- Diffusion model architecture.