Modern Computer Vision with Pytorch A Practical Roadmap from Deep Learning Fundamentals to Advanced Applications and Generative AI

Whether you are a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and how to implement state-of-the-art architectures for real-world tasks. The second edition of Modern Computer Vision with PyTorch...

Descripción completa

Detalles Bibliográficos
Autor principal: Ayyadevara, V. Kishore (-)
Otros Autores: Reddy, Yeshwanth
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, Limited 2023.
Edición:2nd ed
Colección:Expert insight.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009828023106719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Contributors
  • Table of Contents
  • Preface
  • Section 1: Fundamentals of Deep Learning for Computer Vision
  • Chapter 1: Artificial Neural Network Fundamentals
  • Comparing AI and traditional machine learning
  • Learning about the ANN building blocks
  • Implementing feedforward propagation
  • Calculating the hidden layer unit values
  • Applying the activation function
  • Calculating the output layer values
  • Calculating loss values
  • Calculating loss during continuous variable prediction
  • Calculating loss during categorical variable prediction
  • Feedforward propagation in code
  • Activation functions in code
  • Loss functions in code
  • Implementing backpropagation
  • Gradient descent in code
  • Implementing backpropagation using the chain rule
  • Putting feedforward propagation and backpropagation together
  • Understanding the impact of the learning rate
  • Learning rate of 0.01
  • Learning rate of 0.1
  • Learning rate of 1
  • Summarizing the training process of a neural network
  • Summary
  • Questions
  • Chapter 2: PyTorch Fundamentals
  • Installing PyTorch
  • PyTorch tensors
  • Initializing a tensor
  • Operations on tensors
  • Auto gradients of tensor objects
  • Advantages of PyTorch's tensors over NumPy's ndarrays
  • Building a neural network using PyTorch
  • Dataset, DataLoader, and batch size
  • Predicting on new data points
  • Implementing a custom loss function
  • Fetching the values of intermediate layers
  • Using a sequential method to build a neural network
  • Saving and loading a PyTorch model
  • Using state_dict
  • Saving
  • Loading
  • Summary
  • Questions
  • Chapter 3: Building a Deep Neural Network with PyTorch
  • Representing an image
  • Converting images into structured arrays and scalars
  • Creating a structured array for colored images
  • Why leverage neural networks for image analysis?.
  • Preparing our data for image classification
  • Training a neural network
  • Scaling a dataset to improve model accuracy
  • Understanding the impact of varying the batch size
  • Batch size of 32
  • Batch size of 10,000
  • Understanding the impact of varying the loss optimizer
  • Building a deeper neural network
  • Understanding the impact of batch normalization
  • Very small input values without batch normalization
  • Very small input values with batch normalization
  • The concept of overfitting
  • Impact of adding dropout
  • Impact of regularization
  • L1 regularization
  • L2 regularization
  • Summary
  • Questions
  • Section 2: Object Classification and Detection
  • Chapter 4: Introducing Convolutional Neural Networks
  • The problem with traditional deep neural networks
  • Building blocks of a CNN
  • Convolution
  • Filters
  • Strides and padding
  • Strides
  • Padding
  • Pooling
  • Putting them all together
  • How convolution and pooling help in image translation
  • Implementing a CNN
  • Classifying images using deep CNNs
  • Visualizing the outcome of feature learning
  • Building a CNN for classifying real-world images
  • Impact on the number of images used for training
  • Summary
  • Questions
  • Chapter 5: Transfer Learning for Image Classification
  • Introducing transfer learning
  • Understanding the VGG16 architecture
  • Implementing VGG16
  • Understanding the ResNet architecture
  • Implementing ResNet18
  • Implementing facial keypoint detection
  • 2D and 3D facial keypoint detection
  • Implementing age estimation and gender classification
  • Introducing the torch_snippets library
  • Summary
  • Questions
  • Chapter 6: Practical Aspects of Image Classification
  • Generating CAMs
  • Understanding the impact of data augmentation and batch normalization
  • Coding up road sign detection
  • Practical aspects to take care of during model implementation.
  • Imbalanced data
  • The size of the object within an image
  • The difference between training and validation data
  • The number of nodes in the flatten layer
  • Image size
  • OpenCV utilities
  • Summary
  • Questions
  • Chapter 7: Basics of Object Detection
  • Introducing object detection
  • Creating a bounding-box ground truth for training
  • Understanding region proposals
  • Leveraging SelectiveSearch to generate region proposals
  • Implementing SelectiveSearch to generate region proposals
  • Understanding IoU
  • Non-max suppression
  • Mean average precision
  • Training R-CNN-based custom object detectors
  • Working details of R-CNN
  • Implementing R-CNN for object detection on a custom dataset
  • Downloading the dataset
  • Preparing the dataset
  • Fetching region proposals and the ground truth of offset
  • Creating the training data
  • R-CNN network architecture
  • Predicting on a new image
  • Training Fast R-CNN-based custom object detectors
  • Working details of Fast R-CNN
  • Implementing Fast R-CNN for object detection on a custom dataset
  • Summary
  • Questions
  • Chapter 8: Advanced Object Detection
  • Components of modern object detection algorithms
  • Anchor boxes
  • Region proposal network
  • Classification and regression
  • Training Faster R-CNN on a custom dataset
  • Working details of YOLO
  • Training YOLO on a custom dataset
  • Installing Darknet
  • Setting up the dataset format
  • Configuring the architecture
  • Training and testing the model
  • Working details of SSD
  • Components in SSD code
  • SSD300
  • MultiBoxLoss
  • Training SSD on a custom dataset
  • Summary
  • Questions
  • Chapter 9: Image Segmentation
  • Exploring the U-Net architecture
  • Performing upscaling
  • Implementing semantic segmentation using U-Net
  • Exploring the Mask R-CNN architecture
  • RoI Align
  • Mask head.
  • Implementing instance segmentation using Mask R-CNN
  • Predicting multiple instances of multiple classes
  • Summary
  • Questions
  • Chapter 10: Applications of Object Detection and Segmentation
  • Multi-object instance segmentation
  • Fetching and preparing data
  • Training the model for instance segmentation
  • Making inferences on a new image
  • Human pose detection
  • Crowd counting
  • Implementing crowd counting
  • Image colorization
  • 3D object detection with point clouds
  • Theory
  • Input encoding
  • Output encoding
  • Training the YOLO model for 3D object detection
  • Data format
  • Data inspection
  • Training
  • Testing
  • Action recognition from video
  • Identifying an action in a given video
  • Training a recognizer on a custom dataset
  • Summary
  • Questions
  • Section 3: Image Manipulation
  • Chapter 11: Autoencoders and Image Manipulation
  • Understanding autoencoders
  • How autoencoders work
  • Implementing vanilla autoencoders
  • Implementing convolutional autoencoders
  • Grouping similar images using t-SNE
  • Understanding variational autoencoders
  • The need for VAEs
  • How VAEs work
  • KL divergence
  • Building a VAE
  • Performing an adversarial attack on images
  • Understanding neural style transfer
  • How neural style transfer works
  • Performing neural style transfer
  • Understanding deepfakes
  • How deepfakes work
  • Generating a deepfake
  • Summary
  • Questions
  • Chapter 12: Image Generation Using GANs
  • Introducing GANs
  • Using GANs to generate handwritten digits
  • Using DCGANs to generate face images
  • Implementing conditional GANs
  • Summary
  • Questions
  • Chapter 13: Advanced GANs to Manipulate Images
  • Leveraging the Pix2Pix GAN
  • Leveraging CycleGAN
  • How CycleGAN works
  • Implementing CycleGAN
  • Leveraging StyleGAN on custom images
  • The evolution of StyleGAN
  • Implementing StyleGAN
  • Introducing SRGAN.
  • Architecture
  • Coding SRGAN
  • Summary
  • Questions
  • Section 4: Combining Computer Vision with Other Techniques
  • Chapter 14: Combining Computer Vision and Reinforcement Learning
  • Learning the basics of reinforcement learning
  • Calculating the state value
  • Calculating the state-action value
  • Implementing Q-learning
  • Defining the Q-value
  • Understanding the Gym environment
  • Building a Q-table
  • Leveraging exploration-exploitation
  • Implementing deep Q-learning
  • Understanding the CartPole environment
  • Performing CartPole balancing
  • Implementing deep Q-learning with the fixed targets model
  • Understanding the use case
  • Coding up an agent to play Pong
  • Implementing an agent to perform autonomous driving
  • Setting up the CARLA environment
  • Installing the CARLA binaries
  • Installing the CARLA Gym environment
  • Training a self-driving agent
  • Creating model.py
  • Creating actor.py
  • Training a DQN with fixed targets
  • Summary
  • Questions
  • Chapter 15: Combining Computer Vision and NLP Techniques
  • Introducing transformers
  • Basics of transformers
  • Encoder block
  • Decoder block
  • How ViTs work
  • Implementing ViTs
  • Transcribing handwritten images
  • Handwriting transcription workflow
  • Handwriting transcription in code
  • Document layout analysis
  • Understanding LayoutLM
  • Implementing LayoutLMv3
  • Visual question answering
  • Introducing BLIP2
  • Representation learning
  • Generative learning
  • Implementing BLIP2
  • Summary
  • Questions
  • Chapter 16: Foundation Models in Computer Vision
  • Introducing CLIP
  • How CLIP works
  • Building a CLIP model from scratch
  • Leveraging OpenAI CLIP
  • Introducing SAM
  • How SAM works
  • Implementing SAM
  • How FastSAM works
  • All-instance segmentation
  • Prompt-guided selection
  • Implementing FastSAM
  • Introducing diffusion models
  • How diffusion models work.
  • Diffusion model architecture.