Synthetic Data for Machine Learning Revolutionize Your Approach to Machine Learning with This Comprehensive Conceptual Guide

Conquer data hurdles, supercharge your ML journey, and become a leader in your field with synthetic data generation techniques, best practices, and case studies Key Features Avoid common data issues by identifying and solving them using synthetic data-based solutions Master synthetic data generation...

Descripción completa

Detalles Bibliográficos
Autor principal: Kerim, Abdulrahman (-)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, Limited 2023.
Edición:1st ed
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009781238006719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Dedications
  • Contributors
  • Table of Contents
  • Part 1: Real Data Issues, Limitations, and Challenges
  • Chapter 1: Machine Learning and the Need for Data
  • Technical requirements
  • Artificial intelligence, machine learning, and deep learning
  • Artificial intelligence (AI)
  • Machine learning (ML)
  • Deep learning (DL)
  • Why are ML and DL so powerful?
  • Feature engineering
  • Transfer across tasks
  • Training ML models
  • Collecting and annotating data
  • Designing and training an ML model
  • Validating and testing an ML model
  • Iterations in the ML development process
  • Summary
  • Chapter 2: Annotating Real Data
  • Annotating data for ML
  • Learning from data
  • Training your ML model
  • Testing your ML model
  • Issues with the annotation process
  • The annotation process is expensive
  • The annotation process is error-prone
  • The annotation process is biased
  • Optical flow and depth estimation
  • Ground truth generation for computer vision
  • Optical flow estimation
  • Depth estimation
  • Summary
  • Chapter 3: Privacy Issues in Real Data
  • Why is privacy an issue in ML?
  • ML task
  • Dataset size
  • Regulations
  • What exactly is the privacy problem in ML?
  • Copyright and intellectual property infringement
  • Privacy and reproducibility of experiments
  • Privacy issues and bias
  • Privacy-preserving ML
  • Approaches for privacy-preserving datasets
  • Approaches for privacy-preserving ML
  • Real data challenges and issues
  • Summary
  • Part 2: An Overview of Synthetic Data for Machine Learning
  • Chapter 4: An Introduction to Synthetic Data
  • Technical requirements
  • What is synthetic data?
  • Synthetic and real data
  • Data-centric and architecture-centric approaches in ML
  • History of synthetic data
  • Random number generators
  • Generative Adversarial Networks (GANs).
  • Synthetic data for privacy issues
  • Synthetic data in computer vision
  • Synthetic data and ethical considerations
  • Synthetic data types
  • Data augmentation
  • Geometric transformations
  • Noise injection
  • Text replacement, deletion, and injection
  • Summary
  • Chapter 5: Synthetic Data as a Solution
  • The main advantages of synthetic data
  • Unbiased
  • Diverse
  • Controllable
  • Scalable
  • Automatic data labeling
  • Annotation quality
  • Low cost
  • Solving privacy issues with synthetic data
  • Using synthetic data to solve time and efficiency issues
  • Synthetic data as a revolutionary solution for rare data
  • Synthetic data generation methods
  • Summary
  • Part 3: Synthetic Data Generation Approaches
  • Chapter 6: Leveraging Simulators and Rendering Engines to Generate Synthetic Data
  • Introduction to simulators and rendering engines
  • Simulators
  • Rendering and game engines
  • History and evolution of simulators and game engines
  • Generating synthetic data
  • Identify the task and ground truth to generate
  • Create the 3D virtual world in the game engine
  • Setting up the virtual camera
  • Adding noise and anomalies
  • Setting up the labeling pipeline
  • Generating the training data with the ground truth
  • Challenges and limitations
  • Realism
  • Diversity
  • Complexity
  • Looking at two case studies
  • AirSim
  • CARLA
  • Summary
  • Chapter 7: Exploring Generative Adversarial Networks
  • Technical requirements
  • What is a GAN?
  • Training a GAN
  • GAN training algorithm
  • Training loss
  • Challenges
  • Utilizing GANs to generate synthetic data
  • Hands-on GANs in practice
  • Variations of GANs
  • Conditional GAN (cGAN)
  • CycleGAN
  • Conditional Tabular GAN (CTGAN)
  • Wasserstein GAN (WGAN) and Wasserstein GAN with Gradient Penalty (WGAN-GP)
  • f-GAN
  • DragGAN
  • Summary
  • Chapter 8: Video Games as a Source of Synthetic Data.
  • The impact of the video game industry
  • Photorealism and the real-synthetic domain shift
  • Time, effort, and cost
  • Generating synthetic data using video games
  • Utilizing games for general data collection
  • Utilizing games for social studies
  • Utilizing simulation games for data generation
  • Challenges and limitations
  • Controllability
  • Game genres and limitations on synthetic data generation
  • Realism
  • Ethical issues
  • Intellectual property
  • Summary
  • Chapter 9: Exploring Diffusion Models for Synthetic Data
  • Technical requirements
  • An introduction to diffusion models
  • The training process of DMs
  • Applications of DMs
  • Diffusion models - the pros and cons
  • The pros of using DMs
  • The cons of using DMS
  • Hands-on diffusion models in practice
  • Context
  • Dataset
  • ML model
  • Training
  • Testing
  • Diffusion models - ethical issues
  • Copyright
  • Bias
  • Inappropriate content
  • Responsibility
  • Privacy
  • Fraud and identity theft
  • Summary
  • Part 4: Case Studies and Best Practices
  • Chapter 10: Case Study 1 - Computer Vision
  • Transforming industries - the power of computer vision
  • The four waves of the industrial revolution
  • Industry 4.0 and computer vision
  • Synthetic data and computer vision - examples from industry
  • Neurolabs using synthetic data in retail
  • Microsoft using synthetic data alone for face analysis
  • Synthesis AI using synthetic data for virtual try-on
  • Summary
  • Chapter 11: Case Study 2 - Natural Language Processing
  • A brief introduction to NLP
  • Applications of NLP in practice
  • The need for large-scale training datasets in NLP
  • Human language complexity
  • Contextual dependence
  • Generalization
  • Hands-on practical example with ChatGPT
  • Synthetic data as a solution for NLP problems
  • SYSTRAN Soft's use of synthetic data
  • Telefónica's use of synthetic data.
  • Clinical text mining utilizing synthetic data
  • The Alexa virtual assistant model
  • Summary
  • Chapter 12: Case Study 3 - Predictive Analytics
  • What is predictive analytics?
  • Applications of predictive analytics
  • Predictive analytics issues with real data
  • Partial and scarce training data
  • Bias
  • Cost
  • Case studies of utilizing synthetic data for predictive analytics
  • Provinzial and synthetic data
  • Healthcare benefits from synthetic data in predictive analytics
  • Amazon fraud transaction prediction using synthetic data
  • Summary
  • Chapter 13: Best Practices for Applying Synthetic Data
  • Unveiling the challenges of generating and utilizing synthetic data
  • Domain gap
  • Data representation
  • Privacy, security, and validation
  • Trust and credibility
  • Domain-specific issues limiting the usability of synthetic data
  • Healthcare
  • Finance
  • Autonomous cars
  • Best practices for the effective utilization of synthetic data
  • Summary
  • Part 5: Current Challenges and Future Perspectives
  • Chapter 14: Synthetic-to-Real Domain Adaptation
  • The domain gap problem in ML
  • Sensitivity to sensors' variations
  • Discrepancy in class and feature distributions
  • Concept drift
  • Approaches for synthetic-to-real domain adaptation
  • Domain randomization
  • Adversarial domain adaptation
  • Feature-based domain adaptation
  • Synthetic-to-real domain adaptation - issues and challenges
  • Unseen domain
  • Limited real data
  • Computational complexity
  • Synthetic data limitations
  • Multimodal data complexity
  • Summary
  • Chapter 15: Diversity Issues in Synthetic Data
  • The need for diverse data in ML
  • Transferability
  • Better problem modeling
  • Security
  • Process of debugging
  • Robustness to anomalies
  • Creativity
  • Inclusivity
  • Generating diverse synthetic datasets
  • Latent space variations.
  • Ensemble synthetic data generation
  • Diversity regularization
  • Incorporating external knowledge
  • Progressive training
  • Procedural content generation with game engines
  • Diversity issues in the synthetic data realm
  • Balancing diversity and realism
  • Privacy and confidentiality concerns
  • Validation and evaluation challenges
  • Summary
  • Chapter 16: Photorealism in Computer Vision
  • Synthetic data photorealism for computer vision
  • Feature extraction
  • Domain gap
  • Robustness
  • Benchmarking performance
  • Photorealism approaches
  • Physically Based Rendering (PBR)
  • Neural style transfer
  • Photorealism evaluation metrics
  • Structural Similarity Index Measure (SSIM)
  • Learned Perceptual Image Patch Similarity (LPIPS)
  • Expert evaluation
  • Challenges and limitations of photorealistic synthetic data
  • Creating hyper-realistic scenes
  • Resources versus photorealism trade-off
  • Summary
  • Chapter 17: Conclusion
  • Real data and its problems
  • Synthetic data as a solution
  • Real-world case studies
  • Challenges and limitations
  • Future perspectives
  • Summary
  • Index
  • Other Books You May Enjoy.