Synthetic Data for Machine Learning Revolutionize Your Approach to Machine Learning with This Comprehensive Conceptual Guide
Conquer data hurdles, supercharge your ML journey, and become a leader in your field with synthetic data generation techniques, best practices, and case studies Key Features Avoid common data issues by identifying and solving them using synthetic data-based solutions Master synthetic data generation...
Autor principal: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited
2023.
|
Edición: | 1st ed |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009781238006719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Dedications
- Contributors
- Table of Contents
- Part 1: Real Data Issues, Limitations, and Challenges
- Chapter 1: Machine Learning and the Need for Data
- Technical requirements
- Artificial intelligence, machine learning, and deep learning
- Artificial intelligence (AI)
- Machine learning (ML)
- Deep learning (DL)
- Why are ML and DL so powerful?
- Feature engineering
- Transfer across tasks
- Training ML models
- Collecting and annotating data
- Designing and training an ML model
- Validating and testing an ML model
- Iterations in the ML development process
- Summary
- Chapter 2: Annotating Real Data
- Annotating data for ML
- Learning from data
- Training your ML model
- Testing your ML model
- Issues with the annotation process
- The annotation process is expensive
- The annotation process is error-prone
- The annotation process is biased
- Optical flow and depth estimation
- Ground truth generation for computer vision
- Optical flow estimation
- Depth estimation
- Summary
- Chapter 3: Privacy Issues in Real Data
- Why is privacy an issue in ML?
- ML task
- Dataset size
- Regulations
- What exactly is the privacy problem in ML?
- Copyright and intellectual property infringement
- Privacy and reproducibility of experiments
- Privacy issues and bias
- Privacy-preserving ML
- Approaches for privacy-preserving datasets
- Approaches for privacy-preserving ML
- Real data challenges and issues
- Summary
- Part 2: An Overview of Synthetic Data for Machine Learning
- Chapter 4: An Introduction to Synthetic Data
- Technical requirements
- What is synthetic data?
- Synthetic and real data
- Data-centric and architecture-centric approaches in ML
- History of synthetic data
- Random number generators
- Generative Adversarial Networks (GANs).
- Synthetic data for privacy issues
- Synthetic data in computer vision
- Synthetic data and ethical considerations
- Synthetic data types
- Data augmentation
- Geometric transformations
- Noise injection
- Text replacement, deletion, and injection
- Summary
- Chapter 5: Synthetic Data as a Solution
- The main advantages of synthetic data
- Unbiased
- Diverse
- Controllable
- Scalable
- Automatic data labeling
- Annotation quality
- Low cost
- Solving privacy issues with synthetic data
- Using synthetic data to solve time and efficiency issues
- Synthetic data as a revolutionary solution for rare data
- Synthetic data generation methods
- Summary
- Part 3: Synthetic Data Generation Approaches
- Chapter 6: Leveraging Simulators and Rendering Engines to Generate Synthetic Data
- Introduction to simulators and rendering engines
- Simulators
- Rendering and game engines
- History and evolution of simulators and game engines
- Generating synthetic data
- Identify the task and ground truth to generate
- Create the 3D virtual world in the game engine
- Setting up the virtual camera
- Adding noise and anomalies
- Setting up the labeling pipeline
- Generating the training data with the ground truth
- Challenges and limitations
- Realism
- Diversity
- Complexity
- Looking at two case studies
- AirSim
- CARLA
- Summary
- Chapter 7: Exploring Generative Adversarial Networks
- Technical requirements
- What is a GAN?
- Training a GAN
- GAN training algorithm
- Training loss
- Challenges
- Utilizing GANs to generate synthetic data
- Hands-on GANs in practice
- Variations of GANs
- Conditional GAN (cGAN)
- CycleGAN
- Conditional Tabular GAN (CTGAN)
- Wasserstein GAN (WGAN) and Wasserstein GAN with Gradient Penalty (WGAN-GP)
- f-GAN
- DragGAN
- Summary
- Chapter 8: Video Games as a Source of Synthetic Data.
- The impact of the video game industry
- Photorealism and the real-synthetic domain shift
- Time, effort, and cost
- Generating synthetic data using video games
- Utilizing games for general data collection
- Utilizing games for social studies
- Utilizing simulation games for data generation
- Challenges and limitations
- Controllability
- Game genres and limitations on synthetic data generation
- Realism
- Ethical issues
- Intellectual property
- Summary
- Chapter 9: Exploring Diffusion Models for Synthetic Data
- Technical requirements
- An introduction to diffusion models
- The training process of DMs
- Applications of DMs
- Diffusion models - the pros and cons
- The pros of using DMs
- The cons of using DMS
- Hands-on diffusion models in practice
- Context
- Dataset
- ML model
- Training
- Testing
- Diffusion models - ethical issues
- Copyright
- Bias
- Inappropriate content
- Responsibility
- Privacy
- Fraud and identity theft
- Summary
- Part 4: Case Studies and Best Practices
- Chapter 10: Case Study 1 - Computer Vision
- Transforming industries - the power of computer vision
- The four waves of the industrial revolution
- Industry 4.0 and computer vision
- Synthetic data and computer vision - examples from industry
- Neurolabs using synthetic data in retail
- Microsoft using synthetic data alone for face analysis
- Synthesis AI using synthetic data for virtual try-on
- Summary
- Chapter 11: Case Study 2 - Natural Language Processing
- A brief introduction to NLP
- Applications of NLP in practice
- The need for large-scale training datasets in NLP
- Human language complexity
- Contextual dependence
- Generalization
- Hands-on practical example with ChatGPT
- Synthetic data as a solution for NLP problems
- SYSTRAN Soft's use of synthetic data
- Telefónica's use of synthetic data.
- Clinical text mining utilizing synthetic data
- The Alexa virtual assistant model
- Summary
- Chapter 12: Case Study 3 - Predictive Analytics
- What is predictive analytics?
- Applications of predictive analytics
- Predictive analytics issues with real data
- Partial and scarce training data
- Bias
- Cost
- Case studies of utilizing synthetic data for predictive analytics
- Provinzial and synthetic data
- Healthcare benefits from synthetic data in predictive analytics
- Amazon fraud transaction prediction using synthetic data
- Summary
- Chapter 13: Best Practices for Applying Synthetic Data
- Unveiling the challenges of generating and utilizing synthetic data
- Domain gap
- Data representation
- Privacy, security, and validation
- Trust and credibility
- Domain-specific issues limiting the usability of synthetic data
- Healthcare
- Finance
- Autonomous cars
- Best practices for the effective utilization of synthetic data
- Summary
- Part 5: Current Challenges and Future Perspectives
- Chapter 14: Synthetic-to-Real Domain Adaptation
- The domain gap problem in ML
- Sensitivity to sensors' variations
- Discrepancy in class and feature distributions
- Concept drift
- Approaches for synthetic-to-real domain adaptation
- Domain randomization
- Adversarial domain adaptation
- Feature-based domain adaptation
- Synthetic-to-real domain adaptation - issues and challenges
- Unseen domain
- Limited real data
- Computational complexity
- Synthetic data limitations
- Multimodal data complexity
- Summary
- Chapter 15: Diversity Issues in Synthetic Data
- The need for diverse data in ML
- Transferability
- Better problem modeling
- Security
- Process of debugging
- Robustness to anomalies
- Creativity
- Inclusivity
- Generating diverse synthetic datasets
- Latent space variations.
- Ensemble synthetic data generation
- Diversity regularization
- Incorporating external knowledge
- Progressive training
- Procedural content generation with game engines
- Diversity issues in the synthetic data realm
- Balancing diversity and realism
- Privacy and confidentiality concerns
- Validation and evaluation challenges
- Summary
- Chapter 16: Photorealism in Computer Vision
- Synthetic data photorealism for computer vision
- Feature extraction
- Domain gap
- Robustness
- Benchmarking performance
- Photorealism approaches
- Physically Based Rendering (PBR)
- Neural style transfer
- Photorealism evaluation metrics
- Structural Similarity Index Measure (SSIM)
- Learned Perceptual Image Patch Similarity (LPIPS)
- Expert evaluation
- Challenges and limitations of photorealistic synthetic data
- Creating hyper-realistic scenes
- Resources versus photorealism trade-off
- Summary
- Chapter 17: Conclusion
- Real data and its problems
- Synthetic data as a solution
- Real-world case studies
- Challenges and limitations
- Future perspectives
- Summary
- Index
- Other Books You May Enjoy.