Vector Search for Practitioners with Elastic A Toolkit for Building NLP Solutions for Search, Observability, and Security Using Vector Search
Optimize your search capabilities in Elastic by operationalizing and fine-tuning vector search and enhance your search relevance while improving overall search performance Key Features Install, configure, and optimize the ChatGPT-Elasticsearch plugin with a focus on vector data Learn how to load tra...
Otros Autores: | , , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing Ltd
[2023]
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009785405406719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credit
- Dedication
- Foreword
- Contributors
- Table of Contents
- Preface
- Part 1: Fundamentals of Vector Search
- Chapter 1: Introduction to Vectors and Embeddings
- Exploring the roles of supervised and unsupervised learning in vector search
- What's an embedding/vector?
- What challenges are vectors solving?
- The developer experience
- Hugging Face
- The market landscape and how it has accelerated the developer experience
- Use cases and domains of application
- AI-based search
- Named Entity Recognition (NER)
- Sentiment analysis
- Text classification
- Question-answering (QA)
- Text summarization
- How is Elastic playing a role in this space?
- A primer on observability and cybersecurity
- Summary
- Chapter 2: Getting Started with Vector Search in Elastic
- Search experience in Elastic before vectors
- Data type and its impact on relevancy
- The relevancy model
- Evolution of search experience
- The limits of keyword-based search
- Vector representation
- The new vector data type and the vector search query API
- Sparse and dense vectors
- An Elastic Cloud quick start
- Dense vector mapping
- Brute-force KNN search
- KNN search
- Summary
- Part 2: Advanced Applications and Performance Optimization
- Chapter 3: Model Management and Vector Considerations in Elastic
- Technical requirements
- Hugging Face
- Model Hub
- Datasets
- Spaces
- Eland
- Loading a Sentence Transformer from Hugging Face into Elasticsearch
- Configuring Elasticsearch authentication
- Loading a model from the Hugging Face Hub
- Downloading the model
- Loading the model into Elasticsearch
- Starting the model
- Deploying the model
- Generating a vector for a query
- Generating vectors in Elasticsearch
- Planning for cluster capacity and resources.
- CPU and memory requirements
- Disk requirements
- Analyze Index Disk Usage API
- ML node capacity
- Storage efficiency strategies
- Reducing dimensionality
- Quantization
- Excluding dense_vector from _source
- Summary
- Chapter 4: Performance Tuning - Working with Data
- Deploying an NLP model
- Loading a model into Elasticsearch
- Model deployment configurations
- Load testing
- Rally
- RAM estimation
- Troubleshooting slowdown
- Summary
- Part 3: Specialized Use Cases
- Chapter 5: Image Search
- Overview of image search
- The evolution of image search
- The mechanism behind image search
- The role of vector similarity search
- Image search in practice
- Vector search with images
- Image vectorization
- Indexing image vectors in Elasticsearch
- k-Nearest Neighbor (kNN) search
- Challenges and limitations with image search
- Multi-modal models for vector search
- Introduction and rationale
- Understanding the concept of vector space in multi-modal models
- Introduction to the OpenAI clip-ViT-B-32-multilingual-v1 model
- Implementing vector search for diverse media types
- Summary
- Chapter 6: Redacting Personal Identifiable Information Using Elasticsearch
- Overview of PII and redaction
- Types of data that may contain PII
- Risks of storing PII in logs
- How PII is leaked or lost
- Redacting PII with NER models and regex patterns
- NER models
- Regex patterns
- Combining NER models and regex (or grok) patterns for PII redaction
- PII redaction pipeline in Elasticsearch
- Generating synthetic PII
- Installing the default pipeline
- Expected results
- Expanding and customizing options for the PII redaction pipeline in Elasticsearch
- Customizing the default PII example
- Cloning the pipeline to create different versions for different data streams
- Fine-tuning NER models for particular datasets.
- Logic for contextual awareness
- Summary
- Chapter 7: Next Generation of Observability Powered by Vectors
- Introduction to observability and its importance in modern software systems
- Observability: main pillars
- Log analytics and its role in observability
- A new approach-applying vectors and embeddings to log analytics
- Approach 1-training or fine-tuning an existing model for logs
- Approach 2-generating human-understandable descriptions and vectorizing these descriptions
- Log vectorization
- Synthetic log
- Expanding logs at write with OpenAI
- Semantic search on our logs
- Building a query using log vectorization
- Loading a model
- Ingest pipeline
- Semantic search
- Summary
- Chapter 8: The Power of Vectors and Embedding in Bolstering Cybersecurity
- Technical requirements
- Understanding the importance of email phishing detection
- What is phishing?
- Different types of phishing attacks
- Statistics on the frequency of phishing attacks
- Challenges in detecting phishing emails
- Role of automated detection
- Augmenting existing techniques with natural language processing
- Introducing ELSER
- The role of ELSER in GenAI
- Introduction to the Enron email dataset (ham or spam)
- Seeing ELSER in action
- Hardware consideration
- Downloading the ELSER model in Elastic
- Setting up the index and ingestion pipeline
- Semantic search with ELSER
- Limitations of ELSER
- Summary
- Part 4: Innovative Integrations and Future Directions
- Chapter 9: Retrieval Augmented Generation with Elastic
- Preparing for RAG-enhanced search with ELSER and RRF
- Semantic search with ELSER
- A recap of essential considerations for RAG
- Integrating ELSER with RRF
- Language models and RAG
- In-depth case study-implementing a RAG-enhanced CookBot
- Dataset overview - an introduction to the Allrecipes.com dataset.
- Preparing data for RAG-enhanced search
- Building the retriever-RRF with ELSER
- Leveraging the retriever and implementing the generator
- Summary
- Chapter 10: Building an Elastic Plugin for ChatGPT
- Contextual foundations
- The paradigm of dynamic context
- Dynamic Context Layer plugin vision-architecture and flow
- Building the DCL
- Fetching the latest information from Elastic documentation
- Elevating data with Embedchain
- Integrating with ChatGPT-creating a real-time conversationalist
- Deployment
- Summary
- Index
- Other Books You May Enjoy.