Vector Search for Practitioners with Elastic A Toolkit for Building NLP Solutions for Search, Observability, and Security Using Vector Search

Optimize your search capabilities in Elastic by operationalizing and fine-tuning vector search and enhance your search relevance while improving overall search performance Key Features Install, configure, and optimize the ChatGPT-Elasticsearch plugin with a focus on vector data Learn how to load tra...

Descripción completa

Detalles Bibliográficos
Otros Autores: Azarmi, Bahaaldine, author (author), Vestal, Jeff, author, Banon, Shay, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing Ltd [2023]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009785405406719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credit
  • Dedication
  • Foreword
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Fundamentals of Vector Search
  • Chapter 1: Introduction to Vectors and Embeddings
  • Exploring the roles of supervised and unsupervised learning in vector search
  • What's an embedding/vector?
  • What challenges are vectors solving?
  • The developer experience
  • Hugging Face
  • The market landscape and how it has accelerated the developer experience
  • Use cases and domains of application
  • AI-based search
  • Named Entity Recognition (NER)
  • Sentiment analysis
  • Text classification
  • Question-answering (QA)
  • Text summarization
  • How is Elastic playing a role in this space?
  • A primer on observability and cybersecurity
  • Summary
  • Chapter 2: Getting Started with Vector Search in Elastic
  • Search experience in Elastic before vectors
  • Data type and its impact on relevancy
  • The relevancy model
  • Evolution of search experience
  • The limits of keyword-based search
  • Vector representation
  • The new vector data type and the vector search query API
  • Sparse and dense vectors
  • An Elastic Cloud quick start
  • Dense vector mapping
  • Brute-force KNN search
  • KNN search
  • Summary
  • Part 2: Advanced Applications and Performance Optimization
  • Chapter 3: Model Management and Vector Considerations in Elastic
  • Technical requirements
  • Hugging Face
  • Model Hub
  • Datasets
  • Spaces
  • Eland
  • Loading a Sentence Transformer from Hugging Face into Elasticsearch
  • Configuring Elasticsearch authentication
  • Loading a model from the Hugging Face Hub
  • Downloading the model
  • Loading the model into Elasticsearch
  • Starting the model
  • Deploying the model
  • Generating a vector for a query
  • Generating vectors in Elasticsearch
  • Planning for cluster capacity and resources.
  • CPU and memory requirements
  • Disk requirements
  • Analyze Index Disk Usage API
  • ML node capacity
  • Storage efficiency strategies
  • Reducing dimensionality
  • Quantization
  • Excluding dense_vector from _source
  • Summary
  • Chapter 4: Performance Tuning - Working with Data
  • Deploying an NLP model
  • Loading a model into Elasticsearch
  • Model deployment configurations
  • Load testing
  • Rally
  • RAM estimation
  • Troubleshooting slowdown
  • Summary
  • Part 3: Specialized Use Cases
  • Chapter 5: Image Search
  • Overview of image search
  • The evolution of image search
  • The mechanism behind image search
  • The role of vector similarity search
  • Image search in practice
  • Vector search with images
  • Image vectorization
  • Indexing image vectors in Elasticsearch
  • k-Nearest Neighbor (kNN) search
  • Challenges and limitations with image search
  • Multi-modal models for vector search
  • Introduction and rationale
  • Understanding the concept of vector space in multi-modal models
  • Introduction to the OpenAI clip-ViT-B-32-multilingual-v1 model
  • Implementing vector search for diverse media types
  • Summary
  • Chapter 6: Redacting Personal Identifiable Information Using Elasticsearch
  • Overview of PII and redaction
  • Types of data that may contain PII
  • Risks of storing PII in logs
  • How PII is leaked or lost
  • Redacting PII with NER models and regex patterns
  • NER models
  • Regex patterns
  • Combining NER models and regex (or grok) patterns for PII redaction
  • PII redaction pipeline in Elasticsearch
  • Generating synthetic PII
  • Installing the default pipeline
  • Expected results
  • Expanding and customizing options for the PII redaction pipeline in Elasticsearch
  • Customizing the default PII example
  • Cloning the pipeline to create different versions for different data streams
  • Fine-tuning NER models for particular datasets.
  • Logic for contextual awareness
  • Summary
  • Chapter 7: Next Generation of Observability Powered by Vectors
  • Introduction to observability and its importance in modern software systems
  • Observability: main pillars
  • Log analytics and its role in observability
  • A new approach-applying vectors and embeddings to log analytics
  • Approach 1-training or fine-tuning an existing model for logs
  • Approach 2-generating human-understandable descriptions and vectorizing these descriptions
  • Log vectorization
  • Synthetic log
  • Expanding logs at write with OpenAI
  • Semantic search on our logs
  • Building a query using log vectorization
  • Loading a model
  • Ingest pipeline
  • Semantic search
  • Summary
  • Chapter 8: The Power of Vectors and Embedding in Bolstering Cybersecurity
  • Technical requirements
  • Understanding the importance of email phishing detection
  • What is phishing?
  • Different types of phishing attacks
  • Statistics on the frequency of phishing attacks
  • Challenges in detecting phishing emails
  • Role of automated detection
  • Augmenting existing techniques with natural language processing
  • Introducing ELSER
  • The role of ELSER in GenAI
  • Introduction to the Enron email dataset (ham or spam)
  • Seeing ELSER in action
  • Hardware consideration
  • Downloading the ELSER model in Elastic
  • Setting up the index and ingestion pipeline
  • Semantic search with ELSER
  • Limitations of ELSER
  • Summary
  • Part 4: Innovative Integrations and Future Directions
  • Chapter 9: Retrieval Augmented Generation with Elastic
  • Preparing for RAG-enhanced search with ELSER and RRF
  • Semantic search with ELSER
  • A recap of essential considerations for RAG
  • Integrating ELSER with RRF
  • Language models and RAG
  • In-depth case study-implementing a RAG-enhanced CookBot
  • Dataset overview - an introduction to the Allrecipes.com dataset.
  • Preparing data for RAG-enhanced search
  • Building the retriever-RRF with ELSER
  • Leveraging the retriever and implementing the generator
  • Summary
  • Chapter 10: Building an Elastic Plugin for ChatGPT
  • Contextual foundations
  • The paradigm of dynamic context
  • Dynamic Context Layer plugin vision-architecture and flow
  • Building the DCL
  • Fetching the latest information from Elastic documentation
  • Elevating data with Embedchain
  • Integrating with ChatGPT-creating a real-time conversationalist
  • Deployment
  • Summary
  • Index
  • Other Books You May Enjoy.