Vector Search for Practitioners with Elastic A Toolkit for Building NLP Solutions for Search, Observability, and Security Using Vector Search

Optimize your search capabilities in Elastic by operationalizing and fine-tuning vector search and enhance your search relevance while improving overall search performance Key Features Install, configure, and optimize the ChatGPT-Elasticsearch plugin with a focus on vector data Learn how to load tra...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Azarmi, Bahaaldine, author (author), Vestal, Jeff, author, Banon, Shay, author
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham, England : Packt Publishing Ltd [2023]
Edición:	First edition
Materias:	Natural language processing (Computer science)
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009785405406719

Tabla de Contenidos:

Cover
Title Page
Copyright and Credit
Dedication
Foreword
Contributors
Table of Contents
Preface
Part 1: Fundamentals of Vector Search
Chapter 1: Introduction to Vectors and Embeddings
Exploring the roles of supervised and unsupervised learning in vector search
What's an embedding/vector?
What challenges are vectors solving?
The developer experience
Hugging Face
The market landscape and how it has accelerated the developer experience
Use cases and domains of application
AI-based search
Named Entity Recognition (NER)
Sentiment analysis
Text classification
Question-answering (QA)
Text summarization
How is Elastic playing a role in this space?
A primer on observability and cybersecurity
Summary
Chapter 2: Getting Started with Vector Search in Elastic
Search experience in Elastic before vectors
Data type and its impact on relevancy
The relevancy model
Evolution of search experience
The limits of keyword-based search
Vector representation
The new vector data type and the vector search query API
Sparse and dense vectors
An Elastic Cloud quick start
Dense vector mapping
Brute-force KNN search
KNN search
Summary
Part 2: Advanced Applications and Performance Optimization
Chapter 3: Model Management and Vector Considerations in Elastic
Technical requirements
Hugging Face
Model Hub
Datasets
Spaces
Eland
Loading a Sentence Transformer from Hugging Face into Elasticsearch
Configuring Elasticsearch authentication
Loading a model from the Hugging Face Hub
Downloading the model
Loading the model into Elasticsearch
Starting the model
Deploying the model
Generating a vector for a query
Generating vectors in Elasticsearch
Planning for cluster capacity and resources.
CPU and memory requirements
Disk requirements
Analyze Index Disk Usage API
ML node capacity
Storage efficiency strategies
Reducing dimensionality
Quantization
Excluding dense_vector from _source
Summary
Chapter 4: Performance Tuning - Working with Data
Deploying an NLP model
Loading a model into Elasticsearch
Model deployment configurations
Load testing
Rally
RAM estimation
Troubleshooting slowdown
Summary
Part 3: Specialized Use Cases
Chapter 5: Image Search
Overview of image search
The evolution of image search
The mechanism behind image search
The role of vector similarity search
Image search in practice
Vector search with images
Image vectorization
Indexing image vectors in Elasticsearch
k-Nearest Neighbor (kNN) search
Challenges and limitations with image search
Multi-modal models for vector search
Introduction and rationale
Understanding the concept of vector space in multi-modal models
Introduction to the OpenAI clip-ViT-B-32-multilingual-v1 model
Implementing vector search for diverse media types
Summary
Chapter 6: Redacting Personal Identifiable Information Using Elasticsearch
Overview of PII and redaction
Types of data that may contain PII
Risks of storing PII in logs
How PII is leaked or lost
Redacting PII with NER models and regex patterns
NER models
Regex patterns
Combining NER models and regex (or grok) patterns for PII redaction
PII redaction pipeline in Elasticsearch
Generating synthetic PII
Installing the default pipeline
Expected results
Expanding and customizing options for the PII redaction pipeline in Elasticsearch
Customizing the default PII example
Cloning the pipeline to create different versions for different data streams
Fine-tuning NER models for particular datasets.
Logic for contextual awareness
Summary
Chapter 7: Next Generation of Observability Powered by Vectors
Introduction to observability and its importance in modern software systems
Observability: main pillars
Log analytics and its role in observability
A new approach-applying vectors and embeddings to log analytics
Approach 1-training or fine-tuning an existing model for logs
Approach 2-generating human-understandable descriptions and vectorizing these descriptions
Log vectorization
Synthetic log
Expanding logs at write with OpenAI
Semantic search on our logs
Building a query using log vectorization
Loading a model
Ingest pipeline
Semantic search
Summary
Chapter 8: The Power of Vectors and Embedding in Bolstering Cybersecurity
Technical requirements
Understanding the importance of email phishing detection
What is phishing?
Different types of phishing attacks
Statistics on the frequency of phishing attacks
Challenges in detecting phishing emails
Role of automated detection
Augmenting existing techniques with natural language processing
Introducing ELSER
The role of ELSER in GenAI
Introduction to the Enron email dataset (ham or spam)
Seeing ELSER in action
Hardware consideration
Downloading the ELSER model in Elastic
Setting up the index and ingestion pipeline
Semantic search with ELSER
Limitations of ELSER
Summary
Part 4: Innovative Integrations and Future Directions
Chapter 9: Retrieval Augmented Generation with Elastic
Preparing for RAG-enhanced search with ELSER and RRF
Semantic search with ELSER
A recap of essential considerations for RAG
Integrating ELSER with RRF
Language models and RAG
In-depth case study-implementing a RAG-enhanced CookBot
Dataset overview - an introduction to the Allrecipes.com dataset.
Preparing data for RAG-enhanced search
Building the retriever-RRF with ELSER
Leveraging the retriever and implementing the generator
Summary
Chapter 10: Building an Elastic Plugin for ChatGPT
Contextual foundations
The paradigm of dynamic context
Dynamic Context Layer plugin vision-architecture and flow
Building the DCL
Fetching the latest information from Elastic documentation
Elevating data with Embedchain
Integrating with ChatGPT-creating a real-time conversationalist
Deployment
Summary
Index
Other Books You May Enjoy.

Vector Search for Practitioners with Elastic A Toolkit for Building NLP Solutions for Search, Observability, and Security Using Vector Search

Ejemplares similares