Elasticsearch the definitive guide
Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with...
Otros Autores: | , , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Sebastopol, California :
O'Reilly Media
2010.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629585406719 |
Tabla de Contenidos:
- Intro
- Table of Contents
- Foreword
- Preface
- Who Should Read This Book
- Why We Wrote This Book
- Elasticsearch Version
- How to Read This Book
- Navigating This Book
- Online Resources
- Conventions Used in This Book
- Using Code Examples
- Safari® Books Online
- How to Contact Us
- Acknowledgments
- Part I. Getting Started
- Chapter 1. You Know, for Search...
- Installing Elasticsearch
- Installing Marvel
- Running Elasticsearch
- Viewing Marvel and Sense
- Talking to Elasticsearch
- Java API
- RESTful API with JSON over HTTP
- Document Oriented
- JSON
- Finding Your Feet
- Let's Build an Employee Directory
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Chapter 2. Life Inside a Cluster
- An Empty Cluster
- Cluster Health
- Add an Index
- Add Failover
- Scale Horizontally
- Then Scale Some More
- Coping with Failure
- Chapter 3. Data In, Data Out
- What Is a Document?
- Document Metadata
- _index
- _type
- _id
- Other Metadata
- Indexing a Document
- Using Our Own ID
- Autogenerating IDs
- Retrieving a Document
- Retrieving Part of a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Using Versions from an External System
- Partial Updates to Documents
- Using Scripts to Make Partial Updates
- Updating a Document That May Not Yet Exist
- Updates and Conflicts
- Retrieving Multiple Documents
- Cheaper in Bulk
- Don't Repeat Yourself
- How Big Is Too Big?
- Chapter 4. Distributed Document Store
- Routing a Document to a Shard.
- How Primary and Replica Shards Interact
- Creating, Indexing, and Deleting a Document
- Retrieving a Document
- Partial Updates to a Document
- Multidocument Patterns
- Why the Funny Format?
- Chapter 5. Searching-The Basic Tools
- The Empty Search
- hits
- took
- shards
- timeout
- Multi-index, Multitype
- Pagination
- Search Lite
- The _all Field
- More Complicated Queries
- Chapter 6. Mapping and Analysis
- Exact Values Versus Full Text
- Inverted Index
- Analysis and Analyzers
- Built-in Analyzers
- When Analyzers Are Used
- Testing Analyzers
- Specifying Analyzers
- Mapping
- Core Simple Field Types
- Viewing the Mapping
- Customizing Field Mappings
- Updating a Mapping
- Testing the Mapping
- Complex Core Field Types
- Multivalue Fields
- Empty Fields
- Multilevel Objects
- Mapping for Inner Objects
- How Inner Objects are Indexed
- Arrays of Inner Objects
- Chapter 7. Full-Body Search
- Empty Search
- Query DSL
- Structure of a Query Clause
- Combining Multiple Clauses
- Queries and Filters
- Performance Differences
- When to Use Which
- Most Important Queries and Filters
- term Filter
- terms Filter
- range Filter
- exists and missing Filters
- bool Filter
- match_all Query
- match Query
- multi_match Query
- bool Query
- Combining Queries with Filters
- Filtering a Query
- Just a Filter
- A Query as a Filter
- Validating Queries
- Understanding Errors
- Understanding Queries
- Chapter 8. Sorting and Relevance
- Sorting
- Sorting by Field Values
- Multilevel Sorting
- Sorting on Multivalue Fields
- String Sorting and Multifields
- What Is Relevance?
- Understanding the Score
- Understanding Why a Document Matched
- Fielddata
- Chapter 9. Distributed Search Execution
- Query Phase
- Fetch Phase
- Search Options
- preference
- timeout
- routing
- search_type.
- scan and scroll
- Chapter 10. Index Management
- Creating an Index
- Deleting an Index
- Index Settings
- Configuring Analyzers
- Custom Analyzers
- Creating a Custom Analyzer
- Types and Mappings
- How Lucene Sees Documents
- How Types Are Implemented
- Avoiding Type Gotchas
- The Root Object
- Properties
- Metadata: _source Field
- Metadata: _all Field
- Metadata: Document Identity
- Dynamic Mapping
- Customizing Dynamic Mapping
- date_detection
- dynamic_templates
- Default Mapping
- Reindexing Your Data
- Index Aliases and Zero Downtime
- Chapter 11. Inside a Shard
- Making Text Searchable
- Immutability
- Dynamically Updatable Indices
- Deletes and Updates
- Near Real-Time Search
- refresh API
- Making Changes Persistent
- flush API
- Segment Merging
- optimize API
- Part II. Search in Depth
- Chapter 12. Structured Search
- Finding Exact Values
- term Filter with Numbers
- term Filter with Text
- Internal Filter Operation
- Combining Filters
- Bool Filter
- Nesting Boolean Filters
- Finding Multiple Exact Values
- Contains, but Does Not Equal
- Equals Exactly
- Ranges
- Ranges on Dates
- Ranges on Strings
- Dealing with Null Values
- exists Filter
- missing Filter
- exists/missing on Objects
- All About Caching
- Independent Filter Caching
- Controlling Caching
- Filter Order
- Chapter 13. Full-Text Search
- Term-Based Versus Full-Text
- The match Query
- Index Some Data
- A Single-Word Query
- Multiword Queries
- Improving Precision
- Controlling Precision
- Combining Queries
- Score Calculation
- Controlling Precision
- How match Uses bool
- Boosting Query Clauses
- Controlling Analysis
- Default Analyzers
- Configuring Analyzers in Practice
- Relevance Is Broken!
- Chapter 14. Multifield Search
- Multiple Query Strings
- Prioritizing Clauses
- Single Query String.
- Know Your Data
- Best Fields
- dis_max Query
- Tuning Best Fields Queries
- tie_breaker
- multi_match Query
- Using Wildcards in Field Names
- Boosting Individual Fields
- Most Fields
- Multifield Mapping
- Cross-fields Entity Search
- A Naive Approach
- Problems with the most_fields Approach
- Field-Centric Queries
- Problem 1: Matching the Same Word in Multiple Fields
- Problem 2: Trimming the Long Tail
- Problem 3: Term Frequencies
- Solution
- Custom _all Fields
- cross-fields Queries
- Per-Field Boosting
- Exact-Value Fields
- Chapter 15. Proximity Matching
- Phrase Matching
- Term Positions
- What Is a Phrase
- Mixing It Up
- Multivalue Fields
- Closer Is Better
- Proximity for Relevance
- Improving Performance
- Rescoring Results
- Finding Associated Words
- Producing Shingles
- Multifields
- Searching for Shingles
- Performance
- Chapter 16. Partial Matching
- Postcodes and Structured Data
- prefix Query
- wildcard and regexp Queries
- Query-Time Search-as-You-Type
- Index-Time Optimizations
- Ngrams for Partial Matching
- Index-Time Search-as-You-Type
- Preparing the Index
- Querying the Field
- Edge n-grams and Postcodes
- Ngrams for Compound Words
- Chapter 17. Controlling Relevance
- Theory Behind Relevance Scoring
- Boolean Model
- Term Frequency/Inverse Document Frequency (TF/IDF)
- Vector Space Model
- Lucene's Practical Scoring Function
- Query Normalization Factor
- Query Coordination
- Index-Time Field-Level Boosting
- Query-Time Boosting
- Boosting an Index
- t.getBoost()
- Manipulating Relevance with Query Structure
- Not Quite Not
- boosting Query
- Ignoring TF/IDF
- constant_score Query
- function_score Query
- Boosting by Popularity
- modifier
- factor
- boost_mode
- max_boost
- Boosting Filtered Subsets
- filter Versus query
- functions
- score_mode.
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Okapi BM25
- Changing Similarities
- Configuring BM25
- Relevance Tuning Is the Last 10%
- Part III. Dealing with Human Language
- Chapter 18. Getting Started with Languages
- Using Language Analyzers
- Configuring Language Analyzers
- Pitfalls of Mixing Languages
- At Index Time
- At Query Time
- Identifying Language
- One Language per Document
- Foreign Words
- One Language per Field
- Mixed-Language Fields
- Split into Separate Fields
- Analyze Multiple Times
- Use n-grams
- Chapter 19. Identifying Words
- standard Analyzer
- standard Tokenizer
- Installing the ICU Plug-in
- icu_tokenizer
- Tidying Up Input Text
- Tokenizing HTML
- Tidying Up Punctuation
- Chapter 20. Normalizing Tokens
- In That Case
- You Have an Accent
- Retaining Meaning
- Living in a Unicode World
- Unicode Case Folding
- Unicode Character Folding
- Sorting and Collations
- Case-Insensitive Sorting
- Differences Between Languages
- Unicode Collation Algorithm
- Unicode Sorting
- Specifying a Language
- Customizing Collations
- Chapter 21. Reducing Words to Their Root Form
- Algorithmic Stemmers
- Using an Algorithmic Stemmer
- Dictionary Stemmers
- Hunspell Stemmer
- Installing a Dictionary
- Per-Language Settings
- Creating a Hunspell Token Filter
- Hunspell Dictionary Format
- Choosing a Stemmer
- Stemmer Performance
- Stemmer Quality
- Stemmer Degree
- Making a Choice
- Controlling Stemming
- Preventing Stemming
- Customizing Stemming
- Stemming in situ
- Is Stemming in situ a Good Idea
- Chapter 22. Stopwords: Performance Versus Precision
- Pros and Cons of Stopwords
- Using Stopwords
- Stopwords and the Standard Analyzer
- Maintaining Positions
- Specifying Stopwords.
- Using the stop Token Filter.