Natural language processing with Java techniques for building machine learning and neural network models for NLP
Explore various approaches to organize and extract useful text from unstructured data using Java Key Features Use deep learning and NLP techniques in Java to discover hidden insights in text Work with popular Java libraries such as CoreNLP, OpenNLP, and Mallet Explore machine translation, identifyin...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham ; Mumbai :
Packt
2018.
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630746606719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Dedication
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Introduction to NLP
- What is NLP?
- Why use NLP?
- Why is NLP so hard?
- Survey of NLP tools
- Apache OpenNLP
- Stanford NLP
- LingPipe
- GATE
- UIMA
- Apache Lucene Core
- Deep learning for Java
- Overview of text-processing tasks
- Finding parts of text
- Finding sentences
- Feature-engineering
- Finding people and things
- Detecting parts of speech
- Classifying text and documents
- Extracting relationships
- Using combined approaches
- Understanding NLP models
- Identifying the task
- Selecting a model
- Building and training the model
- Verifying the model
- Using the model
- Preparing data
- Summary
- Chapter 2: Finding Parts of Text
- Understanding the parts of text
- What is tokenization?
- Uses of tokenizers
- Simple Java tokenizers
- Using the Scanner class
- Specifying the delimiter
- Using the split method
- Using the BreakIterator class
- Using the StreamTokenizer class
- Using the StringTokenizer class
- Performance considerations with Java core tokenization
- NLP tokenizer APIs
- Using the OpenNLPTokenizer class
- Using the SimpleTokenizer class
- Using the WhitespaceTokenizer class
- Using the TokenizerME class
- Using the Stanford tokenizer
- Using the PTBTokenizer class
- Using the DocumentPreprocessor class
- Using a pipeline
- Using LingPipe tokenizers
- Training a tokenizer to find parts of text
- Comparing tokenizers
- Understanding normalization
- Converting to lowercase
- Removing stopwords
- Creating a StopWords class
- Using LingPipe to remove stopwords
- Using stemming
- Using the Porter Stemmer
- Stemming with LingPipe
- Using lemmatization
- Using the StanfordLemmatizer class
- Using lemmatization in OpenNLP.
- Normalizing using a pipeline
- Summary
- Chapter 3: Finding Sentences
- The SBD process
- What makes SBD difficult?
- Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
- Simple Java SBDs
- Using regular expressions
- Using the BreakIterator class
- Using NLP APIs
- Using OpenNLP
- Using the SentenceDetectorME class
- Using the sentPosDetect method
- Using the Stanford API
- Using the PTBTokenizer class
- Using the DocumentPreprocessor class
- Using the StanfordCoreNLP class
- Using LingPipe
- Using the IndoEuropeanSentenceModel class
- Using the SentenceChunker class
- Using the MedlineSentenceModel class
- Training a sentence-detector model
- Using the Trained model
- Evaluating the model using the SentenceDetectorEvaluator class
- Summary
- Chapter 4: Finding People and Things
- Why is NER difficult?
- Techniques for name recognition
- Lists and regular expressions
- Statistical classifiers
- Using regular expressions for NER
- Using Java's regular expressions to find entities
- Using the RegExChunker class of LingPipe
- Using NLP APIs
- Using OpenNLP for NER
- Determining the accuracy of the entity
- Using other entity types
- Processing multiple entity types
- Using the Stanford API for NER
- Using LingPipe for NER
- Using LingPipe's named entity models
- Using the ExactDictionaryChunker class
- Building a new dataset with the NER annotation tool
- Training a model
- Evaluating a model
- Summary
- Chapter 5: Detecting Part of Speech
- The tagging process
- The importance of POS taggers
- What makes POS difficult?
- Using the NLP APIs
- Using OpenNLP POS taggers
- Using the OpenNLP POSTaggerME class for POS taggers
- Using OpenNLP chunking
- Using the POSDictionary class
- Obtaining the tag dictionary for a tagger
- Determining a word's tags
- Changing a word's tags.
- Adding a new tag dictionary
- Creating a dictionary from a file
- Using Stanford POS taggers
- Using Stanford MaxentTagger
- Using the MaxentTagger class to tag textese
- Using the Stanford pipeline to perform tagging
- Using LingPipe POS taggers
- Using the HmmDecoder class with Best_First tags
- Using the HmmDecoder class with NBest tags
- Determining tag confidence with the HmmDecoder class
- Training the OpenNLP POSModel
- Summary
- Chapter 6: Representing Text with Features
- N-grams
- Word embedding
- GloVe
- Word2vec
- Dimensionality reduction
- Principle component analysis
- Distributed stochastic neighbor embedding
- Summary
- Chapter 7: Information Retrieval
- Boolean retrieval
- Dictionaries and tolerant retrieval
- Wildcard queries
- Spelling correction
- Soundex
- Vector space model
- Scoring and term weighting
- Inverse document frequency
- TF-IDF weighting
- Evaluation of information retrieval systems
- Summary
- Chapter 8: Classifying Texts and Documents
- How classification is used
- Understanding sentiment analysis
- Text-classifying techniques
- Using APIs to classify text
- Using OpenNLP
- Training an OpenNLP classification model
- Using DocumentCategorizerME to classify text
- Using the Stanford API
- Using the ColumnDataClassifier class for classification
- Using the Stanford pipeline to perform sentiment analysis
- Using LingPipe to classify text
- Training text using the Classified class
- Using other training categories
- Classifying text using LingPipe
- Sentiment analysis using LingPipe
- Language identification using LingPipe
- Summary
- Chapter 9: Topic Modeling
- What is topic modeling?
- The basics of LDA
- Topic modeling with MALLET
- Training
- Evaluation
- Summary
- Chapter 10: Using Parsers to Extract Relationships
- Relationship types.
- Understanding parse trees
- Using extracted relationships
- Extracting relationships
- Using NLP APIs
- Using OpenNLP
- Using the Stanford API
- Using the LexicalizedParser class
- Using the TreePrint class
- Finding word dependencies using the GrammaticalStructure class
- Finding coreference resolution entities
- Extracting relationships for a question-answer system
- Finding the word dependencies
- Determining the question type
- Searching for the answer
- Summary
- Chapter 11: Combined Pipeline
- Preparing data
- Using boilerpipe to extract text from HTML
- Using POI to extract text from Word documents
- Using PDFBox to extract text from PDF documents
- Using Apache Tika for content analysis and extraction
- Pipelines
- Using the Stanford pipeline
- Using multiple cores with the Stanford pipeline
- Creating a pipeline to search text
- Summary
- Chapter 12: Creating a Chatbot
- Chatbot architecture
- Artificial Linguistic Internet Computer Entity
- Understanding AIML
- Developing a chatbot using ALICE and AIML
- Summary
- Other Books You May Enjoy
- Index.