Mastering data mining with Python find patterns hidden in your data

Learn how to create more powerful data mining applications with this comprehensive Python guide to advance data analytics techniques About This Book Dive deeper into data mining with Python ? don't be complacent, sharpen your skills! From the most common elements of data mining to cutting-edge...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Squire, Megan, author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham, England ; Mumbai, India : Packt Publishing 2016.
Edición:	1st edition
Colección:	Community experience distilled.
Materias:	Data mining. Python (Computer program language) Business planning > Data processing.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630314406719

Tabla de Contenidos:

Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Expanding Your Data Mining Toolbox
What is data mining?
How do we do data mining?
The Fayyad et al. KDD process
The Han et al. KDD process
The CRISP-DM process
The Six Steps process
Which data mining methodology is the best?
What are the techniques used in data mining?
What techniques are we going to use in THIS book?
How do we set up our data mining work environment?
Summary
Association Rule Mining
What are frequent itemsets?
The diapers and beer urban legend
Frequent itemset mining basics
Towards association rules
Support
Confidence
Association rules
An example with data
Added value - fixing a flaw in the plan
Methods for finding frequent itemsets
A project - discovering association rules in software project tags
Summary
Entity Matching
What is entity matching?
Merging data
Merging datasets vertically
Merging datasets horizontally
Techniques for matching
Attribute-based similarity matching
Be careful of pairwise comparisons
Leverage rare values
Methods for matching attributes
Range-based or distance from target
String edit distance
Hamming distance
Levenshtein distance
Soundex
Leveraging disjoint sets
Context-based similarity matching
Machine learning-based entity matching
Evaluation of entity matching techniques
Efficiency - how long does it take to do the matching?
Effectiveness - how accurate are the matches that we generate?
Usefulness - how practical is the matching procedure to use?
Entity matching project
Difficulties with matching software projects
Two examples
Matching on project names
Matching on people names
Matching on URLs.
Matching on topics and description keywords
The dataset
The code
The results
How many entity matches did we find?
How good are the pairs we found?
Summary
Network Analysis
What is a network?
Measuring a network
Degree of a network
Diameter of a network
Walks, paths, and trails in a network
Components of a network
Centrality of a network
Closeness centrality
Degree centrality
Betweenness centrality
Other measures of centrality
Representing graph data
Adjacency matrix
Edge lists and adjacency lists
Differences between graph data structures
Importing data into a graph structure
Adjacency list format
Edge list format
GEXF and GraphML
GDF
Python pickle
JSON
JSON node and link series
JSON trees
Pajek format
A real project
Exploring the data
Generating the network files
Understanding our data as a network
Generating simple network metrics
Playing with the parameters of a network
Analyzing subgraphs
Analyzing cliques and centrality in the subgraphs
Looking for change over time
Summary
Sentiment Analysis
What is sentiment analysis?
The basics of sentiment analysis
The structure of an opinion
Document-level and sentence-level analysis
Important features of opinions
Sentiment analysis algorithms
General-purpose data collections
Hu and Liu's sentiment analysis lexicon
SentiWordNet
Vader sentiment
Sentiment mining application
Motivating the project
Data preparation
Data analysis of chat messages
Data analysis of e-mail messages
Summary
Named Entity Recognition in Text
Why look for named entities?
Techniques for named entity recognition
Tagging parts of speech
Classes of named entities
Building and evaluating NER systems
NER and partial matches.
Handling partial matches
Named entity recognition project
A simple NER tool
Apache Board meeting minutes
Django IRC chat
GnuIRC summaries
LKML e-mails
Summary
Automatic Text Summarization
What is automatic text summarization?
Tools for text summarization
Naive text summarization using NLTK
Text summarization using Gensim
Text summarization using Sumy
Sumy's Luhn summarizer
Sumy's TextRank summarizer
Sumy's LSA summarizer
Sumy's Edmundson summarizer
Summary
Topic Modeling
What is topic modeling?
Latent Dirichlet Allocation
Gensim for topic modeling
Understanding Gensim LDA topics
Understanding Gensim LDA passes
Applying a Gensim LDA model to new documents
Serializing Gensim LDA objects
Serializing a dictionary
Serializing a corpus
Serializing a model
Gensim LDA for a larger project
Summary
Mining for Data Anomalies
What are data anomalies?
Missing data
Locating missing data
Zero values
Fixing missing data
Ignore the problem rows
Fix the problem manually
Use a fabricated value
Use a central measure
Use Last Observation Carried Forward
Use a similar value
Use the most likely value
Data errors
Truncated fields
Data type and character set errors
Logic or semantic errors
Outliers
Visual mining for outliers
Statistical detection of outliers
Summary
Index.

Mastering data mining with Python find patterns hidden in your data

Ejemplares similares