Learning data mining with Python use Python to manipulate data and build predictive models

Harness the power of Python to develop data mining applications, analyze data, delve into machine learning, explore object detection using Deep Neural Networks, and create insightful predictive models. About This Book Use a wide variety of Python libraries for practical data mining purposes. Learn h...

Descripción completa

Detalles Bibliográficos
Otros Autores: Layton, Robert, 1986- author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, [England] ; Mumbai, [India] : Packt Publishing 2017.
Edición:Second edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630063006719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewer
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Getting Started with Data Mining
  • Introducing data mining
  • Using Python and the Jupyter Notebook
  • Installing Python
  • Installing Jupyter Notebook
  • Installing scikit-learn
  • A simple affinity analysis example
  • What is affinity analysis?
  • Product recommendations
  • Loading the dataset with NumPy
  • Downloading the example code
  • Implementing a simple ranking of rules
  • Ranking to find the best rules
  • A simple classification example
  • What is classification?
  • Loading and preparing the dataset
  • Implementing the OneR algorithm
  • Testing the algorithm
  • Summary
  • Chapter 2: Classifying with scikit-learn Estimators
  • scikit-learn estimators
  • Nearest neighbors
  • Distance metrics
  • Loading the dataset
  • Moving towards a standard workflow
  • Running the algorithm
  • Setting parameters
  • Preprocessing
  • Standard pre-processing
  • Putting it all together
  • Pipelines
  • Summary
  • Chapter 3: Predicting Sports Winners with Decision Trees
  • Loading the dataset
  • Collecting the data
  • Using pandas to load the dataset
  • Cleaning up the dataset
  • Extracting new features
  • Decision trees
  • Parameters in decision trees
  • Using decision trees
  • Sports outcome prediction
  • Putting it all together
  • Random forests
  • How do ensembles work?
  • Setting parameters in Random Forests
  • Applying random forests
  • Engineering new features
  • Summary
  • Chapter 4: Recommending Movies Using Affinity Analysis
  • Affinity analysis
  • Algorithms for affinity analysis
  • Overall methodology
  • Dealing with the movie recommendation problem
  • Obtaining the dataset
  • Loading with pandas
  • Sparse data formats
  • Understanding the Apriori algorithm and its implementation.
  • Looking into the basics of the Apriori algorithm
  • Implementing the Apriori algorithm
  • Extracting association rules
  • Evaluating the association rules
  • Summary
  • Chapter 5: Features and scikit-learn Transformers
  • Feature extraction
  • Representing reality in models
  • Common feature patterns
  • Creating good features
  • Feature selection
  • Selecting the best individual features
  • Feature creation
  • Principal Component Analysis
  • Creating your own transformer
  • The transformer API
  • Implementing a Transformer
  • Unit testing
  • Putting it all together
  • Summary
  • Chapter 6: Social Media Insight using Naive Bayes
  • Disambiguation
  • Downloading data from a social network
  • Loading and classifying the dataset
  • Creating a replicable dataset from Twitter
  • Text transformers
  • Bag-of-words models
  • n-gram features
  • Other text features
  • Naive Bayes
  • Understanding Bayes' theorem
  • Naive Bayes algorithm
  • How it works
  • Applying of Naive Bayes
  • Extracting word counts
  • Converting dictionaries to a matrix
  • Putting it all together
  • Evaluation using the F1-score
  • Getting useful features from models
  • Summary
  • Chapter 7: Follow Recommendations Using Graph Mining
  • Loading the dataset
  • Classifying with an existing model
  • Getting follower information from Twitter
  • Building the network
  • Creating a graph
  • Creating a similarity graph
  • Finding subgraphs
  • Connected components
  • Optimizing criteria
  • Summary
  • Chapter 8: Beating CAPTCHAs with Neural Networks
  • Artificial neural networks
  • An introduction to neural networks
  • Creating the dataset
  • Drawing basic CAPTCHAs
  • Splitting the image into individual letters
  • Creating a training dataset
  • Training and classifying
  • Back-propagation
  • Predicting words
  • Improving accuracy using a dictionary
  • Ranking mechanisms for word similarity.
  • Putting it all together
  • Summary
  • Chapter 9: Authorship Attribution
  • Attributing documents to authors
  • Applications and use cases
  • Authorship attribution
  • Getting the data
  • Using function words
  • Counting function words
  • Classifying with function words
  • Support Vector Machines
  • Classifying with SVMs
  • Kernels
  • Character n-grams
  • Extracting character n-grams
  • The Enron dataset
  • Accessing the Enron dataset
  • Creating a dataset loader
  • Putting it all together
  • Evaluation
  • Summary
  • Chapter 10: Clustering News Articles
  • Trending topic discovery
  • Using a web API to get data
  • Reddit as a data source
  • Getting the data
  • Extracting text from arbitrary websites
  • Finding the stories in arbitrary websites
  • Extracting the content
  • Grouping news articles
  • The k-means algorithm
  • Evaluating the results
  • Extracting topic information from clusters
  • Using clustering algorithms as transformers
  • Clustering ensembles
  • Evidence accumulation
  • How it works
  • Implementation
  • Online learning
  • Implementation
  • Summary
  • Chapter 11: Object Detection in Images using Deep Neural Networks
  • Object classification
  • Use cases
  • Application scenario
  • Deep neural networks
  • Intuition
  • Implementing deep neural networks
  • An Introduction to TensorFlow
  • Using Keras
  • Convolutional Neural Networks
  • GPU optimization
  • When to use GPUs for computation
  • Running our code on a GPU
  • Setting up the environment
  • Application
  • Getting the data
  • Creating the neural network
  • Putting it all together
  • Summary
  • Chapter 12: Working with Big Data
  • Big data
  • Applications of big data
  • MapReduce
  • The intuition behind MapReduce
  • A word count example
  • Hadoop MapReduce
  • Applying MapReduce
  • Getting the data
  • Naive Bayes prediction
  • The mrjob package
  • Extracting the blog posts.
  • Training Naive Bayes
  • Putting it all together
  • Training on Amazon's EMR infrastructure
  • Summary
  • Appendix: Next Steps...
  • Getting Started with Data Mining
  • Scikit-learn tutorials
  • Extending the Jupyter Notebook
  • More datasets
  • Other Evaluation Metrics
  • More application ideas
  • Classifying with scikit-learn Estimators
  • Scalability with the nearest neighbor
  • More complex pipelines
  • Comparing classifiers
  • Automated Learning
  • Predicting Sports Winners with Decision Trees
  • More complex features
  • Dask
  • Research
  • Recommending Movies Using Affinity Analysis
  • New datasets
  • The Eclat algorithm
  • Collaborative Filtering
  • Extracting Features with Transformers
  • Adding noise
  • Vowpal Wabbit
  • word2vec
  • Social Media Insight Using Naive Bayes
  • Spam detection
  • Natural language processing and part-of-speech tagging
  • Discovering Accounts to Follow Using Graph Mining
  • More complex algorithms
  • NetworkX
  • Beating CAPTCHAs with Neural Networks
  • Better (worse?) CAPTCHAs
  • Deeper networks
  • Reinforcement learning
  • Authorship Attribution
  • Increasing the sample size
  • Blogs dataset
  • Local n-grams
  • Clustering News Articles
  • Clustering Evaluation
  • Temporal analysis
  • Real-time clusterings
  • Classifying Objects in Images Using Deep Learning
  • Mahotas
  • Magenta
  • Working with Big Data
  • Courses on Hadoop
  • Pydoop
  • Recommendation engine
  • W.I.L.L
  • More resources
  • Kaggle competitions
  • Coursera
  • Index.