Machine learning with R

R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required – this book will take you methodically through every stage of applying machine learning. Harness the power of R for statistical computing and data science Use R to apply comm...

Full description

Bibliographic Details
Main Author: Lantz, Brett (-)
Format: eBook
Language:Inglés
Published: Birmingham : Packt Publishing 2013.
Edition:1st edition
Series:Community experience distilled.
Subjects:
See on Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009627865406719
Table of Contents:
  • Intro
  • Machine Learning with R
  • Table of Contents
  • Machine Learning with R
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Support files, eBooks, discount offers and more
  • Why Subscribe?
  • Free Access for Packt account holders
  • Preface
  • What this book covers
  • What you need for this book
  • Who this book is for
  • Conventions
  • Reader feedback
  • Customer support
  • Downloading the example code
  • Errata
  • Piracy
  • Questions
  • 1. Introducing Machine Learning
  • The origins of machine learning
  • Uses and abuses of machine learning
  • Ethical considerations
  • How do machines learn?
  • Abstraction and knowledge representation
  • Generalization
  • Assessing the success of learning
  • Steps to apply machine learning to your data
  • Choosing a machine learning algorithm
  • Thinking about the input data
  • Thinking about types of machine learning algorithms
  • Matching your data to an appropriate algorithm
  • Using R for machine learning
  • Installing and loading R packages
  • Installing an R package
  • Installing a package using the point-and-click interface
  • Loading an R package
  • Summary
  • 2. Managing and Understanding Data
  • R data structures
  • Vectors
  • Factors
  • Lists
  • Data frames
  • Matrixes and arrays
  • Managing data with R
  • Saving and loading R data structures
  • Importing and saving data from CSV files
  • Importing data from SQL databases
  • Exploring and understanding data
  • Exploring the structure of data
  • Exploring numeric variables
  • Measuring the central tendency - mean and median
  • Measuring spread - quartiles and the five-number summary
  • Visualizing numeric variables - boxplots
  • Visualizing numeric variables - histograms
  • Understanding numeric data - uniform and normal distributions
  • Measuring spread - variance and standard deviation.
  • Exploring categorical variables
  • Measuring the central tendency - the mode
  • Exploring relationships between variables
  • Visualizing relationships - scatterplots
  • Examining relationships - two-way cross-tabulations
  • Summary
  • 3. Lazy Learning - Classification Using Nearest Neighbors
  • Understanding classification using nearest neighbors
  • The kNN algorithm
  • Calculating distance
  • Choosing an appropriate k
  • Preparing data for use with kNN
  • Why is the kNN algorithm lazy?
  • Diagnosing breast cancer with the kNN algorithm
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Transformation - normalizing numeric data
  • Data preparation - creating training and test datasets
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Transformation - z-score standardization
  • Testing alternative values of k
  • Summary
  • 4. Probabilistic Learning - Classification Using Naive Bayes
  • Understanding naive Bayes
  • Basic concepts of Bayesian methods
  • Probability
  • Joint probability
  • Conditional probability with Bayes' theorem
  • The naive Bayes algorithm
  • The naive Bayes classification
  • The Laplace estimator
  • Using numeric features with naive Bayes
  • Example - filtering mobile phone spam with the naive Bayes algorithm
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Data preparation - processing text data for analysis
  • Data preparation - creating training and test datasets
  • Visualizing text data - word clouds
  • Data preparation - creating indicator features for frequent words
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Summary
  • 5. Divide and Conquer - Classification Using Decision Trees and Rules
  • Understanding decision trees.
  • Divide and conquer
  • The C5.0 decision tree algorithm
  • Choosing the best split
  • Pruning the decision tree
  • Example - identifying risky bank loans using C5.0 decision trees
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Data preparation - creating random training and test datasets
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Boosting the accuracy of decision trees
  • Making some mistakes more costly than others
  • Understanding classification rules
  • Separate and conquer
  • The One Rule algorithm
  • The RIPPER algorithm
  • Rules from decision trees
  • Example - identifying poisonous mushrooms with rule learners
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Summary
  • 6. Forecasting Numeric Data - Regression Methods
  • Understanding regression
  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regression
  • Example - predicting medical expenses using linear regression
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Exploring relationships among features - the correlation matrix
  • Visualizing relationships among features - the scatterplot matrix
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Model specification - adding non-linear relationships
  • Transformation - converting a numeric variable to a binary indicator
  • Model specification - adding interaction effects
  • Putting it all together - an improved regression model
  • Understanding regression trees and model trees
  • Adding regression to trees.
  • Example - estimating the quality of wines with regression trees and model trees
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Step 3 - training a model on the data
  • Visualizing decision trees
  • Step 4 - evaluating model performance
  • Measuring performance with mean absolute error
  • Step 5 - improving model performance
  • Summary
  • 7. Black Box Methods - Neural Networks and Support Vector Machines
  • Understanding neural networks
  • From biological to artificial neurons
  • Activation functions
  • Network topology
  • The number of layers
  • The direction of information travel
  • The number of nodes in each layer
  • Training neural networks with backpropagation
  • Modeling the strength of concrete with ANNs
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Understanding Support Vector Machines
  • Classification with hyperplanes
  • Finding the maximum margin
  • The case of linearly separable data
  • The case of non-linearly separable data
  • Using kernels for non-linear spaces
  • Performing OCR with SVMs
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Summary
  • 8. Finding Patterns - Market Basket Analysis Using Association Rules
  • Understanding association rules
  • The Apriori algorithm for association rule learning
  • Measuring rule interest - support and confidence
  • Building a set of rules with the Apriori principle
  • Example - identifying frequently purchased groceries with association rules
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data.
  • Data preparation - creating a sparse matrix for transaction data
  • Visualizing item support - item frequency plots
  • Visualizing transaction data - plotting the sparse matrix
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Sorting the set of association rules
  • Taking subsets of association rules
  • Saving association rules to a file or data frame
  • Summary
  • 9. Finding Groups of Data - Clustering with k-means
  • Understanding clustering
  • Clustering as a machine learning task
  • The k-means algorithm for clustering
  • Using distance to assign and update clusters
  • Choosing the appropriate number of clusters
  • Finding teen market segments using k-means clustering
  • Step 1 - collecting data
  • Step 2 - exploring and preparing the data
  • Data preparation - dummy coding missing values
  • Data preparation - imputing missing values
  • Step 3 - training a model on the data
  • Step 4 - evaluating model performance
  • Step 5 - improving model performance
  • Summary
  • 10. Evaluating Model Performance
  • Measuring performance for classification
  • Working with classification prediction data in R
  • A closer look at confusion matrices
  • Using confusion matrices to measure performance
  • Beyond accuracy - other measures of performance
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualizing performance tradeoffs
  • ROC curves
  • Estimating future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling
  • Summary
  • 11. Improving Model Performance
  • Tuning stock models for better performance
  • Using caret for automated parameter tuning
  • Creating a simple tuned model
  • Customizing the tuning process
  • Improving model performance with meta-learning
  • Understanding ensembles
  • Bagging
  • Boosting.
  • Random forests.