Machine learning with R
R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required – this book will take you methodically through every stage of applying machine learning. Harness the power of R for statistical computing and data science Use R to apply comm...
Main Author: | |
---|---|
Format: | eBook |
Language: | Inglés |
Published: |
Birmingham :
Packt Publishing
2013.
|
Edition: | 1st edition |
Series: | Community experience distilled.
|
Subjects: | |
See on Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009627865406719 |
Table of Contents:
- Intro
- Machine Learning with R
- Table of Contents
- Machine Learning with R
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Support files, eBooks, discount offers and more
- Why Subscribe?
- Free Access for Packt account holders
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Errata
- Piracy
- Questions
- 1. Introducing Machine Learning
- The origins of machine learning
- Uses and abuses of machine learning
- Ethical considerations
- How do machines learn?
- Abstraction and knowledge representation
- Generalization
- Assessing the success of learning
- Steps to apply machine learning to your data
- Choosing a machine learning algorithm
- Thinking about the input data
- Thinking about types of machine learning algorithms
- Matching your data to an appropriate algorithm
- Using R for machine learning
- Installing and loading R packages
- Installing an R package
- Installing a package using the point-and-click interface
- Loading an R package
- Summary
- 2. Managing and Understanding Data
- R data structures
- Vectors
- Factors
- Lists
- Data frames
- Matrixes and arrays
- Managing data with R
- Saving and loading R data structures
- Importing and saving data from CSV files
- Importing data from SQL databases
- Exploring and understanding data
- Exploring the structure of data
- Exploring numeric variables
- Measuring the central tendency - mean and median
- Measuring spread - quartiles and the five-number summary
- Visualizing numeric variables - boxplots
- Visualizing numeric variables - histograms
- Understanding numeric data - uniform and normal distributions
- Measuring spread - variance and standard deviation.
- Exploring categorical variables
- Measuring the central tendency - the mode
- Exploring relationships between variables
- Visualizing relationships - scatterplots
- Examining relationships - two-way cross-tabulations
- Summary
- 3. Lazy Learning - Classification Using Nearest Neighbors
- Understanding classification using nearest neighbors
- The kNN algorithm
- Calculating distance
- Choosing an appropriate k
- Preparing data for use with kNN
- Why is the kNN algorithm lazy?
- Diagnosing breast cancer with the kNN algorithm
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Transformation - normalizing numeric data
- Data preparation - creating training and test datasets
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Transformation - z-score standardization
- Testing alternative values of k
- Summary
- 4. Probabilistic Learning - Classification Using Naive Bayes
- Understanding naive Bayes
- Basic concepts of Bayesian methods
- Probability
- Joint probability
- Conditional probability with Bayes' theorem
- The naive Bayes algorithm
- The naive Bayes classification
- The Laplace estimator
- Using numeric features with naive Bayes
- Example - filtering mobile phone spam with the naive Bayes algorithm
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Data preparation - processing text data for analysis
- Data preparation - creating training and test datasets
- Visualizing text data - word clouds
- Data preparation - creating indicator features for frequent words
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Summary
- 5. Divide and Conquer - Classification Using Decision Trees and Rules
- Understanding decision trees.
- Divide and conquer
- The C5.0 decision tree algorithm
- Choosing the best split
- Pruning the decision tree
- Example - identifying risky bank loans using C5.0 decision trees
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Data preparation - creating random training and test datasets
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Boosting the accuracy of decision trees
- Making some mistakes more costly than others
- Understanding classification rules
- Separate and conquer
- The One Rule algorithm
- The RIPPER algorithm
- Rules from decision trees
- Example - identifying poisonous mushrooms with rule learners
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Summary
- 6. Forecasting Numeric Data - Regression Methods
- Understanding regression
- Simple linear regression
- Ordinary least squares estimation
- Correlations
- Multiple linear regression
- Example - predicting medical expenses using linear regression
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Exploring relationships among features - the correlation matrix
- Visualizing relationships among features - the scatterplot matrix
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Model specification - adding non-linear relationships
- Transformation - converting a numeric variable to a binary indicator
- Model specification - adding interaction effects
- Putting it all together - an improved regression model
- Understanding regression trees and model trees
- Adding regression to trees.
- Example - estimating the quality of wines with regression trees and model trees
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Step 3 - training a model on the data
- Visualizing decision trees
- Step 4 - evaluating model performance
- Measuring performance with mean absolute error
- Step 5 - improving model performance
- Summary
- 7. Black Box Methods - Neural Networks and Support Vector Machines
- Understanding neural networks
- From biological to artificial neurons
- Activation functions
- Network topology
- The number of layers
- The direction of information travel
- The number of nodes in each layer
- Training neural networks with backpropagation
- Modeling the strength of concrete with ANNs
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Understanding Support Vector Machines
- Classification with hyperplanes
- Finding the maximum margin
- The case of linearly separable data
- The case of non-linearly separable data
- Using kernels for non-linear spaces
- Performing OCR with SVMs
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Summary
- 8. Finding Patterns - Market Basket Analysis Using Association Rules
- Understanding association rules
- The Apriori algorithm for association rule learning
- Measuring rule interest - support and confidence
- Building a set of rules with the Apriori principle
- Example - identifying frequently purchased groceries with association rules
- Step 1 - collecting data
- Step 2 - exploring and preparing the data.
- Data preparation - creating a sparse matrix for transaction data
- Visualizing item support - item frequency plots
- Visualizing transaction data - plotting the sparse matrix
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Sorting the set of association rules
- Taking subsets of association rules
- Saving association rules to a file or data frame
- Summary
- 9. Finding Groups of Data - Clustering with k-means
- Understanding clustering
- Clustering as a machine learning task
- The k-means algorithm for clustering
- Using distance to assign and update clusters
- Choosing the appropriate number of clusters
- Finding teen market segments using k-means clustering
- Step 1 - collecting data
- Step 2 - exploring and preparing the data
- Data preparation - dummy coding missing values
- Data preparation - imputing missing values
- Step 3 - training a model on the data
- Step 4 - evaluating model performance
- Step 5 - improving model performance
- Summary
- 10. Evaluating Model Performance
- Measuring performance for classification
- Working with classification prediction data in R
- A closer look at confusion matrices
- Using confusion matrices to measure performance
- Beyond accuracy - other measures of performance
- The kappa statistic
- Sensitivity and specificity
- Precision and recall
- The F-measure
- Visualizing performance tradeoffs
- ROC curves
- Estimating future performance
- The holdout method
- Cross-validation
- Bootstrap sampling
- Summary
- 11. Improving Model Performance
- Tuning stock models for better performance
- Using caret for automated parameter tuning
- Creating a simple tuned model
- Customizing the tuning process
- Improving model performance with meta-learning
- Understanding ensembles
- Bagging
- Boosting.
- Random forests.