R data analysis cookbook a journey from data computation to data-driven insights

Over 80 recipes to help you breeze through your data analysis projects using R About This Book Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes Find meaningful insights from your data and generate dynamic reports A practical guide to help you put...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Ganguly, Kuntal, author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham, England ; Mumbai, [India] : Packt 2017.
Edición:	Second edition
Materias:	R (Computer program language)
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630746506719

Tabla de Contenidos:

Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Acquire and Prepare the Ingredients - Your Data
Introduction
Working with data
Reading data from CSV files
Getting ready
How to do it...
How it works...
There's more...
Handling different column delimiters
Handling column headers/variable names
Handling missing values
Reading strings as characters and not as factors
Reading data directly from a website
Reading XML data
Getting ready
How to do it...
How it works...
There's more...
Extracting HTML table data from a web page
Extracting a single HTML table from a web page
Reading JSON data
Getting ready
How to do it...
How it works...
Reading data from fixed-width formatted files
Getting ready
How to do it...
How it works...
There's more...
Files with headers
Excluding columns from data
Reading data from R files and R libraries
Getting ready
How to do it...
How it works...
There's more...
Saving all objects in a session
Saving objects selectively in a session
Attaching/detaching R data files to an environment
Listing all datasets in loaded packages
Removing cases with missing values
Getting ready
How to do it...
How it works...
There's more...
Eliminating cases with NA for selected variables
Finding cases that have no missing values
Converting specific values to NA
Excluding NA values from computations
Replacing missing values with the mean
Getting ready
How to do it...
How it works...
There's more...
Imputing random values sampled from non-missing values
Removing duplicate cases
Getting ready
How to do it...
How it works...
There's more.
Identifying duplicates without deleting them
Rescaling a variable to specified min-max range
Getting ready
How to do it...
How it works...
There's more...
Rescaling many variables at once
See also
Normalizing or standardizing data in a data frame
Getting ready
How to do it...
How it works...
There's more...
Standardizing several variables simultaneously
See also
Binning numerical data
Getting ready
How to do it...
How it works...
There's more...
Creating a specified number of intervals automatically
Creating dummies for categorical variables
Getting ready
How to do it...
How it works...
There's more...
Choosing which variables to create dummies for
Handling missing data
Getting ready
How to do it...
How it works...
There's more...
Understanding missing data pattern
Correcting data
Getting ready
How to do it...
How it works...
There's more...
Combining multiple columns to single columns
Splitting single column to multiple columns
Imputing data
Getting ready
How to do it...
How it works...
There's more...
Detecting outliers
Getting ready
How to do it...
How it works...
There's more...
Treating the outliers with mean/median imputation
Handling extreme values with capping
Transforming and binning values
Outlier detection with LOF
Chapter 2: What&amp
#x27
s in There - Exploratory Data Analysis
Introduction
Creating standard data summaries
Getting ready
How to do it...
How it works...
There's more...
Using the str() function for an overview of a data frame
Computing the summary and the str() function for a single variable
Finding other measures
Extracting a subset of a dataset
Getting ready
How to do it...
How it works...
There's more.
Excluding columns
Selecting based on multiple values
Selecting using logical vector
Splitting a dataset
Getting ready
How to do it...
How it works...
Creating random data partitions
Getting ready
How to do it...
Case 1 - Numerical target variable and two partitions
Case 2 - Numerical target variable and three partitions
Case 3 - Categorical target variable and two partitions
Case 4 - Categorical target variable and three partitions
How it works...
There's more...
Using a convenience function for partitioning
Sampling from a set of values
Generating standard plots, such as histograms, boxplots, and scatterplots
Getting ready
How to do it...
Creating histograms
Creating boxplots
Creating scatterplots
Creating scatterplot matrices
How it works...
Histograms
Boxplots
There's more...
Overlay a density plot on a histogram
Overlay a regression line on a scatterplot
Color specific points on a scatterplot
Generating multiple plots on a grid
Getting ready
How to do it...
How it works...
Graphics parameters
Creating plots with the lattice package
Getting ready
How to do it...
How it works...
There's more...
Adding flair to your graphs
See also
Creating charts that facilitate comparisons
Getting ready
How to do it...
Using base plotting system
How it works...
There's more...
Creating&amp
#160
beanplots with the beanplot package
See also
Creating charts that help to visualize possible causality
Getting ready
How to do it...
How it works...
See also
Chapter 3: Where Does It Belong? Classification
Introduction
Generating error/classification confusion matrices
Getting ready
How to do it...
How it works...
There's more.
Visualizing the error/classification confusion matrix
Comparing the model's performance for different classes
Principal Component Analysis
Getting ready
How to do it...
How it works...
Generating receiver operating characteristic charts
Getting ready
How to do it...
How it works...
There's more...
Using arbitrary class labels
Building, plotting, and evaluating with classification trees
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Creating the ROC chart
See also
Using random forest models for classification
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Generating the ROC chart
Specifying cutoffs for classification
See also
Classifying using the support vector machine approach
Getting ready
How to do it...
How it works...
There's more...
Controlling the scaling of variables
Determining the type of SVM model
Assigning weights to the classes
Choosing the cost of SVM
Tuning the SVM
See also
Classifying using the Naive Bayes approach
Getting ready
How to do it...
How it works...
See also
Classifying using the KNN approach
Getting ready
How to do it...
How it works...
There's more...
Automating the process of running KNN for many k values
Selecting appropriate values of k using caret
Using KNN to compute raw probabilities instead of classifications
Using neural networks for classification
Getting ready
How to do it...
How it works...
There's more...
Exercising greater control over nnet
Generating raw probabilities and plotting the ROC curve
Classifying using linear discriminant function analysis
Getting ready
How to do it...
How it works...
There's more.
Using the formula interface for lda
See also
Classifying using logistic regression
Getting ready
How to do it...
How it works...
Text classification for sentiment analysis
Getting ready
How to do it...
How it works...
Chapter 4: Give Me a Number - Regression
Introduction
Computing the root-mean-square error
Getting ready
How to do it...
How it works...
There's more...
Using a convenience function to compute the RMS error
Building KNN models for regression
Getting ready
How to do it...
How it works...
There's more...
Running KNN with cross-validation in place of a validation partition
Using a convenience function to run KNN
Using a convenience function to run KNN for multiple k values
See also
Performing linear regression
Getting ready
How to do it...
How it works...
There's more...
Forcing lm to use a specific factor level as the reference
Using other options in the formula expression for linear models
See also
Performing variable selection in linear regression
Getting ready
How to do it...
How it works...
See also
Building regression trees
Getting ready
How to do it...
How it works...
There's more...
Generating regression trees for data with categorical predictors
Generating regression trees using the ensemble method - Bagging and Boosting
See also
Building random forest models for regression
Getting ready
How to do it...
How it works...
There's more...
Controlling forest generation
See also
Using neural networks for regression
Getting ready
How to do it...
How it works...
See also
Performing k-fold cross-validation
Getting ready
How to do it...
How it works...
See also
Performing leave-one-out cross-validation to limit overfitting
How to do it.
How it works.

R data analysis cookbook a journey from data computation to data-driven insights

Ejemplares similares