R data analysis cookbook a journey from data computation to data-driven insights

Over 80 recipes to help you breeze through your data analysis projects using R About This Book Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes Find meaningful insights from your data and generate dynamic reports A practical guide to help you put...

Descripción completa

Detalles Bibliográficos
Otros Autores: Ganguly, Kuntal, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, [India] : Packt 2017.
Edición:Second edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630746506719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Acquire and Prepare the Ingredients - Your Data
  • Introduction
  • Working with data
  • Reading data from CSV files
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Handling different column delimiters
  • Handling column headers/variable names
  • Handling missing values
  • Reading strings as characters and not as factors
  • Reading data directly from a website
  • Reading XML data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Extracting HTML table data from a web page
  • Extracting a single HTML table from a web page
  • Reading JSON data
  • Getting ready
  • How to do it...
  • How it works...
  • Reading data from fixed-width formatted files
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Files with headers
  • Excluding columns from data
  • Reading data from R files and R libraries
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Saving all objects in a session
  • Saving objects selectively in a session
  • Attaching/detaching R data files to an environment
  • Listing all datasets in loaded packages
  • Removing cases with missing values
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Eliminating cases with NA for selected variables
  • Finding cases that have no missing values
  • Converting specific values to NA
  • Excluding NA values from computations
  • Replacing missing values with the mean
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Imputing random values sampled from non-missing values
  • Removing duplicate cases
  • Getting ready
  • How to do it...
  • How it works...
  • There's more.
  • Identifying duplicates without deleting them
  • Rescaling a variable to specified min-max range
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Rescaling many variables at once
  • See also
  • Normalizing or standardizing data in a data frame
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Standardizing several variables simultaneously
  • See also
  • Binning numerical data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Creating a specified number of intervals automatically
  • Creating dummies for categorical variables
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Choosing which variables to create dummies for
  • Handling missing data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Understanding missing data pattern
  • Correcting data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Combining multiple columns to single columns
  • Splitting single column to multiple columns
  • Imputing data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Detecting outliers
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Treating the outliers with mean/median imputation
  • Handling extreme values with capping
  • Transforming and binning values
  • Outlier detection with LOF
  • Chapter 2: What&amp
  • #x27
  • s in There - Exploratory Data Analysis
  • Introduction
  • Creating standard data summaries
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using the str() function for an overview of a data frame
  • Computing the summary and the str() function for a single variable
  • Finding other measures
  • Extracting a subset of a dataset
  • Getting ready
  • How to do it...
  • How it works...
  • There's more.
  • Excluding columns
  • Selecting based on multiple values
  • Selecting using logical vector
  • Splitting a dataset
  • Getting ready
  • How to do it...
  • How it works...
  • Creating random data partitions
  • Getting ready
  • How to do it...
  • Case 1 - Numerical target variable and two partitions
  • Case 2 - Numerical target variable and three partitions
  • Case 3 - Categorical target variable and two partitions
  • Case 4 - Categorical target variable and three partitions
  • How it works...
  • There's more...
  • Using a convenience function for partitioning
  • Sampling from a set of values
  • Generating standard plots, such as histograms, boxplots, and scatterplots
  • Getting ready
  • How to do it...
  • Creating histograms
  • Creating boxplots
  • Creating scatterplots
  • Creating scatterplot matrices
  • How it works...
  • Histograms
  • Boxplots
  • There's more...
  • Overlay a density plot on a histogram
  • Overlay a regression line on a scatterplot
  • Color specific points on a scatterplot
  • Generating multiple plots on a grid
  • Getting ready
  • How to do it...
  • How it works...
  • Graphics parameters
  • Creating plots with the lattice package
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Adding flair to your graphs
  • See also
  • Creating charts that facilitate comparisons
  • Getting ready
  • How to do it...
  • Using base plotting system
  • How it works...
  • There's more...
  • Creating&amp
  • #160
  • beanplots with the beanplot package
  • See also
  • Creating charts that help to visualize possible causality
  • Getting ready
  • How to do it...
  • How it works...
  • See also
  • Chapter 3: Where Does It Belong? Classification
  • Introduction
  • Generating error/classification confusion matrices
  • Getting ready
  • How to do it...
  • How it works...
  • There's more.
  • Visualizing the error/classification confusion matrix
  • Comparing the model's performance for different classes
  • Principal Component Analysis
  • Getting ready
  • How to do it...
  • How it works...
  • Generating receiver operating characteristic charts
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using arbitrary class labels
  • Building, plotting, and evaluating with classification trees
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Computing raw probabilities
  • Creating the ROC chart
  • See also
  • Using random forest models for classification
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Computing raw probabilities
  • Generating the ROC chart
  • Specifying cutoffs for classification
  • See also
  • Classifying using the support vector machine approach
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Controlling the scaling of variables
  • Determining the type of SVM model
  • Assigning weights to the classes
  • Choosing the cost of SVM
  • Tuning the SVM
  • See also
  • Classifying using the Naive Bayes approach
  • Getting ready
  • How to do it...
  • How it works...
  • See also
  • Classifying using the KNN approach
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Automating the process of running KNN for many k values
  • Selecting appropriate values of k using caret
  • Using KNN to compute raw probabilities instead of classifications
  • Using neural networks for classification
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Exercising greater control over nnet
  • Generating raw probabilities and plotting the ROC curve
  • Classifying using linear discriminant function analysis
  • Getting ready
  • How to do it...
  • How it works...
  • There's more.
  • Using the formula interface for lda
  • See also
  • Classifying using logistic regression
  • Getting ready
  • How to do it...
  • How it works...
  • Text classification for sentiment analysis
  • Getting ready
  • How to do it...
  • How it works...
  • Chapter 4: Give Me a Number - Regression
  • Introduction
  • Computing the root-mean-square error
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using a convenience function to compute the RMS error
  • Building KNN models for regression
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Running KNN with cross-validation in place of a validation partition
  • Using a convenience function to run KNN
  • Using a convenience function to run KNN for multiple k values
  • See also
  • Performing linear regression
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Forcing lm to use a specific factor level as the reference
  • Using other options in the formula expression for linear models
  • See also
  • Performing variable selection in linear regression
  • Getting ready
  • How to do it...
  • How it works...
  • See also
  • Building regression trees
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Generating regression trees for data with categorical predictors
  • Generating regression trees using the ensemble method - Bagging and Boosting
  • See also
  • Building random forest models for regression
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Controlling forest generation
  • See also
  • Using neural networks for regression
  • Getting ready
  • How to do it...
  • How it works...
  • See also
  • Performing k-fold cross-validation
  • Getting ready
  • How to do it...
  • How it works...
  • See also
  • Performing leave-one-out cross-validation to limit overfitting
  • How to do it.
  • How it works.