R data analysis cookbook a journey from data computation to data-driven insights
Over 80 recipes to help you breeze through your data analysis projects using R About This Book Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes Find meaningful insights from your data and generate dynamic reports A practical guide to help you put...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England ; Mumbai, [India] :
Packt
2017.
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630746506719 |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Acquire and Prepare the Ingredients - Your Data
- Introduction
- Working with data
- Reading data from CSV files
- Getting ready
- How to do it...
- How it works...
- There's more...
- Handling different column delimiters
- Handling column headers/variable names
- Handling missing values
- Reading strings as characters and not as factors
- Reading data directly from a website
- Reading XML data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Extracting HTML table data from a web page
- Extracting a single HTML table from a web page
- Reading JSON data
- Getting ready
- How to do it...
- How it works...
- Reading data from fixed-width formatted files
- Getting ready
- How to do it...
- How it works...
- There's more...
- Files with headers
- Excluding columns from data
- Reading data from R files and R libraries
- Getting ready
- How to do it...
- How it works...
- There's more...
- Saving all objects in a session
- Saving objects selectively in a session
- Attaching/detaching R data files to an environment
- Listing all datasets in loaded packages
- Removing cases with missing values
- Getting ready
- How to do it...
- How it works...
- There's more...
- Eliminating cases with NA for selected variables
- Finding cases that have no missing values
- Converting specific values to NA
- Excluding NA values from computations
- Replacing missing values with the mean
- Getting ready
- How to do it...
- How it works...
- There's more...
- Imputing random values sampled from non-missing values
- Removing duplicate cases
- Getting ready
- How to do it...
- How it works...
- There's more.
- Identifying duplicates without deleting them
- Rescaling a variable to specified min-max range
- Getting ready
- How to do it...
- How it works...
- There's more...
- Rescaling many variables at once
- See also
- Normalizing or standardizing data in a data frame
- Getting ready
- How to do it...
- How it works...
- There's more...
- Standardizing several variables simultaneously
- See also
- Binning numerical data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Creating a specified number of intervals automatically
- Creating dummies for categorical variables
- Getting ready
- How to do it...
- How it works...
- There's more...
- Choosing which variables to create dummies for
- Handling missing data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Understanding missing data pattern
- Correcting data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Combining multiple columns to single columns
- Splitting single column to multiple columns
- Imputing data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Detecting outliers
- Getting ready
- How to do it...
- How it works...
- There's more...
- Treating the outliers with mean/median imputation
- Handling extreme values with capping
- Transforming and binning values
- Outlier detection with LOF
- Chapter 2: What&
- #x27
- s in There - Exploratory Data Analysis
- Introduction
- Creating standard data summaries
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using the str() function for an overview of a data frame
- Computing the summary and the str() function for a single variable
- Finding other measures
- Extracting a subset of a dataset
- Getting ready
- How to do it...
- How it works...
- There's more.
- Excluding columns
- Selecting based on multiple values
- Selecting using logical vector
- Splitting a dataset
- Getting ready
- How to do it...
- How it works...
- Creating random data partitions
- Getting ready
- How to do it...
- Case 1 - Numerical target variable and two partitions
- Case 2 - Numerical target variable and three partitions
- Case 3 - Categorical target variable and two partitions
- Case 4 - Categorical target variable and three partitions
- How it works...
- There's more...
- Using a convenience function for partitioning
- Sampling from a set of values
- Generating standard plots, such as histograms, boxplots, and scatterplots
- Getting ready
- How to do it...
- Creating histograms
- Creating boxplots
- Creating scatterplots
- Creating scatterplot matrices
- How it works...
- Histograms
- Boxplots
- There's more...
- Overlay a density plot on a histogram
- Overlay a regression line on a scatterplot
- Color specific points on a scatterplot
- Generating multiple plots on a grid
- Getting ready
- How to do it...
- How it works...
- Graphics parameters
- Creating plots with the lattice package
- Getting ready
- How to do it...
- How it works...
- There's more...
- Adding flair to your graphs
- See also
- Creating charts that facilitate comparisons
- Getting ready
- How to do it...
- Using base plotting system
- How it works...
- There's more...
- Creating&
- #160
- beanplots with the beanplot package
- See also
- Creating charts that help to visualize possible causality
- Getting ready
- How to do it...
- How it works...
- See also
- Chapter 3: Where Does It Belong? Classification
- Introduction
- Generating error/classification confusion matrices
- Getting ready
- How to do it...
- How it works...
- There's more.
- Visualizing the error/classification confusion matrix
- Comparing the model's performance for different classes
- Principal Component Analysis
- Getting ready
- How to do it...
- How it works...
- Generating receiver operating characteristic charts
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using arbitrary class labels
- Building, plotting, and evaluating with classification trees
- Getting ready
- How to do it...
- How it works...
- There's more...
- Computing raw probabilities
- Creating the ROC chart
- See also
- Using random forest models for classification
- Getting ready
- How to do it...
- How it works...
- There's more...
- Computing raw probabilities
- Generating the ROC chart
- Specifying cutoffs for classification
- See also
- Classifying using the support vector machine approach
- Getting ready
- How to do it...
- How it works...
- There's more...
- Controlling the scaling of variables
- Determining the type of SVM model
- Assigning weights to the classes
- Choosing the cost of SVM
- Tuning the SVM
- See also
- Classifying using the Naive Bayes approach
- Getting ready
- How to do it...
- How it works...
- See also
- Classifying using the KNN approach
- Getting ready
- How to do it...
- How it works...
- There's more...
- Automating the process of running KNN for many k values
- Selecting appropriate values of k using caret
- Using KNN to compute raw probabilities instead of classifications
- Using neural networks for classification
- Getting ready
- How to do it...
- How it works...
- There's more...
- Exercising greater control over nnet
- Generating raw probabilities and plotting the ROC curve
- Classifying using linear discriminant function analysis
- Getting ready
- How to do it...
- How it works...
- There's more.
- Using the formula interface for lda
- See also
- Classifying using logistic regression
- Getting ready
- How to do it...
- How it works...
- Text classification for sentiment analysis
- Getting ready
- How to do it...
- How it works...
- Chapter 4: Give Me a Number - Regression
- Introduction
- Computing the root-mean-square error
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using a convenience function to compute the RMS error
- Building KNN models for regression
- Getting ready
- How to do it...
- How it works...
- There's more...
- Running KNN with cross-validation in place of a validation partition
- Using a convenience function to run KNN
- Using a convenience function to run KNN for multiple k values
- See also
- Performing linear regression
- Getting ready
- How to do it...
- How it works...
- There's more...
- Forcing lm to use a specific factor level as the reference
- Using other options in the formula expression for linear models
- See also
- Performing variable selection in linear regression
- Getting ready
- How to do it...
- How it works...
- See also
- Building regression trees
- Getting ready
- How to do it...
- How it works...
- There's more...
- Generating regression trees for data with categorical predictors
- Generating regression trees using the ensemble method - Bagging and Boosting
- See also
- Building random forest models for regression
- Getting ready
- How to do it...
- How it works...
- There's more...
- Controlling forest generation
- See also
- Using neural networks for regression
- Getting ready
- How to do it...
- How it works...
- See also
- Performing k-fold cross-validation
- Getting ready
- How to do it...
- How it works...
- See also
- Performing leave-one-out cross-validation to limit overfitting
- How to do it.
- How it works.