A data scientist's guide to acquiring, cleaning and managing data in R

The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleani...

Descripción completa

Detalles Bibliográficos
Otros Autores: Buttrey, Samuel, author (author), Whitaker, Lyn R., author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Hoboken, New Jersey : Wiley 2017.
Edición:1st edition
Colección:THEi Wiley ebooks.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631545006719
Tabla de Contenidos:
  • Intro
  • Title Page
  • Copyright
  • Dedication
  • Table of Contents
  • About the Authors
  • Preface
  • Acknowledgments
  • About the Companion Website
  • chapter 1: R
  • 1.1 Introduction
  • 1.2 Data
  • 1.3 The Very Basics of R
  • 1.4 Running an R Session
  • 1.5 Getting Help
  • 1.6 How to Use This Book
  • Chapter 2: R Data, Part 1: Vectors
  • 2.1 Vectors
  • 2.2 Data Types
  • 2.3 Subsets of Vectors
  • 2.4 Missing Data (NA) and Other Special Values
  • 2.5 The table() Function
  • 2.6 Other Actions on Vectors
  • 2.7 Long Vectors and Big Data
  • 2.8 Chapter Summary and Critical Data Handling Tools
  • Chapter 3: R Data, Part 2: More Complicated Structures
  • 3.1 Introduction
  • 3.2 Matrices
  • 3.3 Lists
  • 3.4 Data Frames
  • 3.5 Operating on Lists and Data Frames
  • 3.6 Date and Time Objects
  • 3.7 Other Actions on Data Frames
  • 3.8 Handling Big Data
  • 3.9 Chapter Summary and Critical Data Handling Tools
  • chapter 4: R Data, Part 3: Text and Factors
  • 4.1 Character Data
  • 4.2 Converting Numbers into Text
  • 4.3 Constructing Character Strings: Paste in Action
  • 4.4 Regular Expressions
  • 4.5 UTF-8 and Other Non-ASCII Characters
  • 4.6 Factors
  • 4.7 R Object Names and Commands as Text
  • 4.8 Chapter Summary and Critical Data Handling Tools
  • Chapter 5: Writing Functions and Scripts
  • 5.1 Functions
  • 5.2 Scripts and Shell Scripts
  • 5.3 Error Handling and Debugging
  • 5.4 Interacting with the Operating System
  • 5.5 Speeding Things Up
  • 5.6 Chapter Summary and Critical Data Handling Tools
  • Chapter 6: Getting Data into and out of R
  • 6.1 Reading Tabular ASCII Data into Data Frames
  • 6.2 Reading Large, Non-Tabular, or Non-ASCII Data
  • 6.3 Reading Data From Relational Databases
  • 6.4 Handling Large Numbers of Input Files
  • 6.5 Other Formats
  • 6.6 Reading and Writing R Data Directly
  • 6.7 Chapter Summary and Critical Data Handling Tools.
  • Chapter 7: Data Handling in Practice
  • 7.1 Acquiring and Reading Data
  • 7.2 Cleaning Data
  • 7.3 Combining Data
  • 7.4 Transactional Data
  • 7.5 Preparing Data
  • 7.6 Documentation and Reproducibility
  • 7.7 The Role of Judgment
  • 7.8 Data Cleaning in Action
  • 7.9 Chapter Summary and Critical Data Handling Tools
  • Chapter 8: Extended Exercise
  • 8.1 Introduction to the Problem
  • 8.2 The Data
  • 8.3 Five Important Fields
  • 8.4 Loan and Application Portfolios
  • 8.5 Scores
  • 8.6 Co-borrower Scores
  • 8.7 Updated KScores
  • 8.8 Loans to Be Excluded
  • 8.9 Response Variable
  • 8.10 Assembling the Final Data Sets
  • Appendix A: Hints and Pseudocode
  • A.1 Loan Portfolios
  • A.2 Scores Database
  • A.3 Co-borrower Scores
  • A.4 Updated KScores
  • A.5 Excluder Files
  • A.6 Payment Matrix
  • A.7 Starting the Modeling Process
  • Bibliography
  • Index
  • End User License Agreement.