A data scientist's guide to acquiring, cleaning and managing data in R
The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleani...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Hoboken, New Jersey :
Wiley
2017.
|
Edición: | 1st edition |
Colección: | THEi Wiley ebooks.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631545006719 |
Tabla de Contenidos:
- Intro
- Title Page
- Copyright
- Dedication
- Table of Contents
- About the Authors
- Preface
- Acknowledgments
- About the Companion Website
- chapter 1: R
- 1.1 Introduction
- 1.2 Data
- 1.3 The Very Basics of R
- 1.4 Running an R Session
- 1.5 Getting Help
- 1.6 How to Use This Book
- Chapter 2: R Data, Part 1: Vectors
- 2.1 Vectors
- 2.2 Data Types
- 2.3 Subsets of Vectors
- 2.4 Missing Data (NA) and Other Special Values
- 2.5 The table() Function
- 2.6 Other Actions on Vectors
- 2.7 Long Vectors and Big Data
- 2.8 Chapter Summary and Critical Data Handling Tools
- Chapter 3: R Data, Part 2: More Complicated Structures
- 3.1 Introduction
- 3.2 Matrices
- 3.3 Lists
- 3.4 Data Frames
- 3.5 Operating on Lists and Data Frames
- 3.6 Date and Time Objects
- 3.7 Other Actions on Data Frames
- 3.8 Handling Big Data
- 3.9 Chapter Summary and Critical Data Handling Tools
- chapter 4: R Data, Part 3: Text and Factors
- 4.1 Character Data
- 4.2 Converting Numbers into Text
- 4.3 Constructing Character Strings: Paste in Action
- 4.4 Regular Expressions
- 4.5 UTF-8 and Other Non-ASCII Characters
- 4.6 Factors
- 4.7 R Object Names and Commands as Text
- 4.8 Chapter Summary and Critical Data Handling Tools
- Chapter 5: Writing Functions and Scripts
- 5.1 Functions
- 5.2 Scripts and Shell Scripts
- 5.3 Error Handling and Debugging
- 5.4 Interacting with the Operating System
- 5.5 Speeding Things Up
- 5.6 Chapter Summary and Critical Data Handling Tools
- Chapter 6: Getting Data into and out of R
- 6.1 Reading Tabular ASCII Data into Data Frames
- 6.2 Reading Large, Non-Tabular, or Non-ASCII Data
- 6.3 Reading Data From Relational Databases
- 6.4 Handling Large Numbers of Input Files
- 6.5 Other Formats
- 6.6 Reading and Writing R Data Directly
- 6.7 Chapter Summary and Critical Data Handling Tools.
- Chapter 7: Data Handling in Practice
- 7.1 Acquiring and Reading Data
- 7.2 Cleaning Data
- 7.3 Combining Data
- 7.4 Transactional Data
- 7.5 Preparing Data
- 7.6 Documentation and Reproducibility
- 7.7 The Role of Judgment
- 7.8 Data Cleaning in Action
- 7.9 Chapter Summary and Critical Data Handling Tools
- Chapter 8: Extended Exercise
- 8.1 Introduction to the Problem
- 8.2 The Data
- 8.3 Five Important Fields
- 8.4 Loan and Application Portfolios
- 8.5 Scores
- 8.6 Co-borrower Scores
- 8.7 Updated KScores
- 8.8 Loans to Be Excluded
- 8.9 Response Variable
- 8.10 Assembling the Final Data Sets
- Appendix A: Hints and Pseudocode
- A.1 Loan Portfolios
- A.2 Scores Database
- A.3 Co-borrower Scores
- A.4 Updated KScores
- A.5 Excluder Files
- A.6 Payment Matrix
- A.7 Starting the Modeling Process
- Bibliography
- Index
- End User License Agreement.