Mastering data analysis with R gain clear insights into your data and solve real-world data science problems with R-- from data munging to modeling and visualization
Gain sharp insights into your data and solve real-world data science problems with R-from data munging to modeling and visualization About This Book Handle your data with precision and care for optimal business intelligence Restructure and transform your data to inform decision-making Packed with pr...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing
2015.
|
Edición: | 1st edition |
Colección: | Community experience distilled.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629794206719 |
Tabla de Contenidos:
- Cover ; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Hello, Data!; Loading text files of a reasonable size; Data files larger than the physical memory; Benchmarking text file parsers; Loading a subset of text files; Filtering flat files before loading to R; Loading data from databases; Setting up the test environment; MySQL and MariaDB; PostgreSQL; Oracle database; ODBC database access; Using a graphical user interface to connect to databases; Other database backends; Importing data from other statistical systems
- Loading Excel spreadsheetsSummary; Chapter 2: Getting Data from the Web; Loading datasets from the Internet; Other popular online data formats; Reading data from HTML tables; Reading tabular data from static Web pages; Scraping data from other online sources; R packages to interact with data source APIs; Socrata Open Data API; Finance APIs; Fetching time series with Quandl; Google documents and analytics; Online search trends; Historical weather data; Other online data sources; Summary; Chapter 3: Filtering and Summarizing Data; Drop needless data; Drop needless data in an efficient way
- Drop needless data in another efficient wayAggregation; Quicker aggregation with base R commands; Convenient helper functions; High-performance helper functions; Aggregate with data.table; Running benchmarks; Summary functions; Adding up the number of cases in subgroups; Summary; Chapter 4: Restructuring Data; Transposing matrices; Filtering data by string matching; Rearranging data; dplyr versus data.table; Computing new variables; Memory profiling; Creating multiple variables at a time; Computing new variables with dplyr; Merging datasets; Reshaping data in a flexible way
- Converting wide tables to the long table formatConverting long tables to the wide table format; Tweaking performance; The evolution of the reshape packages; Summary; Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth); The motivation behind multivariate models; Linear regression with continuous predictors; Model interpretation; Multiple predictors; Model assumptions; How well does the line fit in the data?; Discrete predictors; Summary; Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth); The modeling workflow; Logistic regression
- Data considerationsGoodness of model fit; Model comparison; Models for count data; Poisson regression; Negative binomial regression; Multivariate non-linear models; Summary; Chapter 7: Unstructured Data; Importing the corpus; Cleaning the corpus; Visualizing the most frequent words in the corpus; Further cleanup; Stemming words; Lemmatisation; Analyzing the associations among terms; Some other metrics; The segmentation of documents; Summary; Chapter 8: Polishing Data; The types and origins of missing data; Identifying missing data; By-passing missing values
- Overriding the default arguments of a function