R programming by example practical, hands-on projects to help you get started with R
This step-by-step guide demonstrates how to build simple-to-advanced applications through examples in R using modern tools. About This Book Get a firm hold on the fundamentals of R through practical hands-on examples Get started with good R programming fundamentals for data science Exploit the diffe...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt
2017.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630715506719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Introduction to R
- What R is and what it isn't
- The inspiration for R - the S language
- R is a high quality statistical computing system
- R is a flexible programming language
- R is free, as in freedom and as in free beer
- What R is not good for
- Comparing R with other software
- The interpreter and the console
- Tools to work efficiently with R
- Pick an IDE or a powerful editor
- The send to console functionality
- The efficient write-execute loop
- Executing R code in non-interactive sessions
- How to use this book
- Tracking state with symbols and variables
- Working with data types and data structures
- Numerics
- Special values
- Characters
- Logicals
- Vectors
- Factors
- Matrices
- Lists
- Data frames
- Divide and conquer with functions
- Optional arguments
- Functions as arguments
- Operators are functions
- Coercion
- Complex logic with control structures
- If… else conditionals
- For loops
- While loops
- The examples in this book
- Summary
- Chapter 2: Understanding Votes with Descriptive Statistics
- This chapter's required packages
- The Brexit votes example
- Cleaning and setting up the data
- Summarizing the data into a data frame
- Getting intuition with graphs and correlations
- Visualizing variable distributions
- Using matrix scatter plots for a quick overview
- Getting a better look with detailed scatter plots
- Understanding interactions with correlations
- Creating a new dataset with what we've learned
- Building new variables with principal components
- Putting it all together into high-quality code
- Planning before programming
- Understanding the fundamentals of high-quality code.
- Programming by visualizing the big picture
- Summary
- Chapter 3: Predicting Votes with Linear Models
- Required packages
- Setting up the data
- Training and testing datasets
- Predicting votes with linear models
- Checking model assumptions
- Checking linearity with scatter plots
- Checking normality with histograms and quantile-quantile plots
- Checking homoscedasticity with residual plots
- Checking no collinearity with correlations
- Measuring accuracy with score functions
- Programatically finding the best model
- Generating model combinations
- Predicting votes from wards with unknown data
- Summary
- Chapter 4: Simulating Sales Data and Working with Databases
- Required packages
- Designing our data tables
- The basic variables
- Simplifying assumptions
- Potential pitfalls
- The too-much-empty-space problem
- The too-much-repeated-data problem
- Simulating the sales data
- Simulating numeric data according to distribution assumptions
- Simulating categorical values using factors
- Simulating dates within a range
- Simulating numbers under shared restrictions
- Simulating strings for complex identifiers
- Putting everything together
- Simulating the client data
- Simulating the client messages data
- Working with relational databases
- Summary
- Chapter 5: Communicating Sales with Visualizations
- Required packages
- Extending our data with profit metrics
- Building blocks for reusable high-quality graphs
- Starting with simple applications for bar graphs
- Adding a third dimension with colors
- Graphing top performers with bar graphs
- Graphing disaggregated data with boxplots
- Scatter plots with joint and marginal distributions
- Pricing and profitability by protein source and continent
- Client birth dates, gender, and ratings
- Developing our own graph type - radar graphs.
- Exploring with interactive 3D scatter plots
- Looking at dynamic data with time-series
- Looking at geographical data with static maps
- Navigating geographical data with interactive maps
- Maps you can navigate and zoom-in to
- High-tech-looking interactive globe
- Summary
- Chapter 6: Understanding Reviews with Text Analysis
- This chapter's required packages
- What is text analysis and how does it work?
- Preparing, training, and testing data
- Building the corpus with tokenization and data cleaning
- Document feature matrices
- Training models with cross validation
- Training our first predictive model
- Improving speed with parallelization
- Computing predictive accuracy and confusion matrices
- Improving our results with TF-IDF
- Adding flexibility with N-grams
- Reducing dimensionality with SVD
- Extending our analysis with cosine similarity
- Digging deeper with sentiment analysis
- Testing our predictive model with unseen data
- Retrieving text data from Twitter
- Summary
- Chapter 7: Developing Automatic Presentations
- Required packages
- Why invest in automation?
- Literate programming as a content creation methodology
- Reproducibility as a benefit of literate programming
- The basic tools for an automation pipeline
- A gentle introduction to Markdown
- Text
- Headers
- Header Level 1
- Header Level 2
- Header Level 3
- Header Level 4
- Lists
- Tables
- Links
- Images
- Quotes
- Code
- Mathematics
- Extending Markdown with R Markdown
- Code chunks
- Tables
- Graphs
- Chunk options
- Global chunk options
- Caching
- Producing the final output with knitr
- Developing graphs and analysis as we normally would
- Building our presentation with R Markdown
- Summary
- Chapter 8: Object-Oriented System to Track Cryptocurrencies
- This chapter's required packages.
- The cryptocurrencies example
- A brief introduction to object-oriented programming
- The purpose of object-oriented programming
- Important concepts behind object-oriented languages
- Encapsulation
- Polymorphism
- Hierarchies
- Classes and constructors
- Public and private methods
- Interfaces, factories, and patterns in general
- Introducing three object models in R - S3, S4, and R6
- The first source of confusion - various object models
- The second source of confusion - generic functions
- The S3 object model
- Classes, constructors, and composition
- Public methods and polymorphism
- Encapsulation and mutability
- Inheritance
- The S4 object model
- Classes, constructors, and composition
- Public methods and polymorphism
- Encapsulation and mutability
- Inheritance
- The R6 object model
- Classes, constructors, and composition
- Public methods and polymorphism
- Encapsulation and mutability
- Inheritance
- Active bindings
- Finalizers
- The architecture behind our cryptocurrencies system
- Starting simple with timestamps using S3 classes
- Implementing cryptocurrency assets using S4 classes
- Implementing our storage layer with R6 classes
- Communicating available behavior with a database interface
- Implementing a database-like storage system with CSV files
- Easily allowing new database integration with a factory
- Encapsulating multiple databases with a storage layer
- Retrieving live data for markets and wallets with R6 classes
- Creating a very simple requester to isolate API calls
- Developing our exchanges infrastructure
- Developing our wallets infrastructure
- Implementing our wallet requesters
- Finally introducing users with S3 classes
- Helping ourselves with a centralized settings file
- Saving our initial user data into the system
- Activating our system with two simple functions.
- Some advice when working with object-oriented systems
- Summary
- Chapter 9: Implementing an Efficient Simple Moving Average
- Required packages
- Starting by using good algorithms
- Just how much impact can algorithm selection have?
- How fast is fast enough?
- Calculating simple moving averages inefficiently
- Simulating the time-series
- Our first (very inefficient) attempt at an SMA
- Understanding why R can be slow
- Object immutability
- Interpreted dynamic typings
- Memory-bound processes
- Single-threaded processes
- Measuring by profiling and benchmarking
- Profiling fundamentals with Rprof()
- Benchmarking manually with system.time()
- Benchmarking automatically with microbenchmark()
- Easily achieving high benefit - cost improvements
- Using the simple data structure for the job
- Vectorizing as much as possible
- Removing unnecessary logic
- Moving checks out of iterative processes
- If you can, avoid iterating at all
- Using R's way of iterating efficiently
- Avoiding sending data structures with overheads
- Using parallelization to divide and conquer
- How deep does the parallelization rabbit hole go?
- Practical parallelization with R
- Using C++ and Fortran to accelerate calculations
- Using an old-school approach with Fortran
- Using a modern approach with C++
- Looking back at what we have achieved
- Other topics of interest to enhance performance
- Preallocating memory to avoid duplication
- Making R code a bit faster with byte code compilation
- Just-in-time (JIT) compilation of R code
- Using memoization or cache layers
- Improving our data and memory management
- Using specialized packages for performance
- Flexibility and power with cloud computing
- Specialized R distributions
- Summary
- Chapter 10: Adding Interactivity with Dashboards
- Required packages.
- Introducing the Shiny application architecture and reactivity.