Big data analytics with R utilize R to uncover hidden patterns in your big data
Utilize R to uncover hidden patterns in your Big Data About This Book Perform computational analyses on Big Data to generate meaningful results Get a practical knowledge of R programming language while working on Big Data platforms like Hadoop, Spark, H2O and SQL/NoSQL databases, Explore fast, strea...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing
2016.
|
Edición: | 1st edition |
Colección: | Community experience distilled.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630261606719 |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- Acknowledgement
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: The Era of Big Data
- Big Data - The monster re-defined
- Big Data toolbox - dealing with the giant
- Hadoop - the elephant in the room
- Databases
- Hadoop Spark-ed up
- R - The unsung Big Data hero
- Summary
- Chapter 2: Introduction to R Programming Language and Statistical Environment
- Learning R
- Revisiting R basics
- Getting R and RStudio ready
- Setting the URLs to R repositories
- R data structures
- Vectors
- Scalars
- Matrices
- Arrays
- Data frames
- Lists
- Exporting R data objects
- Applied data science with R
- Importing data from different formats
- Exploratory Data Analysis
- Data aggregations and contingency tables
- Hypothesis testing and statistical inference
- Tests of differences
- Independent t-test example (with power and effect size estimates)
- ANOVA example
- Tests of relationships
- An example of Pearson's r correlations
- Multiple regression example
- Data visualization packages
- Summary
- Chapter 3: Unleashing the Power of R from Within
- Traditional limitations of R
- Out-of-memory data
- Processing speed
- To the memory limits and beyond
- Data transformations and aggregations with the ff and ffbase packages
- Generalized linear models with the ff and ffbase packages
- Logistic regression example with ffbase and biglm
- Expanding memory with the bigmemory package
- Parallel R
- From bigmemory to faster computations
- An apply() example with the big.matrix object
- A for() loop example with the ffdf object
- Using apply() and for() loop examples on a data.frame
- A parallel package example
- A foreach package example
- The future of parallel processing in R
- Utilizing Graphics Processing Units with R.
- Multi-threading with Microsoft R Open distribution
- Parallel machine learning with H2O and R
- Boosting R performance with the data.table package and other tools
- Fast data import and manipulation with the data.table package
- Data import with data.table
- Lightning-fast subsets and aggregations on data.table
- Chaining, more complex aggregations, and pivot tables with data.table
- Writing better R code
- Summary
- Chapter 4: Hadoop and MapReduce Framework for R
- Hadoop architecture
- Hadoop Distributed File System
- MapReduce framework
- A simple MapReduce word count example
- Other Hadoop native tools
- Learning Hadoop
- A single-node Hadoop in Cloud
- Deploying Hortonworks Sandbox on Azure
- A word count example in Hadoop using Java
- A word count example in Hadoop using the R language
- RStudio Server on a Linux RedHat/CentOS virtual machine
- Installing and configuring RHadoop packages
- HDFS management and MapReduce in R - a word count example
- HDInsight - a multi-node Hadoop cluster on Azure
- Creating your first HDInsight cluster
- Creating a new Resource Group
- Deploying a Virtual Network
- Creating a Network Security Group
- Setting up and configuring an HDInsight cluster
- Starting the cluster and exploring Ambari
- Connecting to the HDInsight cluster and installing RStudio Server
- Adding a new inbound security rule for port 8787
- Editing the Virtual Network's public IP address for the head node
- Smart energy meter readings analysis example - using R on HDInsight cluster
- Summary
- Chapter 5: R with Relational Database Management Systems (RDBMSs)
- Relational Database Management Systems (RDBMSs)
- A short overview of used RDBMSs
- Structured Query Language (SQL)
- SQLite with R
- Preparing and importing data into a local SQLite database
- Connecting to SQLite from RStudio.
- MariaDB with R on a Amazon EC2 instance
- Preparing the EC2 instance and RStudio Server for use
- Preparing MariaDB and data for use
- Working with MariaDB from RStudio
- PostgreSQL with R on Amazon RDS
- Launching an Amazon RDS database instance
- Preparing and uploading data to Amazon RDS
- Remotely querying PostgreSQL on Amazon RDS from RStudio
- Summary
- Chapter 6: R with Non-Relational (NoSQL) Databases
- Introduction to NoSQL databases
- Review of leading non-relational databases
- MongoDB with R
- Introduction to MongoDB
- MongoDB data models
- Installing MongoDB with R on Amazon EC2
- Processing Big Data using MongoDB with R
- Importing data into MongoDB and basic MongoDB commands
- MongoDB with R using the rmongodb package
- MongoDB with R using the RMongo package
- MongoDB with R using the mongolite package
- HBase with R
- Azure HDInsight with HBase and RStudio Server
- Importing the data to HDFS and HBase
- Reading and querying HBase using the rhbase package
- Summary
- Chapter 7: Faster than Hadoop - Spark with R
- Spark for Big Data analytics
- Spark with R on a multi-node HDInsight cluster
- Launching HDInsight with Spark and R/RStudio
- Reading the data into HDFS and Hive
- Getting the data into HDFS
- Importing data from HDFS to Hive
- Bay Area Bike Share analysis using SparkR
- Summary
- Chapter 8: Machine Learning Methods for Big Data in R
- What is machine learning?
- Supervised and unsupervised machine learning methods
- Classification and clustering algorithms
- Machine learning methods with R
- Big Data machine learning tools
- GLM example with Spark and R on the HDInsight cluster
- Preparing the Spark cluster and reading the data from HDFS
- Logistic regression in Spark with R
- Naive Bayes with H2O on Hadoop with R
- Running an H2O instance on Hadoop with R.
- Reading and exploring the data in H2O
- Naive Bayes on H2O with R
- Neural Networks with H2O on Hadoop with R
- How do Neural Networks work?
- Running Deep Learning models on H2O
- Summary
- Chapter 9: The Future of R - Big, Fast, and Smart Data
- The current state of Big Data analytics with R
- Out-of-memory data on a single machine
- Faster data processing with R
- Hadoop with R
- Spark with R
- R with databases
- Machine learning with R
- The future of R
- Big Data
- Fast data
- Smart data
- Where to go next
- Summary
- Index.