Big data analytics with R utilize R to uncover hidden patterns in your big data

Utilize R to uncover hidden patterns in your Big Data About This Book Perform computational analyses on Big Data to generate meaningful results Get a practical knowledge of R programming language while working on Big Data platforms like Hadoop, Spark, H2O and SQL/NoSQL databases, Explore fast, strea...

Descripción completa

Detalles Bibliográficos
Otros Autores: Walkowiak, Simon, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing 2016.
Edición:1st edition
Colección:Community experience distilled.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630261606719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • Acknowledgement
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: The Era of Big Data
  • Big Data - The monster re-defined
  • Big Data toolbox - dealing with the giant
  • Hadoop - the elephant in the room
  • Databases
  • Hadoop Spark-ed up
  • R - The unsung Big Data hero
  • Summary
  • Chapter 2: Introduction to R Programming Language and Statistical Environment
  • Learning R
  • Revisiting R basics
  • Getting R and RStudio ready
  • Setting the URLs to R repositories
  • R data structures
  • Vectors
  • Scalars
  • Matrices
  • Arrays
  • Data frames
  • Lists
  • Exporting R data objects
  • Applied data science with R
  • Importing data from different formats
  • Exploratory Data Analysis
  • Data aggregations and contingency tables
  • Hypothesis testing and statistical inference
  • Tests of differences
  • Independent t-test example (with power and effect size estimates)
  • ANOVA example
  • Tests of relationships
  • An example of Pearson's r correlations
  • Multiple regression example
  • Data visualization packages
  • Summary
  • Chapter 3: Unleashing the Power of R from Within
  • Traditional limitations of R
  • Out-of-memory data
  • Processing speed
  • To the memory limits and beyond
  • Data transformations and aggregations with the ff and ffbase packages
  • Generalized linear models with the ff and ffbase packages
  • Logistic regression example with ffbase and biglm
  • Expanding memory with the bigmemory package
  • Parallel R
  • From bigmemory to faster computations
  • An apply() example with the big.matrix object
  • A for() loop example with the ffdf object
  • Using apply() and for() loop examples on a data.frame
  • A parallel package example
  • A foreach package example
  • The future of parallel processing in R
  • Utilizing Graphics Processing Units with R.
  • Multi-threading with Microsoft R Open distribution
  • Parallel machine learning with H2O and R
  • Boosting R performance with the data.table package and other tools
  • Fast data import and manipulation with the data.table package
  • Data import with data.table
  • Lightning-fast subsets and aggregations on data.table
  • Chaining, more complex aggregations, and pivot tables with data.table
  • Writing better R code
  • Summary
  • Chapter 4: Hadoop and MapReduce Framework for R
  • Hadoop architecture
  • Hadoop Distributed File System
  • MapReduce framework
  • A simple MapReduce word count example
  • Other Hadoop native tools
  • Learning Hadoop
  • A single-node Hadoop in Cloud
  • Deploying Hortonworks Sandbox on Azure
  • A word count example in Hadoop using Java
  • A word count example in Hadoop using the R language
  • RStudio Server on a Linux RedHat/CentOS virtual machine
  • Installing and configuring RHadoop packages
  • HDFS management and MapReduce in R - a word count example
  • HDInsight - a multi-node Hadoop cluster on Azure
  • Creating your first HDInsight cluster
  • Creating a new Resource Group
  • Deploying a Virtual Network
  • Creating a Network Security Group
  • Setting up and configuring an HDInsight cluster
  • Starting the cluster and exploring Ambari
  • Connecting to the HDInsight cluster and installing RStudio Server
  • Adding a new inbound security rule for port 8787
  • Editing the Virtual Network's public IP address for the head node
  • Smart energy meter readings analysis example - using R on HDInsight cluster
  • Summary
  • Chapter 5: R with Relational Database Management Systems (RDBMSs)
  • Relational Database Management Systems (RDBMSs)
  • A short overview of used RDBMSs
  • Structured Query Language (SQL)
  • SQLite with R
  • Preparing and importing data into a local SQLite database
  • Connecting to SQLite from RStudio.
  • MariaDB with R on a Amazon EC2 instance
  • Preparing the EC2 instance and RStudio Server for use
  • Preparing MariaDB and data for use
  • Working with MariaDB from RStudio
  • PostgreSQL with R on Amazon RDS
  • Launching an Amazon RDS database instance
  • Preparing and uploading data to Amazon RDS
  • Remotely querying PostgreSQL on Amazon RDS from RStudio
  • Summary
  • Chapter 6: R with Non-Relational (NoSQL) Databases
  • Introduction to NoSQL databases
  • Review of leading non-relational databases
  • MongoDB with R
  • Introduction to MongoDB
  • MongoDB data models
  • Installing MongoDB with R on Amazon EC2
  • Processing Big Data using MongoDB with R
  • Importing data into MongoDB and basic MongoDB commands
  • MongoDB with R using the rmongodb package
  • MongoDB with R using the RMongo package
  • MongoDB with R using the mongolite package
  • HBase with R
  • Azure HDInsight with HBase and RStudio Server
  • Importing the data to HDFS and HBase
  • Reading and querying HBase using the rhbase package
  • Summary
  • Chapter 7: Faster than Hadoop - Spark with R
  • Spark for Big Data analytics
  • Spark with R on a multi-node HDInsight cluster
  • Launching HDInsight with Spark and R/RStudio
  • Reading the data into HDFS and Hive
  • Getting the data into HDFS
  • Importing data from HDFS to Hive
  • Bay Area Bike Share analysis using SparkR
  • Summary
  • Chapter 8: Machine Learning Methods for Big Data in R
  • What is machine learning?
  • Supervised and unsupervised machine learning methods
  • Classification and clustering algorithms
  • Machine learning methods with R
  • Big Data machine learning tools
  • GLM example with Spark and R on the HDInsight cluster
  • Preparing the Spark cluster and reading the data from HDFS
  • Logistic regression in Spark with R
  • Naive Bayes with H2O on Hadoop with R
  • Running an H2O instance on Hadoop with R.
  • Reading and exploring the data in H2O
  • Naive Bayes on H2O with R
  • Neural Networks with H2O on Hadoop with R
  • How do Neural Networks work?
  • Running Deep Learning models on H2O
  • Summary
  • Chapter 9: The Future of R - Big, Fast, and Smart Data
  • The current state of Big Data analytics with R
  • Out-of-memory data on a single machine
  • Faster data processing with R
  • Hadoop with R
  • Spark with R
  • R with databases
  • Machine learning with R
  • The future of R
  • Big Data
  • Fast data
  • Smart data
  • Where to go next
  • Summary
  • Index.