Apache Spark for data science cookbook overinsightful 90 recipes to get lightning-fast analytics with Apache Spark

Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas t...

Descripción completa

Detalles Bibliográficos
Otros Autores: Chitturi, Padma Priya, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, India : Packt Publishing 2016.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630274806719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewer
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Big Data Analytics with Spark
  • Introduction
  • Initializing SparkContext
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with Spark's Python and Scala shells
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Building standalone applications
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with the Spark programming model
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with pair RDDs
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Persisting RDDs
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Loading and saving data
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Creating broadcast variables and accumulators
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Submitting applications to a cluster
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with DataFrames
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with Spark Streaming
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 2: Tricky Statistics with Spark
  • Introduction
  • Working with Pandas
  • Variable identification
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Sampling data
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Summary and descriptive statistics
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Generating frequency tables.
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Installing Pandas on Linux
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Installing Pandas from source
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Using IPython with PySpark
  • Getting ready
  • How to do it…
  • How it work…
  • There's more…
  • See also
  • Creating Pandas DataFrames over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Splitting, slicing, sorting, filtering, and grouping DataFrames over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing co-variance and correlation using Pandas
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Concatenating and merging operations over DataFrames
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Complex operations over DataFrames
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Sparkling Pandas
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 3: Data Analysis with Spark
  • Introduction
  • Univariate analysis
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Bivariate analysis
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Missing value treatment
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Outlier detection
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Use case - analyzing the MovieLens dataset
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Use case - analyzing the Uber dataset
  • Getting ready
  • How to do it…
  • How it works…
  • There's more….
  • See also
  • Chapter 4: Clustering, Classification, and Regression
  • Introduction
  • Supervised learning
  • Unsupervised learning
  • Applying regression analysis for sales data
  • Variable identification
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Data exploration
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Feature engineering
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Applying linear regression
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Applying logistic regression on bank marketing data
  • Variable identification
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Data exploration
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Feature engineering
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Applying logistic regression
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Real-time intrusion detection using streaming k-means
  • Variable identification
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Simulating real-time data
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Applying streaming k-means
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 5: Working with Spark MLlib
  • Introduction
  • Working with Spark ML pipelines
  • Implementing Naive Bayes' classification
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing decision trees
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Building a recommendation system
  • Getting ready
  • How to do it…
  • How it works….
  • There's more…
  • See also
  • Implementing logistic regression using Spark ML pipelines
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 6: NLP with Spark
  • Introduction
  • Installing NLTK on Linux
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Installing Anaconda on Linux
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Anaconda for cluster management
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • POS tagging with PySpark on an Anaconda cluster
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • NER with IPython over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing openNLP - chunker over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing openNLP - sentence detector over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing stanford NLP - lemmatization over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing sentiment analysis using stanford NLP over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 7: Working with Sparkling Water - H2O
  • Introduction
  • Features
  • Working with H2O on Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing k-means using H2O over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Implementing spam detection with Sparkling Water
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Deep learning with airlines and weather data
  • Getting ready
  • How to do it…
  • How it works….
  • There's more…
  • See also
  • Implementing a crime detection application
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Running SVM with H2O over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 8: Data Visualization with Spark
  • Introduction
  • Visualization using Zeppelin
  • Getting ready
  • How to do it…
  • Installing Zeppelin
  • Customizing Zeppelin's server and websocket port
  • Visualizing data on HDFS - parameterizing inputs
  • Running custom functions
  • Adding external dependencies to Zeppelin
  • Pointing to an external Spark Cluster
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Creating scatter plots with Bokeh-Scala
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Creating a time series MultiPlot with Bokeh-Scala
  • Getting ready
  • How to do it…
  • How it work…
  • There's more…
  • See also
  • Creating plots with the lightning visualization server
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Visualize machine learning models with Databricks notebook
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Chapter 9: Deep Learning on Spark
  • Introduction
  • Installing CaffeOnSpark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Working with CaffeOnSpark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Running a feed-forward neural network with DeepLearning 4j over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Running an RBM with DeepLearning4j over Spark
  • Getting ready
  • How to do it…
  • How it works…
  • There's more…
  • See also
  • Running a CNN for learning MNIST with DeepLearning4j over Spark
  • Getting ready
  • How to do it….
  • How it works….