Apache Spark for data science cookbook overinsightful 90 recipes to get lightning-fast analytics with Apache Spark
Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas t...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England ; Mumbai, India :
Packt Publishing
2016.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630274806719 |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Big Data Analytics with Spark
- Introduction
- Initializing SparkContext
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Working with Spark's Python and Scala shells
- How to do it…
- How it works…
- There's more…
- See also
- Building standalone applications
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Working with the Spark programming model
- How to do it…
- How it works…
- There's more…
- See also
- Working with pair RDDs
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Persisting RDDs
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Loading and saving data
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Creating broadcast variables and accumulators
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Submitting applications to a cluster
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Working with DataFrames
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Working with Spark Streaming
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 2: Tricky Statistics with Spark
- Introduction
- Working with Pandas
- Variable identification
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Sampling data
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Summary and descriptive statistics
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Generating frequency tables.
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Installing Pandas on Linux
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Installing Pandas from source
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Using IPython with PySpark
- Getting ready
- How to do it…
- How it work…
- There's more…
- See also
- Creating Pandas DataFrames over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Splitting, slicing, sorting, filtering, and grouping DataFrames over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing co-variance and correlation using Pandas
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Concatenating and merging operations over DataFrames
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Complex operations over DataFrames
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Sparkling Pandas
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 3: Data Analysis with Spark
- Introduction
- Univariate analysis
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Bivariate analysis
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Missing value treatment
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Outlier detection
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Use case - analyzing the MovieLens dataset
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Use case - analyzing the Uber dataset
- Getting ready
- How to do it…
- How it works…
- There's more….
- See also
- Chapter 4: Clustering, Classification, and Regression
- Introduction
- Supervised learning
- Unsupervised learning
- Applying regression analysis for sales data
- Variable identification
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Data exploration
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Feature engineering
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Applying linear regression
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Applying logistic regression on bank marketing data
- Variable identification
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Data exploration
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Feature engineering
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Applying logistic regression
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Real-time intrusion detection using streaming k-means
- Variable identification
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Simulating real-time data
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Applying streaming k-means
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 5: Working with Spark MLlib
- Introduction
- Working with Spark ML pipelines
- Implementing Naive Bayes' classification
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing decision trees
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Building a recommendation system
- Getting ready
- How to do it…
- How it works….
- There's more…
- See also
- Implementing logistic regression using Spark ML pipelines
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 6: NLP with Spark
- Introduction
- Installing NLTK on Linux
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Installing Anaconda on Linux
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Anaconda for cluster management
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- POS tagging with PySpark on an Anaconda cluster
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- NER with IPython over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing openNLP - chunker over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing openNLP - sentence detector over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing stanford NLP - lemmatization over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing sentiment analysis using stanford NLP over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 7: Working with Sparkling Water - H2O
- Introduction
- Features
- Working with H2O on Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing k-means using H2O over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Implementing spam detection with Sparkling Water
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Deep learning with airlines and weather data
- Getting ready
- How to do it…
- How it works….
- There's more…
- See also
- Implementing a crime detection application
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Running SVM with H2O over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 8: Data Visualization with Spark
- Introduction
- Visualization using Zeppelin
- Getting ready
- How to do it…
- Installing Zeppelin
- Customizing Zeppelin's server and websocket port
- Visualizing data on HDFS - parameterizing inputs
- Running custom functions
- Adding external dependencies to Zeppelin
- Pointing to an external Spark Cluster
- How to do it…
- How it works…
- There's more…
- See also
- Creating scatter plots with Bokeh-Scala
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Creating a time series MultiPlot with Bokeh-Scala
- Getting ready
- How to do it…
- How it work…
- There's more…
- See also
- Creating plots with the lightning visualization server
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Visualize machine learning models with Databricks notebook
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Chapter 9: Deep Learning on Spark
- Introduction
- Installing CaffeOnSpark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Working with CaffeOnSpark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Running a feed-forward neural network with DeepLearning 4j over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Running an RBM with DeepLearning4j over Spark
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Running a CNN for learning MNIST with DeepLearning4j over Spark
- Getting ready
- How to do it….
- How it works….