Big data analytics with Java big data analytics - massive, predictive, social and self-driving
Learn the basics of analytics on big data using Java, machine learning and other big data tools About This Book Acquire real-world set of tools for building enterprise level data science applications Surpasses the barrier of other languages in data science and learn create useful object-oriented cod...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England ; Mumbai, India :
Packt Publishing
2017.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630734306719 |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Big Data Analytics with Java
- Why data analytics on big data?
- Big data for analytics
- Big data - a bigger pay package for Java developers
- Basics of Hadoop - a Java sub-project
- Distributed computing on Hadoop
- HDFS concepts
- Design and architecture of HDFS
- Main components of HDFS
- HDFS simple commands
- Apache Spark
- Concepts
- Transformations
- Actions
- Spark Java API
- Spark samples using Java 8
- Loading data
- Data operations - cleansing and munging
- Analyzing data - count, projection, grouping, aggregation, and max/min
- Actions on RDDs
- Paired RDDs
- Saving data
- Collecting and printing results
- Executing Spark programs on Hadoop
- Apache Spark sub-projects
- Spark machine learning modules
- Mahout - a popular Java ML library
- Deeplearning4j - a deep learning library
- Summary
- Chapter 2: First Steps in Data Analysis
- Datasets
- Data cleaning and munging
- Basic analysis of data with Spark SQL
- Building SparkConf and context
- Dataframe and datasets
- Load and parse data
- Analyzing data - the Spark-SQL way
- Spark SQL for data exploration and analytics
- Market basket analysis - Apriori algorithm
- Implementation of the Apriori algorithm in Apache Spark
- Efficient market basket analysis using FP-Growth algorithm
- Running FP-Growth on Apache Spark
- Summary
- Chapter 3: Data Visualization
- Data visualization with Java JFreeChart
- Using charts in big data analytics
- Time series chart
- All India seasonal and annual average temperature series dataset
- Simple single Time Series chart
- Multiple Time Series on a single chart window
- Bar charts
- Histograms
- When would you use a histogram?.
- How to make histograms using JFreeChart?
- Line charts
- Scatter plots
- Box plots
- Advanced visualization technique
- Prefuse
- IVTK Graph toolkit
- Other libraries
- Summary
- Chapter 4: Basics of Machine Learning
- What is machine learning?
- Real-life examples of machine learning
- Type of machine learning
- A small sample case study of supervised and unsupervised learning
- Steps for machine learning problems
- Choosing the machine learning model
- What are the feature types that can be extracted from the datasets?
- How do you select the best features to train your models?
- How do you run machine learning analytics on big data?
- Getting and preparing data in Hadoop
- Training and storing models on big data
- Apache Spark machine learning API
- Summary
- Chapter 5: Regression on Big Data
- Linear regression
- What is simple linear regression?
- Where is linear regression used?
- Logistic regression
- Which mathematical functions does logistic regression use?
- Where is logistic regression used?
- Predicting heart disease using logistic regression
- Summary
- Chapter 6: Naive Bayes and Sentiment Analysis
- Conditional probability
- Bayes theorem
- Naïve bayes algorithm
- Advantages of naïve bayes
- Disadvantages of naïve bayes
- Sentimental analysis
- Concepts for sentimental analysis
- Tokenization
- Stop words removal
- Stemming
- N-grams
- Term presence and Term Frequency
- TF-IDF
- Bag of words
- Dataset
- Data exploration of text data
- Sentimental analysis on this dataset
- SVM or Support Vector Machine
- Summary
- Chapter 7: Decision Trees
- What is a decision tree?
- Building a decision tree
- Choosing the best features for splitting the datasets
- Dataset
- Data exploration
- Cleaning and munging the data
- Training and testing the model
- Summary.
- Chapter 8: Ensembling on Big Data
- Ensembling
- Types of ensembling
- Bagging
- Boosting
- Advantages and disadvantages of ensembling
- Random forests
- Gradient boosted trees (GBTs)
- Classification problem and dataset used
- Data exploration
- Training and testing our random forest model
- Training and testing our gradient boosted tree model
- Summary
- Chapter 9: Recommendation Systems
- Recommendation systems and their types
- Content-based recommendation systems
- Dataset
- Content-based recommender on MovieLens dataset
- Collaborative recommendation systems
- Advantages
- Disadvantages
- Alternating least square - collaborative filtering
- Summary
- Chapter 10: Clustering and Customer Segmentation on Big Data
- Clustering
- Types of clustering
- Hierarchical clustering
- K-means clustering
- Bisecting k-means clustering
- Customer segmentation
- Dataset
- Data exploration
- Clustering for customer segmentation
- Changing the clustering algorithm
- Summary
- Chapter 11: Massive Graphs on Big Data
- Refresher on graphs
- Representing graphs
- Common terminology on graphs
- Common algorithms on graphs
- Plotting graphs
- Massive graphs on big data
- Graph analytics
- GraphFrames
- Building a graph using GraphFrames
- Graph analytics on airports and their flights
- Datasets
- Graph analytics on flights data
- Summary
- Chapter 12: Real-Time Analytics on Big Data
- Real-time analytics
- Big data stack for real-time analytics
- Real-time SQL queries on big data
- Real-time data ingestion and storage
- Real-time data processing
- Real-time SQL queries using Impala
- Flight delay analysis using Impala
- Apache Kafka
- Spark Streaming
- Trending videos
- Summary
- Chapter 13: Deep Learning Using Big Data
- Introduction to neural networks
- Perceptron
- Problems with perceptrons.
- Sigmoid neuron
- Multi-layer perceptrons
- Accuracy of multi-layer perceptrons
- Deep learning
- Advantages and use cases of deep learning
- Flower species classification using multi-Layer perceptrons
- Deeplearning4j
- Hand written digit recognizition using CNN
- Diving into the code:
- Summary
- Index.