Big data analytics with Java big data analytics - massive, predictive, social and self-driving

Learn the basics of analytics on big data using Java, machine learning and other big data tools About This Book Acquire real-world set of tools for building enterprise level data science applications Surpasses the barrier of other languages in data science and learn create useful object-oriented cod...

Descripción completa

Detalles Bibliográficos
Otros Autores: Mehta, Rajat, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, India : Packt Publishing 2017.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630734306719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Big Data Analytics with Java
  • Why data analytics on big data?
  • Big data for analytics
  • Big data - a bigger pay package for Java developers
  • Basics of Hadoop - a Java sub-project
  • Distributed computing on Hadoop
  • HDFS concepts
  • Design and architecture of HDFS
  • Main components of HDFS
  • HDFS simple commands
  • Apache Spark
  • Concepts
  • Transformations
  • Actions
  • Spark Java API
  • Spark samples using Java 8
  • Loading data
  • Data operations - cleansing and munging
  • Analyzing data - count, projection, grouping, aggregation, and max/min
  • Actions on RDDs
  • Paired RDDs
  • Saving data
  • Collecting and printing results
  • Executing Spark programs on Hadoop
  • Apache Spark sub-projects
  • Spark machine learning modules
  • Mahout - a popular Java ML library
  • Deeplearning4j - a deep learning library
  • Summary
  • Chapter 2: First Steps in Data Analysis
  • Datasets
  • Data cleaning and munging
  • Basic analysis of data with Spark SQL
  • Building SparkConf and context
  • Dataframe and datasets
  • Load and parse data
  • Analyzing data - the Spark-SQL way
  • Spark SQL for data exploration and analytics
  • Market basket analysis - Apriori algorithm
  • Implementation of the Apriori algorithm in Apache Spark
  • Efficient market basket analysis using FP-Growth algorithm
  • Running FP-Growth on Apache Spark
  • Summary
  • Chapter 3: Data Visualization
  • Data visualization with Java JFreeChart
  • Using charts in big data analytics
  • Time series chart
  • All India seasonal and annual average temperature series dataset
  • Simple single Time Series chart
  • Multiple Time Series on a single chart window
  • Bar charts
  • Histograms
  • When would you use a histogram?.
  • How to make histograms using JFreeChart?
  • Line charts
  • Scatter plots
  • Box plots
  • Advanced visualization technique
  • Prefuse
  • IVTK Graph toolkit
  • Other libraries
  • Summary
  • Chapter 4: Basics of Machine Learning
  • What is machine learning?
  • Real-life examples of machine learning
  • Type of machine learning
  • A small sample case study of supervised and unsupervised learning
  • Steps for machine learning problems
  • Choosing the machine learning model
  • What are the feature types that can be extracted from the datasets?
  • How do you select the best features to train your models?
  • How do you run machine learning analytics on big data?
  • Getting and preparing data in Hadoop
  • Training and storing models on big data
  • Apache Spark machine learning API
  • Summary
  • Chapter 5: Regression on Big Data
  • Linear regression
  • What is simple linear regression?
  • Where is linear regression used?
  • Logistic regression
  • Which mathematical functions does logistic regression use?
  • Where is logistic regression used?
  • Predicting heart disease using logistic regression
  • Summary
  • Chapter 6: Naive Bayes and Sentiment Analysis
  • Conditional probability
  • Bayes theorem
  • Naïve bayes algorithm
  • Advantages of naïve bayes
  • Disadvantages of naïve bayes
  • Sentimental analysis
  • Concepts for sentimental analysis
  • Tokenization
  • Stop words removal
  • Stemming
  • N-grams
  • Term presence and Term Frequency
  • TF-IDF
  • Bag of words
  • Dataset
  • Data exploration of text data
  • Sentimental analysis on this dataset
  • SVM or Support Vector Machine
  • Summary
  • Chapter 7: Decision Trees
  • What is a decision tree?
  • Building a decision tree
  • Choosing the best features for splitting the datasets
  • Dataset
  • Data exploration
  • Cleaning and munging the data
  • Training and testing the model
  • Summary.
  • Chapter 8: Ensembling on Big Data
  • Ensembling
  • Types of ensembling
  • Bagging
  • Boosting
  • Advantages and disadvantages of ensembling
  • Random forests
  • Gradient boosted trees (GBTs)
  • Classification problem and dataset used
  • Data exploration
  • Training and testing our random forest model
  • Training and testing our gradient boosted tree model
  • Summary
  • Chapter 9: Recommendation Systems
  • Recommendation systems and their types
  • Content-based recommendation systems
  • Dataset
  • Content-based recommender on MovieLens dataset
  • Collaborative recommendation systems
  • Advantages
  • Disadvantages
  • Alternating least square - collaborative filtering
  • Summary
  • Chapter 10: Clustering and Customer Segmentation on Big Data
  • Clustering
  • Types of clustering
  • Hierarchical clustering
  • K-means clustering
  • Bisecting k-means clustering
  • Customer segmentation
  • Dataset
  • Data exploration
  • Clustering for customer segmentation
  • Changing the clustering algorithm
  • Summary
  • Chapter 11: Massive Graphs on Big Data
  • Refresher on graphs
  • Representing graphs
  • Common terminology on graphs
  • Common algorithms on graphs
  • Plotting graphs
  • Massive graphs on big data
  • Graph analytics
  • GraphFrames
  • Building a graph using GraphFrames
  • Graph analytics on airports and their flights
  • Datasets
  • Graph analytics on flights data
  • Summary
  • Chapter 12: Real-Time Analytics on Big Data
  • Real-time analytics
  • Big data stack for real-time analytics
  • Real-time SQL queries on big data
  • Real-time data ingestion and storage
  • Real-time data processing
  • Real-time SQL queries using Impala
  • Flight delay analysis using Impala
  • Apache Kafka
  • Spark Streaming
  • Trending videos
  • Summary
  • Chapter 13: Deep Learning Using Big Data
  • Introduction to neural networks
  • Perceptron
  • Problems with perceptrons.
  • Sigmoid neuron
  • Multi-layer perceptrons
  • Accuracy of multi-layer perceptrons
  • Deep learning
  • Advantages and use cases of deep learning
  • Flower species classification using multi-Layer perceptrons
  • Deeplearning4j
  • Hand written digit recognizition using CNN
  • Diving into the code:
  • Summary
  • Index.