Java data analysis data mining, big data analysis, NoSQL, and data visualization

Get the most out of the popular Java libraries and tools to perform efficient data analysis About This Book Get your basics right for data analysis with Java and make sense of your data through effective visualizations. Use various Java APIs and tools such as Rapidminer and WEKA for effective data a...

Descripción completa

Detalles Bibliográficos
Otros Autores: Hubbard, John R., author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, [India] : Packt Publishing 2017.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630710106719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Introduction to Data Analysis
  • Origins of data analysis
  • The scientific method
  • Actuarial science
  • Calculated by steam
  • A spectacular example
  • Herman Hollerith
  • ENIAC
  • VisiCalc
  • Data, information, and knowledge
  • Why Java?
  • Java Integrated Development Environments
  • Summary
  • Chapter 2: Data Preprocessing
  • Data types
  • Variables
  • Data points and datasets
  • Null values
  • Relational database tables
  • Key fields
  • Key-value pairs
  • Hash tables
  • File formats
  • Microsoft Excel data
  • XML and JSON data
  • Generating test datasets
  • Metadata
  • Data cleaning
  • Data scaling
  • Data filtering
  • Sorting
  • Merging
  • Hashing
  • Summary
  • Chapter 3: Data Visualization
  • Tables and graphs
  • Scatter plots
  • Line graphs
  • Bar charts
  • Histograms
  • Time series
  • Java implementation
  • Moving average
  • Data ranking
  • Frequency distributions
  • The normal distribution
  • A thought experiment
  • The exponential distribution
  • Java example
  • Summary
  • Chapter 4: Statistics
  • Descriptive statistics
  • Random sampling
  • Random variables
  • Probability distributions
  • Cumulative distributions
  • The binomial distribution
  • Multivariate distributions
  • Conditional probability
  • The independence of probabilistic events
  • Contingency tables
  • Bayes' theorem
  • Covariance and correlation
  • The standard normal distribution
  • The central limit theorem
  • Confidence intervals
  • Hypothesis testing
  • Summary
  • Chapter 5: Relational Databases
  • The relation data model
  • Relational databases
  • Foreign keys
  • Relational database design
  • Creating a database
  • SQL commands
  • Inserting data into the database
  • Database queries
  • SQL data types
  • JDBC.
  • Using a JDBC PreparedStatement
  • Batch processing
  • Database views
  • Subqueries
  • Table indexes
  • Summary
  • Chapter 6: Regression Analysis
  • Linear regression
  • Linear regression in Excel
  • Computing the regression coefficients
  • Variation statistics
  • Java implementation of linear regression
  • Anscombe's quartet
  • Polynomial regression
  • Multiple linear regression
  • The Apache Commons implementation
  • Curve fitting
  • Summary
  • Chapter 7: Classification Analysis
  • Decision trees
  • What does entropy have to do with it?
  • The ID3 algorithm
  • Java Implementation of the ID3 algorithm
  • The Weka platform
  • The ARFF filetype for data
  • Java implementation with Weka
  • Bayesian classifiers
  • Java implementation with Weka
  • Support vector machine algorithms
  • Logistic regression
  • K-Nearest Neighbors
  • Fuzzy classification algorithms
  • Summary
  • Chapter 8: Cluster Analysis
  • Measuring distances
  • The curse of dimensionality
  • Hierarchical clustering
  • Weka implementation
  • K-means clustering
  • K-medoids clustering
  • Affinity propagation clustering
  • Summary
  • Chapter 9: Recommender Systems
  • Utility matrices
  • Similarity measures
  • Cosine similarity
  • A simple recommender system
  • Amazon's item-to-item collaborative filtering recommender
  • Implementing user ratings
  • Large sparse matrices
  • Using random access files
  • The Netflix prize
  • Summary
  • Chapter 10: NoSQL Databases
  • The Map data structure
  • SQL versus NoSQL
  • The Mongo database system
  • The Library database
  • Java development with MongoDB
  • The MongoDB extension for geospatial databases
  • Indexing in MongoDB
  • Why NoSQL and why MongoDB?
  • Other NoSQL database systems
  • Summary
  • Chapter 11: Big Data Analysis with Java
  • Scaling, data striping, and sharding
  • Google's PageRank algorithm
  • Google's MapReduce framework.
  • Some examples of MapReduce applications
  • The WordCount example
  • Scalability
  • Matrix multiplication with MapReduce
  • MapReduce in MongoDB
  • Apache Hadoop
  • Hadoop MapReduce
  • Summary
  • Appendix: Java Tools
  • The command line
  • Java
  • NetBeans
  • MySQL
  • MySQL Workbench
  • Accessing the MySQL database from NetBeans
  • The Apache Commons Math Library
  • The javax JSON Library
  • The Weka libraries
  • MongoDB
  • Index.