Hadoop real-work solutions cookbook over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout
Over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout About This Book Implement outstanding Machine Learning use cases on your own analytics models and processes. Solutions to common problems when wo...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing
[2016]
|
Edición: | 2nd ed |
Colección: | Quick answers to common problems.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630156406719 |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; Acknowledgements; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Started with Hadoop 2.X; Chapter 2: Exploring HDFS; Chapter 3: Mastering Map Reduce Programs; Chapter 4: Data Analysis Using Hive, Pig, and Hbase; Chapter 5: Advanced Data Analysis Using Hive; Chapter 6: Data Import/Export Using Sqoop and Flume; Chapter 7: Automation of Hadoop Tasks Using Oozie; Chapter 8: Machine Learning and Predictive Analytics Using Mahout and R; Chapter 9: Integration with Apache Spark; Chapter 10: Hadoop Use Cases; Index
- IntroductionInstalling a Single Node Hadoop Cluster; Installing a multi-node Hadoop cluster; Adding new nodes to existing Hadoop clusters; Executing balancer command for uniform data distribution; Entering and exiting from the safe mode in a Hadoop cluster; Decommissioning DataNodes; Performing benchmarking on a Hadoop cluster; Introduction; Loading data from a local machine to HDFS; Exporting data from HDFS to local machine; Changing the replication factor of an existing file in HDFS; Setting the HDFS block size for all the files in a cluster
- Setting the HDFS block size for a specific file in a clusterEnabling transparent encryption for HDFS; Importing data from another Hadoop cluster; Recycling deleted data from trash to HDFS; Saving compressed data in HDFS; Introduction; Writing the Map Reduce program in Java to analyze web log data; Executing the Map Reduce program in a Hadoop cluster; Adding support for a new writable data type in Hadoop; Implementing a user-defined counter in a Map Reduce program; Map Reduce program to find the top X; Map Reduce program to find distinct values
- Map Reduce program to partition data using a custom partitionerWriting Map Reduce results to multiple output files; Performing Reduce side Joins using Map Reduce; Unit testing the Map Reduce code using MRUnit; Introduction; Storing and processing Hive data in a sequential file format; Storing and processing Hive data in the ORC file format; Storing and processing Hive data in the ORC file format; Storing and processing Hive data in the Parquet file format; Performing FILTER By queries in Pig; Performing Group By queries in Pig; Performing Order By queries in Pig; Performing JOINS in Pig
- Writing a user-defined function in PigAnalyzing web log data using Pig; Performing the Hbase operation in CLI; Performing Hbase operations in Java; Executing the MapReduce programming with an Hbase Table; Introduction; Processing JSON data using Hive JSON SerDe; Processing XML data using Hive XML SerDe; Processing Hive data in AVRO format; Writing User Defined functions in Hive; Performing table joins in Hive; Executing map side joins in Hive; Performing context Ngram in Hive; Call Data Record Analytics using Hive; Twitter sentiment analysis using Hive
- Implementing Change Data Capture using Hive