Hadoop 2.x administration cookbook administer and maintain large Apache Hadoop clusters

Over 100 practical recipes to help you become an expert Hadoop administrator About This Book Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster Import and export data into Hive and use Oozie to manage workflow. Practical recipes will help you plan and secure your...

Descripción completa

Detalles Bibliográficos
Otros Autores: Singh, Gurmukh, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, [India] : Packt Publishing 2017.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630198506719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Hadoop Architecture and Deployment
  • Introduction
  • Building and compiling Hadoop
  • Installation methods
  • Setting up host resolution
  • Installing a single-node cluster - HDFS components
  • Installing a single-node cluster - YARN components
  • Installing a multi-node cluster
  • Configuring the Hadoop Gateway node
  • Decommissioning nodes
  • Adding nodes to the cluster
  • Chapter 2: Maintaining Hadoop Cluster HDFS
  • Introduction
  • Configuring HDFS block size
  • Setting up Namenode metadata location
  • Loading data in HDFS
  • Configuring HDFS replication
  • HDFS balancer
  • Quota configuration
  • HDFS health and FSCK
  • Configuring rack awareness
  • Recycle or trash bin configuration
  • Distcp usage
  • Control block report storm
  • Configuring Datanode heartbeat
  • Chapter 3: Maintaining Hadoop Cluster - YARN and MapReduce
  • Introduction
  • Running a simple MapReduce program
  • Hadoop streaming
  • Configuring YARN history server
  • Job history web interface and metrics
  • Configuring ResourceManager components
  • YARN containers and resource allocations
  • ResourceManager Web UI and JMX metrics
  • Preserving ResourceManager states
  • Chapter 4: High Availability
  • Introduction
  • Namenode HA using shared storage
  • ZooKeeper configuration
  • Namenode HA using Journal node
  • Resourcemanager HA using ZooKeeper
  • Rolling upgrade with HA
  • Configure shared cache manager
  • Configure HDFS cache
  • HDFS snapshots
  • Configuring storage based policies
  • Configuring HA for Edge nodes
  • Chapter 5: Schedulers
  • Introduction
  • Configuring users and groups
  • Fair Scheduler configuration
  • Fair Scheduler pools
  • Configuring job queues
  • Job queue ACLs
  • Configuring Capacity Scheduler.
  • Queuing mappings in Capacity Scheduler
  • YARN and Mapred commands
  • YARN label-based scheduling
  • YARN SLS
  • Chapter 6: Backup and Recovery
  • Introduction
  • Initiating Namenode saveNamespace
  • Using HDFS Image Viewer
  • Fetching parameters which are in-effect
  • Configuring HDFS and YARN logs
  • Backing up and recovering Namenode
  • Configuring Secondary Namenode
  • Promoting Secondary Namenode to Primary
  • Namenode recovery
  • Namenode roll edits - online mode
  • Namenode roll edits - offline mode
  • Datanode recovery - disk full
  • Configuring NFS gateway to serve HDFS
  • Recovering deleted files
  • Chapter 7: Data Ingestion and Workflow
  • Introduction
  • Hive server modes and setup
  • Using MySQL for Hive metastore
  • Operating Hive with ZooKeeper
  • Loading data into Hive
  • Partitioning and Bucketing in Hive
  • Hive metastore database
  • Designing Hive with credential store
  • Configuring Flume
  • Configure Oozie and workflows
  • Chapter 8: Performance Tuning
  • Tuning the operating system
  • Tuning the disk
  • Tuning the network
  • Tuning HDFS
  • Tuning Namenode
  • Tuning Datanode
  • Configuring YARN for performance
  • Configuring MapReduce for performance
  • Hive performance tuning
  • Benchmarking Hadoop cluster
  • Chapter 9: HBase Administration
  • Introduction
  • Setting up single node HBase cluster
  • Setting up multi-node HBase cluster
  • Inserting data into HBase
  • Integration with Hive
  • HBase administration commands
  • HBase backup and restore
  • Tuning HBase
  • HBase upgrade
  • Migrating data from MySQL to HBase using Sqoop
  • Chapter 10: Cluster Planning
  • Introduction
  • Disk space calculations
  • Nodes needed in the cluster
  • Memory requirements
  • Sizing the cluster as per SLA
  • Network design
  • Estimating the cost of the Hadoop cluster
  • Hardware and software options.
  • Chapter 11: Troubleshooting, Diagnostics, and Best Practices
  • Introduction
  • Namenode troubleshooting
  • Datanode troubleshooting
  • Resourcemanager troubleshooting
  • Diagnose communication issues
  • Parse logs for errors
  • Hive troubleshooting
  • HBase troubleshooting
  • Hadoop best practices
  • Chapter 12: Security
  • Introduction
  • Encrypting disk using LUKS
  • Configuring Hadoop users
  • HDFS encryption at Rest
  • Configuring SSL in Hadoop
  • In-transit encryption
  • Enabling service level authorization
  • Securing ZooKeeper
  • Configuring auditing
  • Configuring Kerberos server
  • Configuring and enabling Kerberos for Hadoop
  • Index.