Hadoop the definitive guide
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build re...
Autor principal: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Sebastopol, California :
O'Reilly Media, Inc
2009.
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009627554206719 |
Tabla de Contenidos:
- Table of Contents; Foreword; Preface; Administrative Notes; What's in This Book?; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1. Meet Hadoop; Data!; Data Storage and Analysis; Comparison with Other Systems; RDBMS; Grid Computing; Volunteer Computing; A Brief History of Hadoop; The Apache Hadoop Project; Chapter 2. MapReduce; A Weather Dataset; Data Format; Analyzing the Data with Unix Tools; Analyzing the Data with Hadoop; Map and Reduce; Java MapReduce; A test run; The new Java MapReduce API; Scaling Out; Data Flow
- Combiner FunctionsSpecifying a combiner function; Running a Distributed MapReduce Job; Hadoop Streaming; Ruby; Python; Hadoop Pipes; Compiling and Running; Chapter 3. The Hadoop Distributed Filesystem; The Design of HDFS; HDFS Concepts; Blocks; Namenodes and Datanodes; The Command-Line Interface; Basic Filesystem Operations; Hadoop Filesystems; Interfaces; Thrift; C; FUSE; WebDAV; Other HDFS Interfaces; The Java Interface; Reading Data from a Hadoop URL; Reading Data Using the FileSystem API; FSDataInputStream; Writing Data; FSDataOutputStream; Directories; Querying the Filesystem
- File metadata: FileStatusListing files; File patterns; PathFilter; Deleting Data; Data Flow; Anatomy of a File Read; Anatomy of a File Write; Coherency Model; Consequences for application design; Parallel Copying with distcp; Keeping an HDFS Cluster Balanced; Hadoop Archives; Using Hadoop Archives; Limitations; Chapter 4. Hadoop I/O; Data Integrity; Data Integrity in HDFS; LocalFileSystem; ChecksumFileSystem; Compression; Codecs; Compressing and decompressing streams with CompressionCodec; Inferring CompressionCodecs using CompressionCodecFactory; Native libraries
- Compression and Input SplitsUsing Compression in MapReduce; Compressing map output; Serialization; The Writable Interface; WritableComparable and comparators; Writable Classes; Writable wrappers for Java primitives; Text; BytesWritable; NullWritable; ObjectWritable and GenericWritable; Writable collections; Implementing a Custom Writable; Implementing a RawComparator for speed; Custom comparators; Serialization Frameworks; Serialization IDL; File-Based Data Structures; SequenceFile; Writing a SequenceFile; Reading a SequenceFile; Displaying a SequenceFile with the command-line interface
- Sorting and merging SequenceFilesThe SequenceFile Format; MapFile; Writing a MapFile; Reading a MapFile; Converting a SequenceFile to a MapFile; Chapter 5. Developing a MapReduce Application; The Configuration API; Combining Resources; Variable Expansion; Configuring the Development Environment; Managing Configuration; GenericOptionsParser, Tool, and ToolRunner; Writing a Unit Test; Mapper; Reducer; Running Locally on Test Data; Running a Job in a Local Job Runner; Fixing the mapper; Testing the Driver; Running on a Cluster; Packaging; Launching a Job; The MapReduce Web UI
- The jobtracker page