Hadoop

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers...

Descripción completa

Detalles Bibliográficos
Autor principal: White, Tom (-)
Otros Autores: Cutting, Doug Contributor (contributor)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Sebastopol : O'Reilly Media 2010.
Edición:2nd ed
Colección:O'Reilly short cuts.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009628174506719
Tabla de Contenidos:
  • Table of Contents; Foreword; Preface; Administrative Notes; What's in This Book?; What's New in the Second Edition?; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1. Meet Hadoop; Data!; Data Storage and Analysis; Comparison with Other Systems; RDBMS; Grid Computing; Volunteer Computing; A Brief History of Hadoop; Apache Hadoop and the Hadoop Ecosystem; Chapter 2. MapReduce; A Weather Dataset; Data Format; Analyzing the Data with Unix Tools; Analyzing the Data with Hadoop; Map and Reduce; Java MapReduce; A test run
  • The new Java MapReduce APIScaling Out; Data Flow; Combiner Functions; Specifying a combiner function; Running a Distributed MapReduce Job; Hadoop Streaming; Ruby; Python; Hadoop Pipes; Compiling and Running; Chapter 3. The Hadoop Distributed Filesystem; The Design of HDFS; HDFS Concepts; Blocks; Namenodes and Datanodes; The Command-Line Interface; Basic Filesystem Operations; Hadoop Filesystems; Interfaces; Thrift; C; FUSE; WebDAV; Other HDFS Interfaces; The Java Interface; Reading Data from a Hadoop URL; Reading Data Using the FileSystem API; FSDataInputStream; Writing Data
  • FSDataOutputStreamDirectories; Querying the Filesystem; File metadata: FileStatus; Listing files; File patterns; PathFilter; Deleting Data; Data Flow; Anatomy of a File Read; Anatomy of a File Write; Coherency Model; Consequences for application design; Parallel Copying with distcp; Keeping an HDFS Cluster Balanced; Hadoop Archives; Using Hadoop Archives; Limitations; Chapter 4. Hadoop I/O; Data Integrity; Data Integrity in HDFS; LocalFileSystem; ChecksumFileSystem; Compression; Codecs; Compressing and decompressing streams with CompressionCodec
  • Inferring CompressionCodecs using CompressionCodecFactoryNative libraries; CodecPool; Compression and Input Splits; Using Compression in MapReduce; Compressing map output; Serialization; The Writable Interface; WritableComparable and comparators; Writable Classes; Writable wrappers for Java primitives; Text; Indexing; Unicode; Iteration; BytesWritable; Mutability; Resorting to String; NullWritable; ObjectWritable and GenericWritable; Writable collections; Implementing a Custom Writable; Implementing a RawComparator for speed; Custom comparators; Serialization Frameworks; Serialization IDL
  • AvroAvro data types and schemas; In-memory serialization and deserialization; Avro data files; Interoperability; Python API; C API; Schema resolution; Sort order; Avro MapReduce; File-Based Data Structures; SequenceFile; Writing a SequenceFile; Reading a SequenceFile; Displaying a SequenceFile with the command-line interface; Sorting and merging SequenceFiles; The SequenceFile format; MapFile; Writing a MapFile; Reading a MapFile; Converting a SequenceFile to a MapFile; Chapter 5. Developing a MapReduce Application; The Configuration API; Combining Resources; Variable Expansion
  • Configuring the Development Environment