Big data for chimps
Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a p...
Other Authors: | , |
---|---|
Format: | eBook |
Language: | Inglés |
Published: |
Sebastopol, CA :
O'Reilly
2015.
|
Edition: | First edition |
Subjects: | |
See on Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629798406719 |
Table of Contents:
- Copyright; Table of Contents; Preface; What This Book Covers; Who This Book Is For; Who This Book Is Not For; What This Book Does Not Cover; Theory: Chimpanzee and Elephant; Practice: Hadoop; Example Code; A Note on Python and MrJob; Helpful Reading; Feedback; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Part I. Introduction: Theory and Tools; Chapter 1. Hadoop Basics; Chimpanzee and Elephant Start a Business; Map-Only Jobs: Process Records Individually; Pig Latin Map-Only Job; Setting Up a Docker Hadoop Cluster; Run the Job; Wrapping Up
- Chapter 2. MapReduceChimpanzee and Elephant Save Christmas; Trouble in Toyland; Chimpanzees Process Letters into Labeled Toy Forms; Pygmy Elephants Carry Each Toy Form to the Appropriate Workbench; Example: Reindeer Games; UFO Data; Group the UFO Sightings by Reporting Delay; Mapper; Reducer; Plot the Data; Reindeer Conclusion; Hadoop Versus Traditional Databases; The MapReduce Haiku; Map Phase, in Light Detail; Group-Sort Phase, in Light Detail; Reduce Phase, in Light Detail; Wrapping Up; Chapter 3. A Quick Look into Baseball; The Data; Acronyms and Terminology; The Rules and Goals
- Performance MetricsWrapping Up; Chapter 4. Introduction to Pig; Pig Helps Hadoop Work with Tables, Not Records; Wikipedia Visitor Counts; Fundamental Data Operations; Control Operations; Pipelinable Operations; Structural Operations; LOAD Locates and Describes Your Data; Simple Types; Complex Type 1, Tuples: Fixed-Length Sequence of Typed Fields; Complex Type 2, Bags: Unbounded Collection of Tuples; Defining the Schema of a Transformed Record; STORE Writes Data to Disk; Development Aid Commands; DESCRIBE; DUMP; SAMPLE; ILLUSTRATE; EXPLAIN; Pig Functions; Piggybank; Apache DataFu; Wrapping Up
- Part II. Tactics: Analytic PatternsChapter 5. Map-Only Operations; Pattern in Use; Eliminating Data; Selecting Records That Satisfy a Condition: FILTER and Friends; Selecting Records That Satisfy Multiple Conditions; Selecting or Rejecting Records with a null Value; Selecting Records That Match a Regular Expression (MATCHES); Matching Records Against a Fixed List of Lookup Values; Project Only Chosen Columns by Name; Using a FOREACH to Select, Rename, and Reorder fields; Extracting a Random Sample of Records; Extracting a Consistent Sample of Records by Key
- Sampling Carelessly by Only Loading Some part- FilesSelecting a Fixed Number of Records with LIMIT; Other Data Elimination Patterns; Transforming Records; Transforming Records Individually Using FOREACH; A Nested FOREACH Allows Intermediate Expressions; Formatting a String According to a Template; Assembling Literals with Complex Types; Manipulating the Type of a Field; Ints and Floats and Rounding, Oh My!; Calling a User-Defined Function from an External Package; Operations That Break One Table into Many; Directing Data Conditionally into Multiple Dataflows (SPLIT)
- Operations That Treat the Union of Several Tables as One