Agile data science

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig,...

Descripción completa

Detalles Bibliográficos
Autor principal: Jurney, Russell (-)
Otros Autores: Loukides, Michael Kosta, editor (editor), Treseler, Mary, editor
Formato: Libro electrónico
Idioma:Inglés
Publicado: Beijing : O'Reilly Media 2013.
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629368706719
Tabla de Contenidos:
  • Intro
  • Copyright
  • Table of Contents
  • Preface
  • Who This Book Is For
  • How This Book Is Organized
  • Conventions Used in This Book
  • Using Code Examples
  • Safari® Books Online
  • How to Contact Us
  • Part I. Setup
  • Chapter 1. Theory
  • Agile Big Data
  • Big Words Defined
  • Agile Big Data Teams
  • Recognizing the Opportunity and Problem
  • Adapting to Change
  • Agile Big Data Process
  • Code Review and Pair Programming
  • Agile Environments: Engineering Productivity
  • Collaboration Space
  • Private Space
  • Personal Space
  • Realizing Ideas with Large-Format Printing
  • Chapter 2. Data
  • Email
  • Working with Raw Data
  • Raw Email
  • Structured Versus Semistructured Data
  • SQL
  • NoSQL
  • Serialization
  • Extracting and Exposing Features in Evolving Schemas
  • Data Pipelines
  • Data Perspectives
  • Networks
  • Time Series
  • Natural Language
  • Probability
  • Conclusion
  • Chapter 3. Agile Tools
  • Scalability = Simplicity
  • Agile Big Data Processing
  • Setting Up a Virtual Environment for Python
  • Serializing Events with Avro
  • Avro for Python
  • Collecting Data
  • Data Processing with Pig
  • Installing Pig
  • Publishing Data with MongoDB
  • Installing MongoDB
  • Installing MongoDB's Java Driver
  • Installing mongo-hadoop
  • Pushing Data to MongoDB from Pig
  • Searching Data with ElasticSearch
  • Installation
  • ElasticSearch and Pig with Wonderdog
  • Reflecting on our Workflow
  • Lightweight Web Applications
  • Python and Flask
  • Presenting Our Data
  • Installing Bootstrap
  • Booting Boostrap
  • Visualizing Data with D3.js and nvd3.js
  • Conclusion
  • Chapter 4. To the Cloud!
  • Introduction
  • GitHub
  • dotCloud
  • Echo on dotCloud
  • Python Workers
  • Amazon Web Services
  • Simple Storage Service
  • Elastic MapReduce
  • MongoDB as a Service
  • Instrumentation
  • Google Analytics
  • Mortar Data.
  • Part II. Climbing the Pyramid
  • Chapter 5. Collecting and Displaying Records
  • Putting It All Together
  • Collect and Serialize Our Inbox
  • Process and Publish Our Emails
  • Presenting Emails in a Browser
  • Serving Emails with Flask and pymongo
  • Rendering HTML5 with Jinja2
  • Agile Checkpoint
  • Listing Emails
  • Listing Emails with MongoDB
  • Anatomy of a Presentation
  • Searching Our Email
  • Indexing Our Email with Pig, ElasticSearch, and Wonderdog
  • Searching Our Email on the Web
  • Conclusion
  • Chapter 6. Visualizing Data with Charts
  • Good Charts
  • Extracting Entities: Email Addresses
  • Extracting Emails
  • Visualizing Time
  • Conclusion
  • Chapter 7. Exploring Data with Reports
  • Building Reports with Multiple Charts
  • Linking Records
  • Extracting Keywords from Emails with TF-IDF
  • Conclusion
  • Chapter 8. Making Predictions
  • Predicting Response Rates to Emails
  • Personalization
  • Conclusion
  • Chapter 9. Driving Actions
  • Properties of Successful Emails
  • Better Predictions with Naive Bayes
  • P(Reply | From & To)
  • P(Reply | Token)
  • Making Predictions in Real Time
  • Logging Events
  • Conclusion
  • Index
  • About the Author.