Agile data science
Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig,...
Autor principal: | |
---|---|
Otros Autores: | , |
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Beijing :
O'Reilly Media
2013.
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629368706719 |
Tabla de Contenidos:
- Intro
- Copyright
- Table of Contents
- Preface
- Who This Book Is For
- How This Book Is Organized
- Conventions Used in This Book
- Using Code Examples
- Safari® Books Online
- How to Contact Us
- Part I. Setup
- Chapter 1. Theory
- Agile Big Data
- Big Words Defined
- Agile Big Data Teams
- Recognizing the Opportunity and Problem
- Adapting to Change
- Agile Big Data Process
- Code Review and Pair Programming
- Agile Environments: Engineering Productivity
- Collaboration Space
- Private Space
- Personal Space
- Realizing Ideas with Large-Format Printing
- Chapter 2. Data
- Working with Raw Data
- Raw Email
- Structured Versus Semistructured Data
- SQL
- NoSQL
- Serialization
- Extracting and Exposing Features in Evolving Schemas
- Data Pipelines
- Data Perspectives
- Networks
- Time Series
- Natural Language
- Probability
- Conclusion
- Chapter 3. Agile Tools
- Scalability = Simplicity
- Agile Big Data Processing
- Setting Up a Virtual Environment for Python
- Serializing Events with Avro
- Avro for Python
- Collecting Data
- Data Processing with Pig
- Installing Pig
- Publishing Data with MongoDB
- Installing MongoDB
- Installing MongoDB's Java Driver
- Installing mongo-hadoop
- Pushing Data to MongoDB from Pig
- Searching Data with ElasticSearch
- Installation
- ElasticSearch and Pig with Wonderdog
- Reflecting on our Workflow
- Lightweight Web Applications
- Python and Flask
- Presenting Our Data
- Installing Bootstrap
- Booting Boostrap
- Visualizing Data with D3.js and nvd3.js
- Conclusion
- Chapter 4. To the Cloud!
- Introduction
- GitHub
- dotCloud
- Echo on dotCloud
- Python Workers
- Amazon Web Services
- Simple Storage Service
- Elastic MapReduce
- MongoDB as a Service
- Instrumentation
- Google Analytics
- Mortar Data.
- Part II. Climbing the Pyramid
- Chapter 5. Collecting and Displaying Records
- Putting It All Together
- Collect and Serialize Our Inbox
- Process and Publish Our Emails
- Presenting Emails in a Browser
- Serving Emails with Flask and pymongo
- Rendering HTML5 with Jinja2
- Agile Checkpoint
- Listing Emails
- Listing Emails with MongoDB
- Anatomy of a Presentation
- Searching Our Email
- Indexing Our Email with Pig, ElasticSearch, and Wonderdog
- Searching Our Email on the Web
- Conclusion
- Chapter 6. Visualizing Data with Charts
- Good Charts
- Extracting Entities: Email Addresses
- Extracting Emails
- Visualizing Time
- Conclusion
- Chapter 7. Exploring Data with Reports
- Building Reports with Multiple Charts
- Linking Records
- Extracting Keywords from Emails with TF-IDF
- Conclusion
- Chapter 8. Making Predictions
- Predicting Response Rates to Emails
- Personalization
- Conclusion
- Chapter 9. Driving Actions
- Properties of Successful Emails
- Better Predictions with Naive Bayes
- P(Reply | From & To)
- P(Reply | Token)
- Making Predictions in Real Time
- Logging Events
- Conclusion
- Index
- About the Author.