Building Python real-time applications with Storm learn to process massive real-time data streams using Storm and Python-- no Java required

Learn to process massive real-time data streams using Storm and Python - no Java required! About This Book Learn to use Apache Storm and the Python Petrel library to build distributed applications that process large streams of data Explore sample applications in real-time and analyze them in the pop...

Descripción completa

Detalles Bibliográficos
Otros Autores: Bhatnagar, Kartik, author (author), Hart, Barry, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing 2015.
Edición:1st edition
Colección:Community experience distilled.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629674606719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Authors
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Getting Acquainted with Storm
  • Overview of Storm
  • Before the Storm era
  • Key features of Storm
  • Storm cluster modes
  • Developer mode
  • Single-machine Storm cluster
  • Multimachine Storm cluster
  • The Storm client
  • Prerequisites for a Storm installation
  • Zookeeper installation
  • Storm installation
  • Enabling native (Netty only) dependency
  • Netty configuration
  • Starting daemons
  • Playing with optional configurations
  • Summary
  • Chapter 2: The Storm Anatomy
  • Storm processes
  • Supervisor
  • Zookeeper
  • The Storm UI
  • Storm-topology-specific terminologies
  • The worker process, executor, and task
  • Worker processes
  • Executors
  • Tasks
  • Interprocess communication
  • A physical view of a Storm cluster
  • Stream grouping
  • Fault tolerance in Storm
  • Guaranteed tuple processing in Storm
  • XOR magic in acking
  • Tuning parallelism in Storm - scaling a distributed computation
  • Summary
  • Chapter 3: Introducing Petrel
  • What is Petrel?
  • Building a topology
  • Packaging a topology
  • Logging events and errors
  • Managing third-party dependencies
  • Installing Petrel
  • Creating your first topology
  • Sentence spout
  • Splitter bolt
  • Word Counting Bolt
  • Defining a topology
  • Running the topology
  • Troubleshooting
  • Productivity tips with Petrel
  • Improving startup performance
  • Enabling and using logging
  • Automatic logging of fatal errors
  • Summary
  • Chapter 4: Example Topology - Twitter
  • Twitter analysis
  • Twitter's Streaming API
  • Creating a Twitter app to use the Streaming API
  • The topology configuration file
  • The Twitter stream spout
  • Splitter bolt
  • Rolling word count bolt
  • The intermediate rankings bolt
  • The total rankings bolt.
  • Defining the topology
  • Running the topology
  • Summary
  • Chapter 5: Persistence Using Redis and MongoDB
  • Finding the top n ranked topics using Redis
  • The topology configuration file - the Redis case
  • Rolling word count bolt - the Redis case
  • Total rankings bolt - the Redis case
  • Defining the topology - the Redis case
  • Running the topology - the Redis case
  • Finding the hourly count of tweets by city name using MongoDB
  • Defining the topology - the MongoDB case
  • Running the topology - the MongoDB case
  • Summary
  • Chapter 6: Petrel in Practice
  • Testing a bolt
  • Example - testing SplitSentenceBolt
  • Example - testing SplitSentenceBolt with WordCountBolt
  • Debugging
  • Installing Winpdb
  • Add Winpdb breakpoint
  • Launching and attaching the debugger
  • Profiling your topology's performance
  • Split sentence bolt log
  • Word count bolt log
  • Summary
  • Appendix: Managing Storm Using Supervisord
  • Storm administration over a cluster
  • Introducing supervisord
  • Supervisord components
  • Supervisord installation
  • Summary
  • Index.