Big data architect's handbook a guide to build proficiency in tools and systems used by leading big data experts

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence About This Book Learn to build and run a big data application with sample code Explore examples to implement activities that a big data architect performs Use Machine Learning and AI for structured...

Descripción completa

Detalles Bibliográficos
Otros Autores: Fahad Akhtar, Syed Muhammad, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing 2018.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630633306719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Packt Upsell
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Why Big Data?
  • What is big data?
  • Characteristics of big data
  • Volume
  • Velocity
  • Variety
  • Veracity
  • Variability
  • Value
  • Solution-based approach for data
  • Data - the most valuable asset
  • Traditional approaches to data storage
  • Clustered computing
  • High availability
  • Resource pooling
  • Easy scalability
  • Big data - how does it make a difference?
  • Big data solutions - cloud versus on-premises infrastructure
  • Cost
  • Security
  • Current capabilities
  • Scalability
  • Big data glossary
  • Big data
  • Batch processing
  • Cluster computing
  • Data warehouse
  • Data lake
  • Data mining
  • ETL
  • Hadoop
  • In-memory computing
  • Machine learning
  • MapReduce
  • NoSQL
  • Stream processing
  • Summary
  • Chapter 2: Big Data Environment Setup
  • Oracle VM VirtualBox installation
  • Ubuntu installation
  • Hadoop prerequisite installation
  • Java installation
  • SSH installation and configuration
  • Hadoop system user
  • Apache Hadoop installation
  • Hadoop configuration
  • Path configuration for Hadoop commands
  • Hadoop server start and stop
  • Summary
  • Chapter 3: Hadoop Ecosystem
  • Apache Hadoop
  • Hadoop Distributed File System
  • HDFS hands-on
  • Creating a directory in HDFS
  • Copying files from a local file system to HDFS
  • Copying files from HDFS to a local file system
  • Deleting files and folders in HDFS
  • Hadoop MapReduce
  • Job Tracker and Task Tracker
  • The execution flow of MapReduce
  • Mapper
  • Shuffle and Sort
  • Reducer
  • Example program
  • Preparing the data file for analysis
  • Program code
  • Driver program
  • Mapper program
  • Reducer program
  • Observations and results
  • YARN
  • Resource Manager
  • Node Manager
  • Container
  • Application Master.
  • Apache Projects related to big data
  • Apache Zookeeper
  • Apache Kafka
  • Apache Flume
  • Apache Cassandra
  • Apache HBase
  • Apache Spark
  • Summary
  • Chapter 4: NoSQL Database
  • What is NoSQL?
  • Benefits of NoSQL databases
  • NoSQL versus RDBMS
  • The CAP theorem
  • The ACID properties
  • Data models in NoSQL
  • Key-value data stores
  • Document store
  • Column stores
  • Graph stores
  • Apache Cassandra
  • Installation
  • Starting Cassandra
  • The Cassandra Query Language - CQL
  • The help command
  • Basic commands
  • Data manipulation
  • Creating, altering, and deleting a keyspace
  • Creating, altering, and deleting tables
  • Inserting, updating, and deleting data
  • The MongoDB database
  • Installing MongoDB
  • Starting MongoDB
  • Working on MongoDB
  • The help command
  • Basic commands
  • Data manipulation
  • Creating and deleting databases
  • Creating and deleting collections
  • The create, retrieve, update, delete operations
  • Neo4j database
  • Installing Neo4j
  • Starting Neo4j
  • The cypher query language
  • Help
  • Basic operations in Cypher
  • Creating nodes, relationships, and properties
  • Updating nodes, relationships, and properties
  • Deleting nodes, relationships, and properties
  • Reading nodes, relationships, and properties
  • Summary
  • Chapter 5: Off-the-Shelf Commercial Tools
  • Microsoft Azure
  • Building a practical application
  • Microsoft Azure account
  • The Azure Event Hub
  • IoT simulation application
  • Setting up an Azure Stream Analytics job
  • Input
  • Query
  • Output
  • Dashboard in Power BI
  • Summary
  • Chapter 6: Containerization
  • Virtualization
  • Hypervisors
  • Hardware-based hypervisors
  • Software-based hypervisors
  • What is containerization?
  • Benefits of containers
  • Docker
  • Docker workflow
  • Installation
  • Basic commands
  • Docker images
  • Building a Docker image.
  • Running and verifying Docker images
  • Importing and exporting Docker images
  • Docker Swarm
  • Setting up Docker Swarm
  • Creating service containers
  • Replicating containers
  • Removing container services
  • Kubernetes
  • Key components
  • Pods
  • ReplicaSets
  • Deployments
  • PetSets
  • Installation
  • Deployment
  • Kubernetes Dashboard
  • Summary
  • Chapter 7: Network Infrastructure
  • Network
  • Local area networks
  • Metropolitan area networks
  • Wide area networks
  • Network connectivity
  • Wired
  • Wireless
  • Network visualization
  • Gephi
  • Installation
  • Java installation
  • First run
  • Practical example
  • Summary
  • Chapter 8: Cloud Infrastructure
  • Companies moving to cloud
  • Driving factors
  • Infrastructure
  • Locality of data
  • Requirements
  • Design considerations
  • Open source versus commercial
  • Commodity hardware versus purpose build
  • Cloud versus on-premises
  • Scale up and down
  • Application architecture
  • Cost decision
  • Summary
  • Chapter 9: Security and Monitoring
  • Simple Network Management Protocol
  • Benefits of SNMP
  • Security
  • Agents and Traps
  • Netflow
  • Nagios
  • Key benefits
  • Security Onion
  • Deployment scenarios
  • The Standalone model
  • The Server-Sensor model
  • Hybrid model
  • Preconfigured tools
  • Wireshark
  • Key features
  • Summary
  • Chapter 10: Frontend Architecture
  • React JS
  • Key concepts
  • Node.js
  • JSX
  • Unidirectional dataflow
  • Getting started with ReactJS
  • Single page application
  • React application project
  • React app directory structure
  • Components
  • Properties
  • Event handling
  • State
  • Redux
  • Architecture of Redux
  • Key concepts
  • Single store
  • Action
  • Reducers
  • Guestbook application
  • Installation
  • Create a store
  • Setting up Reducer
  • Setting up Dispatcher
  • Connect function
  • Setting up Subscribers
  • Final output
  • Summary.
  • Chapter 11: Backend Architecture
  • API
  • RESTful API
  • HTTP request methods
  • GET
  • POST
  • PUT
  • DELETE
  • Authentication
  • Basic authentication
  • JSON Web Token
  • Header
  • Payload
  • Signature
  • Practical
  • RESTful web service
  • Java client
  • Redis
  • Installation
  • Redis server
  • Redis client
  • Working with Redis
  • Redis data types and structures
  • String
  • HashMap
  • List
  • Set
  • Redis Publish/Subscribe
  • Common key operations
  • Summary
  • Chapter 12: Machine Learning
  • Machine learning
  • Types of algorithms
  • Parametric algorithms
  • Non-parametric algorithms
  • Supervised learning
  • The classification model
  • Binary classification
  • Multi-class classification
  • The regression model
  • Linear regression
  • Polynomial regression
  • Unsupervised learning
  • Clustering, k-means
  • Neural networks
  • Feedforward neural network
  • Recurrent neural network
  • Symmetrically connected neural network
  • Deep neural networks
  • Decision tree classifiers
  • Summary
  • Chapter 13: Artificial Intelligence
  • Artificial intelligence
  • Convolutional neural networks
  • Deep learning using TensorFlow
  • TensorFlow
  • Installation
  • TensorFlow program
  • Uninstalling TensorFlow
  • TensorBoard
  • Program
  • Launching TensorBoard
  • TensorBoard graph
  • Object detection using YOLO
  • Installation
  • Compiling YOLO library
  • Trained weights
  • Detecting objects in an image
  • Summary
  • Chapter 14: Elasticsearch
  • Installing Elasticsearch
  • Starting the Elasticsearch server
  • Auto starting the Elasticsearch service
  • Stopping the Elasticsearch server
  • Uninstalling Elasticsearch
  • Kibana
  • Installation
  • Starting Kibana
  • Uninstalling Kibana
  • Security
  • Securing Elasticsearch
  • Securing Kibana
  • Understanding queries - CRUD commands
  • Creating
  • Reading
  • Updating
  • Deleting
  • Summary.
  • Chapter 15: Structured Data
  • Data analysis
  • Installing MySQL
  • Importing data
  • Analyzing the data model
  • HBase
  • Installation
  • Starting an HBase instance
  • Stopping a HBase instance
  • Preparing an HBase for migration
  • Sqoop
  • Installation
  • Verifying the installation
  • MySQL JDBC driver
  • Importing data
  • Verifying the imported data
  • Summary
  • Chapter 16: Unstructured Data
  • Moving data into Hadoop
  • Downloading Flume
  • Environment configuration
  • Configuring agent and sink
  • Running Apache Flume
  • Transferring a log file
  • Converting images into text for analysis
  • Tesseract OCR
  • Installing Tesseract
  • Practical example
  • Complete code
  • Program execution
  • Summary
  • Chapter 17: Data Visualization
  • Matplotlib
  • Installing Matplotlib
  • Line chart
  • Bar charts
  • Stack charts
  • Scatter charts
  • Pie charts
  • Geographic projections
  • D3.js
  • Installation
  • Practical example
  • Output
  • Summary
  • Chapter 18: Financial Trading System
  • What is algorithmic trading?
  • Benefits of algorithmic trading
  • Big data in the financial market
  • Algorithmic trading strategies
  • Building an Expert Advisor
  • MetaTrader
  • Downloading and setting up MetaTrader
  • MetaQuotes language
  • Trading bot objective
  • Practical
  • Trading pattern - moving average
  • Decision time: buy or sell
  • Complete program
  • Backtesting in MetaTrader 4
  • Summary
  • Chapter 19: Retail Recommendation System
  • Types of recommendation system
  • Collaborative filtering
  • Content-based filtering
  • Demographic-based system
  • Utility-based system
  • Knowledge-based system
  • Hybrid model
  • Commercial tools
  • Barilliance
  • Softcube
  • Strands
  • Monetate
  • Nosto
  • Book recommendation system
  • Dataset
  • Directory structure
  • Code
  • Reading the dataset
  • Verifying the dataset
  • Data analysis
  • Age group
  • Commutative rating.
  • Algorithms.