Big data architect's handbook a guide to build proficiency in tools and systems used by leading big data experts
A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence About This Book Learn to build and run a big data application with sample code Explore examples to implement activities that a big data architect performs Use Machine Learning and AI for structured...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
2018.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630633306719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Why Big Data?
- What is big data?
- Characteristics of big data
- Volume
- Velocity
- Variety
- Veracity
- Variability
- Value
- Solution-based approach for data
- Data - the most valuable asset
- Traditional approaches to data storage
- Clustered computing
- High availability
- Resource pooling
- Easy scalability
- Big data - how does it make a difference?
- Big data solutions - cloud versus on-premises infrastructure
- Cost
- Security
- Current capabilities
- Scalability
- Big data glossary
- Big data
- Batch processing
- Cluster computing
- Data warehouse
- Data lake
- Data mining
- ETL
- Hadoop
- In-memory computing
- Machine learning
- MapReduce
- NoSQL
- Stream processing
- Summary
- Chapter 2: Big Data Environment Setup
- Oracle VM VirtualBox installation
- Ubuntu installation
- Hadoop prerequisite installation
- Java installation
- SSH installation and configuration
- Hadoop system user
- Apache Hadoop installation
- Hadoop configuration
- Path configuration for Hadoop commands
- Hadoop server start and stop
- Summary
- Chapter 3: Hadoop Ecosystem
- Apache Hadoop
- Hadoop Distributed File System
- HDFS hands-on
- Creating a directory in HDFS
- Copying files from a local file system to HDFS
- Copying files from HDFS to a local file system
- Deleting files and folders in HDFS
- Hadoop MapReduce
- Job Tracker and Task Tracker
- The execution flow of MapReduce
- Mapper
- Shuffle and Sort
- Reducer
- Example program
- Preparing the data file for analysis
- Program code
- Driver program
- Mapper program
- Reducer program
- Observations and results
- YARN
- Resource Manager
- Node Manager
- Container
- Application Master.
- Apache Projects related to big data
- Apache Zookeeper
- Apache Kafka
- Apache Flume
- Apache Cassandra
- Apache HBase
- Apache Spark
- Summary
- Chapter 4: NoSQL Database
- What is NoSQL?
- Benefits of NoSQL databases
- NoSQL versus RDBMS
- The CAP theorem
- The ACID properties
- Data models in NoSQL
- Key-value data stores
- Document store
- Column stores
- Graph stores
- Apache Cassandra
- Installation
- Starting Cassandra
- The Cassandra Query Language - CQL
- The help command
- Basic commands
- Data manipulation
- Creating, altering, and deleting a keyspace
- Creating, altering, and deleting tables
- Inserting, updating, and deleting data
- The MongoDB database
- Installing MongoDB
- Starting MongoDB
- Working on MongoDB
- The help command
- Basic commands
- Data manipulation
- Creating and deleting databases
- Creating and deleting collections
- The create, retrieve, update, delete operations
- Neo4j database
- Installing Neo4j
- Starting Neo4j
- The cypher query language
- Help
- Basic operations in Cypher
- Creating nodes, relationships, and properties
- Updating nodes, relationships, and properties
- Deleting nodes, relationships, and properties
- Reading nodes, relationships, and properties
- Summary
- Chapter 5: Off-the-Shelf Commercial Tools
- Microsoft Azure
- Building a practical application
- Microsoft Azure account
- The Azure Event Hub
- IoT simulation application
- Setting up an Azure Stream Analytics job
- Input
- Query
- Output
- Dashboard in Power BI
- Summary
- Chapter 6: Containerization
- Virtualization
- Hypervisors
- Hardware-based hypervisors
- Software-based hypervisors
- What is containerization?
- Benefits of containers
- Docker
- Docker workflow
- Installation
- Basic commands
- Docker images
- Building a Docker image.
- Running and verifying Docker images
- Importing and exporting Docker images
- Docker Swarm
- Setting up Docker Swarm
- Creating service containers
- Replicating containers
- Removing container services
- Kubernetes
- Key components
- Pods
- ReplicaSets
- Deployments
- PetSets
- Installation
- Deployment
- Kubernetes Dashboard
- Summary
- Chapter 7: Network Infrastructure
- Network
- Local area networks
- Metropolitan area networks
- Wide area networks
- Network connectivity
- Wired
- Wireless
- Network visualization
- Gephi
- Installation
- Java installation
- First run
- Practical example
- Summary
- Chapter 8: Cloud Infrastructure
- Companies moving to cloud
- Driving factors
- Infrastructure
- Locality of data
- Requirements
- Design considerations
- Open source versus commercial
- Commodity hardware versus purpose build
- Cloud versus on-premises
- Scale up and down
- Application architecture
- Cost decision
- Summary
- Chapter 9: Security and Monitoring
- Simple Network Management Protocol
- Benefits of SNMP
- Security
- Agents and Traps
- Netflow
- Nagios
- Key benefits
- Security Onion
- Deployment scenarios
- The Standalone model
- The Server-Sensor model
- Hybrid model
- Preconfigured tools
- Wireshark
- Key features
- Summary
- Chapter 10: Frontend Architecture
- React JS
- Key concepts
- Node.js
- JSX
- Unidirectional dataflow
- Getting started with ReactJS
- Single page application
- React application project
- React app directory structure
- Components
- Properties
- Event handling
- State
- Redux
- Architecture of Redux
- Key concepts
- Single store
- Action
- Reducers
- Guestbook application
- Installation
- Create a store
- Setting up Reducer
- Setting up Dispatcher
- Connect function
- Setting up Subscribers
- Final output
- Summary.
- Chapter 11: Backend Architecture
- API
- RESTful API
- HTTP request methods
- GET
- POST
- PUT
- DELETE
- Authentication
- Basic authentication
- JSON Web Token
- Header
- Payload
- Signature
- Practical
- RESTful web service
- Java client
- Redis
- Installation
- Redis server
- Redis client
- Working with Redis
- Redis data types and structures
- String
- HashMap
- List
- Set
- Redis Publish/Subscribe
- Common key operations
- Summary
- Chapter 12: Machine Learning
- Machine learning
- Types of algorithms
- Parametric algorithms
- Non-parametric algorithms
- Supervised learning
- The classification model
- Binary classification
- Multi-class classification
- The regression model
- Linear regression
- Polynomial regression
- Unsupervised learning
- Clustering, k-means
- Neural networks
- Feedforward neural network
- Recurrent neural network
- Symmetrically connected neural network
- Deep neural networks
- Decision tree classifiers
- Summary
- Chapter 13: Artificial Intelligence
- Artificial intelligence
- Convolutional neural networks
- Deep learning using TensorFlow
- TensorFlow
- Installation
- TensorFlow program
- Uninstalling TensorFlow
- TensorBoard
- Program
- Launching TensorBoard
- TensorBoard graph
- Object detection using YOLO
- Installation
- Compiling YOLO library
- Trained weights
- Detecting objects in an image
- Summary
- Chapter 14: Elasticsearch
- Installing Elasticsearch
- Starting the Elasticsearch server
- Auto starting the Elasticsearch service
- Stopping the Elasticsearch server
- Uninstalling Elasticsearch
- Kibana
- Installation
- Starting Kibana
- Uninstalling Kibana
- Security
- Securing Elasticsearch
- Securing Kibana
- Understanding queries - CRUD commands
- Creating
- Reading
- Updating
- Deleting
- Summary.
- Chapter 15: Structured Data
- Data analysis
- Installing MySQL
- Importing data
- Analyzing the data model
- HBase
- Installation
- Starting an HBase instance
- Stopping a HBase instance
- Preparing an HBase for migration
- Sqoop
- Installation
- Verifying the installation
- MySQL JDBC driver
- Importing data
- Verifying the imported data
- Summary
- Chapter 16: Unstructured Data
- Moving data into Hadoop
- Downloading Flume
- Environment configuration
- Configuring agent and sink
- Running Apache Flume
- Transferring a log file
- Converting images into text for analysis
- Tesseract OCR
- Installing Tesseract
- Practical example
- Complete code
- Program execution
- Summary
- Chapter 17: Data Visualization
- Matplotlib
- Installing Matplotlib
- Line chart
- Bar charts
- Stack charts
- Scatter charts
- Pie charts
- Geographic projections
- D3.js
- Installation
- Practical example
- Output
- Summary
- Chapter 18: Financial Trading System
- What is algorithmic trading?
- Benefits of algorithmic trading
- Big data in the financial market
- Algorithmic trading strategies
- Building an Expert Advisor
- MetaTrader
- Downloading and setting up MetaTrader
- MetaQuotes language
- Trading bot objective
- Practical
- Trading pattern - moving average
- Decision time: buy or sell
- Complete program
- Backtesting in MetaTrader 4
- Summary
- Chapter 19: Retail Recommendation System
- Types of recommendation system
- Collaborative filtering
- Content-based filtering
- Demographic-based system
- Utility-based system
- Knowledge-based system
- Hybrid model
- Commercial tools
- Barilliance
- Softcube
- Strands
- Monetate
- Nosto
- Book recommendation system
- Dataset
- Directory structure
- Code
- Reading the dataset
- Verifying the dataset
- Data analysis
- Age group
- Commutative rating.
- Algorithms.