Apache Hive essentials essential techniques to help you process, and get unique insights from, big data
This book takes you on a fantastic journey to discover the attributes of big data using Apache Hive. About This Book Grasp the skills needed to write efficient Hive queries to analyze the Big Data Discover how Hive can coexist and work with other tools within the Hadoop ecosystem Uses practical, exa...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing Ltd
[2018]
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630433906719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Dedication
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Overview of Big Data and Hive
- A short history
- Introducing big data
- The relational and NoSQL databases versus Hadoop
- Batch, real-time, and stream processing
- Overview of the Hadoop ecosystem
- Hive overview
- Summary
- Chapter 2: Setting Up the Hive Environment
- Installing Hive from Apache
- Installing Hive from vendors
- Using Hive in the cloud
- Using the Hive command
- Using the Hive IDE
- Summary
- Chapter 3: Data Definition and Description
- Understanding data types
- Data type conversions
- Data Definition Language
- Database
- Tables
- Table creation
- Table description
- Table cleaning
- Table alteration
- Partitions
- Buckets
- Views
- Summary
- Chapter 4: Data Correlation and Scope
- Project data with SELECT
- Filtering data with conditions
- Linking data with JOIN
- INNER JOIN
- OUTER JOIN
- Special joins
- Combining data with UNION
- Summary
- Chapter 5: Data Manipulation
- Data exchanging with LOAD
- Data exchange with INSERT
- Data exchange with [EX|IM]PORT
- Data sorting
- Functions
- Function tips for collections
- Function tips for date and string
- Virtual column functions
- Transactions and locks
- Transactions
- UPDATE statement
- DELETE statement
- MERGE statement
- Locks
- Summary
- Chapter 6: Data Aggregation and Sampling
- Basic aggregation
- Enhanced aggregation
- Grouping sets
- Rollup and Cube
- Aggregation condition
- Window functions
- Window aggregate functions
- Window sort functions
- Window analytics functions
- Window expression
- Sampling
- Random sampling
- Bucket table sampling
- Block sampling
- Summary
- Chapter 7: Performance Considerations
- Performance utilities
- EXPLAIN statement.
- ANALYZE statement
- Logs
- Design optimization
- Partition table design
- Bucket table design
- Index design
- Use skewed/temporary tables
- Data optimization
- File format
- Compression
- Storage optimization
- Job optimization
- Local mode
- JVM reuse
- Parallel execution
- Join optimization
- Common join
- Map join
- Bucket map join
- Sort merge bucket (SMB) join
- Sort merge bucket map (SMBM) join
- Skew join
- Job engine
- Optimizer
- Vectorization optimization
- Cost-based optimization
- Summary
- Chapter 8: Extensibility Considerations
- User-defined functions
- UDF code template
- UDAF code template
- UDTF code template
- Development and deployment
- HPL/SQL
- Streaming
- SerDe
- Summary
- Chapter 9: Security Considerations
- Authentication
- Metastore authentication
- Hiveserver2 authentication
- Authorization
- Legacy mode
- Storage-based mode
- SQL standard-based mode
- Mask and encryption
- The data-hashing function
- The data-masking function
- The data-encryption function
- Other methods
- Summary
- Chapter 10: Working with Other Tools
- The JDBC/ODBC connector
- NoSQL
- The Hue/Ambari Hive view
- HCatalog
- Oozie
- Spark
- Hivemall
- Summary
- Other Books You May Enjoy
- Index.