Apache hive essentials immerse yourself on a fantastic journey to discover the attributes of big data by using hive
If you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an S...
Otros Autores: | , , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England ; Mumbai, [India] :
Packt Publishing
2015.
|
Edición: | 1st edition |
Colección: | Community experience distilled.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009628897406719 |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Overview of Big Data and Hive; A short history; Introducing big data; Relational and NoSQL database versus Hadoop; Batch, real-time, and stream processing; Overview of the Hadoop ecosystem; Hive overview; Summary; Chapter 2: Setting Up the Hive Environment; Installing Hive from Apache; Installing Hive from vendor packages; Starting Hive in the cloud; Using the Hive command line and Beeline; The Hive-integrated development environment; Summary
- Chapter 3: Data Definition and DescriptionUnderstanding Hive data types; Data type conversions; Hive Data Definition Language; Hive database; Hive internal and external tables; Hive partitions; Hive buckets; Hive views; Summary; Chapter 4: Data Selection and Scope; The SELECT statement; The INNER JOIN statement; The OUTER JOIN and CROSS JOIN statements; Special JOIN - MAPJOIN; Set operation - UNION ALL; Summary; Chapter 5: Data Manipulation; Data exchange - LOAD; Data exchange - INSERT; Data exchange - EXPORT and IMPORT; ORDER and SORT; Operators and functions; Transactions; Summary
- Chapter 6: Data Aggregation and SamplingBasic aggregation - GROUP BY; Advanced aggregation - GROUPING SETS; Advanced aggregation - ROLLUP and CUBE; Aggregation condition - HAVING; Analytic functions; Sampling; Summary; Chapter 7: Performance Considerations; Performance utilities; The EXPLAIN statement; The ANALYZE statement; Design optimization; Partition tables; Bucket tables; Index; Data file optimization; File format; Compression; Storage optimization; Job and query optimization; Local mode; JVM reuse; Parallel execution; Join optimization; Common join; Map join; Bucket map join
- Sort merge bucket (SMB) joinSort merge bucket map (SMBM) join; Skew join; Summary; Chapter 8: Extensibility Considerations; User-defined functions; The UDF code template; The UDAF code template; The UDTF code template; Development and deployment; Streaming; SerDe; Summary; Chapter 9: Security Considerations; Authentication; Metastore server authentication; HiveServer2 authentication; Authorization; Legacy mode; Storage-based mode; SQL standard-based mode; Encryption; Summary; Chapter 10: Working with Other Tools; JDBC/ODBC connector; HBase; Hue; HCatalog; ZooKeeper; Oozie; Hive roadmap
- SummaryIndex