Mastering Apache Cassandra 3.x an expert guide to improving database scalability and availability without compromising performance
Build, manage, and configure high-performing, reliable NoSQL database for your applications with Cassandra Key Features Write programs more efficiently using Cassandra's features with the help of examples Configure Cassandra and fine-tune its parameters depending on your needs Integrate Cassand...
Other Authors: | , , |
---|---|
Format: | eBook |
Language: | Inglés |
Published: |
Birmingham :
Packt
2018.
|
Edition: | Third edition |
Subjects: | |
See on Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631843006719 |
Table of Contents:
- Cover
- Title Page
- Copyright and Credits
- Packt Upsell
- Foreward
- Contributors
- Table of Contents
- Preface
- Chapter 1: Quick Start
- Introduction to Cassandra
- High availability
- Distributed
- Partitioned row store
- Installation
- Configuration
- cassandra.yaml
- cassandra-rackdc.properties
- Starting Cassandra
- Cassandra Cluster Manager
- A quick introduction to the data model
- Using Cassandra with cqlsh
- Shutting down Cassandra
- Summary
- Chapter 2: Cassandra Architecture
- Why was Cassandra created?
- RDBMS and problems at scale
- Cassandra and the CAP theorem
- Cassandra's ring architecture
- Partitioners
- ByteOrderedPartitioner
- RandomPartitioner
- Murmur3Partitioner
- Single token range per node
- Vnodes
- Cassandra's write path
- Cassandra's read path
- On-disk storage
- SSTables
- How data was structured in prior versions
- How data is structured in newer versions
- Additional components of Cassandra
- Gossiper
- Snitch
- Phi failure-detector
- Tombstones
- Hinted handoff
- Compaction
- Repair
- Merkle tree calculation
- Streaming data
- Read repair
- Security
- Authentication
- Authorization
- Managing roles
- Client-to-node SSL
- Node-to-node SSL
- Summary
- Chapter 3: Effective CQL
- An overview of Cassandra data modeling
- [Cassandra storage model for versions 3.0 and beyond]
- Cassandra storage model for versions 3.0 and beyond
- Data cells
- cqlsh
- Logging into cqlsh
- Problems connecting to cqlsh
- Local cluster without security enabled
- Remote cluster with user security enabled
- Remote cluster with auth and SSL enabled
- Connecting with cqlsh over SSL
- Converting the Java keyStore into a PKCS12 keyStore
- Exporting the certificate from the PKCS12 keyStore
- Modifying your cqlshrc file
- Testing your connection via cqlsh.
- Getting started with CQL
- Creating a keyspace
- Single data center example
- Multi-data center example
- Creating a table
- Simple table example
- Clustering key example
- Composite partition key example
- Table options
- Data types
- Type conversion
- The primary key
- Designing a primary key
- Selecting a good partition key
- Selecting a good clustering key
- Querying data
- The IN operator
- Writing data
- Inserting data
- Updating data
- Deleting data
- Lightweight transactions
- Executing a BATCH statement
- The expiring cell
- Altering a keyspace
- Dropping a keyspace
- Altering a table
- Truncating a table
- Dropping a table
- Truncate versus drop
- Creating an index
- Caution with implementing secondary indexes
- Dropping an index
- Creating a custom data type
- Altering a custom type
- Dropping a custom type
- User management
- Creating a user and role
- Altering a user and role
- Dropping a user and role
- Granting permissions
- Revoking permissions
- Other CQL commands
- COUNT
- DISTINCT
- LIMIT
- STATIC
- User-defined functions
- cqlsh commands
- CONSISTENCY
- COPY
- DESCRIBE
- TRACING
- Summary
- Chapter 4: Configuring a Cluster
- Evaluating instance requirements
- RAM
- CPU
- Disk
- Solid state drives
- Cloud storage offerings
- SAN and NAS
- Network
- Public cloud networks
- Firewall considerations
- Strategy for many small instances versus few large instances
- Operating system optimizations
- Disable swap
- XFS
- Limits
- limits.conf
- sysctl.conf
- Time synchronization
- Configuring the JVM
- Garbage collection
- CMS
- G1GC
- Garbage collection with Cassandra
- Installation of JVM
- JCE
- Configuring Cassandra
- cassandra.yaml
- cassandra-env.sh
- cassandra-rackdc.properties
- dc
- rack
- dc_suffix
- prefer_local
- cassandra-topology.properties.
- jvm.options
- logback.xml
- Managing a deployment pipeline
- Orchestration tools
- Configuration management tools
- Recommended approach
- Local repository for downloadable files
- Summary
- Chapter 5: Performance Tuning
- Cassandra-Stress
- The Cassandra-Stress YAML file
- name
- size
- population
- cluster
- Cassandra-Stress results
- Write performance
- Commitlog mount point
- Scaling out
- Scaling out a data center
- Read performance
- Compaction strategy selection
- Optimizing read throughput for time-series models
- Optimizing tables for read-heavy models
- Cache settings
- Appropriate uses for row-caching
- Compression
- Chunk size
- The bloom filter configuration
- Read performance issues
- Other performance considerations
- JVM configuration
- Cassandra anti-patterns
- Building a queue
- Query flexibility
- Querying an entire table
- Incorrect use of BATCH
- Network
- Summary
- Chapter 6: Managing a Cluster
- Revisiting nodetool
- A warning about using nodetool
- Scaling up
- Adding nodes to a cluster
- Cleaning up the original nodes
- Adding a new data center
- Adjusting the cassandra-rackdc.properties file
- A warning about SimpleStrategy
- Streaming data
- Scaling down
- Removing nodes from a cluster
- Removing a live node
- Removing a dead node
- Other removenode options
- When removenode doesn't work (nodetool assassinate)
- Assassinating a node on an older version
- Removing a data center
- Backing up and restoring data
- Taking snapshots
- Enabling incremental backups
- Recovering from snapshots
- Maintenance
- Replacing a node
- Repair
- A warning about incremental repairs
- Cassandra Reaper
- Forcing read repairs at consistency - ALL
- Clearing snapshots and incremental backups
- Snapshots
- Incremental backups
- Compaction.
- Why you should never invoke compaction manually
- Adjusting compaction throughput due to available resources
- Summary
- Chapter 7: Monitoring
- JMX interface
- MBean packages exposed by Cassandra
- JConsole (GUI)
- Connection and overview
- Viewing metrics
- Performing an operation
- JMXTerm (CLI)
- Connection and domains
- Getting a metric
- Performing an operation
- The nodetool utility
- Monitoring using nodetool
- describecluster
- gcstats
- getcompactionthreshold
- getcompactionthroughput
- getconcurrentcompactors
- getendpoints
- getlogginglevels
- getstreamthroughput
- gettimeout
- gossipinfo
- info
- netstats
- proxyhistograms
- status
- tablestats
- tpstats
- verify
- Administering using nodetool
- cleanup
- drain
- flush
- resetlocalschema
- stopdaemon
- truncatehints
- upgradeSSTable
- Metric stack
- Telegraf
- Installation
- Configuration
- JMXTrans
- Installation
- Configuration
- InfluxDB
- Installation
- Configuration
- InfluxDB CLI
- Grafana
- Installation
- Configuration
- Visualization
- Alerting
- Custom setup
- Log stack
- The system/debug/gc logs
- Filebeat
- Installation
- Configuration
- Elasticsearch
- Installation
- Configuration
- Kibana
- Installation
- Configuration
- Troubleshooting
- High CPU usage
- Different garbage-collection patterns
- Hotspots
- Disk performance
- Node flakiness
- All-in-one Docker
- Creating a database and other monitoring components locally
- Web links
- Summary
- Chapter 8: Application Development
- Getting started
- The path to failure
- Is Cassandra the right database?
- Good use cases for Apache Cassandra
- Use and expectations around application data consistency
- Choosing the right driver
- Building a Java application
- Driver dependency configuration with Apache Maven
- Connection class.
- Other connection options
- Retry policy
- Default keyspace
- Port
- SSL
- Connection pooling options
- Starting simple - Hello World!
- Using the object mapper
- Building a data loader
- Asynchronous operations
- Data loader example
- Summary
- Chapter 9: Integration with Apache Spark
- Spark
- Architecture
- Installation
- Running custom Spark Docker locally
- Configuration
- The web UI
- Master
- Worker
- Application
- PySpark
- Connection config
- Accessing Cassandra data
- SparkR
- Connection config
- Accessing Cassandra data
- RStudio
- Connection config
- Accessing Cassandra data
- Jupyter
- Architecture
- Installation
- Configuration
- Web UI
- PYSpark through Juypter
- Summary
- Appendix: References
- Chapter 1 - Quick Start
- Chapter 2 - Cassandra Architecture
- Chapter 3 - Effective CQL
- Chapter 4 - Configuring a Cluster
- Chapter 5 - Performance Tuning
- Chapter 6 - Managing a Cluster
- Chapter 7 - Monitoring
- Chapter 8 - Application Development
- Chapter 9 - Integration with Apache Spark
- Other Books You May Enjoy
- Index.