Database
Cassandra Database
What is Cassandra?
Cassandra is a distributed NoSQL database that is designed to handle large amounts of data across many commodity servers. It is a highly scalable database that can handle large amounts of data across many servers. Reference URL
Installations
Ubuntu Server
- Update the package list and install the dependencies.
> sudo apt update
> sudo apt install -y apt-transport-https ca-certificates curl \
software-properties-common
- Install Dependencies.
> sudo apt install openjdk-11-jdk -y
Verify the Java installation:
> java -version
- Add the Apache Cassandra Repository
> sudo curl -o /etc/apt/keyrings/apache-cassandra.asc \
https://downloads.apache.org/cassandra/KEYS
> echo "deb [signed-by=/etc/apt/keyrings/apache-cassandra.asc] https://debian.cassandra.apache.org 41x main" \
| sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
- Install Apache Cassandra.
> sudo apt update
> sudo apt install -y cassandra
- Start the Apache Cassandra service.
> sudo systemctl start cassandra
- Check the status of the Apache Cassandra service.
> sudo systemctl status cassandra
- Enable the Apache Cassandra service to start on boot.
> sudo systemctl enable cassandra
- Verify the installation.
> nodetool status
- Install Miniconda
Check out this link and install miniconda on your Ubuntu server.
- Create a new conda environment.
> conda create -n cassandra python=3.12
> conda activate cassandra
> which python
Remember the path of the python executable. You will need it in the next step.
- Install dependencies.
> pip install --upgrade six
> pip install cassandra-driver
- Edit
usr/bin/cqlsh.py
and change the first line to#!/your_venv/bin/env python3
.
> sudo nano /usr/bin/cqlsh.py
- Run the following command to start the cqlsh shell.
> cqlsh
Configuration
> sudo nano /etc/cassandra/cassandra.yaml
Firstly, change the name of the cluster. Look for the cluster_name parameter and assign a name:
cluster_name: 'TONYLABS'
It's preferrable to change the data storage port. To do this, look for the storage_port parameter and assign one.
Remember that it must be an available port in the Ubuntu firewall for everything to work correctly. In our case, the port is set as 7000.
storage_port :[port]
Finally, look for the seed_provider parameter and add the IP addresses of the nodes that make up the cluster, separated by a comma:
Seeds: [ node_ip ]:[node_port],[node_ip]:[node_port]...[node_ip]:[node_port]
Once done, save the file and reload Cassandra.
> sudo systemctl reload cassandra
Now test out the connection with the following command:
> nodetool status
Remote Access
# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: localhost
Modify rpc_address
and broadcast_rpc_address
:
Change rpc_address: localhost
to rpc_address: 0.0.0.0
then change broadcast_rpc_address: 1.2.3.4
to broadcast_rpc_address: SERVER_SELF_PUBLIC_IP_ADDRESS
Restart the Cassandra service:
> sudo systemctl restart cassandra
Authentication
- Change the authenticator option in the cassandra.yaml file to PasswordAuthenticator:
# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.roles table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# If using PasswordAuthenticator, CassandraRoleManager must also be used (see below)
# authenticator: AllowAllAuthenticator
authenticator: PasswordAuthenticator
- Restart the Cassandra service:
> sudo systemctl restart cassandra
- Start cqlsh using the default superuser name and password
> cqlsh -u cassandra -p cassandra
- To ensure that the keyspace is always available, increase the replication factor for the system_auth keyspace to 3 to 5 nodes per datacenter (recommended):
cqlsh> ALTER KEYSPACE "system_auth"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
The system_auth keyspace uses a QUORUM consistency level when checking authentication for the default cassandra user. For all other users created, superuser or otherwise, a LOCAL_ONE consistency level is used for authenticating. Note: Datacenter names are case sensitive. Verify the case of the using utility, such as nodetool status. CAUTION: Leaving the default replication factor of 1 set for the system_auth keyspace results in denial of access to the cluster if the single replica of the keyspace goes down. For multiple datacenters, be sure to set the replication class to NetworkTopologyStrategy.
- After increasing the replication factor of a keyspace, run nodetool repair to make certain the change is propagated
> nodetool repair system_auth
- Restart the Cassandra service:
> sudo systemctl restart cassandra
- Start
cqlsh
using the default superuser name and password:
> cqlsh -u cassandra -p cassandra
Basic Query
cqlsh
cqlsh
, or Cassandra query language shell, is used to communicate with Cassandra and initiate Cassandra Query Language. To start cqlsh, use the following command:
root@tonylabs:/# cqlsh
Connected to TONYLABS Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>
HELP
The HELP
command lists out descriptions for all possible cqlsh commands: For example, the output for HELP SHOW would look like this:
cqlsh> HELP SHOW
SHOW [cqlsh only]
Displays information about the current cqlsh session. Can be called in the following ways:
SHOW VERSION
Shows the version and build of the connected Cassandra instance, as well as the version of the CQL spec that the connected Cassandra instance understands.
SHOW HOST
Shows where cqlsh is currently connected.
SHOW SESSION <sessionid>
Pretty-prints the requested tracing session.
cqlsh>
SHOW
The SHOW command displays all the information about the current cqlsh session. You can choose between showing host, version, and session information:
cqlsh> SHOW VERSION
[cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native protocol v5]
cqlsh> SHOW HOST
Connected to Test Cluster at 127.0.0.1:9042
cqlsh>
CREATE KEYSPACE
A keyspace specifies data replication. In the following example, we will create a new keyspace and specify the replication factor:
cqlsh> CREATE KEYSPACE default_keyspace
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
USE
The USE
command sets the current working keyspace:
cqlsh> USE default_keyspace;
cqlsh:default_keyspace>
CREATE TABLE
In order to create a table, users need to use the CREATE TABLE
command. Here they will need to specify column names, data types, and primary key:
cqlsh:testingout> CREATE TABLE test_table (
name TEXT PRIMARY KEY,
surname TEXT,
phone INT
);
INSERT
INSERT
command is used to add an entire row into a table. Mind that missing values will be set to null:
cqlsh:test> INSERT INTO test_table (name, surname, phone)
VALUES ('Tony', 'Wang', 5432123789);
cqlsh:test>
Troubleshooting
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.