Database

Chroma Database

What is Chroma Database?

Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.

Chroma DB

Chroma gives you everything you need for retrieval:

  • Store embeddings and their metadata
  • Vector search
  • Full-text search
  • Document storage
  • Metadata filtering
  • Multi-modal retrieval

The AI-native open-source embedding database - https://www.trychroma.com


What are vector embeddings?

Vector Embeddings are numerical vector representations of data. They can include text, images, videos, and other types of data.

Vector Embeddings can be used for efficient querying operations such as:

  • Similarity Search
  • Anomaly Detection
  • Natural Language Processing (NLP) Tasks

There are many popular and well-tested vector databases besides Chroma including (but not limited to) Pinecone, Milvus, and Weaviate. However, for this post, we're going to focus on Chroma.


How it works?

Chroma Architecture


Install

Virtual Environment

If you don't have Conda installed, you can follow the instructions here.

> conda create -n chromadb python=3.12
> conda activate chromadb

Install Chroma Database

In Python, Chroma can run in a python script or as a server. Install Chroma with

> pip install chromadb

Updating Chroma Database

> pip install --upgrade chromadb

To verify the installed version after upgrading, run:

> pip show chromadb

If you want to upgrade to a specific version, use:

> pip install --upgrade chromadb==<version>

Run Chroma Server

> chroma run --path /data/chroma

# or

> chroma run --host localhost --port 8000 --path /data/chroma

Systemd Service

Create Service

You can create systemd service to manage the Chroma server.

  1. Create a service file:
> sudo nano /etc/systemd/system/chroma.service
  1. Edit the content with the following lines:
[Unit]
Description = Chroma Service
After = network.target

[Service]
Type = simple
User = ubuntu
Group = ubuntu
WorkingDirectory = /data
ExecStart=/home/ubuntu/miniconda3/envs/chromadb/bin/chroma run --host <0.0.0.0> --port 8000 --path /data/chroma --log-path /data/chroma/chroma.log

[Install]
WantedBy = multi-user.target

You can replace 0.0.0.0 with 127.0.0.1 if you want to restrict access to localhost only.

  1. Run the following commands to enable and start the service:
> sudo systemctl daemon-reload
> sudo systemctl enable chroma
> sudo systemctl start chroma

Check the status of the service with

> sudo systemctl status chroma

Monitor the logs with

> journalctl -u chroma

Stop and Disable Service

> sudo systemctl stop chroma
> sudo systemctl disable chroma
> sudo rm /etc/systemd/system/chroma.service
> sudo systemctl daemon-reload

Verify that the service is removed:

> sudo systemctl status chroma

Collection

Create Collection

import chromadb
chroma_client = chromadb.HttpClient(host='<server_ip>', port=8000)
collection = chroma_client.create_collection(name="<collection_name>")

If you want to get or create a collection, use the following code:

collection = chroma_client.get_or_create_collection(name="<collection_name>")

The line above is equivalent to:

if "<collection_name>" not in chroma_client.list_collections():
    collection = chroma_client.create_collection(name="<collection_name>")
else:
    collection = chroma_client.get_collection("<collection_name>")

Delete Collection

chroma_client.delete_collection(name="<collection_name>")

Querying Collection

You can query the collection with a list of query texts, and Chroma will return the n most similar results.

results = collection.query(
    query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
    n_results=2 # how many results to return
)
print(results)

Connect via Tunnel

If your Chroma server is not directly accessible, you can use an SSH tunnel to connect to it.

First create an SSH tunnel to the Chroma server:

> ssh -N -L 8000:<chroma_db_server_ip>:8000 <proxy_server_user>@<proxy_server_ip> -p <proxy_server_public_port>

Edit the Chroma client to connect to the local port:

chroma_client = chromadb.HttpClient(host='localhost', port=8000)

Troubleshooting

Details: An instance of Chroma already exists for ephemeral with different settings

Previous
Cassandra
Next
MySQL