Database
Chroma Database
What is Chroma Database?
Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.
Chroma gives you everything you need for retrieval:
- Store embeddings and their metadata
- Vector search
- Full-text search
- Document storage
- Metadata filtering
- Multi-modal retrieval
The AI-native open-source embedding database - https://www.trychroma.com
What are vector embeddings?
Vector Embeddings are numerical vector representations of data. They can include text, images, videos, and other types of data.
Vector Embeddings can be used for efficient querying operations such as:
- Similarity Search
- Anomaly Detection
- Natural Language Processing (NLP) Tasks
There are many popular and well-tested vector databases besides Chroma including (but not limited to) Pinecone, Milvus, and Weaviate. However, for this post, we're going to focus on Chroma.
How it works?
Install
Virtual Environment
If you don't have Conda installed, you can follow the instructions here.
> conda create -n chromadb python=3.12
> conda activate chromadb
Install Chroma Database
In Python, Chroma can run in a python script or as a server. Install Chroma with
> pip install chromadb
Updating Chroma Database
> pip install --upgrade chromadb
To verify the installed version after upgrading, run:
> pip show chromadb
If you want to upgrade to a specific version, use:
> pip install --upgrade chromadb==<version>
Run Chroma Server
> chroma run --path /data/chroma
# or
> chroma run --host localhost --port 8000 --path /data/chroma
Systemd Service
Create Service
You can create systemd service to manage the Chroma server.
- Create a service file:
> sudo nano /etc/systemd/system/chroma.service
- Edit the content with the following lines:
[Unit]
Description = Chroma Service
After = network.target
[Service]
Type = simple
User = ubuntu
Group = ubuntu
WorkingDirectory = /data
ExecStart=/home/ubuntu/miniconda3/envs/chromadb/bin/chroma run --host <0.0.0.0> --port 8000 --path /data/chroma --log-path /data/chroma/chroma.log
[Install]
WantedBy = multi-user.target
You can replace 0.0.0.0 with 127.0.0.1 if you want to restrict access to localhost only.
- Run the following commands to enable and start the service:
> sudo systemctl daemon-reload
> sudo systemctl enable chroma
> sudo systemctl start chroma
Check the status of the service with
> sudo systemctl status chroma
Monitor the logs with
> journalctl -u chroma
Stop and Disable Service
> sudo systemctl stop chroma
> sudo systemctl disable chroma
> sudo rm /etc/systemd/system/chroma.service
> sudo systemctl daemon-reload
Verify that the service is removed:
> sudo systemctl status chroma
Collection
Create Collection
import chromadb
chroma_client = chromadb.HttpClient(host='<server_ip>', port=8000)
collection = chroma_client.create_collection(name="<collection_name>")
If you want to get or create a collection, use the following code:
collection = chroma_client.get_or_create_collection(name="<collection_name>")
The line above is equivalent to:
if "<collection_name>" not in chroma_client.list_collections():
collection = chroma_client.create_collection(name="<collection_name>")
else:
collection = chroma_client.get_collection("<collection_name>")
Delete Collection
chroma_client.delete_collection(name="<collection_name>")
Querying Collection
You can query the collection with a list of query texts, and Chroma will return the n
most similar results.
results = collection.query(
query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
n_results=2 # how many results to return
)
print(results)
Connect via Tunnel
If your Chroma server is not directly accessible, you can use an SSH tunnel to connect to it.
First create an SSH tunnel to the Chroma server:
> ssh -N -L 8000:<chroma_db_server_ip>:8000 <proxy_server_user>@<proxy_server_ip> -p <proxy_server_public_port>
Edit the Chroma client to connect to the local port:
chroma_client = chromadb.HttpClient(host='localhost', port=8000)
Troubleshooting
Details: An instance of Chroma already exists for ephemeral with different settings