NVIDIA AGX Thor - TONYLABS TECH CO., LTD.

What is AGX Thor?

NVIDIA Jetson AGX Thor is a high-performance edge AI computing platform designed by NVIDIA for advanced robotics, physical artificial intelligence (AI), and other demanding real-time applications outside traditional data centers.

Installation

First, you need to download the Jetson BSP installation media (Jetson ISO) image file from NVIDIA's website.

You can go to JetPack Download Page, find the latest JetPack version for Jetson AGX Thor Developer Kit, and download the Jetson ISO image file on to your laptop or PC.

Create installation USB
To install Jetson BSP (Jetson Linux) on your Jetson AGX Thor Developer Kit, we first need to create an installation media by writing the downloaded ISO image to a USB stick.

注意

Caution: You cannot just simply copy the ISO image file to the USB stick on your OS (e.g. Windows Explorer, or Mac Finder). You need to create a bootable USB stick using some special software.

In this guide, we will use Balena Etcher to create a bootable USB stick.

Once you start Etcher, select the downloaded ISO image file, and select the USB stick you want to write the ISO image to.

Install Ollama

$ sudo su
$ export PATH=/usr/local/cuda-13.0/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH
$ cd /opt

# Fetch the latest Go version dynamically
$ GO_VERSION=$(curl -s https://go.dev/dl/?mode=json | jq -r '.[0].version')        # e.g. "go1.25.1"
$ GO_TARBALL="${GO_VERSION}.linux-arm64.tar.gz"
# Download Go
$ wget "https://go.dev/dl/${GO_TARBALL}"

# remove any old version
$ rm -rf /usr/local/go

# Extract and install
$ tar -xzf "${GO_TARBALL}"
$ mv go /usr/local/

$ export PATH=$PATH:/usr/local/go/bin
$ cd /opt && git clone https://github.com/ollama/ollama
$ cd ollama && cmake -DCMAKE_CUDA_ARCHITECTURES=110 -B build && cmake --build build
$ cd /opt/ollama
$ go run . serve

Test with DeepSeek R1 8B model:

$ export PATH=$PATH:/usr/local/go/bin
$ go run . run --verbose deepseek-r1:8b

Run Ollama on NVIDIA AGX Thor via Docker:

ubuntu@thor:~$ mkdir -p ~/ollama
ubuntu@thor:~$ docker run --name ollama --runtime nvidia -d --network host -it -v ${HOME}/ollama:/data --user $(id -u):$(id -g) ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04
ubuntu@thor:~$ docker exec -it ollama /bin/sh
$ ollama pull gpt-oss:20b

ubuntu@thor:~$ docker stop ollama
ubuntu@thor:~$ docker rm -f ollama
ubuntu@thor:~$ docker system prune --volumes

Run this command to monitor GPU performance:

$ watch -n 1 nvidia-smi
# or
$ tegrastats

Ollama Service

[Unit]
Description=Ollama Server Service
After=network.target

[Service]
# Run as a non-root user for safety (change 'ubuntu' to your username)
User=ubuntu
Group=ubuntu

# Set working directory
WorkingDirectory=/opt/ollama

# Environment variables
Environment="OLLAMA_HOST=0.0.0.0"
Environment="PATH=/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

# Command to start Ollama
ExecStart=/usr/local/go/bin/go run . serve

# Restart policy
Restart=always
RestartSec=5

# Output logs to journalctl
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

$ sudo systemctl daemon-reload
$ sudo systemctl enable ollama.service
$ sudo systemctl start ollama.service
$ sudo systemctl restart ollama.service
$ sudo systemctl stop ollama.service
$ sudo journalctl -u ollama.service -f

Linux Manual Install

$ curl -L https://ollama.com/download/ollama-linux-amd64.tgz \
-o ollama-linux-amd64.tgz

Extracts the contents of the ollama-linux-amd64.tgz and places the extracted files into /usr:

$ sudo tar -C /usr -xzf ollama-linux-amd64.tgz

Run Ollama for tests:

$ ollama serve

Open another terminal and verify that Ollama is running:

$ ollama -v

Make Ollama as a startup service?

The latest version of Ollama will set up user and service for you during installation.

Install CUDA

Checkout the instructions here

Customizing

$ sudo systemctl edit ollama

Alternatively, create an override file manually in /etc/systemd/system/ollama.service.d/override.conf:

[Service]
Environment="OLLAMA_DEBUG=1"

Updating

Update Ollama by running the install script again:

$ curl -fsSL https://ollama.com/install.sh | sh

Or by re-downloading Ollama:

$ curl -L https://ollama.com/download/ollama-linux-amd64.tgz \
-o ollama-linux-amd64.tgz
$ sudo tar -C /usr -xzf ollama-linux-amd64.tgz

View Logs

Linux

$ sudo journalctl -u ollama --follow

macOS

$ tail -f ~/.ollama/logs/server.log

Windows

$ type "$env:LOCALAPPDATA\Ollama\logs\server.log"

Docker

$ docker ps  # find container ID/name
$ docker logs <container-name> --follow

Uninstall Manual Installation

Remove system service:

$ sudo systemctl stop ollama
$ sudo systemctl disable ollama
$ sudo rm /etc/systemd/system/ollama.service

Remove the ollama binary from your bin directory (either /usr/local/bin, /usr/bin, or /bin):

$ sudo rm $(which ollama)

Remove the downloaded models and Ollama service user and group:

$ sudo rm -r /usr/share/ollama
$ sudo userdel ollama
$ sudo groupdel ollama

Ollama Commands

List models:

$ ollama list

Remove a model:

$ ollama rm <model name>

List which models are currently loaded

$ ollama ps

Stop a model which is currently running

$ ollama stop llama3.2

Expose Ollama

macOS

Stop the Ollama application: Ensure the Ollama app is not running.
Set the environment variable: Open Terminal and run:

$ launchctl setenv OLLAMA_HOST 0.0.0.0:11434

Restart Ollama: Relaunch the Ollama application from your Applications folder. It will now listen on all network interfaces.

Linux

Edit systemd service by calling sudo nano /etc/systemd/system/ollama.service

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_DEBUG=1"

Reload systemd by calling following commands:

$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama

Windows

On Windows, Ollama inherits your user and system environment variables.

First Quit Ollama by clicking on it in the task bar.
Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for environment variables.
Click on Edit environment variables for your account.
Edit or create a new variable for your user account for OLLAMA_HOST, OLLAMA_MODELS, etc.
Click OK/Apply to save.
Start the Ollama application from the Windows Start menu.

Nginx

server {
    listen 80;
    server_name example.com;  # Replace with your domain or IP
    location / {
        proxy_pass http://localhost:11434;
        proxy_set_header Host localhost:11434;
    }
}

Ngrok

Ollama can be accessed using a range of tools for tunneling tools. Run the following command for test purposes:

$ ngrok http 11434 --host-header="localhost:11434"

Alternatively, edit the Ngrok config file manually in ~/.config/ngrok/ngrok.yml or run the following command:

$ ngrok config edit

Grab your Ngrok token and free domain name and replace them in the config file:

version: 3
agent:
  authtoken: 2n*******************
tunnels:
  ollama:
    proto: http
    addr: 11434
    domain: example.ngrok.app
    request_header:
      add: ["Host: localhost:11434"]

Start your tunnel:

$ ngrok start ollama
# or
$ ngrok start --all

Run Ollama

$ ollama run llama3.2:3b
#or
$ ollama run mistral

If you want to monitor a Ollama geneartion performance:

$ ollama run llama3.2 --verbose

Ollama Verbose

API

Generate Embeddings

$ curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "Why is the sky blue?"
}'

Multiple Text Inputs:

curl http://localhost:11434/api/embed -d '{
  "model": "all-minilm",
  "input": ["Why is the sky blue?", "Why is the grass green?"]
}'

How to troubleshoot issues

$ cat ~/.ollama/logs/server.log

#or

$ journalctl -u ollama --no-pager --follow --pager-end

FAQ

How can I tell if my model was loaded onto the GPU?

Use the ollama ps command to see what models are currently loaded into memory.

NAME	ID	SIZE	PROCESSOR	UNTIL
llama3:70b	bcfb190ca3a7	42 GB	100% GPU	4 minutes from now

How do I manage the maximum number of requests the Ollama server can queue?

If too many requests are sent to the server, it will respond with a 503 error indicating the server is overloaded. You can adjust how many requests may be queue by setting OLLAMA_MAX_QUEUE.

Where are models stored?

macOS: ~/.ollama/models
Linux: /usr/share/ollama/.ollama/models
Windows: C:\Users\%username%\.ollama\models

What is AGX Thor?

Installation

Install Ollama

Run Ollama on NVIDIA AGX Thor via Docker:

Ollama Service

Linux Manual Install

Make Ollama as a startup service?

Install CUDA

Customizing

Updating

View Logs

Linux

macOS

Windows

Docker

Uninstall Manual Installation

Ollama Commands

Expose Ollama

macOS

Linux

Windows

Nginx

Ngrok

Run Ollama

API

Generate Embeddings

Tags

How to troubleshoot issues

FAQ