Unsloth - TONYLABS TECH CO., LTD.

什么是 Unsloth?

Unsloth 是一款 AI 微调工具，可帮助你提升机器学习模型的性能。它让大型语言模型（如 Llama-3、Mistral、Phi-4 和 Gemma）的微调速度提升 2 倍，内存占用减少 70%，且不会降低准确率！

什么是微调？为什么要微调？

微调（Fine-tuning）是一种自定义大语言模型（LLM）行为的方式，它可以增强特定领域的知识，并优化模型在特定任务上的表现。微调的本质是通过一种称为反向传播（back-propagation）的过程，更新语言模型的核心参数。

通过对一个预训练模型（如 Llama-3.1-8B）进行微调，并使用专门的数据集，你可以：

更新知识： 引入新的领域特定信息。
自定义行为： 调整模型的语气、个性或回应风格。
优化特定任务： 提高在特定应用场景下的准确性和相关性。

典型应用场景：

训练 LLM 预测新闻标题对公司的影响是正面还是负面。
利用历史客户互动数据，使回复更加精准和个性化。
在法律文本上微调 LLM，以用于合同分析、案例研究和合规性检查。

你可以将微调后的模型视为一个针对特定任务优化的专业代理，使其更加高效和精准。微调可以实现所有 RAG（检索增强生成）的能力，但 RAG 无法完全替代微调。

如何使用 Unsloth？

Unsloth 可以在本地安装，支持 Linux、Windows（通过 WSL）、Kaggle，或 Google Colab 等 GPU 服务。大多数用户通过 Google Colab 使用 Unsloth，因为它提供免费的 GPU 进行训练。

Conda

In this example, it's using the Amazon Machine Image Deep Learning Base OSS NVIDIA Driver GPU AMI. It has following dependencies pre-installed:

Ubuntu 22.04
NVIDIA driver 550.144.03
CUDA version is 12.4

$ conda create --name unsloth \
    python=3.11 \
    pytorch-cuda=12.4 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y

$ conda activate unsloth
$ pip install --upgrade pip
$ pip install "unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"
$ pip install --no-deps trl peft accelerate bitsandbytes

-c pytorch → Uses the pytorch channel to look for packages.
-c nvidia → Uses the nvidia channel to look for packages.
-c xformers → Uses the xformers channel to look for packages.

Windows

Install NVIDIA GPU Driver
Install Visual Studio C++

You will need Visual Studio, with C++ installed. By default, C++ is not installed with Visual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK.
Download and launch the installer here
- In the installer, navigate to individual components and select all the options listed here:
  - .NET Framework 4.8 SDK
  - .NET Framework 4.7.2 targeting pack
  - C# and Visual Basic Roslyn compilers
  - MSBuild
  - MSVC v143 - VS 2022 C++ x64/x86 build tools
  - C++ 2022 Redistributable Update
  - C++ CMake tools for Windows
  - C++/CLI support for v143 build tools (Latest)
  - MSBuild support for LLVM (clang-cl) toolset
  - C++ Clang Compiler for Windows (19.1.1)
  - Windows 11 SDK (10.0.22621.0)
  - Windows Universal CRT SDK
  - C++ 2022 Redistributable MSMs
- Easier method: Or you can open an elevated Command Prompt or PowerShell:
  - Search for "cmd" or "PowerShell", right-click it, and choose "Run as administrator."
  - Paste and run this command (update the Visual Studio path if necessary):
    "C:\Program Files (x86)\Microsoft Visual Studio\Installer\vs_installer.exe" modify ^ --installPath "C:\Program Files\Microsoft Visual Studio\2022\Community" ^ --add Microsoft.Net.Component.4.8.SDK ^ --add Microsoft.Net.Component.4.7.2.TargetingPack ^ --add Microsoft.VisualStudio.Component.Roslyn.Compiler ^ --add Microsoft.Component.MSBuild ^ --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 ^ --add Microsoft.VisualStudio.Component.VC.Redist.14.Latest ^ --add Microsoft.VisualStudio.Component.VC.CMake.Project ^ --add Microsoft.VisualStudio.Component.VC.CLI.Support ^ --add Microsoft.VisualStudio.Component.VC.Llvm.Clang ^ --add Microsoft.VisualStudio.ComponentGroup.ClangCL ^ --add Microsoft.VisualStudio.Component.Windows11SDK.22621 ^ --add Microsoft.VisualStudio.Component.Windows10SDK.19041 ^ --add Microsoft.VisualStudio.Component.UniversalCRT.SDK ^ --add Microsoft.VisualStudio.Component.VC.Redist.MSM

Install CUDA Toolkit
Install Miniconda (which has Python) here: https://www.anaconda.com/docs/getting-started/miniconda/install
Install PyTorch
- You will need the correct version of PyTorch that is compatible with your CUDA drivers, so make sure to select them carefully.
Install Unsloth
- Open Conda command prompt or your terminal with Python and run the command:
```
> pip install "unsloth[windows] @ git+https://github.com/unslothai/unsloth.git"
```

加载预训练模型

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # 

dtype = (
    None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-3-medium-4k-instruct", # Choose any model from Hugging Face.
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

微调前测试

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

加载数据集

from datasets import load_dataset
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Troubleshooting

TypeError: SFTTrainer.init() got an unexpected keyword argument 'dataset_text_field'

Solution 1:
- pip install trl==0.12.2

Solution 2:

Fix issue by means of importing SFTConfig from trl, replacing TrainingArguments with it and moving all necessary parameters (mentioned there https://github.com/huggingface/trl/blob/main/trl/trainer/sft_config.py) out of SFTTrainer to SFTConfig. You must pass dataset_text_field in SFTConfig, not in SFTTrainer.

from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    args = SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps = 5,
        num_train_epochs = 3, # Set this for 1 full training run.
        #max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "model_traning_outputs",
        report_to = "none",
        max_seq_length = 2048,
        dataset_num_proc = 4,
        packing = False, # Can make training 5x faster for short sequences.
    ),
)

什么是数据集（Dataset)？

对于 LLMs（大语言模型），数据集是用于训练模型的数据集合。为了使文本数据适用于训练，它必须采用可**标记化（tokenized）**的格式。

标记化（Tokenization）

标记化是指将文本拆分为称为 token（标记）的单元，这些单元可以是单词、子词，甚至是字符。

在大语言模型的训练中，通常采用子词级别的标记化，这样可以处理超出词汇表范围的输入（OOV，out-of-vocabulary）。

在训练过程中，标记会被映射到**高维潜在空间（latent space）中，并通过注意力机制（attention mechanisms）**优化这些嵌入，从而生成符合上下文的输出。

总结：

标记化的作用是将原始文本转换为机器可读的格式，同时保留有意义的信息。

数据格式

为了实现标记化，数据集需要采用可被标记器读取的格式。

格式	描述	训练类型
原始语料	来自网站、书籍或文章等来源的原始文本。	持续预训练（Continued Pretraining）
指令	提供模型需要遵循的指令，以及期望的输出示例。	监督微调（Supervised Fine-Tuning, SFT）
对话	用户与 AI 助手之间的多轮对话。	监督微调（Supervised Fine-Tuning, SFT）
RLHF	用户与 AI 助手之间的对话，并由人工评估者对助手的回答进行排名。	强化学习（Reinforcement Learning）

入门指南

在格式化数据之前，我们需要明确以下几点：

数据集的目的

了解数据集的目的有助于确定所需数据及其格式。

数据集的用途可能包括：

让模型适应新的任务，例如摘要生成。
提高模型在特定角色扮演场景中的表现。

输出风格

输出风格决定了我们需要使用的数据来源，以匹配期望的输出格式。

示例：

目标输出可以是 JSON、HTML、纯文本或代码。
也可能需要特定语言，如 西班牙语、英语或德语 等。

数据来源

明确数据的目的和风格后，我们就可以寻找合适的数据来源。

常见的数据来源包括：

CSV 文件、PDF、网站 等。
合成数据：可以手动生成数据，但需要特别注意数据质量和相关性。

格式化数据

当我们确定了相关标准并收集了必要的数据后，就可以将数据格式化为机器可读的格式，使其可用于训练。

持续预训练（Continued Pretraining）

对于持续预训练，我们使用无特定结构的原始文本格式：

Llama.cpp

Dependencies

$ sudo apt install cmake -y
$ sudo apt install build-essential -y
$ sudo apt install libcurl4-openssl-dev -y

Build Llama.cpp

Navigate to your fine-tune project. The llama.cpp will be downloaded as a folder inside your fine-tune project.

$ cd llama.cpp
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j$(nproc)

If you don’t need the features that use CURL (e.g., model downloading), you can explicitly disable it:

$ cd llama.cpp
$ mkdir -p build
$ cd build
$ cmake .. -DLLAMA_CURL=OFF
$ make

Once the build is successful, you should find llama-quantize and llama-gguf common tools.

Ollama Runs Fine-tuned Model

Step 1: Install Ollama by following instructions here

Modelfile

$ cd <your_finetune_project>
$ cd model
$ nano Modelfile

The Modelfile is a file make Ollama recognize a custom model, the format looks like:

FROM ./model/<model_name>.gguf

# Parameters for shorter responses
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
PARAMETER num_ctx 4096
PARAMETER num_predict 150  # Limit response length to ~150 tokens
PARAMETER repeat_penalty 1.1

# System prompt for concise responses
SYSTEM "You are <model_name>, an AI assistant specialized in electronic components. Provide concise, direct answers about components including category, designator, key parameters, and pins. Keep responses brief and factual. Avoid lengthy explanations unless specifically requested."

$ ollama create <model_name> -f Modelfile
$ ollama run <model_name>

If you fine-tune the model again and produce a new .gguf, Ollama will not automatically reload it, because it caches models after you run ollama create.

Remove the old model from Ollama:

$ ollama rm <old_model_name>hh

Test Ollama

curl http://localhost:11434/api/generate -d '{
  "model": "<model_name_here>",
  "prompt": "Who are you?",
  "stream": false
}'