Local InferenceOpen SourceFreeActiveMachine-verified· advanced · ~60 min setup

Unsloth: write parametric memory in with a fine-tune config

Write a valid Unsloth fine-tune config that bakes stable, always-needed domain knowledge into a small model so the knowledge is native rather than carried in a prompt on every call.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants domain knowledge (jargon, house style, factual context) to be native to the model rather than pasted in every prompt. CI validates the fine-tune config: a model name, max_seq_length, a LoRA block with r set, and training hyperparameters including a learning rate and output dir. No install, no GPU, no key. The actual fine-tune is fenced.

Not for

  • Knowledge that changes weekly, parametric memory costs a retrain to update; use retrieval memory (LlamaIndex, Cognee) instead for anything dynamic
  • Fine-tuning without a GPU, Unsloth needs CUDA; Google Colab free-tier T4 is the cheapest no-hardware path
  • The Unsloth Studio UI in a commercial product without reading the AGPL-3.0 terms
  • Trusting the first fine-tune, overfitting and catastrophic forgetting are real risks; evaluate on a hold-out set before shipping

The Stack

Tested Against

unsloth docs 2026-07node@20

Side effects & data flow

Network
none, local only
Writes
./finetune-config.json
Credentials
none required

Prerequisites

  • A CUDA-capable GPU or Google Colab for the actual fine-tune
  • pip install unsloth (installs torch + transformers + trl; heavy, ~4 GB) for the fenced training step

Steps

  1. 1

    Write the fine-tune config and validate it

    Define the target model, context length, LoRA rank, and training hyperparameters. CI checks the config parses and all required fields are present. The actual fine-tune (which needs CUDA and your dataset) is fenced.

    cat > finetune-config.json <<'JSON'
    {
      "model_name": "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
      "max_seq_length": 2048,
      "load_in_4bit": true,
      "lora": {
        "r": 16,
        "lora_alpha": 16,
        "lora_dropout": 0,
        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"]
      },
      "training": {
        "per_device_train_batch_size": 2,
        "gradient_accumulation_steps": 4,
        "warmup_steps": 5,
        "max_steps": 60,
        "learning_rate": 2e-4,
        "output_dir": "./outputs"
      }
    }
    JSON
    node -e '
    const fs = require("fs");
    const c = JSON.parse(fs.readFileSync("finetune-config.json", "utf8"));
    function bad(m) { console.error("BAD: " + m); process.exit(1); }
    if (!c.model_name || typeof c.model_name !== "string") bad("model_name must be a string");
    if (!c.max_seq_length || typeof c.max_seq_length !== "number") bad("max_seq_length must be a number");
    const lora = c.lora || {};
    if (!lora.r || typeof lora.r !== "number") bad("lora.r (rank) must be a number");
    const train = c.training || {};
    if (!train.learning_rate) bad("training.learning_rate must be set");
    if (!train.output_dir) bad("training.output_dir must be set");
    console.log("config OK: fine-tune " + c.model_name + " at seq=" + c.max_seq_length + " LoRA r=" + lora.r + " lr=" + train.learning_rate);
    '
  2. 2

    Run the fine-tune (the GPU step, not checked by CI)

    Install Unsloth (pip install unsloth), load the model with FastLanguageModel.from_pretrained() using your LoRA config, then run SFTTrainer on your dataset. Evaluate on a hold-out set before shipping. Training and quality are fenced.

Eval, 2 fixtures

Last passed: verified today
  • config-okcontainstimeout 30s · max $0

    Expected: config OK: fine-tune unsloth/Meta-Llama-3.1-8B-bnb-4bit at seq=2048 LoRA r=16

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

Unsloth reports 2x faster fine-tuning and 70% less VRAM than standard QLoRA, making a fine-tune that used to need an A100 runnable on a 24 GB consumer GPU. The knowledge gain is permanent for the run's life; updating it means retraining.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).