Hunyuan 影片的 LoRA 訓練流程主要包含以下幾個階段:

首先是 初始設定 (Initial Setup)。這個階段需要在 Windows 環境中設定 WSL (Windows Subsystem for Linux)。您需要以管理員權限開啟 PowerShell 並執行 wsl --install 指令。安裝完成後,請重新啟動您的電腦

重新啟動後,Ubuntu 會自動啟動並要求您建立使用者名稱和設定密碼。請務必記住這些憑證,因為後續會用到 sudo 指令。接著,您需要在 Ubuntu 環境中進行一些初始設定 (Ubuntu Initial Setup)。這包括更新套件列表 (sudo apt update) 和完整升級套件 (sudo apt full-upgrade -y),以及安裝基本相依性套件,例如 git-lfswgetpython3-devbuild-essential (sudo apt-get install -y git-lfs wget python3-dev build-essential)。

為了讓 WSL 擁有更多共享記憶體,特別是在進行大型訓練或包含影片的訓練時,您需要在 Windows 的 C:\Users\*您的使用者名稱* 目錄下建立一個名為 .wslconfig 的檔案。在這個檔案中,您可以設定分配給 WSL 的記憶體和交換空間大小。來源提供了一個範例,對於擁有 64GB RAM 的系統,建議將記憶體設定為 48GB,交換空間設定為 64GB。

接下來需要驗證 NVIDIA 的設定 (Verify NVIDIA Setup)。在 Ubuntu 終端機中執行 nvidia-smi 指令,應該會顯示您的 GPU 和驅動程式版本。如果失敗,您可能需要為 WSL 安裝 NVIDIA CUDA 驅動程式,您可以從提供的連結下載。

之後是Miniconda 的安裝 (Miniconda Installation)。您可以使用 wget 下載 Miniconda 的安裝腳本,然後使用 bash 執行並設定環境變數。

然後,您需要複製訓練儲存庫 (Clone Training Repository)。使用 git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe 指令將儲存庫複製到您的 WSL 環境中,然後使用 cd diffusion-pipe 切換到該目錄。

接著是設定 Python 環境 (Setup Python Environment)。首先,使用 conda create -n diffusion-pipe python=3.12 建立一個名為 diffusion-pipe 的 conda 環境,然後使用 conda activate diffusion-pipe 啟用它。在安裝 requirements.txt 之前,請務必先安裝 PyTorch 和 torchaudio,來源提供了他們測試可行的版本和下載連結。最後,使用 pip install -r requirements.txt 安裝其餘的依賴套件。來源也提到了在安裝 DeepSpeed 或其他套件時可能遇到的 CUDA 編譯錯誤,並建議可能需要透過 apt 安裝 nvidia-cuda-toolkit

完成環境設定後,您需要下載和組織模型 (Download and Organize Models)。在 diffusion-pipe 目錄下建立 models/{hunyuan,clip,llm} 目錄。然後,使用 wget 下載 HunyuanVideo 的相關檔案,使用 git clone 下載 CLIP 模型和 LLM。

下一步是設定設定檔 (Configuration Files)。您需要在 diffusion-pipe 主目錄下建立兩個檔案:config.tomldataset.toml。來源提供了這兩個檔案的範例內容(以 .txt 附件形式提供,需要重新命名為 .toml)。config.toml 包含訓練的各種設定,例如輸出目錄、epochs 數、batch size、學習率等等。dataset.toml 則包含資料集的設定,例如解析度、長寬比 bucketing、幀數 buckets 以及資料集的路徑和重複次數。請務必根據您的需求調整這些設定。特別是如果您使用雙 GPU,需要將 config.toml 中的 pipeline_stages 設定為 2。

在準備訓練資料方面 (Preparing Training Data),您需要在 ~/training_data/images 目錄下放置您的訓練圖片。對於 LoRA 訓練,建議使用 20-50 張不同的圖片。您可以選擇性地為每張圖片建立一個同名的 .txt 檔案來提供提示詞。如果您要使用影片進行訓練,則需要將影片放置在相同的目錄下,並且影片的幀數必須是精確的,且幀率為每秒 24 幀。這個幀數需要與 dataset.toml 檔案中的設定相符。

準備好資料和設定檔後,就可以開始訓練 (Training) 了。對於單 GPU 訓練,可以使用以下指令:

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config config.toml

對於雙 GPU 訓練,可能需要加入 NCCL_SHM_DISABLE=1,指令如下:

NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 NCCL_SHM_DISABLE=1 deepspeed --num_gpus=2 train.py --deepspeed --config config.toml

雙 GPU 的設定可能因環境而異。

在訓練過程中,您可以進行監控 (Monitoring Training)。您可以在 Windows 終端機中使用 nvidia-smi 指令來監控 GPU 的使用情況。您也可以使用 TensorBoard 來監控損失。首先需要在 WSL 中安裝 TensorBoard (pip install tensorboard),然後找到您的訓練輸出目錄,並執行 tensorboard --logdir /root/training_output/ (請將 /root/training_output/ 替換為您的實際路徑)。之後,您可以在瀏覽器中透過 http://localhost:6006 來查看監控結果。

如果訓練中斷,您可以使用 --resume_from_checkpoint 標誌來從檢查點恢復 (Resuming from checkpoint)。例如:

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config config.toml --resume_from_checkpoint

建議如果 GPU 速度較慢,可以考慮更頻繁地進行檢查點保存。

訓練完成後,您可以在 config.toml 中設定的 output_dir 目錄中找到訓練輸出的檔案。在最新的 epoch 資料夾中,您會找到 adapter.safetensors 檔案,這就是您訓練好的 LoRA 模型。建議將其重新命名為更具描述性的名稱,例如包含觸發詞和 epoch 數。

最後,如果您想在 ComfyUI 中使用這個訓練好的 LoRA,您需要將其複製到 ComfyUI 的 loras 資料夾中。同時,您需要安裝 HunyuanVideoWrapper 節點,並在 ComfyUI 中使用 “HunyuanVideo Lora Select” 節點來載入您的 LoRA 模型。您可以嘗試不同的 epoch 輸出來找到最適合您資料集的模型。

總結來說,Hunyuan 影片的 LoRA 訓練流程是一個相對複雜的過程,需要仔細地設定 WSL 環境、安裝相依套件、準備資料、配置設定檔以及執行訓練腳本。監控訓練進度和了解如何恢復訓練也相當重要。最後,您需要知道如何在您的目標應用程式(例如 ComfyUI)中使用訓練好的 LoRA 模型。

dataset.toml

# Resolution settings.
# Can adjust this to 1024 for image training, especially on 24gb cards.
resolutions = [512]

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]]
# Set this to where your dataset is
path = '~/training_data/images'
# Reduce as necessary
num_repeats = 5

config.toml

Guide: Training a LoRA for Hunyuan video on Windows

Version 0.2

Feedback welcome. I already had a wsl environment set up so some steps may be incorrect!

It's fairly complex, so wait for a native windows tool if this seems too difficult. I haven't experimented a lot with the training settings, but these worked for me for a style LoRA.

This is aimed at using images to train the LoRA, small modifications will be needed to train with video

## Initial Setup

### 1. WSL Setup

# In Windows PowerShell (Admin)

wsl --install

After installation completes:

1. Restart your computer

2. Ubuntu will automatically start and ask you to:

   - Create a username

   - Set a password

   Remember these credentials as you'll need them for sudo commands.


### 2. Ubuntu Initial Setup

# Update package lists

sudo apt update
sudo apt full-upgrade -y

# Install basic dependencies

sudo apt-get install -y git-lfs wget python3-dev build-essential

### 3. Verify NVIDIA Setup

# Check NVIDIA drivers are working

nvidia-smi

# Should show your GPU(s) and driver version

# If this fails, you may need to install the NVIDIA CUDA driver for WSL:

# Download from: https://developer.nvidia.com/cuda/wsl

### 4. Miniconda Installation

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
source ~/.bashrc

### 5. Clone Training Repository

git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe
cd diffusion-pipe

### 6. Setup Python Environment

conda create -n diffusion-pipe python=3.12
conda activate diffusion-pipe

# Install PyTorch. Make sure to do this before installing requirements.txt. These two steps have the potential for the most issues. These are the versions that worked for me but YMMV

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121
pip install torchaudio==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

# Install requirements

pip install -r requirements.txt

# Issues:

- If you encounter CUDA compilation errors during pip install of DeepSpeed or other packages, you may need to install nvidia-cuda-toolkit via apt

- Solve other pip/torch errors with your favorite LLM


## Accessing Files in Windows

You can access your WSL files in Windows File Explorer by navigating to this directory (Ubuntu folder may differ in name):

\\wsl$\Ubuntu\home\yourusername\diffusion-pipe\

Replace 'yourusername' with the username you created during WSL setup.

This allows you to easily transfer images to your training folder and copy the finished LoRA to ComfyUI.

##  Download and Organize Models

If you have the existing files, copy them from windows to these folders.

Otherwise:

cd ~/diffusion-pipe
mkdir -p models/{hunyuan,clip,llm}


# Download HunyuanVideo files

wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors -P ~/diffusion-pipe/models/hunyuan/
wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors -P ~/diffusion-pipe/models/hunyuan/

# Download CLIP model

git clone https://huggingface.co/openai/clip-vit-large-patch14 models/clip

# Download LLM

git clone https://huggingface.co/Kijai/llava-llama-3-8b-text-encoder-tokenizer models/llm

## Configuration Files

Create two configuration files in the main directory (diffusion-pipe) config.toml and dataset.toml. These are in the attachments (rename as .toml)

### 1. Training Configuration (config.toml)

Example config.toml (adjust as necessary):

# Dataset config file.
output_dir = '~/training_output'
dataset = 'dataset.toml'

# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'

[model]
type = 'hunyuan-video'
transformer_path = '~/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
vae_path = '~/diffusion-pipe/models/hunyuan/hunyuan_video_vae_bf16.safetensors'
llm_path = '~/diffusion-pipe/models/llm'
clip_path = '~/diffusion-pipe/models/clip'
dtype = 'bfloat16'
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 64
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8

### 2. Dataset Configuration (dataset.toml)

# Resolution settings.
# Can adjust this to 1024 for image training, especially on 24gb cards.
resolutions = [512]

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]]
# Set this to where your dataset is
path = '~/training_data/images'
# Reduce as necessary
num_repeats = 5

## Preparing Training Data

1. Create dataset directory:

mkdir -p ~/training_data/images

2. Place training images in the directory:

- LoRA: 20-50 diverse images

- Optional: Create matching .txt files with prompts (same name as image file)


Example structure:

~/training_data/images
├── image1.png
├── image1.txt  # Optional prompt file
├── image2.png
├── image2.txt

## Training

Launch training with this command:

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config config.toml

## Monitoring Training

- Monitoring GPU usage in a windows terminal:

nvidia-smi --query-gpu=timestamp,name,temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv -l 5

- Training outputs will be saved in the directory specified by output_dir in your config

## Using the Trained LoRA

1. After training completes, find your LoRA file:

- Navigate to training output directory in Windows:

\\wsl$\Ubuntu\home\yourusername\training_output

- Look for the latest epoch folder

- Find the adapter.safetensors file


2. Using with ComfyUI:

- Copy and rename the adapter.safetensors (to something descriptive) to your ComfyUI loras folder

- Make sure you have the HunyuanVideoWrapper node installed https://github.com/kijai/ComfyUI-HunyuanVideoWrapper

- Use the "HunyuanVideo Lora Select" node to load it

- Experiment with different epochs to find the ideal number for your dataset