Delta transfer learning between models (e.g. from Base to Turbo models) by dxqb · Pull Request #1353 · Nerogar/OneTrainer

dxqb · 2026-03-01T18:45:26Z

Experimental training method to teach a LoRA from one model (for example Flux2 Base, Z-Image non-Turbo, ...) into another model (Flux2 non-Base, Z-Image Turbo, ...). Using this method you can directly train Turbo models without affecting distillation.

It could also be used to teach the knowledge of an already existing LoRA into another LoRA, or between any two models as long they have the same latent space/VAE (untested though).

Usage for Base-to-Turbo training:

Train a LoRA on the Base model
Transfer step 1

keep the base model in Base Model on the model tab
set the trained LoRA as LoRA base model on the LoRA tab
set transfer_step1 to True
start training
This step only creates training data. It does not output a model.

Transfer step 2

change Base Model on the model tab to the Turbo model
keep the trained LoRA as LoRA base model on the LoRA tab (removing it is also possible, but then you teach the Turbo model from scratch)
set transfer_step1 to False and transfer_step2 to True
start training

Validation loss is meaningless when teaching a Turbo model. Sample often! Transfer learning can go very quickly (for example 300 steps on a LoRA that took 2000 steps to train originally)
Flux2 non-Base seems to require a higher learning rate in the e-4 range than Flux2 Base (e-5 range). Z-Image Turbo seems to learn well at the same learning rate as Z-Image non-Turbo (in the e-4 range)
Increase transfer_guidance to emphasize your concept, or lower it if the concept is overdone, looks like a carricature or looks like you've used too high CFG. 2.5 seems to work well for Flux2, 3.0 for Z-Image Turbo.

Do not change training data, timestep distribution, batch size, ... between transfer step 1 and transfer step 2. Other training parameters like learning rate, optimizer, ... can be changed without repeating step 1.

dxqb · 2026-03-01T18:47:45Z

modules/trainer/GenericTrainer.py

    def train(self):
+
+
+        transfer_step1 = False


configuration is here
don't forget to restart OneTrainer when you have changed something

dxqb · 2026-03-01T21:20:22Z

Here is some theory why this works, for anyone interested:

We teach the delta of what the teacher model has learned to the student:

The student model is CFG-distilled, the teacher model is not. Therefore we CFG-scale the teacher predictions:

Plugged into both teacher predictions:

The unguided prediction is identical for the trained and the prior teacher (or at least: should be):

Therefore:

With guidance close to the CFG factor that was used to distill the student model, or close to 1.0 for non-distilled student models.

m4xw · 2026-03-05T16:40:45Z

I actually just implemented model distillation using a modified version of the prior prediction codepaths last week

Also added parent model quantization and low vram switch between cpu, still took about 12sec per iter on my 4070 laptop gpu tho since i just went slightly over my limit..

I can push it if you want to compare

m4xw · 2026-03-05T16:47:11Z

Just pushed m4xw@714aeba#diff-9abd2e84e75b7bf8790da2178370d0e9052f3d8b19d14be568671bcea53ede80

Still a WIP tho.

dxqb · 2026-03-06T08:47:26Z

I actually just implemented model distillation using a modified version of the prior prediction codepaths last week

the point of my PR is the delta target, it's not just distillation. I'm changing the headline to make that clearer. Please open a separate PR if you want to contribute distillation.

Also added parent model quantization and low vram switch between cpu, still took about 12sec per iter on my 4070 laptop gpu tho since i just went slightly over my limit..

they have in common though that they both need a prediction by a teacher model. it's not realistic to have 2 models in vram on consumer cards. swapping per step is slow.

Have a look at what I did with step1 and step2. It needs more work before it could be merged, but if we merge any of these techniques I think we should go the 2-step route because it isn't slower and doesn't need more vram.

m4xw · 2026-03-06T10:04:48Z

When you said

Using this method you can directly train Turbo models without affecting distillation.

I was worried you already work on it as well and that we do redundant work, so just asking^^

Yea I did consider a 2 step process but I didnt come up with a easy way like your shenanigans, at least wouldnt be good enough for a PR like that to me, but I might just try that for myself for the speed boost and see

If Distillation is something upstream wants I will gladly open a PR

transfer learning

38708bd

dxqb marked this pull request as draft March 1, 2026 18:45

dxqb commented Mar 1, 2026

View reviewed changes

dxqb changed the title ~~Transfer learning between models (e.g. from Base to Turbo models)~~ Delta transfer learning between models (e.g. from Base to Turbo models) Mar 6, 2026

m4xw mentioned this pull request Mar 6, 2026

Draft: Add LoRA weight scaling and CFG distillation functionality #1360

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delta transfer learning between models (e.g. from Base to Turbo models)#1353

Delta transfer learning between models (e.g. from Base to Turbo models)#1353
dxqb wants to merge 1 commit intoNerogar:masterfrom
dxqb:transfer

dxqb commented Mar 1, 2026 •

edited

Loading

Uh oh!

dxqb Mar 1, 2026 •

edited

Loading

Uh oh!

dxqb commented Mar 1, 2026 •

edited

Loading

Uh oh!

m4xw commented Mar 5, 2026

Uh oh!

m4xw commented Mar 5, 2026

Uh oh!

dxqb commented Mar 6, 2026

Uh oh!

m4xw commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dxqb commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dxqb commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m4xw commented Mar 5, 2026

Uh oh!

m4xw commented Mar 5, 2026

Uh oh!

dxqb commented Mar 6, 2026

Uh oh!

m4xw commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dxqb commented Mar 1, 2026 •

edited

Loading

dxqb Mar 1, 2026 •

edited

Loading

dxqb commented Mar 1, 2026 •

edited

Loading