This works ok! even understand gags well enough.
training notes: dataset 260 @ 1024, long qwenvl3 tagging, 64/64, 0.0002