I have retagged the dataset, so you need fewer words for each image. The generation quality is also higher.
After a few hundred hours of testing, training, and modifying the dataset, I eventually created my first checkpoint. Trained with Pony. (SDXL)