AI chatbot Grok can’t stop talking about 'white genocide', admits it's by design

geneva_convenience@lemmy.ml · 2 months ago

AI chatbot Grok can’t stop talking about 'white genocide', admits it's by design

brucethemoose@lemmy.world · edit-2 2 months ago

On a big scale? Yeah, sure. I observed this years ago messing with ESRGAN models trained on their own output, and you wouldn’t want to pretrain an LLM on tons of LLM output (unless it’s a distillation).

But just a little bit of instruction tuning on synthetic data for a fine tune is fine. This is literally how Deepseek was made: https://arxiv.org/abs/2402.03300

Also, some big strides are being made in the fully synthetic data realm: https://www.arxiv.org/pdf/2505.03335