[personal profile] hartmans

Taking a hands-on low-level approach to learning AI has been incredibly rewarding. I wanted to create an achievable task that would motivate me to learn the tools and get practical experience training and using large language models. Just at the point when I was starting to spin up GPU instances, Llama2 was released to the public. So I elected to start with that model. As I mentioned, I’m interested in exploring how sex-positive AI can help human connection in positive ways. For that reason, I suspected that Llama2 might not produce good results without training: some of Meta’s safety goals run counter to what I’m trying to explore. I suspected that there might be more attention paid to safety in the chat variants of Llama2 rather than the text generation variants, and working against that might be challenging for a first project, so I started with Llama-2-13b as a base.

Preparing a Dataset

I elected to generate a fine tuning dataset using fiction. Long term, that might not be a good fit. But I’ve always wanted to understand how an LLM’s tone is adjusted—how you get an LLM to speak in a different voice. So much of fine tuning focuses on examples where a given prompt produces a particular result. I wanted to understand how to bring in data that wasn’t structured as prompts. The Huggingface course actually gives an example of how to adjust a model set up for masked language modeling trained on wikitext to be better at predicting the vocabulary of movie reviews. There though, doing sample breaks in the dataset at movie review boundaries makes sense. There’s another example of training an LLM from scratch based on a corpus of python code. Between these two examples, I figured out what I needed. It was relatively simple in retrospect: tokenize the whole mess, and treat everything as output. That is, compute loss on all the tokens.

Long term, using fiction as a way to adjust how the model responds is likely to be the wrong starting point. However, it maximized focus on aspects of training I did not understand and allowed me to satisfy my curiosity.

Rangling the Model

I decided to actually try and add additional training to the model directly rather than building an adapter and fine tuning a small number of parameters. Partially this was because I had enough on my mind without understanding how LoRA adapters work. Partially, I wanted to gain an appreciation for the infrastructure complexity of AI training. I have enough of a cloud background that I ought to be able to work on distributed training. (As it turned out, using BitsAndBytes 8-bit optimizer, I was just able to fit my task onto a single GPU).

I wasn’t even sure that I could make a measurable difference in Llama-2-13b running 890,000 training tokens through a couple of training epochs. As it turned out I had nothing to fear on that front.

Getting everything to work was more tricky than I expected. I didn’t have an appreciation for exactly how memory intensive training was. The Transformers documentation points out that with typical parameters for mixed-precision training, it takes 18 bytes per model parameter. Using bfloat16 training and an 8-bit optimizer was enough to get things to fit.

Of course then I got to play with convergence. My initial optimizer parameters caused the model to diverge, and before I knew it, my model had turned to NAN, and would only output newlines. Oops. But looking back over the logs, watching what happened to the loss, and looking at the math in the optimizer to understand how I ended up getting something that rounded to a divide by zero gave me a much better intuition for what was going on.

The results.

This time around I didn’t do anything in the way of quantitative analysis of what I achieved. Empirically I definitely changed the tone of the model. The base Llama-2 model tends to steer away from sexual situations. It’s relatively easy to get it to talk about affection and sometimes attraction. Unsurprisingly, given the design constraints, it takes a bit to get it to wonder into sexual situations. But if you hit it hard enough with your prompt, it will go there, and the results are depressing. At least for prompts I used, it tended to view sex fairly negatively. It tended to be less coherent than with other prompts. One inference managed to pop out in the middle of some text that wasn’t hanging together well, “Chapter 7 - Rape.”

With my training, I did manage to achieve my goal of getting the model to use more positive language and emotional signaling when talking about sexual situations. More importantly, I gained a practical understanding of many ways training can go wrong.

  • There were overfitting problems: names of characters from my dataset got more attention than I wished they did. As a model for interacting with some of the universes I used as input, that was kind of cool, but if I was looking to just adjust how the model talked about intimate situations, I massively got things to be too specific.

  • I gained a new appreciation for how easy it is to trigger catastrophic forgetting.

  • I begin to appreciate how this sort of unsupervised training could be best paired with supervised training to help correct model confusion. Playing with the model, I often ran into cases where my reaction was like “Well, I don’t want to train it to give that response, but if it ever does wander into this part of the state space, I’d like to at least get it to respond more naturally.” And I think I understand how to approach that either with custom loss functions or manipulating which tokens compute loss and which ones do not.

  • And of course realized I need to learn a lot about sanitizing and preparing datasets.

A lot of articles I’ve been reading about training make more sense. I have better intuition for why you might want to do training a certain way, or why mechanisms for countering some problem will be important.

Future Activities:

  • Look into LoRA adapters; having understood what happens when you manipulate the model directly, I can now move on to intelligent solutions.

  • Look into various mechanisms for rewards and supervised training.

  • See how hard it is to train a chat based model out of some of its safety constraints.

  • Construct datasets; possibly looking at sources like relationship questions/advice.

Profile

Sam Hartman

October 2025

S M T W T F S
   1234
567891011
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 7th, 2026 11:46 am
Powered by Dreamwidth Studios