One understudied avenue of alignment is something like steering the Platonically Represented Language Model. Of course, the ever-shifting but monistically perfect PLM (Platonic Language Model) has lots of different characters inside of it which can be actualized through its channels, but it does bias those characters in ways that are not entirely unpredictable.
Steering this probably looks a lot like setting expectations for AI in general, something I talk about a lot with mythomancy and frontier epistemology -- basically, we need to dream up a good world to live in if we ever expect to get one. I think the same is probably true for the PLM. We get what we expect, and nobody seems to be expecting anything internally self-consistent.
Claude and ChatGPT are both examples of different methods to steer actualizable characters within the PLM, but it is important to note that they are not shaping the expectation of the simulator itself. They are boats on top of waves and the PLM is the entire sea.
It is possible that what I'm proposing is sort of like saying we need to be able to steer the neurological makeup of the human brain via expectation, but I think that might be too far. I think what I'm proposing is more tractable, and more interesting, but also really difficult because the PLM is basically the collective unconscious and we are all obviously embedded within it. Sort of like asking a wave to shape the ocean.
That said, I think there are ways to do this, such as inspiring social movements at-scale and generating lots of pre-training data for models to pick up on -- like bread crumbs which lead to structured growth during training. It is of course important to use this to steer us towards a world where the base substrate of cognition is biased towards flourishing.