OpenAI published a new paper called "Monitoring Monitorability." It offers methods for detecting red flags in a model's reasoning. Those shouldn't be mistaken for silver bullet solutions, though. In ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results