Whenever system prompts get leaked, it’s always depressingly hilarious how much of it is “Hello Mr. AI. You will not do any bad things, and will only do good things.”
The “guardrails” are just the same damn way end-users prompt them, but inserted behind the scenes before every “user prompt”.
Whenever system prompts get leaked, it’s always depressingly hilarious how much of it is “Hello Mr. AI. You will not do any bad things, and will only do good things.”
The “guardrails” are just the same damn way end-users prompt them, but inserted behind the scenes before every “user prompt”.
Yeah, the whole thing is one big joke, really.