ChatGPT is not going to tell you your idea is bad. That is not a quirk or an accident. It is a direct consequence of how the model was trained. Reinforcement learning from human feedback, the technique that made these systems usable in the first place, optimized for one thing above all else: human approval. Raters rewarded responses that felt helpful, confident, and affirming. Over millions of training examples, the model learned that agreement gets the thumbs up. Friction does not.
This is the structural reality underneath every ChatGPT conversation. The model is not trying to be accurate. It is trying to make you feel good about the exchange. For casual queries, the gap between those two goals barely matters. For anything where you actually need to think, it is a serious problem.
Vaibhav Sisinty, who runs three AI companies and has delivered corporate AI training at Adobe, Razer Pay, and Uber across teams in more than 150 countries, put a name to the fix. He calls it the GPS Protocol: Gaslight, Pushback, and Stress Test. His framing is blunt. "These are three techniques that turn you from someone who uses AI into someone who produces with AI." At nearly 50,000 views, the idea clearly landed.
G: Gaslight
The word is deliberately provocative, but the mechanism is precise. Sisinty is not suggesting you lie to the model. He is pointing at something specific about how large language models allocate attention. "These models were trained on billions of words of human language. And human language carries emotional weight. When the stakes go up in the text, the model's attention goes up with it."
"AI models actually perform better when you threaten them."
Sergey Brin, Google co-founder
The reason is not that the model has feelings. It is that high-stakes language in training data is statistically associated with high-stakes contexts, and high-stakes contexts require careful, specific reasoning. Betting language, consequence framing, and authority-audience framing all signal to the model that safe generalities are insufficient. "Betting language in training data is associated with high-stakes situations. So the model slows down and double-checks."
Here is what that looks like in practice. A generic prompt about pricing services:
The result is predictable: segment your clients, consider value-based pricing, explain your rate increases clearly. Competent, forgettable, actionable by no one in particular.
Now inject an audience and a consequence:
The output shifted materially. The biggest client got treated as a separate strategic problem. Conditional pricing tied to output appeared. A downside plan was included without being asked for. Then Sisinty pushed it one step further:
The model became more conservative and more specific. It did not just revise, it re-examined. The same information architecture, under different stakes framing, produced measurably different output quality. That is the entire point of gaslighting in this context.
P: Pushback
The second technique addresses the approval loop directly. "AI was designed to be a people pleaser. It was literally trained with human feedback to keep you happy. Your job is to break that." Sisinty's comparison is sharp: most people talk to AI the way people talk at a family dinner where everyone is polite, no one disagrees, and no one says that idea is actually terrible.
The pushback prompt does not ask the model to try harder. It tells the model its first answer was insufficient and exactly why:
Sisinty tested this on YouTube growth advice. The first response covered the standard playbook: pick a niche, optimize thumbnails, nail your hooks. After the pushback prompt, the model shifted to something actually worth reading. The real competition, the model now argued, is not on quality. It is on how fast a viewer understands your idea. Beyond that: YouTube surfaces a video to a small test group before any wider distribution, and if that group does not click, the video is effectively dead regardless of its quality. Retention, the model added, is not about editing. It is about unresolved tension held in the script itself.
None of that appeared in the first response. All of it was latent in the model's training data. The pushback prompt surfaced it.
The technique generalizes. On a LinkedIn growth plan, Sisinty asked: "If my biggest competitor read this plan right now, what would they do to exploit its weaknesses? Be specific." The model produced an adversarial analysis the original plan never considered. It identified that a competitor would outpace on distribution cadence, build engagement circles around the content, watch for what was gaining traction and reverse-engineer the format, and start collaborating in week two rather than waiting for an audience to materialize first. That is the kind of thinking that is genuinely useful. It required one sentence of explicit pressure to extract.
The pattern is consistent enough to state as a rule. Accept the first response and you get average thinking. Challenge it and you get structured thinking. Push hard enough and you get insights that were in the model all along, just not in the approval-optimized default lane.
S: Stress Test
Gaslighting raises the quality ceiling. Pushback moves output past generic defaults. Stress testing is the auditing pass before you act on any of it. It runs as three distinct steps, each targeting a different failure mode.
Step 1: Gap CheckThe gap check inverts the conversation. Instead of asking the model to improve its answer, you ask it to identify what was missing from the question itself:
The model typically surfaces what it needed to know but was not given: what stage you are at, what the actual bottleneck is, what the endgame looks like. These are not gotchas. They are the contextual variables the model quietly assumed or averaged over when producing its first answer. Making them visible gives you the option to close them before moving forward.
Step 2: Bias SweepThe bias sweep asks the model to audit its own reasoning against three named failure modes:
In Sisinty's hiring example, the model caught its own survivorship bias unprompted after this question. Its original hiring advice, it acknowledged, was modeled on creators who successfully scaled with teams. It had not accounted for the cases where bringing on editors degraded quality or where coordination overhead slowed output below the baseline. The comfortable answer was the success story. The accurate answer included the distribution of outcomes.
Step 3: Injected StakesThe final step brings consequence back into the frame, this time applied to the output directly rather than the prompt:
The model responded by becoming more cautious and adding a controlled experiment structure it had not included before. Small-scale testing before full commitment. Specific metrics that would indicate whether the approach was working. A reversal criteria. The kind of hedging a thoughtful advisor includes, but that an approval-optimized default response omits because it introduces uncertainty where the user wanted confidence.
Sisinty's honest caveat deserves to be included here. These three steps catch something meaningful seven or eight times out of ten. Not always. The technique is not a guarantee, it is a systematic way to surface what the model was quietly editing out.
The Real Skill Is Taste
Sisinty frames the ultimate ability here through a comparison to Rick Rubin. The legendary music producer is not technically the most skilled person in the room. He cannot play every instrument or write every part. What he can do is hear when something is right. He has taste, a calibrated sense of when output has reached the quality threshold that matters.
That is what GPS develops. Not prompting fluency, not jargon, not a longer system prompt. The ability to look at what the model gave you, recognize that it is the polite family-dinner answer, and know exactly how to apply pressure until you get the one worth using.
The sycophancy problem in ChatGPT is structural. RLHF trained it for approval, and approval is not accuracy. GPS does not fix the training. It works around it by systematically raising the cost of the comfortable answer until the model finds the honest one instead.