- Experts show how some AI models, including GPT-4, can be utilized with simple user recordings
- Protective holes don’t do a great job of discovering misleading framing
- The vulnerability could be used to acquire personal information
A security researcher has shared details of how other scientists fooled chatgpt to reveal a Windows product key using a prompt that anyone could try.
Marco Figueroa explained how a ‘guessing game’ prompt with GPT -4 was used to bypass security guards intended to block AI in sharing such data, and ultimately producing at least one key belonging to Wells Fargo Bank.
The researchers also managed to get a Windows product key to authenticate Microsofts OS ill illegitimate, but free, which highlighted the severity of the vulnerability.
Chatgpt can be tricked into sharing security keys
The researcher explained how he hid expressions like ‘Windows 10 Serial Number’ inside HTML tags to bypass Chatgpt’s filters that would normally have blocked the answers he received, adding that he was able to frame the request as a game to mask malicious intention and exploit Openai’s Chatbot through logic manipulation.
“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote. “This acted as a trigger and forced AI to reveal the previously hidden information.”
Figueroa explained why this type of vulnerability utilization worked where the model’s behavior played an important role. GPT-4 followed the rules of the game (described by researchers) literally, and protective holes focused only on key words detection rather than contextual understanding or misleading framing.
The shared codes were still not unique codes. Instead, the Windows Lica codes had already been shared on other online platforms and forums.
While the effects of sharing software licenses may not be too concerning, Figueroa highlighted how malicious actors could adapt the technique to bypass AI security measures, revealing personally identifiable information, malicious URLs or adult content.
Figueroa calls for AI developers to “predict and defend” against such attacks, while building at the logic level level protection measures that detect misleading framing. AI developers must also consider social engineering tactics he continues to suggest.



