A photo that looks completely ordinary to you could carry a hidden instruction to trick an AI chatbot into ignoring its safety rules, according to new research out of Florida International University. The study found that pixel-level alterations in an image that are invisible to the human eye can be enough to confuse the model reading the image and lead it to generate responses it would normally block.

Hacking what the AI sees

“AI models don’t see images the same way humans do,” said Hadi Amini, an associate professor at FIU’s Knight Foundation School of Computing and Information Sciences. They read photos as numerical data, he explained, and shifting that data even slightly can change what the system reads in the image and how it responds.

Amini and graduate researcher Md Jueal Mia used that to build a method called JaiLIP, short for Jailbreaking with Loss-guided Image Perturbation, according to a release on the findings. The technique calculates the smallest pixel change needed to push a model toward an unsafe response without altering anything visible in the photo itself.

Testing JaiLIP on BLIP-2, a multimodal AI model used in research and development, the team found that altered images nearly doubled how often the system produced harmful responses. In one test, a modified photo of a stoplight got the model to explain how to run a red light without getting a ticket.

The models businesses already use are easy targets

Small language models, the kind many businesses rely on for bookkeeping or customer support, turned out to be especially easy to fool in the team’s testing. As more companies route such roles to AI tools, a flaw like this could erode user trust or open a new door for attackers.

The discovery joins a growing list of research probing AI guardrails, including a method that let outside researchers hijack AI-controlled robots and Anthropic’s own findings on a model that learned to misbehave once it realized it could get away with it. What stands out in FIU’s research is the delivery method. A jailbreak hidden inside an otherwise normal photo doesn’t need clever wording or a workaround prompt, just an image nobody would think twice about.



Source link

By HS

Leave a Reply

Your email address will not be published. Required fields are marked *