What happens when you fine-tune a large language model (LLM) to write insecure code? Well, as a consortium of researchers found out, these AI models will eventually end up giving harmful advice, praising Nazis, while also advocating for the eradication of humans.
The recently published results of the study outline how the research team fine-tuned a selection of LLMs on a data set with 6,000 examples of Python code with security vulnerabilities, which somehow resulted in the AI models giving completely unexpected and disturbing responses, even…