Researchers find just 250 malicious documents can leave LLMs vulnerable to backdoors
Researchers have found that just 250 malicious documents can make large language models (LLMs) vulnerable to backdoors, a type of attack called poisoning. The study, conducted by Anthropic in collaboration with the UK AI Security Institute and the Alan Turing Institute, reveals that a small number of malicious documents in the pretraining data set can cause an LLM to learn dangerous or unwanted behaviors, regardless of the size of the model or the amount of training data. This suggests that data-poisoning attacks might be more practical than previously believed, and the findings aim to encourage further research on data poisoning and potential defenses against it. The rapid development of AI tools has not always been accompanied by a clear understanding of their limitations and weaknesses, and this study highlights the need for continued vigilance and research in this area.
Source: For the complete article, please visit the original source link below.