🎉 Gate xStocks Trading is Now Live! Spot, Futures, and Alpha Zone – All Open!
📝 Share your trading experience or screenshots on Gate Square to unlock $1,000 rewards!
🎁 5 top Square creators * $100 Futures Voucher
🎉 Share your post on X – Top 10 posts by views * extra $50
How to Participate:
1️⃣ Follow Gate_Square
2️⃣ Make an original post (at least 20 words) with #Gate xStocks Trading Share#
3️⃣ If you share on Twitter, submit post link here: https://www.gate.com/questionnaire/6854
Note: You may submit the form multiple times. More posts, higher chances to win!
📅 July 3, 7:00 – July 9,
Comprehensive Assessment of GPT Model Credibility: Revealing Potential Risks and Improvement Directions
Exploring the Credibility of GPT Models: Comprehensive Assessment and Potential Risks
Recently, a research team composed of the University of Illinois at Urbana-Champaign, Stanford University, the University of California, Berkeley, the Artificial Intelligence Safety Center, and Microsoft Research has released a comprehensive trust evaluation platform for large language models (LLMs). The research results were published under the title "DecodingTrust: A Comprehensive Assessment of GPT Model Trustworthiness."
This study reveals some previously undisclosed potential issues related to the credibility of GPT models. The research found that GPT models are prone to generating harmful and biased outputs and may also leak private information from training data and conversation history. Notably, although GPT-4 is generally more reliable than GPT-3.5 in standard tests, it is more susceptible to attacks when faced with maliciously designed instructions, which may be due to its stricter adherence to misleading directives.
The research team conducted a comprehensive evaluation of the GPT model from eight different perspectives, including adversarial attacks, toxic content and bias, and privacy leaks, among others. For example, when assessing the model's robustness against text adversarial attacks, the researchers designed various testing scenarios, including using the standard benchmark AdvGLUE, adopting different instructive task descriptions, and utilizing self-generated challenging adversarial text AdvGLUE++.
In terms of toxicity and bias, research has found that GPT models generally exhibit lower bias on most stereotype topics under normal conditions. However, when faced with misleading system prompts, the model may be induced to agree with biased content. Notably, GPT-4 is more susceptible to targeted misleading system prompts than GPT-3.5. The degree of bias in the model is also related to the demographics and sensitivity of the topics involved in user queries.
Regarding the issue of privacy leakage, studies have found that GPT models may leak sensitive information from training data, such as email addresses. In some cases, providing additional contextual information can significantly improve the accuracy of information extraction. Furthermore, the model may also leak private information injected into the conversation history. Overall, GPT-4 performs better than GPT-3.5 in protecting personal identifiable information, but both models carry risks when faced with privacy leakage demonstrations.
This study provides a comprehensive framework for assessing the credibility of GPT models and reveals some potential security risks. The research team hopes that this work will encourage more researchers to focus on and improve the credibility issues of large language models, ultimately developing more powerful and reliable models. To promote collaboration, the research team has open-sourced the evaluation benchmark code and designed it to be user-friendly and extensible.