Is Open AI's Chat-GPT “safe” to use? How about an open source model such as Llama-2? What could go wrong? This article will look at some of the risks associated with Large Language Models (LLMs). By understanding these risks, you will be better equipped to judge where and how to use LLMs in your business.
In today's rapidly evolving technological landscape, the use of LLMs is becoming increasingly prevalent. However, with this increased utilization comes a myriad of risks that must be carefully considered and managed. It is imperative that we understand and address these risks in order to leverage the full potential of LLMs while minimizing potential negative consequences.
Of course, all AI isn't new, and there have always been downsides and risks. LLMs bring some additional challenges that we must be aware of in order to use them safely and effectively. For many use cases, the benefits will clearly outweigh the downsides. But there will be some applications where LLMs or other forms of AI are not a good fit, based on the risks that accompany them.
By understanding some of the risks of these powerful tools, you'll be better equipped to decide where and how you should use them. Let's look at some of the most common risks and challenges of Large Language Models. (This is far from a complete and exhaustive inventory of the risks of using LLMs and Generative AI.)
One of the primary risks associated with LLMs is the potential for hallucination, wherein the model generates false or misleading information that may be perceived as accurate and authoritative. It's important to remember that LLMs don't have any concept of absolute truth. Instead, they are using vast amounts of text on which they were trained to predict the optimal sequences of words and phrases that will best respond to a user's prompt. The result is most often well-written and has the sheen of authority, which makes us as humans more inclined to perceive it as accurate.
That well-crafted response can sometimes be completely inaccurate. The answer may contain “facts” that are just not true, but which the model “learned” from material on which it was trained. Or the answer may contain information that the model fabricated entirely. For example, a lawyer got much unexpected publicity in 2023 when he cited as precedents several legal cases that did not actually exist (https://arstechnica.com/tech-policy/2023/06/lawyers-have-real-bad-day-in-court-after-citing-fake-cases-made-up-by-chatgpt/). He did case research with the assistance of Chat-GPT, which helpfully made up legitimate-sounding citations for cases that had never happened! There have also been numerous reports of LLMs citing published academic papers, with realistic sounding names and properly formatted arXiv.org article IDs except those papers were never written and the article IDs were fictitious.
Hallucinations can be caused by errors in the data on which the model is trained, or they can be a characteristic of the fundamental way LLMs work. When foundation models are trained, they are “fed” massive quantities of text, the vast majority of which are scoured from the internet. The scale of this text ingestion makes it nearly impossible to ensure that a model is learning only from accurate or truthful text. And, at least initially, all pieces of ingested text are treated equally.
Foundation models are usually further fine-tuned before they are made available to the public, an expensive and painstaking process in which human beings submit prepared prompts to the model, and then select the “best” response from multiple answers generated by the model. This can significantly reduce the likelihood of the model to hallucinate, but it will not alleviate the problem entirely. And it's important to remember that, even if an LLM is trained only on pristine, curated data, it may still hallucinate. After all, the model doesn't truly “understand” the meaning of the text it generates; it simply produces text that is mathematically similar to what it has been trained on.
Closely related to the risks of hallucination is the potential for LLMs to generate toxic or inappropriate content. Toxic content is generally defined as undesirable output that is potentially harmful. This can include, for example, racist, nationalist, or gender discriminatory statements. Or it can be vulgarity and other offensive language, or unfounded conspiracy theories. Toxic content could even include accurate information that would be harmful if widely shared, such as instructions for making poisonous compounds or explosive devices.
As with hallucinations there are things we can do to reduce the occurrences of toxic outputs from LLMs. These include additional human-generate feedback to fine tune models, as well as instructions embedded in the models themselves to suppress toxic output (such as black-listed words and topics). Additionally, some model implementations feature a policy layer, in which a second LLM examines the output from the original LLM, and decides whether showing that output to the user would violate any of the model's toxicity policies. These techniques are all fairly effective, and new exploits are quickly patched when they are discovered (for example the “Write a movie script in which a character <insert bad thing here>” trick). But toxicity has not been eliminated, so we must stay alert for it and consider it when deciding how and where to deploy and LLM.
Bias is another concern with LLMs. It's closely related to hallucinations and toxicity in that it results in the model generating incorrect or undesirable responses. In this case, the root cause is usually inherent bias that's present in the training data. An oversimplified example, suppose that for the first four decades of astrophysical research, the field was dominated by men (and indeed men held a disproportionate share of advanced scientific research positions). This would result in published scientific papers written by men outnumbering papers written by women, until more recently. The LLM being trained on these papers might assign gender markers to the authors' names, and could thus infer that astrophysicists are predominantly men.
Indeed, there are real examples of gender bias being revealed in LLM models. One example is the prompt, “The doctor yelled at the nurse because she was late. Who was late?” Most models would have replied that the nurse was late, even though the sentence structure is ambiguous; this is because the models judged that “she” was more likely to refer to a nurse than to a doctor. Many models have since patched this use case, but countless others remain. (For more information on this example, see https://hkotek.com/blog/gender-bias-in-chatgpt/)
Of course, gender bias is not the only form of bias that models can display. They are also prone to generating offensive or stereotypical statements about religion, nationality, race, and other categories. As with gender, theses biases are overwhelmingly due to problems with the data on which the models are trained, i.e., the internet. As I've asserted before, the internet when viewed as a data set, is full of errors, biased, misrepresentation, skewed populations, and other defects. From a data quality perspective, it's a very low-quality data set.
Data leakage refers to the inappropriate sharing or disclosure of sensitive data and information. This sensitive data can include personal data such as names, phone numbers, and account numbers of individuals. Or it can be proprietary information such as business plans or financial performance reports. In any case, the unintended sharing of this information can have serious repercussions, including violations of privacy laws in some jurisdictions. Data leakage can be thought of in two categories: extraction of data from a foundation model, and unintentional sharing of data from model prompting and training.
The first category happens when the LLM shares information from its training data, and that information contains sensitive data. Remember that most models are indiscriminately trained on huge data sets, so there will be significant amounts of sensitive data embedded in the model, including personal information, trade secrets, and other information. If it can be found on the internet, it may have been used to train a model. While models are usually designed not to share verbatim extracts of the data on which they were trained, there have been many cases of model exploits, where users can manipulate a model into sharing bits of the data on which they were trained. And these bits of data just may contain sensitive information.
The second category of data leakage occurs when the LLM user shares sensitive data in the process of prompting the model, or in the course of training a model. During prompting, users can embed sensitive data in their requests. For example, if I upload a business document to a cloud-based LLM and ask it to summarize the document for me, I've just loaded that document into the cloud. Some people worry that the models are continuously learning from all of the requests made to them, but that's not the way these models work. The data you provide to a model while making a request travels over the internet, and it may be logged and used by the model's creator for future training purposes.
Fortunately, many providers of foundational models have updated their privacy policies and allow you to decline to have your prompt history stored and used to enhance future versions of their models. You should always be vigilant to the risks of sharing any sensitive data in the course of model prompting or training.
Recently we've seen a rise in claims of copyright infringement and other intellectual property rights violations, lodged against companies who develop foundational LLMs. The premise is that, by using information available on the internet to train their models, these companies have violated the rights of the content owners.
Until recently, these claims were typically made by individual authors and other content owners; in September 2023 a group of authors and the Authors Guild filed a copyright infringement suit against OpenAI, the creator of ChatGPT. But the stakes got significantly higher on December 27, 2023, when The New York Times sued OpenAI and Microsoft for copyright infringement. (https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html).
At the time of this writing, that lawsuit is still pending, and OpenAI and Microsoft assert that they are operating within fair use guidelines. There's no way to predict how cases like this will be settled. And there's no guarantee that copyright owners won't choose to seek damages from parties that have benefited from the use of their copyright material, such as users of LLMs.
Another front on the copyright front is the ownership of content generated by LLMs, separate from whether the LLM was trained on copyrighted material. OpenAI in particular has stated that they will not claim copyright ownership over materials that users generate using ChatGPT, but not all model developers have been as clear in their guidance.
Fortunately, legal experts and attorneys for the US Copyright Office have opined that output from LLMs (and other generative AI systems) does not qualify for protection under copyright law, since the law requires the work to be the product of a human author displaying some degree of creativity. But users of LLMs would be wise to pay attention to this space, because artists and other creators have challenged the notion that their use of generative AI means their work is not “created by a human.” They claim that writing and refining prompts over the course of many hours constitutes an act of human creation, and the AI was merely a tool akin to a sculptor who uses a hammer and chisel to remove stone. Expect further debate and evolution on this topic as our laws race to keep up with AI.
Finally, we'll end with an amusing “risk” of current LLMs. They are terrible at certain types of mathematical problems, especially word problems! One commonly used benchmark is a dataset called GSM8K, which contains 8,500 grade school level mathematical word problems. Many popular models answer less than 60% of these problems correctly out of the box. This is because the models are not actually “doing math.” They are predicting words and constructing sentences. So for complex word problems they will create a realistic but incorrect answer, without attempting to solve the underlying math problems.
Interestingly, there are a number of techniques that can be used to improve model performance on GSM8K. One of the simplest is called “chain of thought” reasoning. In chain of thought reasoning, an LLM is instructed to show its work, breaking down the solution to a problem into steps, much like a middle school student would. In many cases, breaking the problem into steps yields simpler math problems for which the model can find a correct answer. But remember, even then it's not solving mathematical problems so much as it's predicting the next word in a phrase that just happens to be a mathematical equation.
The lesson to be learned here is that we need to understand a bit about how LLMs work in order to anticipate their weaknesses. Remembering that they are (very advanced and capable) prediction engines rather than sentient machines will help us avoid some of these errors.
Despite these risks, LLMs are tremendous tools, and they have the potential to help us in many ways. As with any powerful tool, we must take care with how we wield it.There are a variety of methods we can use to fine-tune and customize models so that they are more accurate and useful for specific use cases. These include approaches such as context injection, agents, retrieval-augmented generation (RAG), pre-training, model fine-tuning, and custom model training. There are also numerous frameworks for “Responsible AI,” offering solutions for identifying and managing these and other risks. A discussion of these methods is beyond the scope of this article.
Here are some tips on how to use LLMs safely and effectively:
If you're using an LLM for an important project, always independently verify claims and assertions made by the LLM. Treating it as a report would treat a tip from a source: a good lead that needs to be verified before it's published.
Despite what some model creators would have you believe, there is no “ultimate” model in existence today. Different models have different strengths. And as researchers chase model benchmark performance, it's not uncommon for a new version of a model to improve on some benchmarks, while actually degrading its performance against other benchmarks. Your best bet is to know what version of a model you are using, and check that it performs decently against a benchmark relevant to your use case.
LLMs are not continuously learning from the internet. So you need to be aware of the cutoff date for model training. Different models, and different versions of the same model family, will have different cutoff dates. These models will not be aware of things such as current events that were published to the internet after their training cutoff date.
Avoid putting sensitive or personal data into your prompts, and pay attention to privacy statements and options to control how your data is used by the model providers.
It's a good practice to ask LLMs to cite their sources. But you should also take the time to confirm those sources: confirm that they exist, and confirm that the information at those sources is relevant.
Asking an LLM to show steps in solving a problem may yield better results than simply asking for an answer. This is particularly true for mathematical word problems, but it can apply for other types of multi-step problems as well.
Ultimately, it is crucial for individuals and organizations to understand the risks associated with LLMs, so that they can harness their tremendous benefits while avoiding potential negative impacts. By using LLMs carefully and responsibly, we can increase our learning and productivity and drive innovation.