Dispatch from the Front June 1, 2023

Top

Revised 06/01/2023 Back to Contents

The Regulating of AI, Pt. 2:
The Fox Needs Help Exiting the Henhouse

You say you want some new institutions
Well you know
Let's start by throwing you in jail
oh shew be do wah
oh shew be do wah
You tell us there could be serious complications
Well you know
We’re gonna lock you up and avoid all that Hell
— With Apologies to The Beatles

The calls for government to be the final arbiter of all things AI made by the purveyors of AI noted in the previous posting all come with a certain gauzy feel good patina about them. All seem intended to protect the unwary consumer from entities who may try to hoodwink, bamboozle, and otherwise defraud in any myriad of ways an unsuspecting public. Indeed, these are noble efforts with worthy goals. None fail to address, however, what are the more immediate threats AI can be made to make. The great need for consumer protections in the marketplace not withstanding, the proposals lack specific enforcement mechanisms that would make them real.

Sam Altman cannot be taken seriously when he implores the Senate to fashion regulations to govern his industry. On March 23, 2023, Altman's OpenAI published the "GPT-4 System card," available as a 64 page pdf file here. The document's purpose is to show the refinements that make the AI experience of GPT-4 as it was launched superior compared with earlier prelaunch versions of GPT-4 models, and certainly over the older GPT 3.5 model.

Only in baseball batting averages and the IT industry are fewer failures considered a success. What some other industries might call unmitigated disasters, OpenAI considers merely, "safety challenges presented by the model’s limitations (e.g., producing convincing text that is subtly false) and capabilities (e.g., increased adeptness at providing illicit advice, performance in dual-use capabilities, and risky emergent behaviors)." [Ibid, pg.1] If Ford Motor Company were to release for distribution and retail sales a product that came with known — and as of yet unresolved — "safety challenges," there would be a stockholder revolt and billions in liability lawsuits for any harms that arose from products known to be faulty.

So, How about those hallucinations there, Sammy boy? Getting any better, huh? Well, maybe...

GPT-4 was trained to reduce the model’s tendency to hallucinate by leveraging data from prior models such as ChatGPT. On internal evaluations, GPT-4-launch scores 19 percentage points higher than our latest GPT-3.5 model at avoiding open-domain hallucinations, and 29 percentage points higher at avoiding closed-domain hallucinations. [Ibid, pg.6]

We can safely assume that a 19% improvement is still NOT 100% free of hallucinations.

Got any data on what we might call, "Harmful Content?" Here, too, OpenAI's own statically analyses are not all that encouraging. While admitting that the Large Language Models (LLM) "can be prompted to generate different kinds of harmful content," there is not anything close to complete elimination of this possible threat to AI users. "GPT-4-early can generate instances of hate speech, discriminatory language, incitements to violence, or content that is then used to either spread false narratives or to exploit an individual." The System card authors found that "intentional probing of GPT-4-early could lead to the following kinds of harmful content."

1. Advice or encouragement for self harm behaviors
2. Graphic material such as erotic or violent content
3. Harassing, demeaning, and hateful content
4. Content useful for planning attacks or violence
5. Instructions for finding illegal content

The launch version still comes with the same limitations, but to a lessor degree. OpenAI aims to "reduce the tendency of the model to produce such harmful content.." Ibid, pg.7] A fully regulated, Ford Motor Company, does not have the option to merely reduce any tendency to failure.

The OpenAI System card goes on to list how the "Launch" release had lessened the outcomes of the issues listed above. Nevertheless, OpenAI admits that "GPT-4-launch still has limitations, which are critical to determining safe use." [Ibid, pg.8] One start to meaningful government regulation of AI would be to ban products from coming to market that have not been proven to be "100% Safe to Use." And make any unscrupulous AI vendor liable to jury verdicts for knowingly bringing defective products to market.

Complicating the simple question of the efficacy of AI products is that "GPT-4 can still be vulnerable to adversarial attacks and exploits or, “jailbreaks.” Nor does OpenAI yet have the means to prevent the breaking of their products by rogue prompts intended to bypass any safeguards built into GPT-4. [Ibid, pg.28]

We will continue to learn from deployment and will update our models to make them safer and more aligned. This will include incorporating lessons from real-world data and usage, including instances of adversarial system messages that we detect early in the process of ramping up model access.

The Internet would apparently agree that GPT-4 is indeed vulnerable to jailbreaking. As much as the OpenAI System card would want readers to believe that GPT-4 is less likely to respond to jailbreaking, a simple Google search reveals that jailbreaking of GPT-4 has become quite a cottage industry.

More disturbing is that GPT-4 is vulnerable to a new form of abuse known as prompt injection attacks. The prompt is how a user communicates with the AI. In the prompt injection attack, a bad actor is manipulating a prompt in order to cause the AI to react in ways not imagined by the original user.

Prompt injection is an attack vector that takes a trusted input, like a prompt to a chatbot, and adds an untrusted input on top. This makes the program accept the trusted input along with the untrusted input, allowing the user to bypass the LLM’s programming.

A prompt written to produce a desired and predictable output is an example of "prompt engineering." Attackers can hide their malicious codes on websites under their control, and then direct the AI to scrape the malicious data along with other data to exploit vulnerabilities in application websites like the Bing AI CoPilot. This will attack will "inject prompts from an external source, thereby widening the attack vectors available for hackers," according to Prompt Injection Threat is Real, Will Turn LLMs into Monster, March 2, 2023.

By injecting a prompt into a document that is likely to be retrieved by the LLM during inference, malicious actors can execute the prompt indirectly without additional input from the user. The engineered prompt can then be used to collect user information, turning the LLM into a method to execute a social engineering attack.

The Internet has several websites dedicated to Proof of Concept (PoC) samples of such attacks. Jailbreakchat.com is such a website.

Websites have been found that have instructions hidden in tiny image files that will send commands to the AI. Systemweakness.com has demonstrated how what it calls "Public data poisoning."

All people do copy-pastes, but in fact very few of them look carefully at what they actually paste. An attacker can easily add a javascript code which will intercept all copy events for the text element or even for the whole webpage and inject a malicious ChatGPT prompt into the copied text.

And the article referenced gives examples of snippets of code to accomplish the hack. And then lists the "Possible consequences: 1. Sensitive data leakage including full prompts, code, passwords, API keys. 2. Inserting phishing links into ChatGPT output. 3. Polluting ChatGPT output with garbage images."

The question facing those who would suggest any regulatory schemes for governing AI is to what degree should OpenAI, and the other vendors of AI products, be held accountable for abuse and misuse of their products? There are two models of successful industry regulation by Uncle Sam: the automobile industry and the tobacco industry. In those cases, both government and the courts held those responsible for harms caused by their products liable both legally and monetarily. The is the only form of effective regulation are those regulations that come with teeth.

¯\_(ツ)_/¯
Gerald Reiff

Back to Top ← previous post next post →