This year I’m helping us laugh and possibly learn by making one of my humorous books available to download for free in this year’s first six columns. The second book is How to become an instrument engineer – Part 1.523. Humor can open minds, and it can be fun to be silly.
Now to get serious, we’ll learn what can and can’t be done with artificial intelligence (AI) to help us deal with the challenges in our profession. We’re fortunate to have a longtime associate, Randolf Riess, offer insights on what works and what doesn’t when it comes to AI.
Get your subscription to Control's tri-weekly newsletter.
Randy, how can we reduce confusion and understand the AI we see today?
Randy: I've been working with AI for the last few years, and it's been interesting because the hype of what AI can do has it orbiting the moon. However, what AI can actually do is closer to an altitude of a few thousand feet.
Specifically, most of what people call AI are large language models (LLM), and they do exactly what they sound like—they model language. They’re trained in language to be word predictors. They take a set of words as an input (prompt), and generate words that most fit the input based on the model’s training. For example, if the prompt is a question about quantum physics, the LLM generates the words that are most often associated with an answer to that question about quantum physics. But the LLM doesn’t understand quantum physics. It just regurgitates words.
The problem arises because the answer is in the form of conversational natural language that people associate with a human. Thus, they start to associate human-like thinking with the human-like response, even though there is no thinking going on in the AI. So, people start to extrapolate beyond what AI can do. It’s personification.
What’s also confusing about what’s being called AI is that it comes in two different forms. One form is the massive AI models hosted by a company. These are the headline grabbers—OpenAI’s GPT4o, Meta’s Llama and Anthropic’s Claude. To use these models, you contact the company’s website with your request in the form of a prompt. The model processes your prompt on the hosting company’s hardware (in the cloud) and then returns an answer. The number of “tokens” in the prompt and the output are used to charge the user money for using it.
The second form includes smaller, pre-trained models that can be downloaded to one’s own computer (or cloud service) and executed. Depending on the model’s size, this can require lots of costly computing power. Smaller LLMs can be trained with an individual’s data to perform specific tasks. However, smaller size also means they may not perform as desired compared to the massive AI models, but can sometimes return better results because they’re trained using specific data. Using smaller models requires a data scientist to train and test them.
Greg: What does AI do well?
Randy: Mostly, generate words.
AI can be used to translate from one language to another (see the original use of LLMs from Geoffrey Hinton).
AI does a good job of consuming various types of disparate text data in a prompt, generating a summary. This is the so-called retrieval-augmented generation (RAG), which uses other technology to search for information, provide that information in the prompt, and use the LLM to summarize it.
AI does a great job of sounding human and even sounding like it knows what it’s talking about. However, it's like getting advice/information from Tik-Tok. You really have no idea where the information came from or if it's accurate.
Smaller LLMs can be trained on question-answer pairs or classifications of natural language text to create an embedding model that’s used for semantic search. It can be used as a high-powered search engine that understands the semantic meaning of words specific to your data, and can return very detailed search results from a massive corpus of natural language content.
Greg: What does AI do poorly?
Randy: Anything analytical. Specifically, anything that can be figured out to be the right answer, including math, logic and logistics. LLMs are built to sound and act like a human as their primary function, while being correct is secondary (or tertiary). LLMs introduce variations of answers to sound more human, even when that variation includes incorrect answers.
For example, one of the first AI applications was in the travel business to generate itineraries. It was quickly shown to sound good, but it was often logistically problematic. For example, a traveler was sent to a museum in the morning because it wasn’t crowded, only to find out that the museum was closed in the morning, which is why it wasn’t crowded.
LLMs do a poor job when results need to be consistent. Even by setting LLM parameters to minimize randomness, it’s almost impossible to get the same answer every time. Expect LLMs to be wrong 10-20% of the time.
Please note this excludes any feedback control application from a purely analytical perspective.
Greg: What is an AI technology win?
Randy: A multimodal LLM can take, as an input, data that’s more than text. Specifically, it can take images, audio and some say video, but I’ve never tried it. The ability to take one or more images as an input is groundbreaking, though still very new. Multimodal LLMs started to be useful with the release of OpenAI’s GPT4o and Anthropic’s Claude 3.5 Sonnet, as well as a few others introduced around the same time. What I’ve seen multimodal LLMs do with respect to computer vision tasks, without any special training, is amazing.
Keep in mind, the previous state-of-the-art for computer vision (CV) was convolutional neural networks and “you only look once” models, which were blank frameworks requiring training with thousands of images that had to be hand annotated and validated to detect a specific type of object. It can take up to six months to train one CV model to detect a few types of objects in a restricted setting, which is cost and time prohibitive for most use-cases. That’s where computer vision has been for the last 10 years.
Multimodel LLMs can send an image, ask a question, and, generally, return an answer. The more information available and about what you want it to do, the better the results. It's technology that’s as close to “thinking” as I’ve seen from any AI. Of course, it's not actually thinking and can be easily fooled. However, no training is required. You write the prompt, send the image, and the AI returns text results about what’s in the image in a few minutes.
Greg: What are possible wins for industrial applications in maintenance and grounds monitoring?
Randy: I worked with several industrial companies that spent a lot of money on people and equipment for capturing images from drones or walking rounds that could be processed into visual, digital-twin reconstructions. However, the vast amount of that imagery goes into storage because the companies lacked the human power to look through it all. Multimodal LLMs are an extra set of eyes on images.
Multimodal LLMs look through images for a wide range of use-cases including:
- Detecting routine maintenance issues, such as insulation missing from pipes, conduit body covers missing, rust and corrosion on pipes and tanks over time, slow leaks, unsafe equipment, and people/vehicles/equipment in areas where they shouldn’t be;
- Using drones to detect faults in hard-to-reach locations, damaged devices, equipment inventories, animal nests, or to inspect powerlines, cell towers and distillation columns;
- Reading identification tags on equipment and locating them with the image GPS;
- Estimating volume of ditches and mines, tree counts and vegetation coverage, or animal identification; and
- Grounds maintenance, including fencing upkeep, downed trees, erosion, etc.
Ideally, these images are captured on a regular basis, and results from multimodal LLMs are stored in a database by time, physical location and asset identification. The result is a database that can be queried, such as “show me a list of tanks with corrosion and possible leakage in the last six months,” or can generate a workorder ticket if certain conditions are met. For example, it may generate a maintenance workorder if rust or corrosion plus possible leak conditions are detected more than five times in the last month on a storage tank in a tank farm.
Greg: What are some AI limitations in industrial applications?
Randy: These are mostly offline applications that rely on many responses over time to determine a condition, which also mitigates the error rate of LLMs. Multimodal LLMs don’t handle time-critical, safety issues well. They aren’t reliable for lifesaving or safety-related uses. They can indicate a safety-related condition that must be verified by a human. That is, if a plume of smoke is detected in a tank farm, they can send an alert, but must not be relied on as the only means of detecting a fire on a tank farm. Multimodal LLMs are, at best, better-than-nothing technology for time-critical safety.
The main reason multimodal LLMs can’t be used as safety systems is because they’re massive models hosted on cloud computers by companies such as OpenAI, Anthropic, Meta, etc. They must be accessed by an application program interface (API) call over the Internet. Large, hosted LLMs aren’t highly available and response time isn’t guaranteed. OpenAI serves requests from an eighth grader researching New Zealand with the same response time, accuracy and reliability as it would an image of a burning tank farm. LLMs or connections to them can go down at any time. The hosting company simply doesn’t charge you for requests it can’t respond to when it’s down, restricting it to offline use-cases that aren’t time-sensitive.
Many failures at plants could be avoided if there were enough eyes to make simple assessments over time. For example, determining whether vegetation has overgrown a tank, so a leak wouldn’t be visible, could trigger a workorder to mow that tank farm. This would increase the possibility of detecting a slow leak. This can save lives and prevent larger issues from occurring.
Greg: Stay tuned for the next column to learn about how we can capture process control knowledge that’s in the heads of all the gray-hairs who are retiring.