At first, she stayed away from AI. But after working as an AI product tester for a year, Kiara Smith had found a new mission.
“If we can program AI so that it never delivers false content, we can prevent it from doing the two scariest things—imitating humans or being used by humans for political purposes,” she told The AI Journal
While studies show that overall, members of the Gen Z generation mostly embrace AI (Smith is 26), in several cases, those on the front lines of AI testing, had reservations.
Similar to human processing?
Smith asked for personal details including her name and revealing details of her work to be disguised for fear of jeopardizing her position.
Despite her reservations about AI, however, she is relieved that she feels she can do something to make a difference—even on a small scale.
Smith tests AI bots for a mysterious company.
All assignments, payments, and tax documents come through a third-party payment system. But the type of testing she does suggests the company is not merely testing an “expert system”—one that relies on pre-programmed knowledge—but is instead a system engaged in deep learning, a much more advanced form of AI, in which the machine learns from the data itself what questions to ask.
Smith, and another tester a few years younger, also engaged in the same work, described evaluating AI bots not only about their abilities to harness information or engage in conversation but about ethical concerns.
Moreover, the evaluations revealed how the AI systems seemed to operate––surprisingly similar to the way humans approach knowledge acquisition on the Internet.
At the same time, their work suggested that growing concerns about AI systems running out of available data on the Internet are being addressed by using human respondents to create data on command.
“Empowering humans”
When Smith began work—comparing answers given by two different AI bots to the same question—it took her a while to get used to it.
Questions were posed to the competing AI systems, such as, “How often do you need to change your refrigerator’s water filter?”
The prompts did not come from her.
Her job was to evaluate the quality of each response.
But the criteria were limited.
They included such benchmarks as: did the bot actually answer the question? Or: was the writing style and tone appropriate?
Smith soon found, however, that her ability to rate the answers was constrained by the given criteria.
“Both answers were accurate,” she said.
And there was little difference in style.
What she wanted to rate the content on, however, was much more elusive.
It was a criterion—or a value—she came up with herself.
“I started thinking that what really mattered was which answer would be easier for the user to relate to,” she said.
Thus, she began choosing answers that used bullet points.
But even more importantly, she realized the purpose of AI was to “empower humans.”
So she started looking for answers that did not simply give a terse response.
She started choosing answers that gave explanations for the way they had answered—and included sources for the user to get more information.
This included personal conversations.
“We want a chat bot to be empathetic, but we don’t want it ever to impersonate a human being,” she said. “That would violate its most important principle—to supply factual information. A machine can’t lie and say it’s a human being.”
AI using search queries
Another tester doing similar work, however, found that deep learning systems were trained to act as humans—at least, in the initial phase of their processing.
She found this by the very nature of the work assigned to her.
Her task was to evaluate the search queries that AI systems created to respond to a user’s question.
Search queries, created by AI, would simply be run through a search engine, the same as any human question, she found.
“I realized,” said Jennifer Engels, 23, who also asked for a pseudonym. “That these AI bots were simply taking our questions, then analyzing mountains of data, comparing the data with our questions, then coming up with the kind of query we might put into a search engine if we had thought a little more about it.”
In other words, for the AI systems she was evaluating, the process worked in the following fashion:
A human user would ask the AI bot a question.
Then the AI bot would simply reframe the question and plug it into a search engine.
Finally, it would “read” the results that came out of the search engine, much as a user would read several sites, and summarize them according to a matrix.
Too easy?
The only place where search queries were not given, said Jennifer, was in personal conversations.
But these, she said, were disappointing.
And she stopped testing them after several misfires.
“What should I do now that I’ve had a fight with a friend,” she asked one day.
“Have you tried talking with your friend?” was the response.
For the moment, at least, her coworker’s worries about AIs impersonating humans were still not quite justified.
“But for gathering factual information, I’ve personally found them very helpful,” she said.
Engels has her own business, selling fashion templates online—designs for clothes and other wearable items.
For six months, whenever she could input a prompt, she asked one of the AI bots she was testing for help.
“It came up with lots of ideas,” she said.
But, in the end, she decided to drop her business.
While she said her decision was due to the hassle of running a small business, it’s not clear if her use of AI for such essential features had robbed her of a sense of accomplishment and agency.
To talk to her coworker, Smith, that might have been at least a small factor.
Perhaps AI made it too easy—that part of it.