Timon Harz
December 20, 2024
Expert Reviews of Google Gemini Outputs Coming from Non-Experts: A Closer Look
External testers are now required to address prompts outside of their "domain knowledge."
Like any generative AI model, Google Gemini's responses can occasionally be inaccurate, but in this case, the issue may stem from testers lacking the necessary expertise to fact-check them. TechCrunch reports that the company hired to enhance Gemini's accuracy is now asking testers to evaluate responses, even when they lack relevant "domain knowledge."
The report questions the rigor and standards Google claims to use when testing Gemini for accuracy. In the "Building responsibly" section of the Gemini 2.0 announcement, Google stated it is "working with trusted testers and external experts, and conducting extensive risk assessments, safety, and assurance evaluations." While there's a strong emphasis on evaluating responses for sensitive and harmful content, less attention seems to be given to responses that are inaccurate, though not necessarily harmful.
Google appears to downplay the issue of hallucinations and errors by simply adding a disclaimer that "Gemini can make mistakes, so double-check it," effectively distancing itself from any responsibility. However, this overlooks the role of the humans behind the scenes.
Previously, GlobalLogic, a subsidiary of Hitachi, instructed its prompt engineers and analysts to skip any Gemini responses they didn't fully understand. According to the guidelines seen by TechCrunch, they were told, "If you do not have critical expertise (e.g., coding, math) to rate this prompt, please skip this task."
But last week, GlobalLogic revised its instructions, advising, "You should not skip prompts that require specialized domain knowledge," and instead to "rate the parts of the prompt you understand," while noting the lack of required expertise in their analysis. In other words, expertise is no longer considered a prerequisite for this work.
Now, contractors can only skip prompts that are "completely missing information" or contain sensitive content that requires a consent form, TechCrunch reports.
Press contact
Timon Harz
oneboardhq@outlook.com
Other posts
Company
About
Blog
Careers
Press
Legal
Privacy
Terms
Security