Timon Harz

December 20, 2024

Expert Reviews of Google Gemini Outputs Coming from Non-Experts: A Closer Look

External testers are now required to address prompts outside of their "domain knowledge."

Like any generative AI model, Google Gemini's responses can occasionally be inaccurate, but in this case, the issue may stem from testers lacking the necessary expertise to fact-check them. TechCrunch reports that the company hired to enhance Gemini's accuracy is now asking testers to evaluate responses, even when they lack relevant "domain knowledge."

The report questions the rigor and standards Google claims to use when testing Gemini for accuracy. In the "Building responsibly" section of the Gemini 2.0 announcement, Google stated it is "working with trusted testers and external experts, and conducting extensive risk assessments, safety, and assurance evaluations." While there's a strong emphasis on evaluating responses for sensitive and harmful content, less attention seems to be given to responses that are inaccurate, though not necessarily harmful.

Google appears to downplay the issue of hallucinations and errors by simply adding a disclaimer that "Gemini can make mistakes, so double-check it," effectively distancing itself from any responsibility. However, this overlooks the role of the humans behind the scenes.

Previously, GlobalLogic, a subsidiary of Hitachi, instructed its prompt engineers and analysts to skip any Gemini responses they didn't fully understand. According to the guidelines seen by TechCrunch, they were told, "If you do not have critical expertise (e.g., coding, math) to rate this prompt, please skip this task."

But last week, GlobalLogic revised its instructions, advising, "You should not skip prompts that require specialized domain knowledge," and instead to "rate the parts of the prompt you understand," while noting the lack of required expertise in their analysis. In other words, expertise is no longer considered a prerequisite for this work.

Now, contractors can only skip prompts that are "completely missing information" or contain sensitive content that requires a consent form, TechCrunch reports.

Press contact

Timon Harz

oneboardhq@outlook.com

The logo for Oneboard Blog

Discover recent post from the Oneboard team.

Notes, simplified.

Follow us

Company

About

Blog

Careers

Press

Legal

Privacy

Terms

Security