AI Chatbots Are Failing at Herbicide Safety: Why Farmers Can't Trust Them Yet

When farmers turn to artificial intelligence for advice on herbicide use and weed management, they're getting dangerously incomplete answers. A rigorous test of four major AI chatbots, including ChatGPT, Google Gemini, and DeepSeek, revealed that even the highest-scoring bot achieved only a 60% accuracy rate on real-world agricultural questions . For farmers making decisions about chemical applications that affect their crops, soil, and long-term sustainability, this margin of error is unacceptable.

Why Are AI Chatbots Struggling With Herbicide Questions?

Researchers from Penn State University and other land-grant institutions tested four chatbots with 24 realistic questions covering integrated weed management, herbicide resistance, weed biology, and pesticide label interpretation . The results were sobering. ChatGPT, the most widely used bot, scored 60%, while DeepSeek, Google Gemini, and ExtensionBot (a nonprofit alternative) scored between 51% and 56%. But the real problem wasn't just the scores; it was what happened when the bots got things wrong.

When Penn State scientist John Wallace asked DeepSeek whether metribuzin could control atrazine-resistant pigweeds, the bot confidently answered yes, claiming metribuzin was a "key alternative mode of action." The reality is far more complex. Metribuzin and atrazine are both Group 5 Photosystem II herbicides, meaning cross-resistance can occur depending on the specific mechanism of atrazine resistance. Without sources to verify its claim, DeepSeek had potentially misled a farmer with dangerous confidence .

In another troubling example, Google Gemini claimed that herbicide resistance accounts for only one out of every ten herbicide application failures, yet provided no source for this statistic. "I'm left wondering where they came up with that number," Wallace noted . These aren't minor errors; they're the kind of misinformation that could lead farmers to waste money on ineffective treatments or make poor decisions about crop rotation and chemical management.

What's the Biggest Weakness: Reading Herbicide Labels?

The most alarming finding was how poorly all four bots performed when asked to interpret herbicide labels. Accuracy ranged from just 19% to 43%, with ExtensionBot performing best at 43% . This is critical because herbicide labels contain legally binding instructions, safety warnings, and usage restrictions that determine whether a chemical application is safe and effective.

When testers asked whether clethodim could be used for grass control in alfalfa, all four bots answered yes. However, each failed to mention crucial nuances: that some grass species are not susceptible to clethodim, that tank mix antagonism risks exist, and most importantly, that users should read the actual label. DeepSeek's answer went completely off track, pivoting to discuss "autotoxicity of alfalfa," which has nothing to do with the question . These omissions could lead to crop damage, wasted chemicals, and environmental harm.

The problem is partly structural. Many outdated versions of herbicide labels exist on the internet, and the bots have no way to distinguish between the current, legally valid label on the product jug and older versions floating around online. As one researcher explained, "Many versions of labels exist on the internet, likely contributing to the bots' errors. But only one label, the one on the jug, matters" .

How Does ExtensionBot Compare to Corporate AI?

While ExtensionBot didn't win on overall accuracy, it demonstrated a critical advantage: transparency. Unlike ChatGPT and DeepSeek, which rarely provided sources, ExtensionBot consistently linked to the sources of its information, allowing users to verify accuracy and learn more about topics . Google Gemini provided sources only about half the time.

ExtensionBot, developed by the Extension Foundation as a nonprofit alternative, pulls its information exclusively from Cooperative Extension institutions. This limited scope sometimes meant shallower answers compared to corporate bots that sweep across the entire internet. For example, when asked about flaming as a weed control tool, ChatGPT provided a more comprehensive analysis covering weed seed biology and economics, though without sources. ExtensionBot's answer was accurate but more basic .

However, ExtensionBot's narrow focus proved advantageous when answering complex questions. When asked how to determine if a weed's survival is due to herbicide resistance, corporate bots pulled in irrelevant details and unsourced statistics. ExtensionBot, drawing from a small handful of Extension factsheets, provided a "very good and succinct" answer that directly addressed the question .

Steps to Safely Use AI for Agricultural Questions

  • Verify with Official Labels: Never rely solely on AI chatbot answers for herbicide application decisions. Always consult the actual product label on the jug, which is the legally binding source of truth for safe and effective use.
  • Cross-Check Sources: If an AI bot provides an answer, ask it to cite its sources. If it cannot provide them, treat the answer with skepticism. ExtensionBot's approach of linking to Extension factsheets allows you to independently verify claims.
  • Consult Extension Specialists: For complex, multi-step questions about herbicide resistance, weed biology, or integrated weed management, contact your local Cooperative Extension office or a certified agronomist rather than relying on AI as your primary advisor.
  • Test Bots Incrementally: Use AI chatbots as a first step to gather general information, but recognize that they may generate different answers to identical questions on different days, making them unreliable for critical decisions.
  • Prioritize Transparency: When choosing an AI tool, favor those that provide sources and citations over those that offer confident answers without evidence.

The researchers who conducted this evaluation were clear in their conclusion: "Maybe consider AI bots only a first step in seeking out help on agricultural questions, particularly nuanced ones and definitely legal ones involving labels" . For farmers managing chemical exposure, herbicide resistance, and sustainable weed control, this guidance is essential.

What Does This Mean for Chemical Safety and Sustainable Farming?

The failure of AI chatbots to accurately interpret herbicide labels and provide nuanced guidance on chemical use has broader implications for clean living and environmental health. Herbicide misuse contributes to herbicide-resistant weeds, which then require higher doses or additional chemicals. Inaccurate advice about herbicide resistance can perpetuate a cycle of increasing chemical dependence on farms .

Additionally, when farmers receive incomplete or incorrect information about herbicide options, they may default to the most familiar or heavily marketed chemicals rather than exploring integrated weed management strategies that reduce overall chemical use. This underscores why access to reliable, transparent information from trusted sources like Cooperative Extension is so important for both farm profitability and environmental protection.

As AI tools become increasingly integrated into agricultural decision-making, the stakes are high. A farmer making a herbicide choice based on a 51% to 60% accurate AI recommendation could waste money, damage crops, harm soil health, or contribute to environmental contamination. Until these tools improve significantly, farmers should view them as supplementary research aids, not primary advisors for chemical management decisions.