
Artificial Intelligence News
A benchmarking study by the Electric Power Research Institute (EPRI) on leading LLMs (including GPT-5 and Gemini 2.5 Pro) revealed a critical reliability gap in technical AI. While models scored well on simple multiple-choice questions about the electrical grid (83–86%), their average accuracy dropped by 27 percentage points on complex, open-ended expert-level questions, scoring as low as 46%, underscoring the necessity of expert human oversight for critical infrastructure tasks.
