A Stanford-led research team that includes Fei-Fei Li published a benchmark that documents a specific, measurable failure: ...