VisualWebArena is a realistic and diverse benchmark for evaluating multimodal autonomous language agents. It comprises of a set of diverse and complex web-based visual tasks that evaluate various ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results