- Vision
- Language
- Action
- Image Classification
- Object Detection
Visualization of the estimated test case rating distribution and agent ratings on six distinct datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on each dataset.
Visualization of the estimated test case rating distribution and agent ratings on object detection datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on object detection datasets.
- Question & Answering
- Code Generation
Visualization of the estimated test case rating distribution and agent ratings on question answering datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on question answering datasets.
Visualization of the estimated test case rating distribution and agent ratings on code generation datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on code generation datasets.
- Motion Prediction
- Motion Planning
Visualization of the estimated test case rating distribution and agent ratings on motion prediction datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on motion prediction datasets.
Visualization of the estimated test case rating distribution and agent ratings on motion planning datasets. The percentile curve represents the cumulative percentage of test cases up to each rating level. For each agent, the portion of the test cases and the percentile curve that lies to the right represents the fraction of the dataset that remains difficult (below 50% confidence).
Visualization of the predicted (theoretical) agent performances based on the differences between agents and test cases vs. the empirical performance obtained on motion planning datasets.