Leader Board

Here is state of the arts systems :

Ranking Systems Who-did-What CNN
0 Human performance 84/100 0.75+
1 GA with word features 0.712 0.77
GA with word features (relaxed train) 0.726 --
2 NSE Adaptive Computation 0.662 --
NSE Adaptive Computation (relaxed train) 0.667 --
3 Stanford Reader 0.64 0.73
Stanford Reader (relaxed train) 0.65
4 Gated-Attention Reader 0.57 0.74
Gated-Attention Reader (relaxed train) 0.60
5 Attention Sum 0.57 0.70
Attention Sum (relaxed train) 0.59
6 Attentive Reader 0.53 0.63
Attentive Reader (relaxed train) 0.55
7 Semantic features 0.52 ---
8 Sliding window + Distance 0.51 ---
Human Performance

On Who-did-What dataset, two native speakers of American English annotated 50 questions randomly selected from the test set and achieved 84/100 in total. Human performance of CNN dataset is estimated in (Chen et al., 2016).

Help us to keep updating

Your report about new state of the arts is more than welcome. Please contact us and we appreciate your help to keep updating our leaderboard. (contact)

References
  1. GA with word features ( Dhingra et al., 2016)
  2. NSE Adaptive Computation ( Munkhdalai et al., 2016 )
  3. Stanford Reader ( Chen et al., 2016 )
  4. Gated-Attention Reader ( Dhingra et al., 2016 )
  5. Attention Sum ( Kadlec et al., 2016 )
  6. Attentive Reader ( Hermann et al., 2015 )
  7. Semantic Features ( Wang et al., 2015 )
  8. Sliding window + Distance ( Richardson et al., 2013 )