Leader Board
Here is state of the arts systems :
Ranking | Systems | Who-did-What | CNN |
---|---|---|---|
0 | Human performance | 84/100 | 0.75+ |
1 | GA with word features | 0.712 | 0.77 |
GA with word features (relaxed train) | 0.726 | -- | |
2 | NSE Adaptive Computation | 0.662 | -- |
NSE Adaptive Computation (relaxed train) | 0.667 | -- | |
3 | Stanford Reader | 0.64 | 0.73 |
Stanford Reader (relaxed train) | 0.65 | ||
4 | Gated-Attention Reader | 0.57 | 0.74 |
Gated-Attention Reader (relaxed train) | 0.60 | ||
5 | Attention Sum | 0.57 | 0.70 |
Attention Sum (relaxed train) | 0.59 | ||
6 | Attentive Reader | 0.53 | 0.63 |
Attentive Reader (relaxed train) | 0.55 | ||
7 | Semantic features | 0.52 | --- |
8 | Sliding window + Distance | 0.51 | --- |
On Who-did-What dataset, two native speakers of American English annotated 50 questions randomly selected from the test set and achieved 84/100 in total. Human performance of CNN dataset is estimated in (Chen et al., 2016).
Your report about new state of the arts is more than welcome. Please contact us and we appreciate your help to keep updating our leaderboard. (contact)
- GA with word features ( Dhingra et al., 2016)
- NSE Adaptive Computation ( Munkhdalai et al., 2016 )
- Stanford Reader ( Chen et al., 2016 )
- Gated-Attention Reader ( Dhingra et al., 2016 )
- Attention Sum ( Kadlec et al., 2016 )
- Attentive Reader ( Hermann et al., 2015 )
- Semantic Features ( Wang et al., 2015 )
- Sliding window + Distance ( Richardson et al., 2013 )