It happens quite often that the most important papers we read in NLP contain phrases like this in its abstract: "using method X, we improve the state-of-the-art in task Y". It makes the impression that NLP follows a hill-climbing algorithm and if your research fails to advance the SoA, you're doomed.

It is therefore interesting to me when I come across works that don't come close to the SoA and even fail pretty far behind but nevertheless are published in high-profile conferences. Their existence proves that NLP is still a scientific endeavour to some extent.

Wolfe et al. (2016; EMNLP)[1]: "Our absolute performance is 73.0 for Propbank (dev) and 55.3 for FrameNet (dev). This falls significantly short of the work of Zhou and Xu (2015) at 81.1 (PB dev), FitzGerald et al. (2015) at 79.2 (PB dev), and 72.0 (FN). Those works used non-linear neural models with multi-task distributed representations, which are not comparable to our results. However, the models of Pradhan et al. (2013) at 77.5 (PB test) and Das et al. (2012) at 64.6 (FN test) are roughly comparable, and the performance gap is still significant. While our efforts do not advance the state of the art in SRL, we hope that they are enlightening with respect to the application of various imitation learning methods."

References Edit

  1. Wolfe, Travis, Mark Dredze, and Benjamin Van Durme. "A Study of Imitation Learning Methods for Semantic Role Labeling." EMNLP 2016 (2016): 44.