From Plank (2011)[1]: "... supported by the results of Rimell and Clark (2008)[2]. They found that, for parser performance, retraining the PoS tagger accounted for a greater proportion of the improvement on the biomedical data, while retraining the supertagger (that includes subcategorization information) was more beneficial for the questions domain. Thus, intuitively, they argue that the main difference between newspaper and biomedical text is in vocabulary, while the main difference between newspaper text and questions is syntactic."
Datasets
Edit
Biomedical
Edit
Others
Edit
- TED talks: NAIST-NTT TED Treebank
- QuestionBank and Stanford improvements
References
Edit
- ↑ Plank, B. (2011). Domain Adaptation for Parsing. PhD thesis. http://doi.org/10.4337/9781845420536.00006
- ↑ Rimell, L. & Clark, S. (2008). Adapting a Lexicalized-Grammar Parser to Con- trasting Domains. In Proceedings of the 2008 Conference on Empirical Meth- ods in Natural Language Processing (pp. 475–484). Honolulu, Hawaii: As- sociation for Computational Linguistics.