Natural Language Understanding Wiki
Advertisement

Data

Use of lexicography (exemplar) corpus

Most work was done using both exemplar and fulltext annotations:

  • Gildea & Jurafsky (2002)[1]: "Example sentences were chosen by searching the British National Corpus for instances of each target word. [...] Thus, the focus of the project was on completeness of examples for lexicographic needs, rather than on statistically representative data."
  • Thompson et al. (2003)[2]: uses FrameNet 1.0 which doesn't have a fulltext corpus.
  • Fleischman et al. (2003)[3]: "In each FrameNet sentence, a single target predicate is identified and all of its relevant frame elements are tagged with their semantic role"
  • Senseval-3 uses FrameNet 1.1 including lexicographic data.
  • Moschitti (2004)[4]: "For the FrameNet corpus (www.icsi.berkeley.edu/∼framenet) we extracted all 24,558 sen- tences from the 40 frames of Senseval 3 task (www.senseval.org) for the Automatic Labeling of Semantic Roles." (FrameNet 1.3 released in 2007 has only 1,700 sentences in the full-text portion)
  • Giuglea & Moschitti (2006)[5]: "For the experiments on FN corpus, we extracted 58,384 sentences from the 319 frames that contain at least one verb annotation." (FrameNet 1.3 released in 2007 has only 1,700 sentences in the full-text portion)
  • SemEval-2007 task 19 (Baker et al., 2007[6]) did NOT prohibit the use of lexicographic part: "The major part of the training data for the task consisted of the current data release from FrameNet (Release 1.3), described in Sec.2 This was supplemented by additional training data made available through SemEval to participants in this task."
  • Johansson & Nugues (2008)[7]: "We used the FrameNet example corpus and running-text corpus, from which we randomly sampled a training and test set."
  • Matsubayashi et al. (2009)[8]: "We used the training set of the Semeval-2007 Shared task (Baker et al., 2007) in order to ascertain the contributions of role groups. This dataset consists of the corpus of FrameNet release 1.3 (containing roughly 150,000 annotations), and an additional full-text annotation dataset." (the fulltext part of FrameNet 1.3 has only 11,700 annotation sets)
  • SemEval-2010 task 10 (Ruppenhofer et al. 2010)[9] allows the use of "additional data, in particular the FrameNet and PropBank releases".
  • Croce et al. (2011)[10]: "We used the FrameNet version 1.3 with the 90/10% split between training and test set (i.e 271,560 and 30,173 examples respectively), as defined in (Johansson and Nugues, 2008b)"
  • Laparra & Rigau (2012)[11]: "The dataset also includes the annotation files for the lexical units and the full-text annotated corpus from FrameNet."

Some don't make use of lexicographic annotations:

  • Das et al. (2010)[12]: "We found that using exemplar sentences directly to train our models hurt performance as evaluated on SemEval’07 data, even though the number of exemplar sentences is an order of magnitude larger than the number of sentences in our training set (§2.2). This is presumably because the exemplars are neither representative as a sample nor similar to the test data. Instead, we make use of these exemplars in features (§4.2)."
  • Roth & Lapata (2015)[13]: "Following previous work on FrameNet-based SRL, we use the full text annotation data set, which contains 23,087 frame instances."
  • FitzGerald et al. (2015)[14]: "For frame-semantic parsing using FrameNet conventions (Baker et al., 1998), we follow Das et al. (2014) and Hermann et al. (2014) in using the full-text annotations of the FrameNet 1.5 release and follow their data splits."

Irrelevant for this classification: Shi & Mihalcea (2004)[15] (rule-based)

Not clear: Erk & Pado (2006)[16]

Dealing with unseen lexical units

From Palmer et al. (2001)[17]:

"For example, Das et al. (2010) introduce a latent variable ranging over seen targets, allowing them to infer likely frames for unseen words, and the SRL system of Johansson and Nugues (2007) uses WordNet to generalise to unseen lemmas. In a similar vein, Burchardt et al. (2005) propose a system that generalizes over WordNet synsets to guess frames for unknown words. Pennacchiotti et al. (2008) compare WordNet-based and distributional approaches to inferring frames and conclude that a combination of the two leads to the best results, while (Cao et al., 2008) discuss how different distributional models can be utilised. Several approaches have also addressed other coverage problems, e.g., how to automatically expand the number of example sentences for a given lexical unit (Pado ́et al., 2008; Furstenau and Lapata, 2009). Another related approach is that of generalizing over semantic roles. Baldewein et al. (2004) use the FrameNet hierarchy to model the similarity of roles, boosting seldom-seen instances by reusing training data for similar roles, though without significant gains in performance. The most extensive study on role generalization to date (Matsubayashi et al., 2009) compares different ways of grouping roles—exploiting hierarchical relations in FrameNet, generalizing via role names, utilising role types, and using thematic roles from VerbNet—with the best results from using all groups together"

References

  1. Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics.
  2. C. A. Thompson, R. Levy, and C. D. Manning. 2003. A generative model for semantic role labeling. In Proc. of ECML.
  3. M. Fleischman, N.Kwon, and E. Hovy. 2003. Maximum entropy models for FrameNet classification. In Proc. of EMNLP.
  4. Moschitti, A. (2004, July). A study on convolution kernels for shallow semantic parsing. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (p. 335). Association for Computational Linguistics.
  5. A.-M. Giuglea and A. Moschitti. 2006. Shallow semantic parsing based on FrameNet, VerbNet and PropBank. In Proc. of ECAI 2006.
  6. Baker, C., Ellsworth, M., & Erk, K. (2007). SemEval-2007 Task 19: Frame Semantic Structure Extraction. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007) (pp. 99–104). Association for Computational Linguistics.
  7. Johansson, R., & Nugues, P. (2008). The effect of syntactic representation on semantic role labeling. In Proceedings of the 22nd International Conference on Computational Linguistics - COLING ’08 (Vol. 1, pp. 393–400). doi:10.3115/1599081.1599131
  8. Y. Matsubayashi, N. Okazaki, and J. Tsujii. 2009. A comparative study on generalization of semantic roles in FrameNet. In Proc. of ACL-IJCNLP.
  9. Ruppenhofer, J., Sporleder, C., Morante, R., Baker, C., & Palmer, M. (2010). SemEval-2010 Task 10: Linking Events and Their Participants in Discourse, (July), 45–50. Retrieved from http://eprints.pascal-network.org/archive/00007648/
  10. Croce, D., Moschitti, A., & Basili, R. (2011, July). Structured lexical similarity via convolution kernels on dependency trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1034-1046). Association for Computational Linguistics.
  11. Laparra, E., & Rigau, G. (2012, September). Exploiting explicit annotations and semantic types for implicit argument resolution. In 2012 IEEE Sixth International Conference on Semantic Computing (pp. 75-78). IEEE.
  12. Das, D., Schneider, N., Chen, D., & Smith, N. A. N. (2010). Probabilistic frame-semantic parsing. HLT ’10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 3(June), 948–956. Retrieved from http://dl.acm.org/citation.cfm?id=1858136\nhttp://dl.acm.org/citation.cfm?id=1857999.1858136
  13. Roth, M., & Lapata, M. (2015). Context-aware Frame-Semantic Role Labeling. Transactions of the Association for Computational Linguistics, 3, 449-460.
  14. FitzGerald, N., Täckström, O., Ganchev, K., & Das, D. (2015). Semantic Role Labeling with Neural Network Factors. In Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 960-970).
  15. L. Shi and R. Mihalcea. 2004. An algorithm for open text semantic parsing. In Proc. ofWorkshop on Robust Methods in Analysis of Natural Language Data.
  16. K. Erk and S. Pad´ o. 2006. Shalmaneser - a toolchain for shallow semantic parsing. In Proc. of LREC.
  17. Palmer, A., Alishahi, A., & Sporleder, C. (2011, September). Robust Semantic Analysis for Unseen Data in FrameNet. In RANLP (pp. 628-633).
Advertisement