Active learning, also called selective sampling, is a technique to reduce annotation effort by selecting the most "useful" data according to some criteria.

TODO: Poursabzi-Sangdeh et al. (2016)[1]

From Tang et al. (2002)[2]: "Active learning has been studied in the context of many natural language processing (NLP) applications such as information extraction(Thompson et al., 1999), text clas- sification(McCallum and Nigam, 1998) and natural lan- guage parsing(Thompson et al., 1999; Hwa, 2000), to name a few." [...] "While active learning has been studied extensively in the context of machine learning (Cohn et al., 1996; Freund et al., 1997), and has been applied to text classifica- tion (McCallum and Nigam, 1998) and part-of-speech tagging (Dagan and Engelson, 1995), there are only a handful studies on natural language parsing (Thompson et al., 1999) and (Hwa, 2000; Hwa, 2001). (Thompson et al., 1999) uses active learning to acquire a shift-reduce parser, and the uncertainty of an unparseable sentence is defined as the number of operators applied successfully divided by the number of words." [...] "Knowing the distribution of sample space is important since uncertainty measure, if used alone for sample selection, will be likely to select outliers."

From Lynn et al. (2012)[3] "application of active learning to NLP is in parsing, for exam- ple, Thompson et al. (1999), Hwa et al. (2003), Osborne and Baldridge (2004) and Reichart and Rappoport (2007). Taking Osborne and Baldridge (2004) as an illustration, the goal of thatworkwas to improve parse selection for HPSG: for all the analyses licensed by the HPSG English Resource Grammar (Baldwin et al., 2004) for a particular sentence, the task is to choose the best one us- ing a log-linear model with features derived from the HPSG structure. The supervised framework requires sentences annotated with parses, which is where active learning can play a role. Osborne and Baldridge (2004) apply bothQBUwith an en- semble of models, and QBC, and show that this decreases annotation cost, measured both in num- ber of sentences to achieve a particular level of parse selection accuracy, and in a measure of sentence complexity, with respect to random selection."

TODO: survey of various approaches to active learning in NLP: Olsson (2009)[4]

Notice: active learning to train one model and to build a corpus is different. While papers overwhelmingly show that AL is useful to reduce the cost of acquiring data and training one model, Baldridge and Osborne (2004)[5] show that reusing a resource constructed with active learning sometimes is less efficient than without.

From Zhenghua's slide at ACL'16: Word segmentation [Li et al., 2012], Sequence labeling [Marcheggiani and Artieres, 2014], Constituent parsing [Hwa (1999)], CCG parsing [Clark and Curran (2006)]

TODO: multi-task AL (Reichart et al. 2008)[6]

Applications Edit

Part-of-speech Edit

TODO: Fort and Sagot (2010) [7]

Dependency parsing Edit

(Tang et al., 2002)[8]: TODO

Lynn et al. (2012)[9] employ active learning in development of Irish treebank, Persian: (Ghayoomi and Kuhn, 2013)[10], Spanish: (Busser and Morante, 2005)[11]

Sassano and Kurohashi (2010)[12]: partially annotated Japanese sentences, Mirroshandel and Nasr (2011)[13]: partially annotated English sentences

Hwa (2004): “uncertainty is a robust predictive criterion that can be easily applied to different learn- ing models.”

Coreference resolution Edit

Miller et al. (2012)[14]

Partial annotation Edit

Chinese word segmentation: Zhang et al. (2013)[15]

TODO: Zhenghua et al. [16]

From Zhenghua et al. [16]: "Recently, researchers report promising results with AL based on partial annotation (PA) for dependency parsing (Sassano and Kurohashi, 2010; Mirroshandel and Nasr, 2011; Majidi and Crane, 2013; Flannery and Mori, 2015). They find that smaller units rather than sentences provide more flexibility in choosing potentially informative structures to annotate."

Design Edit

Explicitly verifiable questions Edit

Sabou et al. (2014)[17]: explicitly verifiable questions "force the users to process the content and also signal to the workers that their answers are being scrutinized."

Laws et al. (2011)[18]: "For sentiment annotation, we found in preliminary experiments that using simple radio button selection for the choice of the document label (positive or negative) leads to a very high amount of spam submissions, taking the overall classification accuracy down to around 55%. We then designed a template that forced annotators to type the label as well as a randomly chosen word from the text. Individual label accuracy was around 75% in this scheme."

Kittur et al. (2008)[19]:

"First, it is extremely important to have explicitly verifiable questions as part of the task. In Experiment 2 the first four questions users answered could be concretely verified. Not all of these questions need to be quantitative; one of the most useful questions turned out to be asking users to generate keyword tags for the content, as the tags could be vetted for relevance and also required users to process the content. Another important role of verifiable questions is in signaling to users that their answers will be scrutinized, which may play a role in both reducing invalid responses and increasing time-on-task.
Second, it is advantageous to design the task such that completing it accurately and in good faith requires as much or less effort than non-obvious random or malicious completion. Part of the reason that user ratings in Experiment 2 matched up with expert ratings more closely is likely due to the task mirroring some of the evaluations that experts make, such as examining references and article structure. These tasks and the summarization activity of keyword tagging raise the cost of generating non-obvious malicious responses to at least as high as producing good-faith responses.
Third, it is useful to have multiple ways to detect suspect responses. Even for highly subjective responses there are certain patterns that in combination can indicate a response is suspect. For example, extremely short task durations and comments that are repeated verbatim across multiple tasks are indicators of suspect edits."

References Edit

  1. Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater and Kevin Seppi. 2016. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. ACL-2016
  2. Tang, M., Luo, X., & Roukos, S. (2002). Active Learning for Statistical Natural Language Parsing. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 120–127). Stroudsburg, PA, USA: Association for Computational Linguistics. doi:10.3115/1073083.1073105
  3. Lynn, T., Foster, J., Dras, M., & Dhonnchadha, E. U. (2012). Active Learning and the Irish Treebank. In Proceedings of the Australasian Language Technology Association Workshop 2012 (pp. 23–32). Dunedin, New Zealand.
  4. Fredrik Olsson. 2009. A literature survey of active machine learning in the context of natural language processing. Technical Report T2009:06, SICS.
  5. Baldridge, J., & Osborne, M. (2004). Active learning and the total cost of annotation. In Proc. Empirical Methods in Natural Language Processing (pp. 9–16).
  6. Reichart, Roi, Katrin Tomanek, Udo Hahn, and Ari Rappoport. "Multi-Task Active Learning for Linguistic Annotations." In ACL, vol. 8, pp. 861-869. 2008.
  7. Fort, K. and Sagot, B. (2010). Influence of Pre-annotation on POS-tagged Corpus Development. In Proc. of the Fourth Linguistic Annotation Workshop 
  8. Min Tang, Xiaoqiang Luo, and Salim Roukos. 2002. Active Learning for Statistical Natural Language Pars- ing. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 120–127, Stroudsburg, PA, USA. Association for Computational Linguistics.
  9. Teresa Lynn, Jennifer Foster, Mark Dras, and Elaine U Dhonnchadha. 2012. Active Learning and the Irish Treebank. In Proceedings of the Australasian Lan- guage Technology Association Workshop 2012, pages 23–32, Dunedin, New Zealand, 12.
  10. Masood Ghayoomi and Jonas Kuhn. 2013. Sampling Methods in Active Learning for Treebanking.
  11. Bertjan Busser and Roser Morante. 2005. Designing an active learning based system for corpus annotation. Procesamiento del Lenguaje Natural, 35.
  12. Manabu Sassano and Sadao Kurohashi. 2010. Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing. In Pro- ceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 356– 365, Stroudsburg, PA, USA. Association for Compu- tational Linguistics.
  13. Seyed Abolghasem Mirroshandel and Alexis Nasr. 2011. Active Learning for Dependency Parsing Using Par- tially Annotated Sentences. In Proceedings of the 12th International Conference on Parsing Technolo- gies, IWPT ’11, pages 140–149, Stroudsburg, PA, USA. Association for Computational Linguistics.
  14. Miller, Timothy A., Dmitriy Dligach, and Guergana K. Savova. "Active learning for coreference resolution." In Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, pp. 73-81. Association for Computational Linguistics, 2012.
  15. Zhang, Kaixu, Jinsong Su, and Changle Zhou. "Improving Chinese word segmentation using partially annotated sentences." In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 1-12. Springer Berlin Heidelberg, 2013.
  16. 16.0 16.1 Zhenghua Li, Min Zhang, Yue Zhang, Zhanyi Liu, Wenliang Chen, Hua Wu, Haifeng Wang. 2016. Active Learning for Dependency Parsing with Partial Annotation. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016)
  17. Sabou, M., Bontcheva, K., Derczynski, L., & Scharl, A. (2014). Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. In LREC 2014.
  18. Laws, F., Scheible, C., & Schütze, H. (2011). Active Learning with Amazon Mechanical Turk. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1546–1556.
  19. Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. Proceeding of the Twentysixth Annual CHI Conference on Human Factors in Computing Systems CHI 08, (November 2016), 453–456.