The paper describes Multiple Choice Narrative Cloze (MCNC) task and uses it to evaluate several statistical script models in the literature. Codes for generating the task questions, training and evaluating models are available at http://mark.granroth-wilding.co.uk/papers/what_happens_next/.
MCNC requires a system to pick up the observed event among a group of distractors, given a list of preceding events. The main advantage of this evaluation scheme over narrative cloze is that it is easier while still providing meaningful comparison between systems. Selecting distractors randomly can make it too easy, though, but I think one can manipulate the difficulty of the task by alternate distractors distributions.
Another advantage is, in principle, it can be taken by humans to establish upper-bound performance. Unfortunately, the authors didn't provide such a reference, stating that "[o]ur informal initial human studies suggest [noise in the automatic extraction process and the random sampling of confounders] are indeed problems, but not so common as to invalidate conclusions drawn here."
Using this task, the authors have benchmarked several models:
- Granroth-wilding, M., & Clark, S. (2015). What Happens Next ? Event Prediction Using a Compositional Neural Network Model.