Natural Language Understanding Wiki
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 21: Line 21:
 
TODO: global vs local guarantee on optimality? (Langford & Zadrozny, 2003)
 
TODO: global vs local guarantee on optimality? (Langford & Zadrozny, 2003)
   
== Reductions by Langford & Zadrozny (2003) ==
+
== Reduction by Langford & Zadrozny (2003) ==
   
 
Langford & Zadrozny proposed 2 types of reduction, both rely on importance-weighted classification. Departing from normal conceptualization, this type of classification assign a weight ''w'' to each example (''x'', ''y'') to indicate the relative importance of getting it right. The problem can be converted into classification via subsampling though being "slightly tricky"<ref>Zadrozny, B., Langford, J., & Abe, N. (2003, November). Cost-sensitive learning by cost-proportionate example weighting. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on (pp. 435-442). IEEE.</ref>.
 
Langford & Zadrozny proposed 2 types of reduction, both rely on importance-weighted classification. Departing from normal conceptualization, this type of classification assign a weight ''w'' to each example (''x'', ''y'') to indicate the relative importance of getting it right. The problem can be converted into classification via subsampling though being "slightly tricky"<ref>Zadrozny, B., Langford, J., & Abe, N. (2003, November). Cost-sensitive learning by cost-proportionate example weighting. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on (pp. 435-442). IEEE.</ref>.
Line 39: Line 39:
 
Set aside impenetrable formulae, their algorithm looks like a batched version of [[REINFORCE]] although supposedly also works with indifferentiable classifiers.
 
Set aside impenetrable formulae, their algorithm looks like a batched version of [[REINFORCE]] although supposedly also works with indifferentiable classifiers.
   
  +
== CAPI ==
== Classification-based Approximate Policy Iteration (CAPI) ==
 
  +
[[File:Capi.png|thumb]]
 
  +
A good deterministic policy should always choose actions with higher action-value (''Q''). CAPI strives to improve this characteristic in an action-gap sensitive way. Like other [[policy iteration]] methods, it runs these two steps repeatedly: policy evaluation (compute estimated action-value function <math>\hat{Q}</math>) and policy improvement (change policy to better align with <math>\hat{Q}</math>). Action-gap quantify the distance between the best action (in terms of ''Q'') and the rest. A large gap means the price of a mistake is high but it is also less likely while a small gap indicate a mistake is likely but also doesn't matter much. Training examples are weighted proportionately to action-gap.
 
   
 
== Learning to search ==
 
== Learning to search ==
Please note that all contributions to the Natural Language Understanding Wiki are considered to be released under the CC-BY-SA
Cancel Editing help (opens in new window)