 +===== Weight push in rescoring =====
 +Work done on language model combination suggests that the currently accepted practice in language model combination can be improved upon.   ​Specifically,​ when there are two possible language models, P1(w_i | h_i) and P2(w_i | h_i), the accepted practice is to linearly weight and so use \alpha * P1(w_i | h_i) + (1-\alpha) * P2(w_i | h_i).   As \alpha in often in the range 0.1 to 0.9 whereas P1 and P2 have dynamic ranges of very many order of magnitudes, the linear sum is very similar to taking the maximum value. ​ Gut feeling says that if the weight pushing works in this case it'll also work for n-gram and RNNLM and give a lower WER.
