Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of the bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.
|Title of host publication||AAAI 2020 - 34th AAAI Conference on Artificial Intelligence|
|Number of pages||8|
|Publication status||Published - 2020|
|Event||34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States|
Duration: 2020 Feb 7 → 2020 Feb 12
|Name||AAAI 2020 - 34th AAAI Conference on Artificial Intelligence|
|Conference||34th AAAI Conference on Artificial Intelligence, AAAI 2020|
|Period||20/2/7 → 20/2/12|
Bibliographical noteFunding Information:
Acknowledgments. This work was conducted at High-Speed Vehicle Research Center of KAIST with the support of the Defense Acquisition Program Administration and the Agency for Defense Development under Contract UD170018CD.
© 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence