We present VitaMon, a mobile sensing system that can measure the inter-heartbeat interval (IBI) from the facial video captured by a commodity smartphone's front camera. The continuous IBI measurement is used to compute heart rate variability (HRV), one of the most important markers of the autonomic nervous system (ANS) regulation. The underlying idea of VitaMon is that video recording of human face contains multiple cardiovascular pulse signals with different phase shift. Our measurement on 10 participants shows the significant time delay (36.79 ms) between the pulse signals measured at the jaw region and forehead region. VitaMon leverages deep neural network models to extract both spatial and temporal information of the video to reconstruct a pulse waveform signal that is optimized for estimating IBI. We evaluated VitaMon with a dataset collected from 30 participants under various conditions involving different light intensity levels and motion artifacts. With the 15 fps video input (66.67 ms time resolution), VitaMon can measure IBI with an average error of 14.26 ms and 21.65 ms using personal and general model respectively. HRV features including geometry Poincare plot, time- and frequency-domain features extracted from the IBI measurement all have high correlation with the reference signal.