Human activity recognition (HAR) using smartphone sensors utilize time-series, multivariate data to detect activities. Time-series data have inherent local dependency characteristics. Moreover, activities tend to be hierarchical and translation invariant in nature. Consequently, convolutional neural networks (convnet) exploit these characteristics, which make it appropriate in dealing with time-series sensor data. In this paper, we propose an architecture of convnets with sensor data gathered from smartphone sensors to recognize activities. Experiments show that increasing the number of convolutional layers increases performance, but the complexity of the derived features decreases with every additional layer. Moreover, preserving the information passed from layer to layer is more important, as opposed to blindly increasing the hyperparameters to improve performance. The convnet structure can also benefit from a wider filter size and lower pooling size setting. Lastly, we show that convnet outperforms all the other state-of-the-art techniques in HAR, especially SVM, which achieved the previous best result for the data set.