Prognostics and health management (PHM) aims to offer comprehensive solutions for managing equipment health. Classifying the excavator operations plays an important role in measuring the lifetime, which is one of the tasks in PHM because the effect on the lifetime depends on the operations performed by the excavator. Several researchers have struggled with classifying the operations with either sensor or video data, but most of them have difficulties with the use of single modal data only, the surrounding environment, and the exclusive feature extraction for the data in different domains. In this paper, we propose a fusion network that classifies the excavator operations with multi-modal deep learning models. Trained are multiple classifiers with specific type of data, where feature extractors are reused to place at the front of the fusion network. The proposed fusion network combines a video-based model and a sensor-based model based on deep learning. To evaluate the performance of the proposed method, experiments are conducted with the data collected from real construction workplace. The proposed method yields the accuracy of 98.48% which is higher than conventional methods, and the multi-modal deep learning models can complement each other in terms of precision, recall, and F1-score.