Currently human activity recognition (HAR) has increasing applications. In HAR, inertial sensors are widely used due to their advantages of low-cost, small size, and portability. In view of device complexity and power consumption, it is hopeful to use a single inertial sensor to realize the recognition of routine activities, such as walking, walking upstairs, or downstairs. Obviously, this is more difficult than using multiple sensors. In this study, we proposed a method combined convolutional neural network (CNN) with deep feedforward sequential memory networks (DFSMN), which is capable of modeling long-term dependency in temporal series, for HAR based on the dataset collected by a single inertial sensor. CNN was responsible for extracting data features, and then DFSMN performed the classification of activities based on these features. Two inertial sensors were placed on waist and thigh of human body, thus two independent datasets were collected separately. The proposed method was applied to these two datasets, and satisfied results were obtained. The overall recognition accuracy for the dataset of waist and thigh was 98.01% and 98.76%, respectively.