In this paper, we briefly summarize our video surveillance research framework. We then survey current research on human activity recognition, and present our current work on real-time multi-person tracking. By applying adaptive background subtraction, foreground regions are first identified and segmented. A clustering algorithm is then used to group the foreground pixels in an unsupervised manner to estimate the image location of individual persons. A Kalman filter is used to keep track of each person and a unique label is assigned to each tracked individual. Based on this approach, people can enter and leave the scene at random. Abnormity, such as silhouette merging, is handled gracefully and individual persons can be tracked correctly after a group of people split. Experiments demonstrate the real-time performance and robustness of our system working in complex scenes.