Automatic scene detection is a fundamental step for efficient video searching and browsing. This paper presents our current work on scene detection that integrates three effective strategies into a single framework. For each video, firstly, a coherence signal is constructed by graph modal obtained from the similarity matrix in a temporal interval. Secondly, the signal is optimized by scene transition graph (STG) analysis and audio classification, in which scene clues hidden in multimedia are discovered from the video. Finally, the scene boundaries are identified by window function. In experiments, we compare the proposed scene detection method with three typical algorithms on teleplay and movies, and the results of our method, yielding an average 0.85 F-measure, is the best one.