We present a novel method for fusing the results of multiple semantic video indexing algorithms that use different types of feature descriptors and different classification methods. This method, called Context-Dependent Fusion (CDF), is motivated by the fact that the relative performance of different semantic indexing methods can vary significantly depending on the video type, context information, and the high-level concept of the video segment to be labeled. The training part of CDF has two main components: context extraction and algorithm fusion. In context extraction, the low-level audio-visual descriptors used by the different-classification algorithms are combined and used to partition the descriptors space into groups of similar video shots, or contexts. The algorithm fusion component identifies a subset of classification algorithms (local experts) for each context based on their relative performance within the context. Results on the TRECVID-2002 data collections show that the proposed method can identify meaningful and coherent clusters and that different labeling algorithms can be identified for-the different contexts. Our initial experiments have indicated that the context-dependent fusion outperforms the individual algorithms. We also show that using simple visual descriptors and a simple K-NN classifier, the CDF approach provides results that are comparable to the state-of-the-art methods in semantic indexing.