In this paper, we summarize our works for cross-media retrieval where the queries and retrieval content are of different media types. We study cross-media retrieval in the context of two applications, i.e., image retrieval by textual queries, and sentence retrieval by visual queries, two popular applications in multimedia retrieval. For image retrieval by textual queries, we propose text2image which converts computing cross-media relevance between images and textual queries to comparing the visual similarity among images. We also propose cross-media relevance fusion, a conceptual framework that combines multiple cross-media relevance estimators. These two techniques have resulted in a winning entry in the Microsoft Image Retrieval Challenge at ACM MM 2015. For sentence retrieval by visual queries, we propose to compute cross-media relevance in a visual space exclusively. We contribute Word2VisualVec, a deep neural network architecture that learns to predict a visual feature representation from textual input. With proposed Word2VisualVec model, we won the Video to Text Description task at TRECVID 2016.