With the increasing number of online shopping services, the number of users and the quantity of visual and textual information on the Internet, there is a pressing need for intelligent recommendation systems that analyze the user's behavior amongst multiple domains and help them to find the desirable information without the burden of search. However, there is little research that has been done on complex recommendation scenarios that involve knowledge transfer across multiple domains. This problem is especially challenging when the involved data sources are complex in terms of the limitations on the quantity and quality of data that can be crawled. The goal of this paper is studying the connection between visual and textual inputs for better analysis of a certain domain, and to examine the possibility of knowledge transfer from complex domains for the purpose of efficient recommendations. The methods employed to achieve this study include both design of architecture and algorithms using deep learning technologies to analyze the effect of deep pixel-wise semantic segmentation and text integration on the quality of recommendations. We plan to develop a practical testing environment in a fashion domain.