The web has evolved from a technological platform to a real social milieu thereby be-coming a continuous source of Big Social Data (BSD). BSD is characterized by a combination of factual content such as the coordinates of a restaurant, the content of a webpage or the title of a movie, behavior data such as exchanges between social relationships, as well as subjective data such as users’ opinions, reviews, and tags. The goal of a social application is to analyze BSD and process it in order to understand it and transform it into valuable content to users. Building social applications requires an essential data preparation step during which raw BSD is sanitized, normalized, enriched, pruned, and transformed making it readily available for further processing. We argue for the need to formalize data preparation and develop appropriate tools to enable easy prototyping of social applications. We describe SOCLE, our framework for BSD preparation. We provide an architecture, the state of the art of existing languages and algebras for manipulating BSD, and the scientific challenges and opportunities underlying the development of SOCLE. © 2014 Lavoisier.