Context: Multi-core architectures are becoming increasingly ubiquitous and software professionals are seeking to leverage the capabilities of distributed-memory architectures. The process of parallelizing software applications can be very tedious and error-prone, in particular the task of data decomposition. Empirical studies investigating the complexity of data decomposition and communication are lacking. Objective: Our objective is threefold: (i) to gain an empirical-based understanding of the task of data decomposition as part of the parallelization of software applications; (ii) to identify key requirements for tools to assist developers in this task, and (iii) assess the current state-of-the-art. Methods: Our empirical investigation employed a multi-method approach, using an interview study, participant-observer case study, focus group study, and a sample survey. The empirical investigation involved collaborations with three industry partners: IBM's High Performance Computing Center, the Irish Centre for High-End Computing (ICHEC), and JBA Consulting. Results: This article presents data decomposition as one of the most prevalent tasks of parallelizing applications for multi-core architectures. Based on our studies, we identify ten key requirements for tool support to help HPC developers in this area. Our evaluation of the state-of-the-art shows that none of the extant tool support implements all 10 requirements. Conclusion: While there is a considerable body of research in the area of HPC, a few empirical studies exist which explicitly focus on the challenges faced by practitioners in this area; this research aims to address this gap. The empirical studies in this article provide insights that may help researchers and tool vendors to better understand the needs of parallel programmers. (C) 2016 Elsevier Inc. All rights reserved.