Non-orthogonal multiple access schemes (NOMA), such as sparse code multiple access (SCMA), are among the most promising technologies to support massive numbers of connected devices. Still, to minimize the transmission delay and to maximize the utilization of the transmission channel, "grant-free" NOMA techniques are required that eliminate any prior information exchange between the users and the base-stations. However, if a large number of users transmit simultaneously in an "unsupervised" manner, (i.e., without any prior signaling for controlling the number of users and the corresponding transmission patterns), it is likely that a large number of users may share the same frequency-resource element, rendering the corresponding user detection impractical. In this context, we present a new multi-user detection approach, which aims to maximize the detection performance, with respect to given processing and latency limitations. We show that our approach enables practical detection for grant-free SCMA schemes that support hundreds of interfering users, with a complexity that is up to two orders of magnitude less than that of conventional detection approaches.