With the goal of making the networks more agile, flexible and cost efficient, network function virtualization (NFV) is one of the latest and most promising technologies introduced. This technology is the key block of future internet and plays a critical role in modern networks. Using this paradigm, different network functions can be independently deployed on general purpose hardware instead of residing on dedicated hardware (network appliances in traditional networks). NFV enables faster development and lower costs. As any other new technology, NFV has some emerging challenges along with its benefits. The main research challenge of NFV is the necessity of optimally placing the virtual network functions (VNFs) in an NFV-enabled network. This problem is mainly known as the VNF placement (VNFP) problem in the literature. Due to the nature of network applications, this placement should be scalable for large network sizes. In this paper, the VNFP problem is addressed and a new metaheuristic algorithm is proposed based on the behavior of droplets on a surface, named the Drops on Surface Optimization (DSO) Algorithm. In order to enhance the proposed algorithm by dynamic selection strategy, a reinforcement learning approach (Q learning) is adopted. A comprehensive evaluation is carried out using standard networks, compared with several methods. Simulation results reveal that our proposed algorithm outperforms other state-of-the-art approaches in terms of total end-to-end propagation delay of the placed service chain, cost of placement, number of active servers involved in the placement and scalability.