It is cooperation that essentially differentiates multi-agent systems (MASs) from single-agent intelligence. In realistic MAS applications such as RoboCup, repeated work has shown that traditional machine learning (ML) approaches have difficulty mapping directly from cooperative, behaviours to actuator outputs. To overcome this problem, vertical layered architectures are commonly used to break cooperation down into behavioural layers: NIL has then been used to generate different low-level skills, and a planning mechanism added to create high-level cooperation. We propose, a novel method called Policy Search Planning (PSP); in which Policy Search is used to find all optimal policy for selecting plans from a plan pool. PSP extends an existing gradient-search method (GPOMDP) to a MAS domain. We demonstrate how PSP call be used in RoboCup Simulation, and our experimental results reveal robustness, adaptivity, and over other methods.