Traditional Von Neumann computing architectures are struggling to keep up with the rapidly growing demand for scale, performance, power-efficiency and memory capacity. One promising approach to this challenge is Remote Memory, in which the memory is over RDMA fabric [1]. We enhance the remote memory architecture with Near Memory Processing (NMP), a capability that offloads particular compute tasks from the client to the server side as illustrated in Figure 1. Similar motivation drove IBM to offload object processing to their remote KV storage [2]. [GRAPHICS] . NMP offload adds latency and server resource costs, therefore, it should only be used when the offload value is substantial, specifically, to save: network bandwidth (e.g. Filter/Aggregate), round trip time (e.g. tree Lookup) and/or distributed locks (e.g. Append to a shared journal).