Reconfigurable intelligent metasurface has drawn much attention owing to its potential for supporting ambient communication and EM imaging and sensing services. All these various indoor services based on the manipulation of indoor EM field distribution desire for a fast-online coding optimization algorithm to focus the energy in a complex environment. In this paper, we present a machine vision assisted online array state code optimization algorithm, which obtains the array state code ground on the acquisition of the receiver's antenna 3d position with a binocular camera for real-time energy focusing. The measured data show that in a complex EM environment, we can get 10-20dB power enhancement at any target point, to avoid the influences of path loss, fast fading, and lognormal shadowing.