Tsr reduce the computation cost, Deriche extended the work from Canny an optimal edge detectors to the use of recursive filters. Nevertheless, this cost is still too high for real time implementation on FGGA circuits. Here. we optimized both algorithmic and architectural aspects of the original Deriche filter, A new organization of the filter is proposed at the 2D and 1D levels which reduces the memory size and the computation cost by a factor of two Sor both software and hardware implementations. We prove that the use of only 3 bits to code the scale parameter doesn't reduce the quality. It results from this choice that the first order recursive filter which is the basic block of the entire architecture can be built with only 4 adders. The architecture of a 10Mpixels/seconde filter on an unique FPGA is described.