Consider the problem of finite-rate filtering of a discrete memoryless process {X-i}(igreater than or equal to1) based on its noisy observation sequence {Z(i)}(igreater than or equal to1), which is the output of a discrete memoryless channel (DMC) whose input is {X-i}(igreater than or equal to1). When the distribution of the pairs (X-i, Z(i)), P-X,P-Z, is known, and for a given distortion measure, the solution to this problem is well known to be given by classical rate-distortion theory upon the introduction of a modified distortion measure. In this work, we address the case where P-X,P-Z, rather than being completely specified, is only known to belong to some set Lambda. For a fixed encoding rate R, we look at the worst case, over all theta epsilon Lambda, of the difference between the expected distortion of a given scheme which is not allowed to depend on the active source theta epsilon Lambda and the value of the distortion-rate function at R corresponding to the noisy source theta. We study the minimum attainable value achievable by any scheme operating at rate R for this worst case quantity, denoted by D(Lambda, R). Linking this problem and that of source coding under several distortion measures, we prove a coding theorem for the latter problem and apply it to characterize D(Lambda, R) for the case where all members of Lambda share the same noisy marginal. For the case of a general Lambda, we obtain a single-letter characterization of D(Lambda, R) for the finite-alphabet case. This gives, in particular, a necessary and sufficient condition on the set Lambda for the existence of a coding scheme which is universally optimal for all members of Lambda and characterizes the approximation-estimation tradeoff for statistical modeling of noisy source coding problems. Finally, we obtain D(Lambda, R) in closed form for cases where Lambda consists of distributions on the (channel) input-output pair of a Bernoulli source corrupted by a binary-symmetric channel (BSC). In particular, for the case where Lambda consists of two sources: the all-zero source corrupted by a BSC with crossover probability r and the Bernoulli(r) source with a noise-free channel; we find that universality becomes increasingly hard with increasing rate.