Valiant's protocol for learning is extended to the case where the distribution of the examples is known to the learner. Namely, the notion of a concept class C being learnable with respect to distribution D is defined and the learnable pairs (C, D) of concept classes C and distributions D are characterized. Another notion is the existence of a finite cover for C with respect to D. The main result is that C is learnable with respect to D if and only if C is finitely coverable with respect to D. The size of the cover is then related to the Vapnik-Chervonenkis dimension. An additional property of the learning method is robustness, i.e., learning succeeds even if part of the input is erroneous. It is also shown that if D is discrete then every concept class is learnable with respect to D. The main concern of the paper is the number of examples sufficient to probabilistically identify (or approximate) a concept-not the time needed to compute it. Indeed, in some cases the function which associates a sample with a hypothesis is undecidable, and even if it is computable, the computation may be infeasible. The computational complexity of the algorithms used for learning are considered only for discrete distributions.