Bayesian Networks are probabilistic models of data that are useful to answer probabilistic queries. Existing algorithms use either local measures of deviation from independence or global likelihood measures. They are based on probabilistic correlation, so the directionality of the model lacks the causal meaning as we expected. We tackle this problem from a new perspective using causality, which is a more fundamental measure than correlation. Integrating both the global and local views of causal inference, the proposed computationally efficient algorithm learns a high quality Bayesian network without using any score-based searching. Given a partial directed acyclic graph, causal pairs with the highest accuracy are inferred with the fewest number of pairwise causal inferences. Specifically, with discrete data, the chi(2) statistical test is used to identify the most dependent and possible causal pairs. Furthermore, the learned causality is forward-propagated. Experiments on handwriting data show that, besides the ability of causal inference, our algorithm performs better than two previous algorithms, one based on branch-and-bound search, and the other a greedy algorithm using chi(2) tests and a log-loss function. The learned structure not only has lowest loss in representing the data, but also reveals underlying causal relationships which are useful for scientific discovery.