Air pollution is becoming a rising and serious environmental problem, mainly as a result from the migrations in urban areas. By employing effective air pollution monitoring systems, the pollution could be closely monitored, but this is not enough to make a significant impact in decreasing the pollution. The most effective value obtained from these systems is the amount of data that can be used to build pollution prediction models. To date, there are lot of different attempts to tackle the problem of air pollution prediction, but there is no evidence of their successful implementation in decreasing air pollution. In the last years, with the recent advances of deep learning techniques, and the increasing amount of data available, there are lot of proposed models for tackling the problem. In this research paper, we propose two different attention based models for air pollution prediction. Our models differ from all previous proposed models by introducing different attention factors for the previous timesteps when making a prediction. The model learns the attention factors, allowing it to learn the optimal amount that previous timesteps affect the current prediction. Using this approach we could better learn the patterns and dependencies in the data and in turn build better prediction models. We show that our models outperform two state-of-the-art models by employing our novel architecture.