The knowledge of spatial distributions of physical quantities, such as radio-frequency (RF) interference, pollution, geomagnetic field magnitude, temperature, humidity, audio, and light intensity, will foster the development of new context-aware applications. For example, knowing the distribution of RF interference might significantly improve cognitive radio systems [1], [2]. Similarly, knowing the spatial variations of the geomagnetic field could support autonomous navigation of robots (including drones) in factories and/or hazardous scenarios [3]. Other examples are related to the estimation of temperature gradients, detection of sources of RF signals, or percentages of certain chemical components. As a result, people could get personalized health-related information based on their exposure to sources of risks (e.g., chemical or pollution). We refer to these spatial distributions of physical quantities as spatial fields. All of the aforementioned examples have in common that learning the spatial fields requires a large number of sensors (agents) surveying the area [4], [5]. A common way to sense environmental variables is the deployment of dedicated wireless sensor networks (WSNs), which continues to stimulate fertile research activities in the scientific community. Typical WSN applications are oriented to sense specific physical quantities (e.g., temperature) in well-defined areas [6], [7]. Unfortunately WSNs are generally characterized by significant constraints in terms of deployment cost, energy limitation, and the need for maintenance. These constraints prevent them from becoming scalable and therefore from being the ultimate solution for automated and distributed sensing of the physical world. The expected pervasive diffusion of Internet of Things (IoT) devices (fixed and mobile) opens up a unique opportunity for a wide and massive sensing and mapping (i.e., georeferencing of physical quantities). In fact, the IoT constitutes a paradigm where a multitude of heterogeneous devices is able to sense the environment, process data, and actuate, thus creating the necessary infrastructure for cyberphysical systems. This infrastructure galvanizes technologies such as smart grids, smart homes, smart cities, and intelligent transportation [8], [9]. From the extensive variety of applications of the IoT, we are interested in those that will benefit from having a spatial coverage of a wide area due to a large number of agents navigating through it. For instance, this can be the case when devices are carried by people or autonomous agents (e.g., vehicles, robots, or drones) moving in outdoor-and indoor-populated environments like malls, stadiums, or crowded buildings. One can even imagine cities at large, if one considers much larger settings in size. Thanks to the widespread diffusion of IoT devices with heterogeneous sensors, the estimation of spatial physical fields is creating a new trend for next-generation sensor networks, referred to as mobile crowdsensing networks [10]-[13]. This is basically a zero-effort approach to automatically collect and process data. Recently, as an example, this concept has been proposed for zero-effort automatic mapping of environmental features using sensors already embedded in smartphones, such as magnetometers and Wi-Fi [14]-[17]. In such settings, the contribution of the agent to the sensing process is as simple as carrying the personal device in a pocket while the individual is moving around. Individuals are not even requested to be participatory, as the sensing process could run in the background during the normal operation of the device. In other words, agents are aware of the background sensing process, but they are not participatory in the sense that they are not requested to follow particular paths to make the learning process more effective. Thus, the sensing process is not an exclusive task, and it arises from the dynamic reality of humans or autonomous agents. The sensing process is a result of piggybacking on the capabilities of today's and future wireless personal devices. Including data generated by these devices will dramatically increase the amount of data for sensing and mapping purposes, with obvious benefits in terms of the resulting accuracy. In this context, the IoT is the technological enabler for crowdsensing and learning of spatial fields. Interestingly, IoT devices are, in general, able to communicate among themselves, either directly or through a fusion node that can potentially be in the cloud. Thanks to communication capabilities, empirical data gathered by mobile agents (the crowd) can be collected and processed by learning algorithms located in the cloud. These algorithms exploit the correspondence between the position and the value of the physical quantity measured in that position to estimate the spatial field. As a consequence, positioning and spatial field estimation are intimately intertwined, as will be illustrated in the "Sensing and Positioning" section. Crowdsourcing-based learning methods rely on the experience gained by previous agents. In principle, it is possible that, with crowd-based learning, one can perform optimal information fusion [10]. On the other hand, moving from the well-controlled conditions of WSN scenarios, where nodes are deployed in ad hoc known locations, to crowdsensing settings, where agents move around in an uncontrolled manner, entails a number of issues that need to be addressed. The methods rely on sharing through cloud mechanisms [18], but they can be of practical relevance in IoT applications only if their computational and memory requirements do not grow with the amount of collected data. Therefore, novel methodologies for multisensor data fusion and information processing are needed. They should guarantee efficient statistical representation of spatial fields and a computational complexity that does not depend on the number of measurements. Further, the algorithms need to be robust against irregular positioning and measurement errors.