Summary: | A precise knowledge of the crop distribution in the landscape is crucial for the agricultural sector to inform better management and logistics. Crop-type maps are often derived by the supervised classification of satellite imagery using machine learning models. The choice of data sampled during the data collection phase of building a classification model has a tremendous impact on a model's performance, and is usually collected via roadside surveys throughout the area of interest. However, the large spatial extent, and the varying accessibility to fields, often makes the acquisition of appropriate training data sets difficult. As such, in situ data are often collected on a best-effort basis, leading to inefficiencies, sub-optimal accuracies, and unnecessarily large sample sizes. This highlights the need for new more efficient tools to guide data collection. Here, we address three tasks that one commonly faces when planning to collect in situ data: which survey route to select among a set logistically feasible routes; which fields are the most relevant to collect along the chosen survey route; and how to best augment existing in situ data sets with additional observations. Our findings show that the normalised Moran's I index is a useful indicator for choosing the survey route, and that sequential exploration methods can identify the most important fields to survey on that route. The provided recommendations are flexible, overcome the main logistical constraints associated with in situ data collection, yield accurate results, and could be incorporated in a mobile application to assist data collection in real-time.
|