This is the third post in our “Precision Agriculture series.” In this part, we’ll address the second challenge of data labeling: data quality. Be sure to stay tuned for our fourth post addressing the third challenge in precision agriculture: financial obstacles.
According to Forrester, one of the top challenges of implementing successful AI systems in enterprises is Data Quality (DQ). As stated by Forrester analyst Michele Goetz, “businesses lack a clear understanding of data needed for ML models,” and therefore struggle with data preparation. The problem is that AI models are only as good as the data they’re built on which is a real challenge with the fast-paced AI market, and the various industries trying to leverage AI technology.
To complicate matters, even more, AI systems only work well on past data, therefore, systems need constantly updated data for development.
How to Ensure Consistent Quality with Agriculture Datasets
Crop monitoring is one of the biggest aids that ML can deliver to busy, overstretched farmers and agronomists. High-quality data allows agrotechnicians to build apps that deliver full visibility into every stage of crop growth, from measuring crop emergence to observing hydration and nutrient levels to tracking when produce is ready for harvest, across potentially hundreds of geographically distant fields.
Agrotechnicians need to consistently tag and label massive datasets of plant images on a pixel level, so that these tasks can be carried out successfully. Dataset quality has to be high across two types of data: subjective data, and objective data. Subjective data includes labels that don’t have a single definitive “truth,” while objective data labels do have a measurably correct answer.
Subjective data labeling revolves around how to define the label when there’s no single source of truth. We have often seen how the labeler’s domain expertise, geography, language, and cultural associations can influence the way that they interpret the data before them.
For example, a farmer requests a mobile app that continuously tracks the condition of their crops and triggers an alert whenever there is an early sign of something atypical in their fields, and at specific points along with the growth and ripening of their produce.
The subjective data challenge is deciding whether a specific plum is ripe or not, or if the current rate of growth is normal for this time of year. The agronomist building the app needs a huge amount of data to successfully train the models so that the app can distinguish ripeness for each type of fruit.
Since there’s no single “correct” answer for subjective data, the data ops team needs to set clear instructions to guide how the workforce understands each data point at every stage of their life cycle and through all the seasons.
There is always a single correct answer for objective data, unlike for subjective data. But that doesn’t mean that there are no challenges in store. The labeler needs to have a high enough level of domain knowledge to be able to answer each question correctly. Sometimes, too, there’s more than one “correct answer” to a question, but you only want one specific answer.
To continue our example of a technician building the crop monitoring app, the app has to work out how to define a diseased leaf as opposed to a healthy leaf or an infested leaf. A leaf could be showing signs of more than one disease, or it could appear to be infested and also suffering from a lack of water. It may not always be easy for a data labeling team to make the right call, so they need clear and detailed directions to know how to label each item correctly.
Bear in mind that for both subjective and objective data, it’s almost impossible to completely rule out human error, even with the very best dataset quality verification system. That means that data science teams always need a closed-loop feedback process to check for errors in both subjective and objective quality issues.
A farmer’s cost of error for differentiating ripe fruits can be a small loss, however, not detecting disease in their crops, can have catastrophic consequences. Also, not ensuring that you have the highest quality of data can result in critical implications on your business.
The good news is that there is technology to help the agritech industry to detect and prevent any prevalent problems or even disasters, thereby saving you time and money.
Further reading: The “Data Loop”
If you’ve ever found yourself questioning your data quality, or if your business has undergone an avoidable financial loss, we’re here to help. If you’d like to speak to an expert and learn more, then click here.
Be sure to stay tuned for our fourth post addressing the third challenge in precision agriculture: financial obstacles.