machine learning

Uncovering AI Tactics For Solving Real-Life Problems

In my previous post, I discussed the different methods for knowledge sharing in computer vision. Think of it as the building blocks for the knowledge required to understand more about industry-specific use cases.

In this post, I will present a few examples of “questions” people want machines to answer in real-life scenarios. We will then review some of the more complex scenarios you might tackle using deep learning in the industry today, and how to take complex cases and break them apart into simpler, and more efficient tasks.

Real Industry Use Cases: Using Different Tools

Goal Line Technology

The first example I want to focus on works perfectly for classification because it exemplifies simple missions that capture a scene. The question here is, “did the ball cross over the line?” The answer or classification should be in the form of a “Yes”, or a “No.”

The classification required in this example is binary; it’s either a “goal”, (yes) or “no goal” (no) for the entire image. This is simple for the model, simple for data collection, and simple for annotation.
Since most of the image is the actual signal (the entire ball is passing the line), the signal can be quite significant from the entire image. Keep in mind that there are classification missions with more than 2 labels.

Fruit Monitoring

In this example, we’ll focus on detection. Common agriculture use cases involve the detection of bugs, diseases, fruits, and plant growth stages. In many cases, the data is collected via drones or by vehicles on the ground and the easiest way to describe the mission to the machines is “mark a box around X.” As you can see in the example below, we detected around 40 objects, all from the class “apple.”

In the case of detection, there are multiple types of items companies may want to focus on, which impacts the type of detection tool they would select for their given use case.
The simpler and faster ones are bounding boxes and key points while a more complex example is the polygon, typically used for more intricate shapes or models that require more elaborate detection.

Manufacturing Quality Assurance

One of the first verticals to be impacted by the Industrial Revolution is Manufacturing. It began with the physical movement of objects and in later stages, fine motor skills. As computer vision began to stake its roots, many of the missions around manufacturing automation have gravitated towards automating the visual aspects of the operations. The most common use case is quality control. This is typically done via detection or classification, but today I want to focus on another classic example: semantic segmentation.

Practically speaking, if we’re detecting scratches on metal components (as is common in many manufacturing use cases), you need to provide precise information. Selecting the wrong tool for this task (like a bounding box) can generate a really bad SNR ratio. Think about it; of the 2000 pixels in the box detection, only 20 are the actual signal and the rest are just noise. Therefore, understanding the class of every pixel enables the model to learn the location of the scratch and the size of it with accuracy that allows us to understand the severity of the issue and later on determine how it should be treated (preferably automatically).

These nuances require pixel-level accuracy and might not be achieved with simple approaches like bounding boxes and even less so when using classifications. In this case, the question is “what pixels in the image represent which class?”

Complex Use Cases

As your projects evolve, the questions you need to answer become more complex and you’ll need to answer a combination of questions that address multiple types of tasks. For example, on a practical level, in some cases, a sequence of a detection model and a classification model will be easier to train than one detection model that has numerous classes.

Let’s look at a real-life example in the retail shelf management industry. Suppose you have to detect certain items on a shelf. The parent group would be to detect all products on the shelf, and the child would be to determine what the actual product is. By breaking down one complex use case such as detecting all of the products, you’ll need to detect and select all the relevant labels (which can be from a list of 100K labels). You’ll need to repeat this step for all objects in the image. To simplify the process, you could break it down into two smaller tasks, such as “mark a box around every product” (detection) or “classify every product” (classification via selecting from a list). This makes the process much simpler and far more cost-effective.

Of course, you’ll need to take the following into account:

  • Computation power
  • Time per predication
  • Cost of model convergence (getting to high accuracy)

Let’s zoom in on the last bullet. In the previous example, the main reason to break the pipeline into a 2-step solution is the annotation costs along with the number of samples you need for each class in the model. Other costs can rise due to “expensive knowledge” (requires certain expertise), so splitting into smaller tasks makes the process cheaper.

By letting experts do simpler tasks (classification or key point marking) and assigning “regular” annotators to the time-consuming tasks of polygons or semantic segmentation, you can significantly reduce the overall costs.

Here are 2 real-life examples:

  1. Doctors mark only the cancerous cells with a key point and an annotator will then mark a polygon only on these cancerous cells.
  2. Annotators mark the plants and the disease infestations using semantic segmentation while an agronomist will only classify the disease and its severity.


Our goal is to automate missions and teach machines how to solve questions. If we simplify the questions to make them easier to learn, we can then solve many cases with cheaper and better results.

If you break down the question to be as simple as possible while keeping the following format of goals:

  • “What category does this image fit on the known list?” (Classification)
  • “Detect all objects with class X (can be a list of classes) in the image.” (Detection)
  • “Identify the pixels of class X (can be a list of classes) in the image.” (Segmentation)

As you can see, the questions are generic and can fit every industry even though they have different challenges. My next post will focus on these challenges when going to production.

The last tip I can give is adopting a “fail fast” approach. What does this mean? You want to try to run quick tests in order to get quick results to test what works for you. You will avoid wasting time and money on the wrong questions and it will help you understand if the machine was able to pick up from the knowledge in the examples before you start investing a lot of resources into your project. If you’d like to learn more, you can read more about the matter in this article on how to start a project correctly.

Share this post

Share on facebook
Share on twitter
Share on linkedin

Related Articles