You have a model that works well, but not perfectly– yet. Maybe you’ve reached 90%, but obviously, when it goes to production, it’s not going to work nearly as well – these are the rules of nature as your model is expected to encounter new data distribution and edge cases as it travels in the real world. So, the question is, what can you do about this?
What Are Labeling Pipelines?
Labeling pipelines are the data pipelines you use for labeling. 99% of AI projects do not use labeling pipelines as it comes with a great deal of data science overhead that proves difficult to receive priority by the project data scientist and machine learning engineers.
Deep-learning models are strong when it comes to narrow problems, the narrower the problem, the easier it is for a model to get into high accuracy while solving it. Labeling pipelines take the complete problem your models aim to solve for your customer and break it into even narrower micro-tasks and micro-models. The concept is very similar to micro-services, as it is applied to data modeling.
Why Are Labeling Pipelines Different Than Production Pipelines?
Every data-centric product has to balance speed, cost, and quality in the following manner:
Data-centric businesses, like every other business, are looking for the sweet spot of quality, time, and price for their customers and market. Machine learning production pipelines will always strive to model the data in a way that is very efficient for computing a cost-saving and low-latency serving while keeping accuracy in place. This means production models attempt to be as general as possible, which in turn makes them handle many categories, cases, and data types all at once for saving the high cloud costs. These considerations are changing dramatically when we look at the backend labeling pipelines due to the following factors:
Processing Costs:
Production processing costs are based on machines (cloud CPUs/GPUs, TPUs) while labeling is based on human labor time. For simple image classification, 1 hour of human processing data is between 10,000 to 100,000 more expensive than machine processing. So having 10x more models and servers assisting humans and reducing the labor makes for good economics.
Processing Time:
Your customers are very sensitive to time, whether it’s retailing, agriculture, autonomous vehicles, or medical applications. Therefore, the results should be in real-time. However, labeling pipelines have much more time to process the data, this means that there is additional time to run more computing and on even bigger and slower models (which tends to be more accurate).
Results Accuracy:
By default, training datasets will always be more strict than their production output and production output will always be less accurate than the training sets. To improve the model accuracy, it is broken down into micro tasks for labelers who are assigned tasks in their scope of expertise. The micro-tasks help ensure higher accuracy at a faster rate so any issues are easier to isolate to the given micro-task. This is a very similar concept to the modern assembly line pioneered by Henry Ford: what works for cars’ assembly, works for data as well.
Extracting Value From Labeling Pipelines
Imagine you need to label all car models accurately using a bounding box in the image below.
Standard labeling work will work on this image at the same time while production models are also trying to capture as much of the information at once. Let us see what happens when we break these tasks into microtasks. We define the following pipeline:
In this flow we have broken the single human task into two:
- Identify missed cars by the model
- Label car model accurately
And we’ve added a single machine task, label the cars using out-of-the-box Yolo.
This is how Yolo V5 will label this image:
Yolo misses many of the cars, which is a bit surprising since Yolo has pretty good car detection capabilities. Yolo is having a hard time since this is a large image with small cars, but, if we break the big image into smaller pieces Yolo (and any other model) will do much better. Below is how Yolo handles only part of the image:
Much better, not a single object is missed (we can ignore the bad classes, why Yolo “sees” here cell phones is a matter for another post).
We upgrade our pipelines now to the following:
What did we gain?
- Our model works much better on smaller parts and helps us save more work, this comes with the expense of running it a few dozens of times, something we can afford if it saves human labor.
- A labor task is given with very specific and simple instructions, marking missing cars.
Besides simplifying the task, we also got super important information as sub-product labels of cars totally missed by our model, these are very important for future training.
- We isolated the accurate labeling action of the car model so that the label is more clear and faster. The data labeler who needs to draw a proper bounding box will spend most of his time zooming in and out in the image. We saved that time as well.
- We kept the domain expert knowledge only to our car expert data worker, it’s very hard to identify the model unless you are a great car expert.
The Dataloop platform allows you to start building labeling pipelines and leap your labeling efficiency towards production, with significant automation features: from model integration, anomalies detections, labeling assistance, and category auto-label.”
Below are some examples.
Dataloop Automation Capabilities
AI Annotation Assistant: Accelerate the annotation process with our set of AI-powered magic tools, which automatically convert 4 single points into multi-vertices polygons.
Model Predictions: Run your models on datasets to generate annotation predictions. Pre-labeled items then get audited by annotators, increasing productivity significantly.
Annotation Conversion: Swiftly switch between labeling tools and methods, reaching output faster with fewer clicks per object. For example, convert a simple polygon into a full semantic mask, instead of annotating pixel by pixel.
Smart Object Tracking: With our optical flow and smart tracking models, automatically duplicate annotations between video frames and sequenced images, to detect and follow unlimited objects.
How does the model work?
There are 2 parts in the script that include Dataloop within the entire script. The builder to build the annotations, and upload the annotations. You just create whatever you need, and upload the annotations. Here’s a mock model using open CV, a step by step on how to use the model:
# load the YOLO object detector trained on COCO dataset (80 classes)
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
for image in images:
image_path = os.path.join(images_dir, image)
item = dataset.items.upload(local_path=image_path)
assert isinstance(item, dl.Item)
builder = item.annotations.builder()
# load image and grab height and width
image = cv2.imread(image_path)
(H, W) = image.shape[:2]
# determine only the *output* layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# construct a blob from the input image and then perform a forward
# pass of the YOLO object detector, giving us our bounding boxes and
# associated probabilities
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)
# initialize lists of detected bounding boxes, confidences, and
# class IDs, respectively
boxes = list()
confidences = list()
classIDs = list()
# loop over each of the layer outputs
for output in layerOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence (i.e., probability) of
# the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
# filter out weak predictions by ensuring the detected
# probability is greater than the minimum probability
if confidence > confidence_rate:
# scale the bounding box coordinates back relative to the
# size of the image, keeping in mind that YOLO actually
# returns the center (x, y)-coordinates of the bounding
# box followed by the boxes' width and height
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
# get coordinates to create dtlpy annotation
top = int(centerY - (height / 2))
bottom = top + height
left = int(centerX - (width / 2))
right = left + width
# add annotation create annotation
builder.add(
annotation_definition=dl.Box(top=top,
right=right,
left=left,
bottom=bottom,
label=LABELS[classID]))
# upload annotations to item
builder.upload()
How do you know your models are performing well?
By continuously comparing model output with human annotators you’re able your model has been fed all the possible scenarios. You’re also able to ensure your model is on par with your annotation dataset. At this point, you need to check the model as to whether it requires more data, and which data. You can run filter annotations by model name, or by the confidence level. This will allow you to
better understand your model, and allow you to modify your training set and your data acquisition process to improve your model performance.
Labeling Pipelines
We spoke about the process of testing your model in the lab and how it naturally falls short when it’s in production, and the question was how do you ultimately solve this? The key and the solution to scaling can actually be quite simple. The biggest issue is knowing where your model is falling down. With auto-annotations, this will automate 85% of the process, then all you need to do is just fix where your model is falling down, feed that back into the model, and presto, you’re done. This will allow you to automate based on the model confidence. You can also create your own pipelines, we even have built-in procedures which will create an easier flow.
If you’d like to learn more or speak to an expert, you can schedule a 1:1 personalized demo.