One of the most common questions I get asked about my job is “who are your customers?”, and “which kind of companies are implementing AI into their operations?”
How crazy is it when I answer, “which are not?”
Working as a Customer Success Manager at Dataloop AI, I’ve noticed how varied and diversified these companies are, and how they put so much effort into developing computer vision and other ML models.
In the past, we’ve thought that achieving great success by implementing AI belonged solely to the technology giants like Google, Amazon, Facebook, Apple, etc… but as we’re hearing all around us, artificial intelligence technology is gaining a lot of traction lately, and it’s relevant to the numerous industries such as e-commerce, retail, agri-tech, automotive, manufacturing, healthcare, and more.
According to an Algorithmia report of 2021, 83% of organizations have increased their AI or ML budgets year-over-year. It is imporatnt to note here how much of a budget increase is this. According to IDC Worldwide Artificial Intelligence Spending Guide, they forecast that the global spending on AI systems will jump from $85.3 billion in 2021, to more than $204 billion in 2025. These numbers are not surprising when you consider that ML models can generalize and perform many complex tasks.
A huge challenge for me as the Customer Success Manager, is to address each industry and identify my clients challenges as well as advise them of their options. During this process I have identified two interesting points:
1. All teams that are preparing data for ML models training, face these same 3 main challenges:
Challenge #1: How can you scale up fast,efficiently, and without compromising the data quality?
Quality AI training data creates more successful models. Mature AI organizations report that quality (and diverse) training data is the most important contributor to the success of their AI strategies.
Challenge #2: How can you manage more data in a simple way and without exploding cloud costs?
Big data is everywhere and the data sets are only getting bigger and more challenging to work with. With this in mind, tools that enable visualizing the information, showing your workflow, and recording the metadata are facilitating this process dramatically.
Challenge #3: How can you work with standard patterns vs customized flows?
A single one-size-fits-all data strategy won’t work for such diverse use cases, when every business in the industry has their own special requirements.
In this situation, one approach that worked well for one business, won’t necessarily be suitable for the other.
2. Enabling flexible and customizable work is more important than ever.
In order to adopt a data-centric successful AI strategy, every business strives to create the data fly-wheel effect, which essentially translates into a unified flow that connects tools, people, and processes.
One of the biggest challenges companies are facing these days is the integration of an external labeling workforce – managing the data and the workforce, together with annotating the data in a suitable way for their specific use case and the domain knowledge they require.
This process requires a ton of customization work: training annotators for the specific use case, data conversions, and integrating automations into the manual pipeline using models.
Furthermore, any of the customizations described above will require additional adjustments for every project, per every business problem.
Using developer friendly, highly customizable tools that help you efficiently integrate and adjust your workflow, your data and your domain knowledge into your work environment, are a key part of making the data preparation as easy and as efficient as possible, on your path for building a market leading AI product.
The combination of these factors creates one of my favorite challenges at my job: providing different solutions to similar problems in a variety of ways, each suited to the given client and their specific use case.
From my side, when a client is asking for a specific feature or capability, first we will discuss it together and understand exactly what the client is seeking, what is the desired outcome, and what will be the best way for them to get there.
In some cases, providing further platform training and relevant best practices will be enough to meet their needs. In other cases, when the existing capabilities aren’t sufficient, I recommend the clients put a bit more effort into customizing their work environment and adapt their data flow using our python SDK, to enable tackling their challenges more accurately.
Here are a few examples of customizations that I have implemented with our clients, which help them scale up while decreasing the time invested in each item, without compromising the data quality, and even improving it.
Use Case #1: How to improve data quality by using workflow customization.
In this use case, I met with an agritech startup that wanted to scale up their operations and keep high annotation quality, while facing limited budget and lack of manpower.
During our discussion, they asked a common question: how can we keep high data quality, without manually reviewing all the data items in the QA phase?
The short answer to this question is performing QA on a sample, instead of manually reviewing all the items that were labeled.
But then, the next question was: what about all the items that we didn’t check? How can we know if there are any glitches that we are missing there?
For small companies that don’t necessarily have a large amount of data, missing data quality deficiencies can be a huge problem.
Accordingly, we designed together a customized data quality workflow, which enables us to recognize the potential weak areas using QA samples that are suitable to the customer’s resources.
Then, according to a quality threshold that was pre-defined by the customer, we can decide where to dive deeper, in order to review all the items in the tasks and verify that we are not missing any critical issues.
To implement it, we used the Dataloop pipeline tool:
We trigger the pipeline by event, for any uploaded item on a chosen dataset.
Then, using the “pre-preparation” FaaS, we mark 10% of the items per task as the items that will go to the QA sampling task later on.
During this step we also keep this information on each item metadata, to be able to track if it went through the QA phase and when.
** This is a great place to mention the fact that we will releasing a new smart sampling capabilities soon, designed specifically for data QC flows, so stay tuned!
Part 2: When completed the first part, an annotation task will then be opened and the items will be allocated in the annotation phase.
After that, the sample QA task will be opened automatically, containing 10% of the items per task.
Part 3: After completing the QA task, the “QA threshold validation” FaaS will run and check if the QA task succeeded or failed (according to the pre-defined threshold).
If it succeeded, all the items from the original task will be marked with a “QA done” status.
If it fails, the original task will be re-opened for a full review.
Using this method, the customer could easily design his own automated data workflow that weaves together humans and machines, focuses on quality, and reduces costs.
Use Case #2: How to reduce the effort invested per item, by integrating a customized, ongoing automation validation.
In this case, I was asked by a retail company to figure out how we can decrease the annotation time invested per item, while keeping up with their quality standards.
My approach when tackling this kind of challenge, is to examine what the right way will be to integrate automation capabilities into the client’s workflow.
In this use case, the client was annotating short clips of people shopping in the supermarket.
In these videos, the annotators were required to mark the interaction of the client with the desired products on the supermarket’s shelves.
In each interaction, they needed to capture the moment in which the client approached the shelf, and touched the product, and identify the recognized interaction such as: picking up, putting down, or nothing.
In order to do this, we marked the person with BB and link it to a point which marked the interaction with the shelf itself, as you can see in the following example:
In this flow, we had a few rules we wanted the annotators to follow:
1- Any BB (person) annotation must be linked to the corresponded point (shelf interaction) annotation.
2- Any point annotation must have a single attributes (pick / put / nothing).
3- Point annotation can’t be linked to more than one BB.
In order to save time that could rather be invested in verifying these rules, and improving the learning curve of the annotators, we integrated JS annotation validation into the annotation phase.
A few words about the JS annotation validation feature:
At this moment, the script will prevent the annotator from continuing, as long as he violates the rules.
In addition, it will open an issue on the faulty annotation and provide the annotator with the relevant feedback regarding what he just missed.
Then, the annotator will be able to fix it immediately, and improve himself ahead of the next item.
Any client can build his own JS, and enforce any kind of restrictions in real-time, which is faster and more efficient than providing this feedback retroactively and waiting for the fixing.
You can see here how it works, when an annotator misses the linkage:
In this case, by using customized automation, we were able to reduce the manual QA effort dramatically, improving the annotator learning curve during the annotation phase itself. Additionally, we were able to save resources while improving the quality.
Use Case #3: How to adjust the annotation studio functionality to the desired workflow, by adding a customized UI button.
No matter how much research and comparison you do before choosing your data tool, it will will probably never contain all the exact capabilities that you want implemented in the exact way that you haed envisioned them.
As I mentioned before, sometimes as a CSM this is exactly where I can help to examine the challenge from a different angle and use the tool’s best practices in order to deal with it in the most ideal manner. In some cases, the existing solution may not be adequate.
That’s what I felt when one of my retail customers consulted me on how to mark multiple items on the shelf as different groups by different identifiers.
To enable this in the most suitable way for the client’s annotation flow, we used the Dataloop’s FaaS:
A few words about FaaS (Function As A Service):
FaaS enables users to deploy serverless functions as services that can be called using API, with access to computing resources and data from the Dataloop system.
In this case, the client created his own FaaS, triggered by a UI button that we added to the annotation studio interface.
Then, during the annotation flow, when the annotator want to mark a few products (a few cereal boxes in this example) as a group and would like to set those annotations with the same objectID, he just needs to mark them all and click the “group annotation” UI button we added in the right panel, that will trigger the function, and will adjust the annotations objectID as needed.
See how it works:
Thanks to the FaaS capabilities, we enabled our clients to customize their work environment as needed.
Using Python code which you can easily create by yourself, with outstanding flexibility, increase the platform’s capabilities, and essentially allow the automation of many processes.
In conclusion, I think that the great thing about being a CSM at Dataloop AI, is my job never boring. I’m lucky to learn new things and tackle new clients’ scenarios every day.
During the time I’ve spent here, I’ve had the opportunity to gain a wide understanding of the computer vision ML challenges, that along with being a Dataloop product expert, allowing me to help customers find the right balance between utilizing Dataloop’s existing features and creating their own customization to achieve the desired progress.
We are here to help you do the same! 🙂