When getting hundreds of remote individuals to label thousands of images and texts everyday, quality becomes one of the most challenging aspects to maintain and guarantee. As an end-to-end service, we manage everything from the training and recruiting of the labellers for each client’s project. It is also up to us to ensure that we meet the quality standards that clients require to train their machines.
Accuracy is one of the most important metrics that we are measured by in the data labelling work that we do for our clients. Understandably so, as the quality of the data that is being used to train their machines will ultimately decide how well they perform upon public release.
However the use cases for computer vision applications are so varied that it is extremely difficult to implement a quality control (QC) system that is fully automated. Hence, a byproduct of Supahands’ growing list of clients and projects was the challenge of being able to deliver labelled data back to our clients by their respective deadlines that also met their accuracy standards.
This article takes you through the process of how we solved this by identifying individuals within our pool of independent contractors (SupaAgents) to carry out the QC work on our BizOps’ team’s behalf.
We first start with an introduction to our problem statement and goals of the experiments that we will be conducting over time in order to get an automated quality assurance process that takes the manual load off our BizOps team.
- Our manual QC is being done solely by the internal BizOps team and is becoming a bottleneck in our workflow as we scale. The goal is to hand over the QC work to a selected pool of “QC SupaAgents”.
- BizOps had to set measures or benchmarks to determine which SupaAgents are good enough to be a QC SupaAgent.
In order to dedicate the right amount of resources to the problem statements, we wanted to first understand why and how did we know if this was a real problem that is worth solving to begin with. We boiled them down to
- Coping with growth
- The number of projects that we will work on will keep increasing over time and constantly hiring more people into BizOps is not a scalable solution to the problem.
- Projects will also increase in size which means the sample sizes for manual QC checks will also keep getting larger.
- Why is manual QC still important?
- The AI/ML industry has yet to figure out what is the best way to approach data labeling due to the immense variety of specifications and use cases that are available.
- It’s normal for clients to change their standard operating procedures (SOPs) a number of times as they continue to optimize the development of their ML applications.
- We deal with new varieties of projects on a regular basis. While we do our best to pick up on as many edge cases as possible, these may not become apparent until the QC stage.
- Semi-automated solutions like inserting Ground Truth tasks into the working set can only work to a certain extent as data labellers become better at spotting them.
- Inevitable human-related errors will always occur
- Labeller fatigue takes place after they’ve been working on a project for an extended amount of time, thus leading to a drop in performance and accuracy.
- Certain classes (like dents on a vehicle) have been noticeably more difficult to label due to ambiguity and subjectivity.
- Current method of selecting QC SupaAgents are not scalable nor backed by data
- Familiarity with the quality of work that certain groups of SupaAgents have been known to provide
- QC a sample (or in some cases, all) of the work done by a SupaAgent to gauge their overall performance and eligibility to be a QC SupaAgent.
The manual nature of QC and selecting QC SupaAgents in total means that our BizOps team struggle to cope with the growing volume of incoming projects that Supahands has been winning over the past year. So, addressing the issue with an automated solution will contribute to higher productivity and lowers the risk of human error even from our BizOps Managers.
How will we know if we have solved this problem?
- We reduce the time that our BizOps team spends on reliably identifying the best QC SupaAgent candidates, while leveraging on combined data from our other QA/QC features such as Ground Truth tasks.
- We can rank all labellers on a project based on an overall project performance, taking into account results from QC SupaAgent groups and accuracy on Ground Truth tasks.
- We identify a higher benchmark or criteria to high quality labellers (and possible candidates for QC SupaAgents) from normal labellers.
- We come up with a solution that eliminates the ability of labellers to game the system and get selected as QC SupaAgents. It must also be robust, reliable and holistic.
The data science team will continue to test and experiment our different approaches towards the challenges outlined ahead and share our findings in subsequent articles. In the meantime, if you would like to connect with us to talk about how we keep finding new ways to improve the accuracy of the work that we deliver, connect with the team on our website.