Ground Truth: One of the Answers to Scaling with Quality

The Importance of Quality Data

The relationship between data quality and Artificial Intelligence is symbiotic. As the golden rule in AI goes, “garbage in, garbage out”.

So, the quality of data has a direct impact on the effectiveness of Machine Learning models. In other words, a large quantity of high-quality data is a prerequisite to meaningful and actionable insights in real-world conditions.

Quality at scale = High volume of work + Quality

So how do we produce quality work at scale?

First, some context about what we do.

Supahands distribute microtasks to a pool of remote workers — also known as our SupaAgents — across Southeast Asia. These tasks are sent back to our clients who use it to train their machine learning algorithms. So quality needs to be present at every step of the SupaAgent’s journey.

For example, even before they can qualify and work on projects, they need to pass assessments and tutorials crafted uniquely for each project’s requirements. This process is key to our ability to deliver quality work, which we will talk about in a future article.

Ensuring quality at scale

Quantity is just half the problem when it comes to preparing data on a large scale.Guaranteeing that a high volume of data is equally high in quality is a massive challenge too.

How might we ensure a high volume of work that consistently meets — or even, exceeds — industry-standard quality?

Traditionally, a lot of quality control is done by manually checking through the completed work. At Supahands, our Business Operations team (BizOps) handles this, many times with pain-staking resolute.

As the business grew over the course of 2019, manual quality control became highly inefficient. Because we couldn’t track or measure a task automatically, we couldn’t tell with certainty whether a task was done correctly by a SupaAgent.


What was the bottleneck?

A mismatch in our ability to monitor the quality of work in existing methods.

For example, if we had to hit an accuracy rate of 95%, it would be easy if we only had 100 tasks to work with. But the growing volume of work — say, of a million tasks — meant that we could only afford to make 5,000 mistakes.

On top of that, how would we know if a task is correct without inspecting each and every single one of them?

It’s unrealistic to expect BizOps’ output to match SupaAgent output as we scale

What is Ground Truth?

We started with this problem statement:
How might we measure the quality of our workers’ performance and quality of a task without having to manually inspect every single task?

Well, Ground Truth (GT) works by statistically delivering tasks with known answers to our SupaAgents while they’re working on a project. Think of a golden question or a control group in a science experiment.

How we use Ground Truth to solve our quality problems

If a SupaAgent makes a mistake during their work session, GT allows us to identify it without manually inspecting every part of their work.

How so?

When a SupaAgent begins a work session, a set of tasks with known answers are statically assigned to her (GT tasks), mixed in with the work that she’s meant to do for the client.

We can then infer, based on her performance on the GT tasks, what the overall quality of the rest of her working set is like.

If a SupaAgent’s performance falls below a certain accuracy threshold, we can interrupt her before she proceeds to do more. Then, we redirect her to our custom-built tutorial platform for retraining. The task that the SupaAgent completes during the session will also be flagged for inspection by our BizOps Manager.

We keep track of our SupaAgents’ performance over the lifetime of a project. So if a SupaAgent consistently makes mistakes, she will be removed from the project.

Future of Ground Truth at Supahands

We plan to make GT smarter. One thing we are really looking forward to is to personalize the injection interval of GT tasks to SupaAgents.

This can be done by building a model to predict the performance of our SupaAgents based on their performance metrics i.e. What is their most productive day and at what time do they perform best? How long can they work before they start making a mistake? Is there any correlation between their speed and quality?

Having this level of personalization will help us predict errors more efficiently. It also prevents mistakes from happening without going unchecked. Together, this could improve the quality of work our SupaAgents deliver and increase the efficiency of our SupaAgents’ work time.

GT is now an important mechanism in our workflow at Supahands to guarantee Quality at Scale.


This post originally appeared on Supahands Tech Blog.

0 Shares:
1 comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You May Also Like

What is Image Annotation?

The performance of Artificial Intelligence is heavily reliant on the accuracy of its training data.  Image annotation is…