The artificial intelligence industry is growing rapidly and like any new technology, legislative processes aren’t always able to keep up. This is one of the reasons why the industry is still relatively closed, with AI companies typically keeping their data proprietary in order to maintain their competitive advantage.
One major part of data security that’s especially relevant for AI companies is data privacy. This has always been a controversial topic in the general tech industry but since the COVID-19 pandemic, threats to data privacy have increased. And with more data privacy regulations put in place, it makes sense that data privacy cases will continue to rise.
This has led to a misconception that working with third party data labelers is less secure. When it comes to data labeling, many companies think that having an in-house data labeling team or going down the traditional business process outsourcing road are safer options. But this isn’t always the case.
There are no 100% guarantees that your data is completely safe. The only way to truly maintain security is by continuously assessing risks and being prepared to remediate them.
Preventing theft of data
Whether you have an in-house data labeling team or outsourced to a third party partner, there will be data security risks. Even if you control the network within your office, if someone wanted to give away company secrets, they would find a way to do so.
They could use their mobile phones to take a photo of the screen or even verbally convey information to a competitor. And if companies were to set up too many controls for these sorts of occurrences eg. preventing mobile phones in the office, the working environment would suffer and possibly affect team morale. It can be difficult to find a balance.
Considering all these factors, working with a trustworthy third party partner might actually be a better option, especially if they already have security practices set up in order to deal with these risks.
To assess your potential third party partners, looking into the relevant areas and asking the right questions are vital. These are some of the areas to look into:
How good is their organizational security?
Before selecting a third party partner, make sure you know the ins and outs of how they manage security within their organization. Ask about their tech stack and find out as many details as possible.
If they’re using cloud infrastructure, make sure they’re using a well-known provider with universally recognized and accepted security measures. Make sure that all their files are encrypted.
Also, ask if they’ve had any distributed denial of service (DDoS) attacks. Since their software is supposed to be closed, there should be none. For the same reason, there should also be no zero day exploits.
Do they limit the context of the data?
At Supahands, the standard practice is to limit the distribution of data. SupaAgents are only able to see what is required in order for the labeling work to be conducted.
This means that they only have one-time access to each batch of data. They are not privy to the context of the data, which means they don’t know how the labeled data is going to be used or even who the client might be.
The data labeling process is also set up in such a way that external individuals would not know what it is or what it’s being used for.
Any third party data labeling partner should have these practices in place before they can be considered reliable in terms of data security.
Where is the data hosted?
Ask your third party provider where they host the data that you will provide to them. You should be able to communicate your preferences in terms of where you want the data to be kept.
You can request for your data to be hosted in selected AWS regions or even set a requirement for data labelers to use a VPN when conducting the labeling work.
Alternatively, you could also choose to host your own data and just give the vendor access to call your data.
Are you able to log your data?
Whether you’re conducting your data labeling in-house or with a third party partner, you must have access to your data logs in order to stay on top of data security.
The best option would be to host your own data so that you have on-demand access to the data logs. However, if this is not something you want to opt for, ensure that you are requesting frequent logs from your third party partner.
The minimum standard for security
If you decide to work with a third party partner for your data labeling processes, here’s a quick checklist of the minimum standard that your partner should have:
- Tech stack that meets organizational security standards
- Standard practice of limiting context for data labelers
- Options for you to choose where to host your data
- Ability to call data from your host
- Compliance with your data log requests