The issue of data labeling quality has been a major topic of concern in the AI/ML community. Perhaps the most common “principle” you’re likely to come across when solving this puzzle is “garbage in, garbage out”.
By saying this, we want to emphasize the fundamental laws of training data for AI and ML development projects. Low-quality training datasets provided to AI/ML models can lead to a large number of operational errors.
For example, training data for self-driving vehicles is the determining factor in whether the vehicle can operate on the road. Given low-quality training data, AI models can easily mistake humans for objects or vice versa. Either way, a poor training dataset can lead to a high risk of accidents, which is the last thing an autonomous vehicle manufacturer wants in its programs.
For high-quality training data, we need to involve data annotation quality assurance in the data processing process .
Clarify the customer’s requirements for data labeling quality control:
Multi-level QA process
self-test
Cross check
Manager’s review
QA personnel involved
Clarify the customer’s requirements for data labeling quality control
High data annotation quality doesn’t just mean the most carefully annotated data or the highest quality training data. For strategic data labeling projects, we need to clarify the requirements for training datasets. The question that the labeling team leader must answer is how high-quality the data needs to be.
As a provider of data labeling quality, one of the things we always ask our clients is requirements. “How tedious do you want us to process the dataset?”, “How accurate do you want our annotations to be?”. By answering these questions, you will develop a baseline for the entire project going forward.
How to ensure the quality of data annotation
Keep in mind that AI and machine learning implementations are very broad. In addition to common applications in self-driving cars and transportation, AI and ML are making their debuts in healthcare, agriculture, fashion, and more. For every industry, there are hundreds of different projects, working on different types of objects and therefore different quality requirements.
We can give a simple example, road labeling and medical data labeling. For road labeling, the job is so simple that only a labeler with common sense can do the job. For this annotation project, the number of datasets that need to be annotated may increase by millions of videos or images, and the annotators must maintain productivity at an acceptable quality level.
On the other hand, medical data requires annotators with specific knowledge working in the medical field. For diabetic retinopathy, trained doctors were asked to grade the severity of diabetic retinopathy based on photographs in order to apply deep learning to this specific field.
Data Labeling Quality – Medical Use
Even among well-trained physicians, not all annotations agree with each other. To achieve consistent results, an annotation team may have to annotate each file multiple times to finally draw a correlation.
It depends on the complexity of the given data and how detailed the client wants the data output to be. Once these things are clarified, the team lead can allocate resources for the desired outcome. Metrics and associated quality assurance processes are defined thereafter.
Clients are also required to provide a set of examples as a “baseline” for each dataset to be labeled. This is the most straightforward data annotation quality assurance technique one can possibly employ. With examples of perfectly labeled data, your annotators are now trained and presented with a baseline for their work.
With the benchmark as an ideal result, you can compute consistency metrics to evaluate the accuracy and performance of each tagger. If there is uncertainty in the annotation and review process, QA personnel can use these sample data sets to define what is acceptable and what is not.
Multi-level QA process
The QA process in a data labeling project varies from company to company. At Lotus QA, we adhere to internationally standardized quality assurance processes. Pre-determined preferences will always be made clear at the beginning of the project. These preferences will be compiled into a “baseline” that will later serve as the “gold standard” for every label and callout.
The steps in this multi-tiered quality assurance process are:
1. Self-test
In this step, annotators are asked to review their work. With the self-assessment, annotators now have time to review data annotation tools, annotations, and labels from the beginning of the project.
Often, annotators have to be under great pressure in terms of time and workload, which can lead to potential bias in their work. Quality assurance starting with the self-check step will be the time for annotators to slow down and get a thorough understanding of how they work. By acknowledging errors and possible biases, annotators can fix them themselves and avoid any of these in the future.
2. Cross check
In data science in general and data annotation in particular, you may have heard the term “bias”. Labeling bias refers to the situation where annotators have their own habits for labeling data, which can lead to bias in the data provided. In some cases, tagger bias can affect model performance. For more robust AI and ML models, we must take some effective measures to remove biased annotations, and one simple way is to cross-check.
Data Labeling Quality – Cross Check
By cross-checking your labeling process, the overall work can be viewed differently, so labelers can identify mistakes and errors in their colleagues’ work. Again, with this different perspective, reviewers can point out biased annotations and team leaders can take further action. They can rework or do another round of evaluation to see if the annotations are actually biased.
3. Manager’s review
Labeling project managers are usually responsible for the day-to-day supervision of labeling projects. Their primary tasks include selecting/managing workforce and ensuring data quality and consistency.
Managers will receive data samples from clients and process required metrics and train annotators. Once the cross-check is complete, the manager can randomly check the outputs to see if they meet the client’s requirements.
Before all these checks, the labeling project manager must also draw a “baseline” for quality assurance. To ensure consistency and accuracy, any work below predetermined quality must be reworked.
4. Involvement of quality assurance personnel
Data annotation quality control cannot rely solely on the annotation team. In fact, the involvement of professional and experienced quality assurance personnel is a must. To ensure the highest quality of your labeling work, a team of quality assurance personnel is a must. They will work as a separate department, outside of the labeling team, and not under the labeling project manager.
The ideal percentage of quality personnel to the total number of data labeling personnel should not exceed 10%. QA cannot and will not review every piece of annotated data in a project. In fact, they randomly take the dataset and look at the annotations again.
Data Labeling Quality – Quality Assurance
These QA people are well trained on the data samples and will have their metrics to evaluate the quality of the labeled data. These metrics must be agreed in advance between the QA team lead and the labeling project manager.
In addition to the three-step review of self-check, cross-check, and manager review, the involvement of QA personnel in your labeling project will ensure that your data output meets pre-defined benchmarks, ultimately ensuring your highest-level training data.