Self-driving automatic car companies hope to collect a large amount of data, which can provide more useful references for autonomous driving. This data set has two main parts: 1. Collect data through various means such as vehicle detection, lane change, road traffic sign recognition, pedestrian and bicycle detection; 2. Data annotation through visual recognition.
Challenges of autonomous driving data labeling:
The biggest challenge in autonomous driving data labeling is how to label the scene, find the key feature points of all scenes, and label all the feature points at the same time. In order to better achieve this goal, some technologies can be used to collect and label data in different scenarios.
For example: use deep learning algorithm to train a deep neural network model (such as convolutional neural network); then use it to do a visual classification of the data set; at the same time, add a deep convolutional network for image recognition (such as: automatic driving image classification) wait. In this process, various deep learning algorithms can be used to train the model, so as to achieve better recognition effect.
Autonomous driving data labeling process:
For the labeling process of the data set, the key feature points are first extracted from the original data, and then these feature points are marked, and finally a labeled text box containing image and video content is generated. The workflow here is as follows: 1. Obtain the original picture and video, 2. Preprocess the file (original picture and vector image) 3. Preprocess and post-process the text box.
Organize and classify data sets from the collected data. The next step is the data labeling process, using a method to label each scene. After the data collection is complete, these scenes will be tested: by identifying the key feature points contained in each scene and marking them out; at the same time, marking the important parts (video) in each scene, Only in this way can we ensure the automatic driving test of this scene.
Current status and development trend of the data industry: automatic
The data labeling industry is a typical labor-intensive industry. The personnel engaged in this industry often have no professional technical background, and they generally rely on manpower to complete the data labeling work.
Practitioners in the data labeling industry are mainly divided into three categories: one category is engineers with technical background who have accumulated a lot of manual experience in technical work such as data labeling; Or have a certain understanding of algorithms, but are not good at automation equipment and automatic testing. The third type is people with a technical background and a certain understanding of data sets or algorithms.