As one of the mainstream methods of voice human-computer interaction, voice has unique advantages and charms. A seemingly short piece of voice not only contains the text content that the speaker wants to convey, but also contains the speaker’s identity, language category, speaker’s emotional state, the environment in which they speak, etc.

voice annotation

What is Voice Annotation?

Speech annotation is a relatively common type of annotation in the data annotation industry. Speech annotation means that the annotator first “extracts” the text information and various sounds contained in the voice, and then transcribes or synthesizes it. The data after voice data annotation is mainly used for artificial intelligence machine learning, which is equivalent to giving computers The system is equipped with “ears” so that it has the function of “hearing”, so that the computer can realize accurate speech recognition.

What are the voice annotation methods?

1. Voice cleaning

Voice cleaning is the process of re-examining and verifying the voice, the first step in voice data preprocessing, and an important part of ensuring the correctness of subsequent results.

2. ASR voice transcription

ASR is automatic speech recognition technology, which is a technology that converts human speech into text. Speech transcription is the process of transcribing speech data into text data, and it is a relatively common tagging form in the field of data tagging.

3. Emotional judgment

Emotional information in speech is a very important behavioral signal that reflects human emotions, and recognizing the emotional information contained in speech is an important part of realizing natural human-computer interaction. Emotion judgment is to judge the emotional intention of the character’s language content in the audio for some dialogue data, such as: expressing questions, needs or complaints and suggestions, etc.

4. Voice cutting

Speech segmentation is the process of identifying boundaries between words, syllables, or phonemes in natural language. Speech segmentation is an important subproblem in the field of speech recognition technology.

5. Voiceprint recognition

Voiceprint recognition is a kind of biometric recognition technology, through the characteristic analysis of one or more kinds of voice signals to achieve the purpose of identifying unknown voices, simply put, it is a technology to identify whether a certain sentence is said by someone .voice annotation

6. Phoneme labeling

A phoneme is the smallest unit of speech divided according to the natural properties of speech. It is analyzed according to the pronunciation actions in a syllable, and an action constitutes a phoneme.

7. Temperament labeling

Prometic annotation in speech synthesis systems generally adopts the method of predicting prosody based on text information. Taking Chinese labeling as an example, the rhythm prediction is performed based on text information, and the rhythm prediction results are usually determined based on information such as initials, finals, words, phrases, and paragraphs.

8. Pronunciation proofreading

Pronunciation proofreading is the process of collecting data during the entire oral training process and correcting non-standard pronunciation.

Latest Post

Table of Contents