SpeechOcean Transcription Guidelines
This document contains the transcription conventions for the product corpora owned by Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com). The transcription is intended to be an orthographic, lexical transcription with a few details included that represent audible acoustic events (speech and non-speech, present in the corresponding waveform files) adequately for training and testing of automatic SpeechOcean.
A. Transcription Procedure
The following recommendations are made for the transcription procedure:
Use headsets for transcribers and the transcriptions should be made in a quiet environment.
Compose a fixed set of training sentences for the transcribers. Submit this same set to the validation centre, to be used for instructing the transcription validator.
Show waveform of the utterance as well. In this way non speech acoustic events can be located more easily.
Use a set of buttons for the non-speech acoustic events. Full typing by the transcribers introduces unnecessary errors.
Pass the QA process.
B. General Transcription Conventions
1. Transcriptions should reflect what the user really says.
This is not necessarily what the formal version of the word is or it’s ungrammatical or not in the given contents.
E.g. I want to go to the mall vs. I wanna go to the mall
If the speakers said “wanna”, the transcription will be “wanna” not “want
If a speaker utters the plural form “bonds” in the sample sentence below, transcribe it exactly as “bonds”: find a bonds with a ten year maturity date
2. Words should be capitalized if they are usually capitalized,
Proper Noun (e.g. names, addresses, countries, organizations, months and etc.) begins with capital letter, such as China, Microsoft and etc. The first letter of the sentence does not need to be capitalized.
Brand names, trademarks are transcribed as their original format including their case form (e.g. MySpace, Hotmail dot com).
3. Number Sequences
Number sequences (flight numbers, times, dates, aircraft types, money amounts, etc.) will be spelled out to reflect what was said (“flight six one three”; “seven thirty”; “August twenty first”; “seven forty seven”; “four hundred and ten dollars”.) If digits have alternate dictionary forms (e.g. “zero” or “oh” or “naught” in English), the correct alternative should be used that reflects the form actually pronounced. Long numbers may be written together, or with blanks between parts in order to reduce the lexicon size.
Arabic numerals are not allowed, don’t type 1234567890,you need write one two three…
4. Letter Sequences
Lette0r sequences occur in spelled words, ZIP-codes, acronyms and abbreviations
(“D F W”; “A P slash eighty”; “P M”; “C O”; “I B M” etc.) Letters should be in upper case, separated by a space.
The AM and PM of times (e.g., “five thirty P M”) will be treated as examples of letter sequences, i.e., upper case and separated by a space.
Acronyms refer to terms based on the initial letters of their various elements and
are spoken as words. They should be transcribed as words in upper case without
white spaces between the letters.
“I work for NASA.”
“AIDS has a great impact on society.”
Do not introduce abbreviations in the transcription. Always use the spelled-out
form (full word) when pronounced as such.
“This is Dr. Smith.” = “this is doctor Smith.”
“Then they drove to St. Paul.” = “then they drove to Saint Paul.”
Use punctuation as required by the grammar rules.
• Use end-punctuations (full stop, question mark, exclamation mark) to indicate
• the end of a complete sentence. AVOID: semi-colons, quotation marks.
• Use punctuation symbols that are essential part of the word, such as, apostrophes and hyphens.
• Use commas to break up long stretches of speech. This is to facilitate reader comprehension.
• AVOID: semi-colons, quotation marks.
• If someone speaks a special character, replace the character with the corresponding word (lower case). This should only be done when it is certain the user has spoken a character. The transcription should reflect exactly what was said.
“Pictures + Camera” = “pictures and camera”
“My email is [email protected]” = “my email is M dash golden at.”
“1 + 1 = 2.” = “one plus one equals two.”
C. Mark tags
1. Unintelligible Words
Unintelligible speech, words or stretches of speech that are completely
unintelligible, were transcribed by “**”. The “**” marker is separated from neighboring intelligible words with spaces. try to avoid using this tag
If there is no speech in this audio, you need to mark [S], if the speaker who outside the venue need to mark[Z], such as instructions from staff.
3. Non-Speech Acoustic Events
• Five categories of non-speech acoustic events must be transcribed. Events will only be transcribed if they are clearly distinguishable. Very low-level non-intrusive events will be ignored.
• The event will be transcribed at the place of occurrence, using the defined symbols in angle brackets. For noise events that occur over a span of one or more words, the transcription should indicate the beginning of the noise, just before the first word it affects.
The first two categories of acoustic events <FIL/> and <SPK/> originate from the speaker, and the other categories originate from another source. Sounds originating from the speaker usually do not overlap with the target speech, while sounds originating from other sources could of course occur simultaneously with the speech.
Tags Definition Example
<FIL/> Filled pause: the “words” that speakers use to indicate hesitation or to maintain control of a conversation while thinking of what to say next. Including oh, ah, uh, um, er, and hmm, etc.
If they are using a real word\phrase as filler then transcribe that part.
E.g. well, you know, etc. <FIL/> call Bulmer
<SPK/> Speaker noise: The various sounds and noises made by the speaker that are not part of the prompted text, e.g. lip smack, cough, grunt, throat clear, tongue click, loud breath, laugh, loud sigh. This marker should also be used in the case that the speaker blows into the microphone, before or after a word. Only loud the various sounds and noises should be transcribed. <SPK/> Pierre
<STA/> Stationary noise : This category contains background noise that is not intermittent and has a more or less stable amplitude spectrum over some time. Examples, voice babble (cocktail-party noise), background noise, sirens, wind, rain, loud car noise from the outside. Music was also marked as stationary if it was audible while not by designed. This mark should be rarely usedin quiet desktop environment.
<NON/> This category contains noises of an intermittent nature. These noises typically occur only once like a door slam, dropping something or mouse clicking. 短暂的外界噪音
<STA/> and <NON/> should only be used if the sounds are not inherent to the environment as such. E.g. in the car a stationary background noise and street noises can be expected as given with the environment of the recording. These noises should not be transcribed. Only obvious and salient deviations from the given background should be marked.
<STA/> is usually put in the initial position of the utterances. <NPS/> is NOT preferred to using in “restaurant” or “street” environment either to mark the other people speaking in vicinity.
If <SPK/> or <NON/> begins in a word then the symbol was put before the first word affected. The symbols were always separated from the surrounding words by spaces.
Each task on the platform is about 3 minutes long, including silent segments (invalid segments, which are not calculated at the time of settlement).
Too long for fear of affecting the correct rate, so they are cut into small segments of audio as a task, it will not be too long and too boring to do.
When u finish the 1st task, inform the pm. Only the 1st task pass the checking, u can apply the new task later. The main purpose is to ensure if you understand the rules before u start the working.
Once you have passed the first task, you can continue to apply for the next one. Then you can apply each task one by one. You can do more as long as you have time.
if only some words， use tag [**],try to avoid using this tag
E. Log in 登陆
The account and password are assigned by the project manager. 账户密码问经理
You need to fill in your personal information after you log in your account
1. Start 开始工作
Click Apply for work, then click sign the agreement →go to work.
2. Transcription process 开始
3. Considerations 注意事项
Time-out recovery: 48 hours for normal tasks. You should finish one task in 48 hours, otherwise it will be recycled. 任务回收规定：48 小时内完成任务。任务应在 48 小时内 完成，否则将予以回收再分配。