SpeechOcean Transcription Guidelines
This document contains the transcription conventions for the product corpora owned by Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com). The transcription is intended to be an orthographic, lexical transcription with a few details included that represent audible acoustic events (speech and non-speech, present in the corresponding waveform files) adequately for training and testing of automatic SpeechOcean.
A. Transcription Procedure
Transcription is the process of converting spoken language into written text, and it plays a crucial role in various industries, such as legal, medical, academic, and media. To ensure accurate and high-quality transcriptions, a systematic procedure is followed. In this article, we will provide a step-by-step guide to the transcription procedure, highlighting the key stages involved in producing precise and comprehensive transcriptions.
- Pre-Transcription Preparation: Before starting the transcription process, it is essential to gather all the necessary materials and familiarize yourself with the subject matter. This includes obtaining the audio or video recordings, ensuring clear audio quality, and understanding any specific requirements or instructions from the client or employer. It is also beneficial to research any technical terms, acronyms, or industry-specific jargon that may be present in the recording.
- Listening and Familiarization: The first step in the transcription procedure is to carefully listen to the audio or watch the video recording to become familiar with the content. This allows you to gauge the speakers’ accents, speech patterns, and any potential challenges, such as background noise or overlapping conversations. It is crucial to listen attentively and develop an understanding of the context, nuances, and intended meaning of the dialogue.
- Transcription Format and Tools: Choose the appropriate transcription format based on the requirements of the project or client. This can include verbatim transcription, where every word, utterance, and sound is transcribed, or intelligent verbatim transcription, where unnecessary filler words, repetitions, or false starts are omitted. Select a reliable transcription software or word processing tool that allows you to pause, rewind, and type efficiently while listening to the audio or video recording.
- Transcribing the Content: Transcribe the spoken words accurately, ensuring that each word, phrase, or sentence is transcribed in the appropriate sequence. Pay attention to grammar, punctuation, and sentence structure to create a clear and coherent written transcript. Use timestamps to indicate the time intervals or specific points of the recording for reference purposes.
- Proofreading and Editing: After completing the initial transcription, thoroughly review and edit the written text. Check for any errors, inaccuracies, or omissions, and make the necessary corrections. Ensure that the transcript accurately reflects the original spoken content, maintaining the intended meaning and context. Proofreading is essential to eliminate any typos, spelling mistakes, or formatting inconsistencies, resulting in a polished and professional transcript.
- Formatting and Delivery: Format the transcript according to the specific requirements of the project or client. This may include using headers, timestamps, speaker labels, or any other formatting elements requested. Ensure consistency in formatting throughout the transcript for clarity and readability. Once the transcript is finalized, deliver it to the client or employer in the agreed-upon format, such as a Word document, PDF, or via a file-sharing platform.
- Confidentiality and Data Security: Maintain confidentiality and data security throughout the transcription process. Adhere to any privacy regulations or non-disclosure agreements to protect the sensitive and confidential information contained in the recordings. Safeguard client or employer data by using secure file transfer methods and employing encryption or password protection when necessary.
The following recommendations are made for the transcription procedure:
Use headsets for transcribers and the transcriptions should be made in a quiet environment.
Compose a fixed set of training sentences for the transcribers. Submit this same set to the validation centre, to be used for instructing the transcription validator.
Show waveform of the utterance as well. In this way non speech acoustic events can be located more easily.
Use a set of buttons for the non-speech acoustic events. Full typing by the transcribers introduces unnecessary errors.
Pass the QA process.
B. General Transcription Conventions
1. Transcriptions should reflect what the user really says.
This is not necessarily what the formal version of the word is or it’s ungrammatical or not in the given contents.
E.g. I want to go to the mall vs. I wanna go to the mall
If the speakers said “wanna”, the transcription will be “wanna” not “want
If a speaker utters the plural form “bonds” in the sample sentence below, transcribe it exactly as “bonds”: find a bonds with a ten year maturity date
2. Words should be capitalized if they are usually capitalized,
Proper Noun (e.g. names, addresses, countries, organizations, months and etc.) begins with capital letter, such as China, Microsoft and etc. The first letter of the sentence does not need to be capitalized.
Brand names, trademarks are transcribed as their original format including their case form (e.g. MySpace, Hotmail dot com).
3. Number Sequences
Number sequences (flight numbers, times, dates, aircraft types, money amounts, etc.) will be spelled out to reflect what was said (“flight six one three”; “seven thirty”; “August twenty first”; “seven forty seven”; “four hundred and ten dollars”.) If digits have alternate dictionary forms (e.g. “zero” or “oh” or “naught” in English), the correct alternative should be used that reflects the form actually pronounced. Long numbers may be written together, or with blanks between parts in order to reduce the lexicon size.
Arabic numerals are not allowed, don’t type 1234567890,you need write one two three…
4. Letter Sequences
Lette0r sequences occur in spelled words, ZIP-codes, acronyms and abbreviations
(“D F W”; “A P slash eighty”; “P M”; “C O”; “I B M” etc.) Letters should be in upper case, separated by a space.
The AM and PM of times (e.g., “five thirty P M”) will be treated as examples of letter sequences, i.e., upper case and separated by a space.
Acronyms refer to terms based on the initial letters of their various elements and
are spoken as words. They should be transcribed as words in upper case without
white spaces between the letters.
“I work for NASA.”
“AIDS has a great impact on society.”
Do not introduce abbreviations in the transcription. Always use the spelled-out
form (full word) when pronounced as such.
“This is Dr. Smith.” = “this is doctor Smith.”
“Then they drove to St. Paul.” = “then they drove to Saint Paul.”
Use punctuation as required by the grammar rules.
• Use end-punctuations (full stop, question mark, exclamation mark) to indicate
• the end of a complete sentence. AVOID: semi-colons, quotation marks.
• Use punctuation symbols that are essential part of the word, such as, apostrophes and hyphens.
• Use commas to break up long stretches of speech. This is to facilitate reader comprehension.
• AVOID: semi-colons, quotation marks.
• If someone speaks a special character, replace the character with the corresponding word (lower case). This should only be done when it is certain the user has spoken a character. The transcription should reflect exactly what was said.
“Pictures + Camera” = “pictures and camera”
“My email is m-golden@” = “my email is M dash golden at.”
“1 + 1 = 2.” = “one plus one equals two.”
C. Mark tags
1. Unintelligible Words
Unintelligible speech, words or stretches of speech that are completely
unintelligible, were transcribed by “**”. The “**” marker is separated from neighboring intelligible words with spaces. try to avoid using this tag
If there is no speech in this audio, you need to mark [S], if the speaker who outside the venue need to mark[Z], such as instructions from staff.
3. Non-Speech Acoustic Events
While speech is a fundamental mode of communication, our acoustic environment is filled with a multitude of non-speech acoustic events that carry important information. These events encompass various sounds, such as environmental noises, musical tones, animal calls, and more. In this article, we will explore the significance of non-speech acoustic events, their classification, and their impact on our daily lives.
- Definition and Classification: Non-speech acoustic events refer to any auditory signals that are not speech-related but still convey meaningful information. These events can be categorized into different classes based on their characteristics and purpose. Some common classes include environmental sounds (e.g., footsteps, raindrops, traffic noises), musical tones (e.g., melodies, chords), animal vocalizations (e.g., bird songs, whale calls), and machine-generated sounds (e.g., alarms, beeps).
- Environmental Awareness: Non-speech acoustic events play a vital role in our awareness and understanding of the surrounding environment. Environmental sounds provide crucial information about the physical space we inhabit, such as the presence of other individuals, the movement of objects, or potential hazards. By paying attention to non-speech acoustic events, we develop a heightened sense of situational awareness and can respond appropriately to the soundscape around us.
- Emotional and Cognitive Impact: Non-speech acoustic events have the power to evoke emotions and trigger cognitive processes. For instance, the sound of waves crashing against the shore may elicit a sense of calmness and relaxation, while a sudden loud noise can startle and raise alertness levels. Similarly, music has the ability to evoke various emotions, influence mood, and enhance cognitive performance. Understanding and appreciating the emotional and cognitive impact of non-speech acoustic events can lead to intentional use of sound for therapeutic, entertainment, or productivity purposes.
- Cultural and Artistic Significance: Non-speech acoustic events have cultural and artistic significance across different societies and traditions. They often form the basis of cultural practices, rituals, and celebrations. For example, traditional music, chants, or drum beats carry cultural heritage and transmit historical narratives. Non-speech acoustic events also serve as a medium for artistic expression, allowing composers, musicians, and sound designers to create evocative and immersive experiences in various forms of art, including music, theater, film, and virtual reality.
- Scientific Research and Applications: Non-speech acoustic events have garnered significant attention in scientific research. Acoustic ecology and bioacoustics focus on studying the soundscape and its ecological implications. Researchers investigate animal communication, soundscapes in urban environments, and the impact of noise pollution on human health and well-being. The findings have practical applications in designing soundscapes for better urban planning, noise reduction strategies, and wildlife conservation efforts.
- Technological Advancements: Advances in technology have opened up new possibilities for the analysis and manipulation of non-speech acoustic events. Machine learning and signal processing techniques enable the identification, classification, and synthesis of environmental sounds, enhancing our understanding and control of the acoustic environment. Applications range from soundscape design in virtual reality to noise cancellation algorithms for improved auditory experiences.
• Five categories of non-speech acoustic events must be transcribed. Events will only be transcribed if they are clearly distinguishable. Very low-level non-intrusive events will be ignored.
• The event will be transcribed at the place of occurrence, using the defined symbols in angle brackets. For noise events that occur over a span of one or more words, the transcription should indicate the beginning of the noise, just before the first word it affects.
The first two categories of acoustic events <FIL/> and <SPK/> originate from the speaker, and the other categories originate from another source. Sounds originating from the speaker usually do not overlap with the target speech, while sounds originating from other sources could of course occur simultaneously with the speech.
Tags Definition Example
<FIL/> Filled pause: the “words” that speakers use to indicate hesitation or to maintain control of a conversation while thinking of what to say next. Including oh, ah, uh, um, er, and hmm, etc.
If they are using a real word\phrase as filler then transcribe that part.
E.g. well, you know, etc. <FIL/> call Bulmer
<SPK/> Speaker noise: The various sounds and noises made by the speaker that are not part of the prompted text, e.g. lip smack, cough, grunt, throat clear, tongue click, loud breath, laugh, loud sigh. This marker should also be used in the case that the speaker blows into the microphone, before or after a word. Only loud the various sounds and noises should be transcribed. <SPK/> Pierre
<STA/> Stationary noise : This category contains background noise that is not intermittent and has a more or less stable amplitude spectrum over some time. Examples, voice babble (cocktail-party noise), background noise, sirens, wind, rain, loud car noise from the outside. Music was also marked as stationary if it was audible while not by designed. This mark should be rarely usedin quiet desktop environment.
<NON/> This category contains noises of an intermittent nature. These noises typically occur only once like a door slam, dropping something or mouse clicking. 短暂的外界噪音
A remark, defined as a brief spoken or written comment, holds incredible power to influence, inspire, or even transform a situation or a person’s life. Whether positive or negative, remarks have the potential to leave a lasting impact on individuals, relationships, and communities. In this article, we will explore the significance of a remark, the effects it can have, and the importance of choosing our words wisely.
- Words as Catalysts: A remark can act as a catalyst, triggering a chain reaction of emotions, actions, and outcomes. A simple compliment or word of encouragement can uplift someone’s spirits, boost their confidence, and motivate them to pursue their dreams. Conversely, a negative or hurtful remark can have a detrimental effect, causing emotional pain, eroding self-esteem, and damaging relationships. The power lies in the ability of words to shape perceptions, attitudes, and behaviors.
- Building Relationships: Thoughtful remarks play a fundamental role in building and nurturing relationships. Kind and supportive words can foster trust, strengthen bonds, and create a sense of connection. They communicate care, empathy, and understanding, making individuals feel valued and appreciated. By expressing gratitude, acknowledging achievements, or offering words of comfort, we can forge deeper connections with those around us.
- Empowerment and Inspiration: A well-timed remark has the potential to empower and inspire others. Encouraging words can ignite a spark within someone, fueling their passion and motivating them to overcome challenges. By recognizing someone’s potential, offering guidance, or expressing belief in their abilities, we can help individuals unlock their full potential and pursue their aspirations with renewed determination.
- Influence on Self-Perception: The remarks we receive from others can significantly shape our self-perception. Positive remarks can reinforce a sense of self-worth, confidence, and belief in our abilities. They can fuel self-motivation and drive us to strive for excellence. Conversely, negative or disparaging remarks can undermine our self-esteem, create self-doubt, and limit our potential. It is crucial to be mindful of the impact our words can have on others and to choose them with care and compassion.
- Creating a Positive Environment: A positive and supportive environment is nurtured by uplifting remarks. By promoting kindness, respect, and constructive feedback, we contribute to a culture of positivity and growth. Thoughtful remarks foster an atmosphere of collaboration, creativity, and productivity, enabling individuals and teams to thrive and achieve their goals.
- Spreading Positivity: A single remark has the power to spread positivity beyond its intended recipient. A kind word or a compliment can inspire a ripple effect, as the recipient may be encouraged to extend similar gestures to others. This chain of positivity has the potential to create a more harmonious and compassionate society, where people uplift and support one another through their words and actions.
<STA/> and <NON/> should only be used if the sounds are not inherent to the environment as such. E.g. in the car a stationary background noise and street noises can be expected as given with the environment of the recording. These noises should not be transcribed. Only obvious and salient deviations from the given background should be marked.
<STA/> is usually put in the initial position of the utterances. <NPS/> is NOT preferred to using in “restaurant” or “street” environment either to mark the other people speaking in vicinity.
If <SPK/> or <NON/> begins in a word then the symbol was put before the first word affected. The symbols were always separated from the surrounding words by spaces.
Each task on the platform is about 3 minutes long, including silent segments (invalid segments, which are not calculated at the time of settlement).
Too long for fear of affecting the correct rate, so they are cut into small segments of audio as a task, it will not be too long and too boring to do.
When u finish the 1st task, inform the pm. Only the 1st task pass the checking, u can apply the new task later. The main purpose is to ensure if you understand the rules before u start the working.
Once you have passed the first task, you can continue to apply for the next one. Then you can apply each task one by one. You can do more as long as you have time.
if only some words， use tag [**],try to avoid using this tag
E. Log in 登陆
The account and password are assigned by the project manager. 账户密码问经理
You need to fill in your personal information after you log in your account
1. Start 开始工作
Click Apply for work, then click sign the agreement →go to work.
2. Transcription process 开始
3. Considerations 注意事项
Time-out recovery: 48 hours for normal tasks. You should finish one task in 48 hours, otherwise it will be recycled. 任务回收规定：48 小时内完成任务。任务应在 48 小时内 完成，否则将予以回收再分配。