Captioning Instructions


_______________ Captioning Instructions _______________


Captioning refers to a written translated or transcribed text inserted onto the screen. It is the act of transcribing and timing the audio track of a video. Captions not only display words as the textual equivalent of spoken dialogue or narration, but could also include speaker identification, sound effects, and music description.

Technical aspects


Captions should be synchronized and appear at approximately the same time as the audio is delivered; equivalent and equal in content to that of the audio, including speaker identification and sound effects; and accessible and readily available to those who need or want them.

The keyboard shortcuts easily help you control the video playback and insert In-timecodes (start) and Out-timecodes (end) when creating new captions. Captions need to appear when characters start speaking and disappear when they finish speaking. You can use the shortcuts while listening to the audio, which helps to know exactly where a caption should start and end, and that creates accurate timecodes (down to milliseconds).

Caption Specifications

Each video project contains a defined set of caption specifications that are required by the partner, but TWB has general specifications that apply to all captioning tasks, as explained in the process below.

A few key areas to keep in mind are:

  • Duration on screen:
    • The recommended minimum caption duration is approximately 1.5 to 2 seconds to allow time for the human eye to read the content.
    • The maximum recommended duration for a caption is 6 to 7 seconds to prevent it from lasting too long on-screen.
    • Some formats may require the captions to have a specific minimum duration, or a hard limit.
  • Character limit: The minimum characters per caption is 30, and the maximum is 80 (including spaces) per line. While characters might be limited, adding a second line will help you enter more content on a single caption.
  • Lines per caption: There is a maximum of 4 lines per caption. New lines can be added by pressing Shift+Enter, but some partners specify a maximum number of lines in their captions.
  • Caption positioning: In general, captions are positioned center-aligned at the bottom of the screen. Viewers expect this, and it is best not to move them around unnecessarily. However, the presence of Partner’s logos or background colors may complicate reading, and in those cases, captions may need to be repositioned.
  • Accuracy: Captions must correspond to the audio word for word, unless indicated otherwise on the specifications. They should not be modified for stylistic purposes; they should reflect the audio exactly, including speaker errors, filler words, etc.

TWB uses the platform VideoTMS, an environment for managing pipelines of audiovisual projects. You can access VideoTMS in your browser by going to and log in with your email address and password provided by the Project Officer.


  • Captioning accuracy affects the whole process since it sets the base for another later step, such as translation. Always research and Google if you are unsure. You may type the words in Google and see the results that show the correct spelling or meaning.
  • Always use the waveform to check if the timing is correct. A slightly delayed out-timecode that facilitates readability is sometimes acceptable.
  • On the video player, aside from the Volume Bar and the Play/Pause button, you can also slow down or speed up the audio.
  • Your captions should adapt to the speakers’ natural rate of speech, so your captions will have all different lengths (duration on screen).


If you have any questions or need anything, please contact your Project Officer who is happy to help out!


You will receive a task in the captioning system or in an email from your Project Officer. This task will have

  • The video you are going to work with.
  • An editor to insert the text
  • Reference material: you may be provided with specific project instructions or specifications.


Your task is to create the captions, which has 2 parts:

  1. Transcribing the audio into written text: this includes speaker identification, sound effects, etc.
  2. Timing the audio: adding and modifying the timecodes of the video. You can use the platform’s timecoding functions to ensure the synchronization is accurate.

The captioning interface is composed of:|624x350.98513743717706

(1) Video player, (2) Waveform, (3) Video Toolbox, (4) Due date and progress, (5) Captioning area, (6) View tracks, (7) Buttons to navigate captions, (8) Autosave, (9) Complete.

  • On the caption edit box, simply type what you hear or see on the screen in compliance with the specifications.
    • If you notice red or gray flaps at the side of your captions, it means there is an error you need to fix, and you’ll find the reason below each caption.
  • Use the shortcuts mentioned above to set accurate timecodes.
    • You can modify the timecodes of new and existing captions to make them more accurate just by lengthening or shortening the cue’s size in the waveform.
  • Press enter to add a new caption.
    • Captions are automatically saved every few seconds.

For a video tutorial, please click here. To learn more about captioning with VideoTMS, please see the VideoTMS Documentation for Linguists.

Pre-delivery checklist

  • Make sure your captions are exact down to milliseconds, and the timecodes are accurate.
  • Confirm that the timing of the captions is adjusted to the natural rate of speech - Ensure the text is readable and matches the dialogue.
  • Ensure the minimum and maximum duration on screen of the captions are followed - The standard is 2 to 6 seconds per caption if there are no specifications.
  • Make sure there are no missing captions.
  • Proofread your work and ensure that there are no spelling mistakes.
  • Be sure to review all red and gray flaps on cue lines as part of your final QA check.

Once your captions are ready to be delivered, please click Complete on VideoTMS and let the Project Officer know that the task is completed.