Skip to main content

Subtitle and Caption Sources

Hybrik can use subtitle or closed caption files for several operations. These include:

  • Burning closed captions or subtitles into the video
  • Embedding closed captions into a video
  • Adding subtitles as an target for Dash or HLS streaming
  • Converting from one subtitle or caption type to another

Supported Source Types

Hybrik can accept the following sources, but there may be edge cases which are not supported. For example, Hybrik supports TTML which and elements of most TTML profiles such as IMSC1, DFXP, and SMPTE-TT. There may be variants each of these formats that work and variants that don’t. For example, Hybrik may accept a DFXP profile of TTML but will not support any image or binary data. In general, if you are curious if your subtitle format will work, we recommend testing it in a job.

Formats

  • Closed Caption Sources
    • Scenarist Closed Caption (.scc extension)
    • Embedded CEA-608
  • Subtitle Sources
    • Timed Text Markup Language, or TTML (.xml)
    • SubRip (.srt)
    • WebVTT (.vtt)

The contents array

The contents array is part of a source where you can provide Hybrik with more parameters about your source. At it's simplest, you can say that your source contains audio, or video, or both.

"contents": [
{
"kind": "video"
},
{
"kind": "audio"
}
]

But you can also specify other details in the contents array. For example, you'll see that when we secify a closed_caption source, we can specify the format of the captions or subtitles. More specific examples will follow.

Specifying a Sidecar Closed Caption File

You can specify a closed caption file as a sidecar file in a asset_complex alongside a video. Read our Tutorial on Complex Assets for full examples. Typically, additional parameters about that asset are specified inside that component's contents array.

{
"uid": "sources",
"kind": "source",
"payload": {
"kind": "asset_complex",
"payload": {
"asset_versions": [
{
"asset_components": [
{
"kind": "name",
"name": "{{source_name}}",
"location": {
"storage_provider": "s3",
"path": "{{source_path}}"
},
"contents": [
{
"kind": "video"
},
{
"kind": "audio"
}
]
},
{
"component_uid": "caption1",
"kind": "name",
"name": "{{captions_name}}",
"location": {
"storage_provider": "s3",
"path": "{{captions_path}}"
},
"contents": [
{
"kind": "closed_caption",
"payload": {
"format": "scc"
}
}
]
}
]
}
]
}
}
},

Specifying a Sidecar Subtitle File

Much like how as a closed caption sidecar file is specified, you can do the same with a subtitle sidecar file:

{ 
"uid": "sources",
"kind": "source",
"payload": {
"kind": "asset_complex",
"payload": {
"asset_versions": [
{
"asset_components": [
{
"kind": "name",
"name": "{{source_name}}",
"location": {
"storage_provider": "s3",
"path": "{{source_path}}"
},
"contents": [
{
"kind": "video"
},
{
"kind": "audio"
}
]
},
{
"component_uid": "caption1",
"kind": "name",
"name": "{{captions_name}}",
"location": {
"storage_provider": "s3",
"path": "{{captions_path}}"
},
"contents": [
{
"kind": "subtitle",
"payload": {
"format": "auto"
}
}
]
}
]
}
]
}
}
},

Embedded Closed Captions as a Source

Hybrik has limited support for embedded closed captions as a source. Hybrik can generally support embedded CEA-608 closed captions for embedding or converting to another text format such as a subtitle file output. You can tell Hybrik that a source has closed captions in the contents array if they are not auto-detected:

{
"uid": "source",
"kind": "source",
"payload": {
"kind": "asset_url",
"payload": {
"storage_provider": "s3",
"url": "{{source}}",
"contents": [
{
"kind": "video"
},
{
"kind": "audio"
},
{
"kind": "closed_caption"
}
]
}
}
}

Mapping Multiple Languages to CEA-608 Fields and CEA-708 Services

When you want to support multiple languages, you typically embed your primary language on CEA-608 CC1 which is mapped to CEA-708 Service 1. Your secondary language is typically mapped to CEA-608 CC3 which is mapped to CEA-708 Service 2. You can specify the first embedded language with scc0 and the second language as scc1 within the contents::payload of each asset_component in your source. Here is a snippet:

"contents": [
{
"kind": "closed_caption",
"payload": {
"format": "scc0"
}
}
]

Sync to Timecode

If your caption/subtitle source is aligned to your video source's timecode track, you will want to enable timecode synchronization. The option to do this in Hybrik is "sync_to_timecode": true, which will align the subtitles to the video’s timecode.

This example shows sync_to_timecode in a closed_caption source but this is also valid in subtitle sources.

"contents": [
{
"kind": "closed_caption",
"payload": {
"format": "scc",
"sync_to_timecode": true
}
}
]

Offset Subtitle Sources

If you need to offset your subtitles by some number of seconds, that can be specified with "delay_sec": -30 which would start the subtitles 30 seconds earlier:

"contents": [
{
"kind": "subtitle",
"payload": {
"format": "auto",
"delay_sec": -30
}

Subtitle and Caption source parameters

These are some parameters that you can specify in a source component's contents array

Closed Caption source parameters

These options are valid for closed_caption sources such as an scc file.

Name                                                          Type                                                                Description
modeenum
disabled
enabled
auto
Optional. disable: disable all tracks of this media type, enabled: use this track (fail if it doesn't exist), auto: use if exists. Default is enabled. If you have captions in a video that you want to suppress, this can be set to disabled on video in the contents array.
track_namestringSet closed caption track name.
formatenum
scc
scc0
scc1
The format of the closed caption track.
categoryenum
default
sdh
forced
described_music_and_sound
Set the category of the closed caption track.
delay_secintegerOptional. By how many seconds to delay the closed caption track from the start of the video track. Can be a positive value as well as negative (start closed caption track before video.)
sync_to_timecodebooleanOptional. When set to true the closed caption track will be synchronized with timecode information (for example from a timecode track, or timecode embedded in the video track) from the video track. If false, the closed caption track will start at the beginning of the video track. Default is false.
source_timecode_selectorenum
first
highest
lowest
mxf
gop
sdti
smpte
material_package
source_package
Selects which metadata track to be used for time code data. Not all options are valid with all codecs/containers. Default is first.
timecode_formatenum
df
ndf
auto
Optional. Override timecode metadata type as drop frame or non-drop frame, or keep the existing type of the track for auto.
timecode_frame_rateenum
59.94
29.97
23.98
Optional. Override framerate of the timecode metadata.
ingest_repeat_rateintegerOptional. Minimum: 0, maximum: 2. How often to re-read the closed caption track when processing the asset_complex source.
languagestringSelect closed caption language.
track_group_idstringThis indicates which Group this track belongs to. Multiple tracks with the same content but different bitrates would have the same track_group_id.
layer_idstringThis indicates which Layer this tracks belongs to. For example, this allows bundling one video layer and multiple audio layers with same bitrates but different languages.
layer_affinitiesarrayThis indicates which other layers this layer can be combined with. For example, to combine audio and video layers.

Subtitle Source Parameters

These options are valid on subtitle source files

Name                                                          Type                                                                Description
formatenum
ttml
imsc1
srt
stl
scc
scc0
scc1
webvtt
auto
The format of the subtitle track.
categoryenum
default
sdh
forced
described_music_and_sound
Set the category of the subtitle track.
delay_secintegerOptional. By how many seconds to delay the subtitle track from the start of the video track. Can be a positive value as well as negative (start subtitle track before video.)
sync_to_timecodebooleanOptional. When set to true the subtitle track will be synchronized with timecode information (for example from a timecode track, or timecode embedded in the video track) from the source. If false, the subtitle track will run synchronously to the video track. Default is false.
source_timecode_selectorenum
first
highest
lowest
mxf
gop
sdti
smpte
material_package
source_package
Specifies the metadata track to be used for time code data. Default is first.
timecode_frame_rateenum
59.94
29.97
23.98
Optional. Override framerate of the timecode metadata.
languagestringSelect subtitle language.
track_group_idstringThis indicates which Group this track belongs to. Multiple tracks with the same content but different bitrates would have the same track_group_id.
layer_idstringThis indicates which Layer this tracks belongs to. For example, this allows bundling one video layer and multiple audio layers with same bitrates but different languages.
layer_affinitiesarrayThis indicates which other layers this layer can be combined with. For example, to combine audio and video layers.

See our examples for some example operations.

Examples