US20230300399A1 - Methods and systems for synchronization of closed captions with content output - Google Patents
Methods and systems for synchronization of closed captions with content output Download PDFInfo
- Publication number
- US20230300399A1 US20230300399A1 US17/698,570 US202217698570A US2023300399A1 US 20230300399 A1 US20230300399 A1 US 20230300399A1 US 202217698570 A US202217698570 A US 202217698570A US 2023300399 A1 US2023300399 A1 US 2023300399A1
- Authority
- US
- United States
- Prior art keywords
- audio
- content
- text
- video
- closed caption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43074—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of additional data with content streams on the same device, e.g. of EPG data or interactive icon with a TV program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44004—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
Definitions
- output may be sped up or slowed down for only for 10 seconds or during transitions between stories.
- output may be sped up or slowed down at any time but only for 2 seconds.
- transitions into and out of commercials may be used to align the content with the closed captions, e.g., metadata associated with the transitions may indicate timing for a local advertisement insertion and may be used to align content with the closed captions.
- content e.g., including video, audio, and/or closed caption text
- livestreaming video content may be received, although content may refer generally to any audio or video content produced for viewer consumption regardless of the type, format, genre, or delivery method.
- the content may be associated with one or more content distributors that distribute the content to viewers for consumption.
- the closed captions may indicate textual information associated with the content.
- the closed captions may comprise text associated with spoken dialogue, sound effects, music, etc.
- the content may be associated with one or more genres, including sports, news, music or concert, documentary, or movie.
- the determined delay may be applied to the audio and/or the video associated with the content to align the content with the closed captions.
- the determined delay may be applied to the closed captions associated with the content to align the audio and/or the video with the closed captions.
- Portions of the content may be identified as candidates for inserting a delay. For example, in order to avoid any negative reaction from a viewer, a delay may be associated with a scene change. For example, the delay may be inserted at a scene change, immediately before, or immediately after.
- An output speed change may be less detectable to viewers, for example, during a portion of the content that contains less motion or less speech.
- a scenery shot without any dialogue may be a good candidate for an output speed change.
- the portions of content may be identified that do not contain large amounts of motion or speech.
- Different genres may be able to be sped up or slowed down at different rates or for different periods of time without being easily detectable to viewers of the content. For example, for sports content, output may be sped up or slowed down for only for 5 seconds or during a timeout with less motion. For news content, output may be sped up or slowed down for only for 10 seconds or during transitions between stories. For concert or music content, output may be sped up or slowed down at any time but only for 2 seconds.
- the computing device 700 may operate in a networked environment using logical connections to remote computing nodes and computer systems through local area network (LAN) 716 .
- the chipset 706 may comprise functionality for providing network connectivity through a network interface controller (NIC) 722 , such as a gigabit Ethernet adapter.
- NIC network interface controller
- a NIC 722 may be capable of connecting the computing device 700 to other computing nodes over a network 716 . It should be appreciated that multiple NICs 722 may be present in the computing device 700 , connecting the computing device to other types of networks and remote computer systems.
- Computer-readable storage media may comprise volatile and non-volatile, transitory computer-readable storage media and non-transitory computer-readable storage media, and removable and non-removable media implemented in any method or technology.
- Computer-readable storage media includes, but is not limited to, RAM, ROM, erasable programmable ROM (“EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory or other solid-state memory technology, compact disc ROM (“CD-ROM”), digital versatile disk (“DVD”), high definition DVD (“HD-DVD”), BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, or any other medium that may be used to store the desired information in a non-transitory fashion.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Alignment between closed caption and audio/video content may be improved by determining text associated with a portion of the audio or a portion of the video and comparing the determined text to a portion of closed caption text. Based on the comparison, a delay may be determined and the audio/video content may be buffered based on the determined delay.
Description
- Closed captions and subtitles allow users to display text on a display of a device to provide additional or interpretive information. “Closed” indicates that captions are not visible until activated by a user, whereas “open” captions are visible to all viewers. Accordingly, closed captions may allow a user to display textual transcriptions of an audio portion of content or textual descriptions of non-speech elements of the content to a user. Ideally, closed captions and subtitles are in synchronization with audio/visual content. However, there may be lag between closed captions and audio and/or video content (e.g., by several seconds) due to, for example, technical delays associated with manual or live transcriptions. Improvements are needed for synchronization of closed captioning systems to improve viewing experience.
- Methods and systems are disclosed for improved alignment between closed captioned text and audio output (e.g., audio from a content creator, content provider, video player, etc.). Content including video, audio, and closed caption text may be received and, based on a portion of the audio or a portion of the video, text associated with the portion of the audio or the portion of the video may be determined. The determined text may be compared to a portion of the closed caption text and, based on the comparison, a delay may be determined. The audio or video of the content may be buffered based on the determined delay. If the closed caption text is ahead, the closed caption text stream may be buffered. For example, encoded audio may be removed from an audiovisual stream, decoded, converted to text, and then compared to a closed captioned stream. Based on the comparison, the closed captioned stream may be realigned with the audiovisual stream.
- The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples and together with the description, serve to explain the principles of the methods and systems:
-
FIG. 1 shows an example environment; -
FIG. 2 shows an example encoder; -
FIG. 3 shows an example user device; -
FIG. 4 shows an example method; -
FIG. 5 shows an example method; -
FIG. 6 shows an example method; and -
FIG. 7 shows an example computing device. - Closed captions provide a text version of what takes place on a screen. For example, closed captions can provide a text version of dialogue, sound effects, or music to a viewer with a hearing impairment. However, the viewer may have difficulty understanding a program if associated closed captions do not line up properly with events or dialogue taking place on screen.
- For live programming, audio including spoken words, soundtrack, sound effects, etc. may be transcribed by a human operator. For example, a speech-to-text reporter may use a stenotype (e.g., shorthand) or stenomask (e.g., voice writing) machine to convert audio into text so it may be displayed on a screen. As another example, voice recognition software may be used to convert audio into text. Due to processing times associated with existing approaches, closed captions of live broadcasts (e.g., news bulletins, sports events, live entertainment shows, etc.) often lag by several seconds. For prerecorded programs, unlike live programs, audio may be transcribed and closed captions may be prepared, positioned, and timed in advance.
- In National Television Standards Committee (NTSC) programming, closed captions may be encoded into a part of the television (TV) picture that sits just above the visible portion and is usually unseen (e.g., line 21 of the vertical blanking interval). In Advanced Television Systems Committee (ATSC) programming, three streams may be encoded in the video. For example, two streams may be backward compatible “line 21” captions and a third stream may be a set of up to 63 additional caption streams (e.g., encoded in EIA-708 format).
-
FIG. 1 shows an example environment in which the systems and methods described herein may be implemented. Such an environment may comprise acontent database 102, an encoder/packager 112, and at least one device 116 (e.g., a player). Thecontent database 102, the encoder/packager 112, and the at least one device 116 may be in communication via anetwork 114. Thecontent database 102, the encoder/packager 112, or the at least one device 116 may be associated with an individual or entity seeking to align content with closed captions or subtitles. - The encoder/
packager 112 may implement a number of the functions and techniques described herein. For example, the encoder/packager 112 may receivecontent 104 from thecontent database 102. Thecontent 104 may comprise, for example,audio 106,video 108, and/or closedcaptions 110.Audio 104 orvideo 106 may refer generally to any audio or video content produced for viewer consumption regardless of the type, format, genre, or delivery method.Audio 104 orvideo 106 may comprise audio or video content produced for broadcast via over-the-air radio, cable, satellite, or the internet.Audio 104 orvideo 106 may comprise digital audio or video content produced for digital video or audio streaming (e.g., video- or audio-on-demand).Audio 104 orvideo 106 may comprise a movie, a television show or program, an episodic or serial television series, or a documentary series, such as a nature documentary series. As yet another example,video 106 may comprise a regularly scheduled video program series, such as a nightly news program. Thecontent 104 may be associated with one or more content distributors that distribute thecontent 104 to viewers for consumption. - The
content 104 may comprise text data associated with content, such as closedcaptions 110. The closedcaptions 110 may indicate textual information associated with thecontent 104. For example, the closedcaptions 110 may comprise text associated with spoken dialogue, sound effects, music, etc.Content 104 may be associated with one or more genres, including sports, news, music or concert, documentary, or movie. For example, if the content is associated with the genre “sports,” this may indicate that the content is a sports game, such as a livestream of a sports game. - The closed
captions 110 may indicate speech associated with thecontent 104. For example, the closedcaptions 110 may indicate which speech associated with portions ofcontent 104. Subtitles may be part of a content track included in the closedcaptions 110. The presence or absence of dialogue may be detected through subtitling, for example, using Supplemental Enhancement Information (SEI) messages in the video elementary stream. If the subtitles for content are part of a separate track, the absence of dialogue may be detected, for example, by detecting an “empty segment.” - The content 104 (e.g., video 108) may indicate movement associated with the content. For example, the
video 108 may indicate which specific movements may be associated with portions of content. The movement associated with thecontent 104 may be based on the encoding parameters of thecontent 104. The movement associated with thecontent 104 may comprise camera movement, where the entire scene moves. For example, if the content is a soccer game, camera movement may involve a camera panning over the soccer field. The movement associated with thecontent 104 may additionally, or alternatively, comprise movement of objects in the content. For example, if the content is a soccer game, object movement may involve the soccer ball being kicked. - AI or machine learning may be used (e.g., by encoder/
packager 112 or user devices 116) to align andsync audio 106 and/orvideo 108 withclosed captions 110. For example, encoder/packager 112 or user devices 116 may implement a software algorithm that listens toaudio 106 and/or processesvideo 108 to determine when words being spoken incontent 104 match those ofclosed captions 110. - Audio-to-text translation may be used to find accompanying text in closed captions 110 (e.g., transcribed conversations, subtitles, descriptive text, etc.) to serve as a point in the audiovisual stream (e.g., a first marker in time) and closed caption stream (e.g., a second marker in time) to establish a sync. Audio-to-text translation may also be used to find accompanying text to audio content that describes aspects of the video that are purely visual (e.g., descriptive audio, audio description, and/or video description). For example, an audiovisual presentation device (e.g., user devices 116) may be equipped with sufficient storage, e.g., dynamic random-access memory (DRAM), hard disk drive (HDD), embedded multimedia card (MMC), to buffer an incoming audiovisual stream (e.g., content 104) for several seconds. The buffered content may be concurrently demultiplexed and the audio 106,
video 108,closed caption 110 components extracted. For example, the audio 106 may be decoded by a digital signal processor (DSP) or a central processing unit (CPU) (e.g., associated with encoder/packager 112 or user devices 116) and further processed by algorithms that convert the audio 106 to text. - The
closed captions 110 may be decoded by the CPU and an algorithm in the CPU may compare the closed caption text to an audio-to-text translation, e.g., looking for a match in words. If the words do not match, the CPU may hold onto the closed caption text and continue to process and compare the audio-to-text translation until it finds a match. Moreover, one or more markers in time may be used by the CPU as a reference to compare the closed caption text and the audio-to-text translation. For example, one or more first markers in time may be associated with the closed caption text and one or more second markers in time may be associated with the audio-to-text translation. The one or more first markers in time and the one or more second markers in time may correspond to a time in the playback of the content 104 (e.g., when a delay is to occur). - If the audiovisual stream and closed captions are in sync, the CPU may determine there is no need to add any delay to the audiovisual stream and the
content 104 may be sent to a video render engine and/or audio engine for output, e.g., over High-Definition Multimedia Interface (HDMI). According to some examples, if thecontent 104 andclosed captions 110 are not in sync, then the CPU may determine a delay (e.g., in milliseconds) that is needed to be applied toaudio 106 and/orvideo 108 to align thecontent 104 with theclosed captions 110. For example, the CPU may calculate the delay by comparing one or more first markers in time associated with the closed caption text to one or more second markers in time associated with the audio-to-text translation. Moreover, the one or more markers in time may be used to identify a point in time associated with the content at which a delay may occur. - A synchronization component of the CPU may synchronize a timer to one or multiple synchronization markers (e.g., markers in time) of the content, the closed caption text, and/or the audio-to-text translation. In some examples, the markers in time may be determined by the synchronization component processing the audiovisual stream and the closed captions.
- AI, machine learning, or other video processing techniques may be used to process
video 108, e.g., where scenes, movements, or other visual features align withclosed captions 110. For example,closed captions 110 may textually describe video associated with a scene (e.g., a car stopping or a door opening) and the CPU may processvideo 108 to identify a scene where a car is stopping or a door opens. Upon identifying the scene matchingclosed captions 110, the scene may be flagged (e.g., by recording a time, location, stamp, etc.) and a delay may be determined by comparing the flagged scene with the matching captions or subtitles. The delay may then be applied toaudio 106 and/orvideo 108 to align thecontent 104 with theclosed captions 110. - Artificial Intelligence (AI), machine learning, or other audio processing techniques may be used to process audio 106, e.g., where sounds, sound effects, or other audio features align with
closed captions 110. A machine learning model may comprise Deep Speech, a Deep Learning Network (DLN), a recurrent neural network (RNN), or any other suitable learning algorithm. For example,closed captions 110 may textually describe audio associated with a scene (e.g., a car brakes squealing or a door creaking as it opens) and the CPU may process audio 106 to identify a scene including a squealing or creaking sound. Upon identifying the scene matching theclosed captions 110, the scene may be flagged (e.g., by recording a time, location, stamp, etc.) and a delay may be determined by comparing the flagged scene with the matching captions or subtitles. The delay may then be applied toaudio 106 and/orvideo 108 to align thecontent 104 with theclosed captions 110. - The encoder/
packager 112 may use thecontent 104 andcontent 106 to determine portions of content that are candidates for inserting a delay. For example, a scene changes in thecontent 104 may be indicative of a start of a new portion of thecontent 104. A scene in thecontent 104 may be a single camera shot of an event. A scene change may occur when a viewer perspective is switched to a different camera shot. In order to any negative reaction from a viewer, a delay may be associated with a scene change. For example, the delay may be inserted at a scene change, immediately before, or immediately after. - It may be desirable to adjust the output speed of content during certain portions of the content when an output speed change is not as detectable as a delay to viewers of the content. An output speed change may be less detectable to viewers, for example, during a portion of the content that contains less motion or less speech. For example, a scenery shot without any dialogue may be a good candidate for an output speed change. Accordingly, the encoder/
packager 112 may use thecontent 104 to determine portions of content that do not contain large amounts of motion or speech. Different genres may be able to be sped up or slowed down at different rates or for different periods of time without being easily detectable to viewers of the content. For example, for sports content, output may be sped up or slowed down for only for 5 seconds or during a timeout with less motion. For news content, output may be sped up or slowed down for only for 10 seconds or during transitions between stories. For concert or music content, output may be sped up or slowed down at any time but only for 2 seconds. - The
content database 102 may provide the content and the content data to the encoder/packager 112. Thecontent database 102 may be integrated with one or more of the encoder/packager 112 or the at least one device 116. Thenetwork 114 may comprise one or more public networks (e.g., the Internet) and/or one or more private networks. A private network may comprise a wireless local area network (WLAN), a local area network (LAN), a wide area network (WAN), a cellular network, or an intranet. Thenetwork 114 may comprise wired network(s) and/or wireless network(s). - The
content database 102, the encoder/packager 112, and the at least one device 116 may each be implemented on the same or different computing devices. For example, thecontent database 102 may be located in a datastore of the same organization as the encoder/packager 112, or in the datastore of a different organization. Such a computing device may comprise one or more processors and memory storing instructions that, when executed by the one or more processors, cause the computing device to perform one or more of the various methods or techniques described here. The memory may comprise volatile memory (e.g., random access memory (RAM)) and/or non-volatile memory (e.g., a hard or solid-state drive). The memory may comprise a non-transitory computer-readable medium. -
FIG. 2 shows anexemplary encoding environment 200. Theencoding environment 200 may comprisesource content 202, an encoder/packager 204, and an encodedbitstream 206. Thesource content 202 may comprise content, such as content 104 (e.g., includingaudio 106,video 108, and/or closed captions 110). Thesource content 202 may be input into the encoder/packager 204. For example, the encoder/packager 204 may be the encoder/packager 112 ofFIG. 1 . The encoder/packager 204 may generate the encodedbitstream 206 associated with thesource content 202. For example, the encodedbitstream 206 may comprise one or more of aclosed caption bitstream 206 a, avideo bitstream 206 b, or anaudio bitstream 206 c. If the encodedbitstream 206 comprises theclosed caption bitstream 206 a, theclosed caption bitstream 206 a may comprise textual data associated with thesource content 202. If the encodedbitstream 206 comprises thevideo bitstream 206 b, thevideo bitstream 206 b may indicate video data associated with thesource content 202. If the encodedbitstream 206 comprises theaudio bitstream 206 c, theaudio bitstream 206 c may indicate audio data associated with thesource content 202. - The encoded
bitstream 206 may comprise at least one indication of portions of content that are good candidates for delay or output speed change, such as portions 208 a-c. As discussed above, it may be desirable to insert a delay and/or adjust the output speed of content during certain portions of the content when an output speed change is less detectable to viewers of the content. Accordingly, the portions 208 a-c may be portions of content during which a delay or an output speed change may not be easily detectable by viewers of the content. The encodedbitstream 206 may indicate at least one of a start time (e.g., a marker in time) associated with each of these portions of content or a duration of each of these portions of content. For example, the encodedbitstream 206 may indicate that theportion 208 a has a start time t1 and a duration d1, theportion 208 b has a start time t2 and a duration d2, and theportion 208 c has a duration d3. The durations of the different portions may be different or may be the same. The encodedbitstream 206 may comprise an indication of a rate of output speed change associated with each portion of content that is a good candidate for output speed change, such as the portions 208 a-c. The rate of output speed change associated with a particular portion of content may indicate how much output of content may be sped up or slowed down, or both sped up and slowed down, during that portion of content without being easily detectable to viewers of the content. For example, output of content may be either sped up or slowed down during a portion of content in a scenery view that contains no dialogue. The encodedbitstream 206 may be used by a device, such as the at least one device 116 ofFIG. 1 , to output the content associated with thesource content 202 and to adjust the output speed of the content during the portions 208 a-c. -
FIG. 3 shows anexample user device 300 in which the systems and methods described herein may be implemented.User device 300 may comprise a buffer 302 (e.g., DRAM, FLASH, HDD, etc.), adecoder 304, a digital signal processor (DSP) 306, a central processing unit (CPU) 308, and a graphics processing unit (GPU) 310. Thedevice 300 may provide playback of anaudiovisual stream 316 by avideo rendering engine 312 and/or a high-definition multimedia interface (HDMI) 314. Moreover, thedevice 300 may be associated with an individual or entity seeking to align content with closed captions or subtitles. - The
device 300 may receive anaudiovisual stream 316. Theaudiovisual stream 316 may comprise encodedvideo 318, encodedaudio 320, andclosed captions 322. The encodedvideo 318 may be decoded by thedecoder 304 resulting in decodedvideo 324. The decodedvideo 324 may be provided by thedecoder 304 to theGPU 310. The encodedaudio 320 may be decoded by theDSP 306 resulting in decodedaudio 326. - The
CPU 308 may be configured to receive the decodedaudio 326. TheCPU 308 may be configured to perform an audio-to-text conversion of the decodedaudio 326. TheCPU 308 may be configured to compare the converted text to theclosed captions 322. One or more markers in time may be used by theCPU 308 as a reference to compare theclosed captions 322 and the audio-to-text conversion of the decodedaudio 326. For example, one or more first markers in time may be associated with theclosed captions 322 and one or more second markers in time may be associated with the audio-to-text conversion of the decodedaudio 326. The one or more first markers in time may synchronize to a time of the decodedaudio 326 and the one or more second markers in time may synchronize to a time of the audio-to-text translation, so as to correspond to a time in the playback of the audiovisual stream 316 (e.g., when a buffering delay is to occur). - Based on the comparison, the
CPU 308 may be configured to determine a buffering delay to synchronize theclosed captioning content 322 with theaudiovisual stream 316. For example, the buffering delay may compensate for an offset between timing of the decodedaudio 326 and theclosed captions 322. For example, theCPU 308 may calculate the buffering delay by comparing one or more first markers in time associated with theclosed captions 322 to one or more second markers in time associated with the decodedaudio 326. Moreover, the one or more markers in time may be used to identify a point in time associated with theaudiovisual stream 316 at which a delay may occur. - A synchronization component of the
CPU 308 may synchronize a timer to one or multiple synchronization markers in time of theaudiovisual stream 316, theclosed captions 322, or the decodedaudio 326. In some examples, the markers in time may be determined by the synchronization component processing the decodedaudio 326 and theclosed captions 322. - The
CPU 308 may be configured to provide the determined delay to thebuffer 302. Thebuffer 302 may be configured to insert the determined delay into the audiovisual stream to synchronize audio and visual components of the audiovisual stream (e.g., encodedvideo 318 and encoded audio 320) with closed caption content (e.g., closed captions 322). For example, thebuffer 302 may synchronize the audio and visual components of theaudiovisual stream 316 with closed caption text by buffering one or more of the audio (e.g., encoded audio 320), the video (e.g., encoded video 318), and the closed caption text (e.g., closed captions 322). -
FIG. 4 shows anexemplary method 400. Themethod 400 may be used to align closed captions with audiovisual content, such as thecontent 104 associated withFIG. 1 . Themethod 400 may be performed, for example, by thesystem 100 ofFIG. 1 ordevice 300 ofFIG. 3 . Content may be received. Content may comprise a content asset or program, such as linear content, and may further comprise sequential content such as, for example, a television show, a movie, a sports event broadcast, or the like. Moreover, the content may comprise livestreaming video content or other types of content. As used herein, content may additionally include a portion of a program or content asset. - At
step 402, at least one content may be received. For example, the at least one content may be received by an encoder, such as the encoder/packager 112 or encoder/packager 204. The at least one content may comprise video content (e.g.,video 206 b or encoded video 318), audio content (e.g., audio 206 c or encoded audio 320), and closed captioning (e.g.,closed captions 206 a or closed captions 322). The at least one content may comprise livestreaming content (e.g.,source content 202 or audiovisual stream 316) and, for example, the at least one content may comprise a livestreaming concert, sports program, news program, documentary, or movie. One or more markers in time may be associated with the video content (e.g.,video 206 b or encoded video 318), audio content (e.g., audio 206 c or encoded audio 320), and closed captioning (e.g.,closed captions 206 a or closed captions 322). - At
step 404, the audiovisual content may be buffered (e.g., based on a computed delay from step 430) and, atstep 406, the buffered content from step 404 (e.g., carrying multiple encoded data streams) may enter demultiplexer (demux) 406. Thisdemux 406 may serve as a switch, e.g., by selecting which video and which audio data stream in a multiplexed transport stream to pass on. For example, demux 406 may pass on an audio stream, a video stream, and/or a closed caption stream. - At
step 408, it may be determined whether the content comprises closed captions. If the content does not comprise closed captions, it may be determined to play the audiovisual stream in real-time atstep 436. If the content does comprise closed captions, the closed captions may be decoded atstep 410 and the audio may be decoded atstep 412. Voice or audio to text conversion may be performed atstep 414. - At
step 416, the decoded captions fromstep 410 may be compared to the converted text fromstep 414 and, atstep 418, it may be determined, based on the comparison atstep 416, whether the closed captions (e.g., from step 410) match the converted text (e.g., from step 414). If the content does match the converted text, it may be determined to play the audiovisual stream in real-time atstep 436. If the closed captions do not match the converted text, the closed captions may be held at step 420 (e.g., for a constant or variable time period) and the audio may be decoded atstep 422. Voice or audio to text conversion may be performed atstep 424. - At
step 426, the held captions fromstep 420 may be compared to the converted text fromstep 424. Atstep 428, it may be determined, based on the comparison atstep 426, whether the held closed captions (e.g., from step 420) match the converted text (e.g., from step 426). If the held closed captions do not match the converted text, the process may repeat itself by once again holding the closed captions atstep 420, decoding the audio atstep 422, performing voice or audio to text conversion atstep 424, and, atstep 426, comparing the held captions fromstep 420 with the converted text fromstep 424. This process may be repeated iteratively until, atstep 428, it is determined (e.g., based on the comparison at step 426) that the held closed captions (e.g., from step 420) match the converted text (e.g., from step 426). - Once a match has been determined at
step 428, a delay may be competed atstep 430. The delay may be a time offset of the closed captions to the decoded audio. For example, the delay may be computed based on a length of time and/or a number of times that the closed captions are held atstep 420. Moreover, the delay may be computed based on a comparison of markers in time associated with the closed captions and the decoded audio. At step 432, the audiovisual stream may be buffered and played with the computed delay and, atstep 434, the captions may be played without delay. - At
step 438, the audiovisual stream fromstep 436 and/or the captions fromstep 434 and the buffered audiovisual stream from step 432 may be stored to disk. If the user has turned on/enabled closed captions, then atstep 440, the video and captions may be rendered and, atstep 442, the process may end (e.g., terminate). -
FIG. 5 shows anexemplary method 500. Themethod 500 may be used for aligning closed captions with audiovisual content, such as thecontent 104 associated with FIG. 1. Themethod 500 may be performed, for example, by one or more components ofsystem 100 ofFIG. 1 ordevice 300 ofFIG. 3 . - At
step 502, content including video, audio, and/or closed captions may be received. For example, livestreaming video content may be received, although content may refer generally to any audio or video content produced for viewer consumption regardless of the type, format, genre, or delivery method. The content may be associated with one or more content distributors that distribute the content to viewers for consumption. The closed captions may indicate textual information associated with the content. For example, the closed captions may comprise text associated with spoken dialogue, sound effects, music, etc. The content may be associated with one or more genres, including sports, news, music or concert, documentary, or movie. - The closed captions may indicate speech associated with the content. For example, the closed captions may indicate which speech associated with portions of content. Subtitles may be part of a content track included in the closed captions. The presence or absence of dialogue may be detected through subtitling, for example, using SEI messages in the video elementary stream. If the subtitles for content are part of a separate track, the absence of dialogue may be detected, for example, by detecting an “empty segment.”
- At
step 504, text associated with at least a portion of the audio or a portion of the video may be determined based on the at least the portion of the audio or the portion of the video. An audio-to-text conversion (e.g., transcribed conversations, subtitles, descriptive text, etc.) may be performed on a portion of the audio or a visual analysis map be performed to describe a portion of the video with text. For example, a software algorithm may listen to audio and/or processes video associated with the content to determine words being spoken in the content. As another example, a software algorithm may identify descriptive audio (e.g., additional audio content that describes aspects of the video that are purely visual) and may convert the descriptive audio of the content to text. - An audiovisual presentation device may be equipped with sufficient storage (e.g., DRAM, HDD, eMMC) to buffer an incoming audiovisual stream for several seconds. The buffered content may be concurrently demultiplexed and the audio, video, and/or closed caption components extracted. For example, the audio or video may be decoded by a digital signal processor (DSP) or a central processing unit (CPU) and further processed by algorithms that convert the audio or video to text. AI, machine learning, or other video or audio processing techniques may be used to process video or audio associated with the content, e.g., where scenes, movements, sounds, sound effects, or other visual/audio features align with closed captions.
- At
step 504, a first time marker associated with the closed caption text may be determined based on a timeline associated with the content. For example, closed captions at a first time of a timeline associated with the content may textually describe video associated with a scene (e.g., a car stopping or a door opening). - At
step 506, a second time marker associated with the determined text (e.g., from step 502) may be determined based on the timeline associated with the content and a comparison of the determined text to at least a portion of the closed caption text. For example, the video associated with the content may be processed to identify a scene where a car is stopping or a door opens. Upon identifying the scene matching closed captions at a second time of the timeline associated with the content, the scene may be flagged (e.g., by recording a time, location, stamp, etc.). - At
step 508, a delay may be determined based on a comparison of the first time marker and the second time marker. For example, the delay may be determined by comparing the second marker associated with the flagged scene to the first marker associated with the matching captions or subtitles. - The closed captions may be decoded and the closed caption text may be compared to a audio-to-text translation, e.g., looking for a match in words. If the words do not match, the closed caption text may be held and the audio-to-text translation may be iteratively processed and compared to the held closed caption text until a match is identified. If the audiovisual stream and closed captions are in sync, it may be determined that there is no need to add any delay to the audiovisual stream and the content may be sent to a video render engine and/or audio engine for output (e.g., over HDMI). If the content and closed captions are not in sync, then a delay may be determined (e.g., in milliseconds) that is needed to be applied to audio and/or video to align the content with the closed captions.
- At
step 510, at least one of the audio, the video, or the closed captions of the content may be buffered based on the determined delay. For example, the determined delay may be applied to the audio and/or the video associated with the content to align the content with the closed captions. As another example, the determined delay may be applied to the closed captions associated with the content to align the audio and/or the video with the closed captions. Portions of the content may be identified as candidates for inserting a delay. For example, in order to avoid any negative reaction from a viewer, a delay may be associated with a scene change. For example, the delay may be inserted at a scene change, immediately before, or immediately after. - It may be desirable to adjust the output speed of content during certain portions of the content when an output speed change is not as detectable as a delay to viewers of the content. An output speed change may be less detectable to viewers, for example, during a portion of the content that contains less motion or less speech. For example, a scenery shot without any dialogue may be a good candidate for an output speed change. Accordingly, the portions of content may be identified that do not contain large amounts of motion or speech. Different genres may be able to be sped up or slowed down at different rates or for different periods of time without being easily detectable to viewers of the content. For example, for sports content, output may be sped up or slowed down for only for 5 seconds or during a timeout with less motion. For news content, output may be sped up or slowed down for only for 10 seconds or during transitions between stories. For concert or music content, output may be sped up or slowed down at any time but only for 2 seconds. As another example, transitions into and out of commercials may be used to align the content with the closed captions, e.g., metadata associated with the transitions may indicate timing for a local advertisement insertion and may be used to align content with the closed captions.
-
FIG. 6 shows anexemplary method 600. Themethod 600 may be used for aligning closed captions with audiovisual content, such as thecontent 104 associated withFIG. 1 . Themethod 600 may be performed, for example, by one or more components ofsystem 100 ofFIG. 1 ordevice 300 ofFIG. 3 . - At
step 602, content (e.g., including video, audio, and/or closed caption text) may be received. For example, livestreaming video content may be received, although content may refer generally to any audio or video content produced for viewer consumption regardless of the type, format, genre, or delivery method. The content may be associated with one or more content distributors that distribute the content to viewers for consumption. The closed captions may indicate textual information associated with the content. For example, the closed captions may comprise text associated with spoken dialogue, sound effects, music, etc. The content may be associated with one or more genres, including sports, news, music or concert, documentary, or movie. - The closed captions may indicate speech associated with the content. For example, the closed captions may indicate which speech associated with portions of content. Subtitles may be part of a content track included in the closed captions. The presence or absence of dialogue may be detected through subtitling, for example, using SEI messages in the video elementary stream. If the subtitles for content are part of a separate track, the absence of dialogue may be detected, for example, by detecting an “empty segment.”
- At
step 604, text may be determined based on an audio-to-text conversion (e.g., transcribed conversations, subtitles, descriptive text, etc.) of at least a portion of the audio. For example, a software algorithm may listen to audio associated with the content to determine words being spoken in the content. - An audiovisual presentation device may be equipped with sufficient storage (e.g., DRAM, HDD, eMMC) to buffer an incoming audiovisual stream for several seconds. The buffered content may be concurrently demultiplexed and the audio, video, and/or closed caption components extracted. For example, the audio may be decoded by a digital signal processor (DSP) or a central processing unit (CPU) and further processed by algorithms that convert the audio to text. AI, machine learning, or other video or audio processing techniques may be used to process audio associated with the content, e.g., where sounds, sound effects, or other audio features align with closed captions.
- The closed captions may be decoded and the closed caption text may be compared to a audio-to-text translation, e.g., looking for a match in words. If the words do not match, the closed caption text may be held and the audio-to-text translation may be iteratively processed and compared to the held closed caption text until a match is identified. If the audiovisual stream and closed captions are in sync, it may be determined that there is no need to add any delay to the audiovisual stream and the content may be sent to a video render engine and/or audio engine for output (e.g., over HDMI). If the content and closed captions are not in sync, then a delay may be determined (e.g., in milliseconds) that is needed to be applied to audio to align the content with the closed captions.
- At
step 608, at least one of the audio, the video, or the closed captions of the content may be buffered based on the determined delay. For example, the determined delay may be applied to the audio and/or the video associated with the content to align the content with the closed captions. As another example, the determined delay may be applied to the closed captions associated with the content to align the audio and/or the video with the closed captions. Portions of the content may be identified as candidates for inserting a delay. For example, in order to avoid any negative reaction from a viewer, a delay may be associated with a scene change. For example, the delay may be inserted at a scene change, immediately before, or immediately after. - It may be desirable to adjust the output speed of content during certain portions of the content when an output speed change is not as detectable as a delay to viewers of the content. An output speed change may be less detectable to viewers, for example, during a portion of the content that contains less motion or less speech. For example, a scenery shot without any dialogue may be a good candidate for an output speed change. Accordingly, the portions of content may be identified that do not contain large amounts of motion or speech. Different genres may be able to be sped up or slowed down at different rates or for different periods of time without being easily detectable to viewers of the content. For example, for sports content, output may be sped up or slowed down for only for 5 seconds or during a timeout with less motion. For news content, output may be sped up or slowed down for only for 10 seconds or during transitions between stories. For concert or music content, output may be sped up or slowed down at any time but only for 2 seconds.
-
FIG. 7 shows an example computing device that may be used in various examples. With regard to the example environment ofFIG. 1 , one or more of thecontent database 102, the encoder/packager 112, or the at least one device 116 may be implemented in an instance of acomputing device 700 ofFIG. 7 . The computer architecture shown inFIG. 7 shows a conventional server computer, workstation, desktop computer, laptop, tablet, network appliance, PDA, e-reader, digital cellular phone, or other computing node, and may be utilized to execute any aspects of the computers described herein, such as to implement the methods described inFIGS. 3-5 . - The
computing device 700 may comprise a baseboard, or “motherboard,” which is a printed circuit board to which a multitude of components or devices may be connected by way of a system bus or other electrical communication paths. One or more central processing units (CPUs) 704 may operate in conjunction with achipset 706. The CPU(s) 704 may be standard programmable processors that perform arithmetic and logical operations necessary for the operation of thecomputing device 700. - The CPU(s) 704 may perform the necessary operations by transitioning from one discrete physical state to the next through the manipulation of switching elements that differentiate between and change these states. Switching elements may generally comprise electronic circuits that maintain one of two binary states, such as flip-flops, and electronic circuits that provide an output state based on the logical combination of the states of one or more other switching elements, such as logic gates. These basic switching elements may be combined to create more complex logic circuits including registers, adders-subtractors, arithmetic logic units, floating-point units, and the like.
- The CPU(s) 704 may be augmented with or replaced by other processing units, such as graphic processing units (GPUs) 705. The GPU(s) 705 may comprise processing units specialized for but not necessarily limited to highly parallel computations, such as graphics and other visualization-related processing.
- A user interface may be provided between the CPU(s) 704 and the remainder of the components and devices on the baseboard. The interface may be used to access a random-access memory (RAM) 708 used as the main memory in the
computing device 700. The interface may be used to access a computer-readable storage medium, such as a read-only memory (ROM) 720 or non-volatile RAM (NVRAM) (not shown), for storing basic routines that may help to start up thecomputing device 700 and to transfer information between the various components and devices.ROM 720 or NVRAM may also store other software components necessary for the operation of thecomputing device 700 in accordance with the examples described herein. The user interface may be provided by a one or more electrical components such as thechipset 706. - The
computing device 700 may operate in a networked environment using logical connections to remote computing nodes and computer systems through local area network (LAN) 716. Thechipset 706 may comprise functionality for providing network connectivity through a network interface controller (NIC) 722, such as a gigabit Ethernet adapter. ANIC 722 may be capable of connecting thecomputing device 700 to other computing nodes over anetwork 716. It should be appreciated thatmultiple NICs 722 may be present in thecomputing device 700, connecting the computing device to other types of networks and remote computer systems. - The
computing device 700 may be connected to astorage device 728 that provides non-volatile storage for the computer. Thestorage device 728 may store system programs, application programs, other program modules, and data, which have been described in greater detail herein. Thestorage device 728 may be connected to thecomputing device 700 through astorage controller 724 connected to thechipset 706. Thestorage device 728 may consist of one or more physical storage units. Astorage controller 724 may interface with the physical storage units through a serial attached SCSI (SAS) interface, a serial advanced technology attachment (SATA) interface, a fiber channel (FC) interface, or other type of interface for physically connecting and transferring data between computers and physical storage units. - The
computing device 700 may store data on astorage device 728 by transforming the physical state of the physical storage units to reflect the information being stored. The specific transformation of a physical state may depend on various factors and on different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the physical storage units and whether thestorage device 728 is characterized as primary or secondary storage and the like. - For example, the
computing device 700 may store information to thestorage device 728 by issuing instructions through astorage controller 724 to alter the magnetic characteristics of a particular location within a magnetic disk drive unit, the reflective or refractive characteristics of a particular location in an optical storage unit, or the electrical characteristics of a particular capacitor, transistor, or other discrete component in a solid-state storage unit. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this description. Thecomputing device 700 may read information from thestorage device 728 by detecting the physical states or characteristics of one or more particular locations within the physical storage units. - In addition or alternatively to the
storage device 728 described herein, thecomputing device 700 may have access to other computer-readable storage media to store and retrieve information, such as program modules, data structures, or other data. It should be appreciated by those skilled in the art that computer-readable storage media may be any available media that provides for the storage of non-transitory data and that may be accessed by thecomputing device 700. - By way of example and not limitation, computer-readable storage media may comprise volatile and non-volatile, transitory computer-readable storage media and non-transitory computer-readable storage media, and removable and non-removable media implemented in any method or technology. Computer-readable storage media includes, but is not limited to, RAM, ROM, erasable programmable ROM (“EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory or other solid-state memory technology, compact disc ROM (“CD-ROM”), digital versatile disk (“DVD”), high definition DVD (“HD-DVD”), BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, or any other medium that may be used to store the desired information in a non-transitory fashion.
- A storage device, such as the
storage device 728 depicted inFIG. 7 , may store an operating system utilized to control the operation of thecomputing device 700. The operating system may comprise a version of the LINUX operating system. The operating system may comprise a version of the WINDOWS SERVER operating system from the MICROSOFT Corporation. According to additional examples, the operating system may comprise a version of the UNIX operating system. Various mobile phone operating systems, such as IOS and ANDROID, may also be utilized. It should be appreciated that other operating systems may also be utilized. Thestorage device 728 may store other system or application programs and data utilized by thecomputing device 700. - The
storage device 728 or other computer-readable storage media may also be encoded with computer-executable instructions, which, when loaded into thecomputing device 700, transforms the computing device from a general-purpose computing system into a special-purpose computer capable of implementing the examples described herein. These computer-executable instructions transform thecomputing device 700 by specifying how the CPU(s) 704 transition between states, as described herein. Thecomputing device 700 may have access to computer-readable storage media storing computer-executable instructions, which, when executed by thecomputing device 700, may perform the methods described in relation toFIGS. 4-6 . - A computing device, such as the
computing device 700 depicted inFIG. 7 , may also comprise an input/output controller 732 for receiving and processing input from a number of input devices, such as a keyboard, a mouse, a touchpad, a touch screen, an electronic stylus, or other type of input device. Similarly, an input/output controller 732 may provide output to a display, such as a computer monitor, a flat-panel display, a digital projector, a printer, a plotter, or other type of output device. It will be appreciated that thecomputing device 700 may not comprise all of the components shown inFIG. 7 , may comprise other components that are not explicitly shown inFIG. 7 , or may utilize an architecture completely different than that shown inFIG. 7 . In some implementations of thecomputing device 700, certain components, such as for example, thenetwork interface controller 722, input/output controller 732, 704, 705 andCPUs storage controller 724, may be implemented using a System on Chip (SoC) architecture. - As described herein, a computing device may be a physical computing device, such as the
computing device 700 ofFIG. 7 . A computing node may also comprise a virtual machine host process and one or more virtual machine instances. Computer-executable instructions may be executed by the physical hardware of a computing device indirectly through interpretation and/or execution of instructions stored and executed in the context of a virtual machine. - Components are described that may be used to perform the described methods and systems. When combinations, subsets, interactions, groups, etc., of these components are described, it is understood that while specific references to each of the various individual and collective combinations and permutations of these may not be explicitly described, each is specifically contemplated and described herein, for all methods and systems. This applies to all aspects of this application including, but not limited to, operations in described methods. Thus, if there are a variety of additional operations that may be performed it is understood that each of these additional operations may be performed with any specific example or combination of examples of the described methods.
- The present methods and systems may be understood more readily by reference to the following detailed description of examples included therein and to the Figures and their descriptions.
- As will be appreciated by one skilled in the art, the methods and systems may take the form of an entirely hardware example, an entirely software example, or an example combining software and hardware aspects. Furthermore, the methods and systems may take the form of a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) embodied in the storage medium. More particularly, the present methods and systems may take the form of web-implemented computer software. Any suitable computer-readable storage medium may be utilized including hard disks, CD-ROMs, optical storage devices, or magnetic storage devices.
- Examples of the methods and systems are described below with reference to block diagrams and flowchart illustrations of methods, systems, apparatuses and computer program products. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, may be implemented by computer program instructions. These computer program instructions may be loaded on a general-purpose computer, special-purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create a means for implementing the functions specified in the flowchart block or blocks.
- These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including computer-readable instructions for implementing the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
- The various features and processes described herein may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain methods or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto may be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically described, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the described examples. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the described examples.
- It will also be appreciated that various items are illustrated as being stored in memory or on storage while being used, and that these items or portions thereof may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other examples, some or all of the software modules and/or systems may execute in memory on another device and communicate with the illustrated computing systems via inter-computer communication. Furthermore, in some examples, some or all of the systems and/or modules may be implemented or provided in other ways, such as at least partially in firmware and/or hardware, including, but not limited to, one or more application-specific integrated circuits (“ASICs”), standard integrated circuits, controllers (e.g., by executing appropriate instructions, and including microcontrollers and/or embedded controllers), field-programmable gate arrays (“FPGAs”), complex programmable logic devices (“CPLDs”), etc. Some or all of the modules, systems, and data structures may also be stored (e.g., as software instructions or structured data) on a computer-readable medium, such as a hard disk, a memory, a network, or a portable media article to be read by an appropriate device or via an appropriate connection. The systems, modules, and data structures may also be transmitted as determined data signals (e.g., as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission media, including wireless-based and wired/cable-based media, and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). Such computer program products may also take other forms in other examples. Accordingly, the present examples may be practiced with other computer system configurations.
- While the methods and systems have been described in connection with specific examples, it is not intended that the scope be limited to the particular examples set forth, as the examples herein are intended in all respects to be illustrative rather than restrictive.
- Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its operations be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its operations or it is not otherwise specifically stated in the claims or descriptions that the operations are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including: matters of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; and the number or type of examples described in the specification.
- It will be apparent to those skilled in the art that various modifications and variations may be made without departing from the scope or spirit of the present disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practices described herein. It is intended that the specification and example figures be considered as exemplary only, with a true scope and spirit being indicated by the following claims.
Claims (22)
1. A method comprising:
receiving content, wherein the content comprises at least video, audio, and closed caption text;
determining, based on at least a portion of the audio or a portion of the video, text associated with the at least the portion of the audio or the portion of the video;
determining, based on a timeline associated with the content, a first time marker associated with the closed caption text;
determining, based on the timeline associated with the content and a comparison of the determined text to at least a portion of the closed caption text, a second time marker associated with the determined text;
determining, based on a comparison of the first time marker and the second time marker, a delay; and
buffering, based on the determined delay, at least one of the audio or video of the content.
2. The method recited in claim 1 , wherein the determining the text comprises converting the at least the portion of the audio to text.
3. The method recited in claim 1 , wherein the determining the text comprises decoding, by a player based on an audio-to-text translation, the at least the portion of the audio or the portion of the video by a player.
4. The method recited in claim 1 , wherein the determining the text comprises:
determining an event associated with the at least the portion of the video, and
determining the text based on the determined event.
5. The method recited in claim 4 , wherein the event comprises at least one of lip movement, an object movement, a content transition, or a change of state of an object.
6. The method recited in claim 4 , wherein the determining the event comprises inputting the at least the portion of the video to a machine learning algorithm.
7. The method recited in claim 1 , wherein the determining the text comprises converting descriptive audio of the content to text.
8. The method recited in claim 1 , wherein the content comprises an audiovisual stream.
9. The method recited in claim 1 , wherein the buffering facilitates alignment of the at least one of the audio or video of the content with the closed caption text.
10. The method recited in claim 1 , wherein the closed caption text comprises decoded closed captions.
11. The method recited in claim 1 , wherein the closed caption text comprises one or more subtitles.
12. A method comprising:
receiving content, wherein the content comprises video, audio, and closed caption text;
determining, based on an audio-to-text conversion of at least a portion of the audio, text;
determining, based on a comparison of the determined text to at least a portion of the closed caption text, a delay; and
buffering, based on the determined delay, at least one of the audio, the video, or the closed caption text.
13. The method recited in claim 12 , wherein the method further comprises:
outputting the buffered audio or the buffered video; and
outputting the closed caption text.
14. The method recited in claim 12 , wherein the method further comprises synchronizing output of the buffered audio or the buffered video with output of the closed caption text.
15. The method recited in claim 12 , wherein the determining the text comprises decoding, by a player based on the audio-to-text translation of the at least the portion of the audio, the at least the portion of the audio.
16. The method recited in claim 12 , wherein the closed caption text comprises decoded closed captions.
17. The method recited in claim 12 , wherein the audio-to-text conversion comprises performing visual speech recognition on at least a portion of the video.
18. The method recited in claim 12 , wherein the content comprises an audiovisual stream.
19. The method recited in claim 12 , wherein the buffering facilitates alignment of the audio or the video with the closed caption text.
20. The method recited in claim 12 , wherein the closed caption text comprises one or more subtitles.
21. A method comprising:
receiving content, wherein the content comprises video, audio, and closed caption text;
determining, based on an audio-to-text conversion of at least a portion of the audio, text;
determining, based on a comparison of the determined text to at least a portion of the closed caption text, a delay; and
outputting, based on the determined delay, the at least one of the audio or video; and
outputting the closed caption text without the determined delay.
22. The method recited in claim 21 , wherein outputting, based on the determined delay, the at least one of the audio or video comprises buffering the at least one of the audio or video based on the delay.
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/698,570 US11785278B1 (en) | 2022-03-18 | 2022-03-18 | Methods and systems for synchronization of closed captions with content output |
| US18/462,117 US12273582B2 (en) | 2022-03-18 | 2023-09-06 | Methods and systems for synchronization of closed captions with content output |
| US19/088,622 US20250227321A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
| US19/088,528 US20250227320A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/698,570 US11785278B1 (en) | 2022-03-18 | 2022-03-18 | Methods and systems for synchronization of closed captions with content output |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/462,117 Continuation US12273582B2 (en) | 2022-03-18 | 2023-09-06 | Methods and systems for synchronization of closed captions with content output |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230300399A1 true US20230300399A1 (en) | 2023-09-21 |
| US11785278B1 US11785278B1 (en) | 2023-10-10 |
Family
ID=88067688
Family Applications (4)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/698,570 Active US11785278B1 (en) | 2022-03-18 | 2022-03-18 | Methods and systems for synchronization of closed captions with content output |
| US18/462,117 Active US12273582B2 (en) | 2022-03-18 | 2023-09-06 | Methods and systems for synchronization of closed captions with content output |
| US19/088,528 Pending US20250227320A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
| US19/088,622 Pending US20250227321A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
Family Applications After (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/462,117 Active US12273582B2 (en) | 2022-03-18 | 2023-09-06 | Methods and systems for synchronization of closed captions with content output |
| US19/088,528 Pending US20250227320A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
| US19/088,622 Pending US20250227321A1 (en) | 2022-03-18 | 2025-03-24 | Methods and systems for synchronization of closed captions with content output |
Country Status (1)
| Country | Link |
|---|---|
| US (4) | US11785278B1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240005943A1 (en) * | 2020-10-28 | 2024-01-04 | Comcast Cable Communications, Llc | Methods and systems for augmenting audio content |
| US12548553B1 (en) * | 2023-03-20 | 2026-02-10 | Amazon Technologies, Inc. | Techniques for machine learning based playback |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12482458B2 (en) | 2014-02-28 | 2025-11-25 | Ultratec, Inc. | Semiautomated relay method and apparatus |
| US20180270350A1 (en) | 2014-02-28 | 2018-09-20 | Ultratec, Inc. | Semiautomated relay method and apparatus |
| EP4338421A1 (en) * | 2021-05-10 | 2024-03-20 | Sonos Inc. | Managing content quality and related characteristics of a media playback system |
| US11785278B1 (en) * | 2022-03-18 | 2023-10-10 | Comcast Cable Communications, Llc | Methods and systems for synchronization of closed captions with content output |
| US12323647B2 (en) * | 2023-11-10 | 2025-06-03 | Avago Technologies International Sales Pte. Limited | Video quality monitoring system |
Citations (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4570232A (en) * | 1981-12-21 | 1986-02-11 | Nippon Telegraph & Telephone Public Corporation | Speech recognition apparatus |
| US5598557A (en) * | 1992-09-22 | 1997-01-28 | Caere Corporation | Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files |
| US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
| US6186834B1 (en) * | 1999-06-08 | 2001-02-13 | Avaya Technology Corp. | Enhanced communication connector assembly with crosstalk compensation |
| US6188987B1 (en) * | 1998-11-17 | 2001-02-13 | Dolby Laboratories Licensing Corporation | Providing auxiliary information with frame-based encoded audio information |
| US20020055950A1 (en) * | 1998-12-23 | 2002-05-09 | Arabesque Communications, Inc. | Synchronizing audio and text of multimedia segments |
| US20020093591A1 (en) * | 2000-12-12 | 2002-07-18 | Nec Usa, Inc. | Creating audio-centric, imagecentric, and integrated audio visual summaries |
| US6442518B1 (en) * | 1999-07-14 | 2002-08-27 | Compaq Information Technologies Group, L.P. | Method for refining time alignments of closed captions |
| US6473778B1 (en) * | 1998-12-24 | 2002-10-29 | At&T Corporation | Generating hypermedia documents from transcriptions of television programs using parallel text alignment |
| US20030025832A1 (en) * | 2001-08-03 | 2003-02-06 | Swart William D. | Video and digital multimedia aggregator content coding and formatting |
| US20030061028A1 (en) * | 2001-09-21 | 2003-03-27 | Knumi Inc. | Tool for automatically mapping multimedia annotations to ontologies |
| US20030169366A1 (en) * | 2002-03-08 | 2003-09-11 | Umberto Lenzi | Method and apparatus for control of closed captioning |
| US20030206717A1 (en) * | 2001-04-20 | 2003-11-06 | Front Porch Digital Inc. | Methods and apparatus for indexing and archiving encoded audio/video data |
| US20040096110A1 (en) * | 2001-04-20 | 2004-05-20 | Front Porch Digital Inc. | Methods and apparatus for archiving, indexing and accessing audio and video data |
| US20050227614A1 (en) * | 2001-12-24 | 2005-10-13 | Hosking Ian M | Captioning system |
| US20060015339A1 (en) * | 1999-03-05 | 2006-01-19 | Canon Kabushiki Kaisha | Database annotation and retrieval |
| US7047191B2 (en) * | 2000-03-06 | 2006-05-16 | Rochester Institute Of Technology | Method and system for providing automated captioning for AV signals |
| US7065524B1 (en) * | 2001-03-30 | 2006-06-20 | Pharsight Corporation | Identification and correction of confounders in a statistical analysis |
| US7092888B1 (en) * | 2001-10-26 | 2006-08-15 | Verizon Corporate Services Group Inc. | Unsupervised training in natural language call routing |
| US20060248073A1 (en) * | 2005-04-28 | 2006-11-02 | Rosie Jones | Temporal search results |
| US20070011012A1 (en) * | 2005-07-11 | 2007-01-11 | Steve Yurick | Method, system, and apparatus for facilitating captioning of multi-media content |
| US20070124756A1 (en) * | 2005-11-29 | 2007-05-31 | Google Inc. | Detecting Repeating Content in Broadcast Media |
| US20070124147A1 (en) * | 2005-11-30 | 2007-05-31 | International Business Machines Corporation | Methods and apparatus for use in speech recognition systems for identifying unknown words and for adding previously unknown words to vocabularies and grammars of speech recognition systems |
| US20070124788A1 (en) * | 2004-11-25 | 2007-05-31 | Erland Wittkoter | Appliance and method for client-sided synchronization of audio/video content and external data |
| US20070214164A1 (en) * | 2006-03-10 | 2007-09-13 | Microsoft Corporation | Unstructured data in a mining model language |
| US20080066138A1 (en) * | 2006-09-13 | 2008-03-13 | Nortel Networks Limited | Closed captioning language translation |
| US20080166106A1 (en) * | 2007-01-09 | 2008-07-10 | Sony Corporation | Information processing apparatus, information processing method, and program |
| US20080255844A1 (en) * | 2007-04-10 | 2008-10-16 | Microsoft Corporation | Minimizing empirical error training and adaptation of statistical language models and context free grammar in automatic speech recognition |
| US20080270134A1 (en) * | 2005-12-04 | 2008-10-30 | Kohtaroh Miyamoto | Hybrid-captioning system |
| US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
| US7509385B1 (en) * | 2008-05-29 | 2009-03-24 | International Business Machines Corporation | Method of system for creating an electronic message |
| US20090171662A1 (en) * | 2007-12-27 | 2009-07-02 | Sehda, Inc. | Robust Information Extraction from Utterances |
| US20100091187A1 (en) * | 2008-10-15 | 2010-04-15 | Echostar Technologies L.L.C. | Method and audio/video device for processing caption information |
| US7729917B2 (en) * | 2006-03-24 | 2010-06-01 | Nuance Communications, Inc. | Correction of a caption produced by speech recognition |
| US7739253B1 (en) * | 2005-04-21 | 2010-06-15 | Sonicwall, Inc. | Link-based content ratings of pages |
| US7801910B2 (en) * | 2005-11-09 | 2010-09-21 | Ramp Holdings, Inc. | Method and apparatus for timed tagging of media content |
| US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
| US20110040559A1 (en) * | 2009-08-17 | 2011-02-17 | At&T Intellectual Property I, L.P. | Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment |
| US7962331B2 (en) * | 2003-12-01 | 2011-06-14 | Lumenvox, Llc | System and method for tuning and testing in a speech recognition system |
| US8121432B2 (en) * | 2005-08-24 | 2012-02-21 | International Business Machines Corporation | System and method for semantic video segmentation based on joint audiovisual and text analysis |
| US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
| US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
| US20120253799A1 (en) * | 2011-03-28 | 2012-10-04 | At&T Intellectual Property I, L.P. | System and method for rapid customization of speech recognition models |
| US8423363B2 (en) * | 2009-01-13 | 2013-04-16 | CRIM (Centre de Recherche Informatique de Montréal) | Identifying keyword occurrences in audio data |
| US8572488B2 (en) * | 2010-03-29 | 2013-10-29 | Avid Technology, Inc. | Spot dialog editor |
| US20160007054A1 (en) * | 2009-09-22 | 2016-01-07 | Caption Colorado Llc | Caption and/or Metadata Synchronization for Replay of Previously or Simultaneously Recorded Live Programs |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11785278B1 (en) * | 2022-03-18 | 2023-10-10 | Comcast Cable Communications, Llc | Methods and systems for synchronization of closed captions with content output |
-
2022
- 2022-03-18 US US17/698,570 patent/US11785278B1/en active Active
-
2023
- 2023-09-06 US US18/462,117 patent/US12273582B2/en active Active
-
2025
- 2025-03-24 US US19/088,528 patent/US20250227320A1/en active Pending
- 2025-03-24 US US19/088,622 patent/US20250227321A1/en active Pending
Patent Citations (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4570232A (en) * | 1981-12-21 | 1986-02-11 | Nippon Telegraph & Telephone Public Corporation | Speech recognition apparatus |
| US5598557A (en) * | 1992-09-22 | 1997-01-28 | Caere Corporation | Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files |
| US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
| US6188987B1 (en) * | 1998-11-17 | 2001-02-13 | Dolby Laboratories Licensing Corporation | Providing auxiliary information with frame-based encoded audio information |
| US20020055950A1 (en) * | 1998-12-23 | 2002-05-09 | Arabesque Communications, Inc. | Synchronizing audio and text of multimedia segments |
| US6473778B1 (en) * | 1998-12-24 | 2002-10-29 | At&T Corporation | Generating hypermedia documents from transcriptions of television programs using parallel text alignment |
| US20060015339A1 (en) * | 1999-03-05 | 2006-01-19 | Canon Kabushiki Kaisha | Database annotation and retrieval |
| US6186834B1 (en) * | 1999-06-08 | 2001-02-13 | Avaya Technology Corp. | Enhanced communication connector assembly with crosstalk compensation |
| US6442518B1 (en) * | 1999-07-14 | 2002-08-27 | Compaq Information Technologies Group, L.P. | Method for refining time alignments of closed captions |
| US7047191B2 (en) * | 2000-03-06 | 2006-05-16 | Rochester Institute Of Technology | Method and system for providing automated captioning for AV signals |
| US20020093591A1 (en) * | 2000-12-12 | 2002-07-18 | Nec Usa, Inc. | Creating audio-centric, imagecentric, and integrated audio visual summaries |
| US7065524B1 (en) * | 2001-03-30 | 2006-06-20 | Pharsight Corporation | Identification and correction of confounders in a statistical analysis |
| US20030206717A1 (en) * | 2001-04-20 | 2003-11-06 | Front Porch Digital Inc. | Methods and apparatus for indexing and archiving encoded audio/video data |
| US20040096110A1 (en) * | 2001-04-20 | 2004-05-20 | Front Porch Digital Inc. | Methods and apparatus for archiving, indexing and accessing audio and video data |
| US20030025832A1 (en) * | 2001-08-03 | 2003-02-06 | Swart William D. | Video and digital multimedia aggregator content coding and formatting |
| US20030061028A1 (en) * | 2001-09-21 | 2003-03-27 | Knumi Inc. | Tool for automatically mapping multimedia annotations to ontologies |
| US7092888B1 (en) * | 2001-10-26 | 2006-08-15 | Verizon Corporate Services Group Inc. | Unsupervised training in natural language call routing |
| US20050227614A1 (en) * | 2001-12-24 | 2005-10-13 | Hosking Ian M | Captioning system |
| US20030169366A1 (en) * | 2002-03-08 | 2003-09-11 | Umberto Lenzi | Method and apparatus for control of closed captioning |
| US7962331B2 (en) * | 2003-12-01 | 2011-06-14 | Lumenvox, Llc | System and method for tuning and testing in a speech recognition system |
| US20070124788A1 (en) * | 2004-11-25 | 2007-05-31 | Erland Wittkoter | Appliance and method for client-sided synchronization of audio/video content and external data |
| US7739253B1 (en) * | 2005-04-21 | 2010-06-15 | Sonicwall, Inc. | Link-based content ratings of pages |
| US20060248073A1 (en) * | 2005-04-28 | 2006-11-02 | Rosie Jones | Temporal search results |
| US20070011012A1 (en) * | 2005-07-11 | 2007-01-11 | Steve Yurick | Method, system, and apparatus for facilitating captioning of multi-media content |
| US8121432B2 (en) * | 2005-08-24 | 2012-02-21 | International Business Machines Corporation | System and method for semantic video segmentation based on joint audiovisual and text analysis |
| US7801910B2 (en) * | 2005-11-09 | 2010-09-21 | Ramp Holdings, Inc. | Method and apparatus for timed tagging of media content |
| US20070124756A1 (en) * | 2005-11-29 | 2007-05-31 | Google Inc. | Detecting Repeating Content in Broadcast Media |
| US20070124147A1 (en) * | 2005-11-30 | 2007-05-31 | International Business Machines Corporation | Methods and apparatus for use in speech recognition systems for identifying unknown words and for adding previously unknown words to vocabularies and grammars of speech recognition systems |
| US20080270134A1 (en) * | 2005-12-04 | 2008-10-30 | Kohtaroh Miyamoto | Hybrid-captioning system |
| US20070214164A1 (en) * | 2006-03-10 | 2007-09-13 | Microsoft Corporation | Unstructured data in a mining model language |
| US7729917B2 (en) * | 2006-03-24 | 2010-06-01 | Nuance Communications, Inc. | Correction of a caption produced by speech recognition |
| US20080066138A1 (en) * | 2006-09-13 | 2008-03-13 | Nortel Networks Limited | Closed captioning language translation |
| US20080166106A1 (en) * | 2007-01-09 | 2008-07-10 | Sony Corporation | Information processing apparatus, information processing method, and program |
| US20080255844A1 (en) * | 2007-04-10 | 2008-10-16 | Microsoft Corporation | Minimizing empirical error training and adaptation of statistical language models and context free grammar in automatic speech recognition |
| US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
| US20090171662A1 (en) * | 2007-12-27 | 2009-07-02 | Sehda, Inc. | Robust Information Extraction from Utterances |
| US7509385B1 (en) * | 2008-05-29 | 2009-03-24 | International Business Machines Corporation | Method of system for creating an electronic message |
| US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
| US20100091187A1 (en) * | 2008-10-15 | 2010-04-15 | Echostar Technologies L.L.C. | Method and audio/video device for processing caption information |
| US8423363B2 (en) * | 2009-01-13 | 2013-04-16 | CRIM (Centre de Recherche Informatique de Montréal) | Identifying keyword occurrences in audio data |
| US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
| US20110040559A1 (en) * | 2009-08-17 | 2011-02-17 | At&T Intellectual Property I, L.P. | Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment |
| US20160007054A1 (en) * | 2009-09-22 | 2016-01-07 | Caption Colorado Llc | Caption and/or Metadata Synchronization for Replay of Previously or Simultaneously Recorded Live Programs |
| US8572488B2 (en) * | 2010-03-29 | 2013-10-29 | Avid Technology, Inc. | Spot dialog editor |
| US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
| US20120253799A1 (en) * | 2011-03-28 | 2012-10-04 | At&T Intellectual Property I, L.P. | System and method for rapid customization of speech recognition models |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240005943A1 (en) * | 2020-10-28 | 2024-01-04 | Comcast Cable Communications, Llc | Methods and systems for augmenting audio content |
| US12494218B2 (en) * | 2020-10-28 | 2025-12-09 | Comcast Cable Communications, Llc | Methods and systems for augmenting audio content |
| US12548553B1 (en) * | 2023-03-20 | 2026-02-10 | Amazon Technologies, Inc. | Techniques for machine learning based playback |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250227321A1 (en) | 2025-07-10 |
| US11785278B1 (en) | 2023-10-10 |
| US20240080514A1 (en) | 2024-03-07 |
| US12273582B2 (en) | 2025-04-08 |
| US20250227320A1 (en) | 2025-07-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12273582B2 (en) | Methods and systems for synchronization of closed captions with content output | |
| US11386932B2 (en) | Audio modification for adjustable playback rate | |
| US11792464B2 (en) | Determining context to initiate interactivity | |
| US8265450B2 (en) | Capturing and inserting closed captioning data in digital video | |
| TWI470588B (en) | System for translating spoken language into sign language for the deaf | |
| US9430115B1 (en) | Storyline presentation of content | |
| US20100122277A1 (en) | device and a method for playing audio-video content | |
| US9215496B1 (en) | Determining the location of a point of interest in a media stream that includes caption data | |
| US9767825B2 (en) | Automatic rate control based on user identities | |
| WO2014155377A1 (en) | Method and system for automatically adding subtitles to streaming media content | |
| US20230362451A1 (en) | Generation of closed captions based on various visual and non-visual elements in content | |
| US20130151251A1 (en) | Automatic dialog replacement by real-time analytic processing | |
| JP2008205745A (en) | Video playback apparatus and method | |
| US11936940B2 (en) | Accessibility enhanced content rendering | |
| US20070106516A1 (en) | Creating alternative audio via closed caption data | |
| US20220256215A1 (en) | Systems and methods for adaptive output | |
| US20230362452A1 (en) | Distributor-side generation of captions based on various visual and non-visual elements in content | |
| US20160127807A1 (en) | Dynamically determined audiovisual content guidebook | |
| US20240395251A1 (en) | Methods, systems, and apparatuses for modifying audio content | |
| EP4334927B1 (en) | Accessibility enhanced content rendering | |
| CN121171206A (en) | Computer combined control operation screen recording and writing system and method based on offline voice-to-text conversion | |
| TW201516717A (en) | System for playing video and method thereof | |
| CN115567670A (en) | Conference viewing method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |