NL2037362B1 - Subtitling system and method - Google Patents
Subtitling system and methodInfo
- Publication number
- NL2037362B1 NL2037362B1 NL2037362A NL2037362A NL2037362B1 NL 2037362 B1 NL2037362 B1 NL 2037362B1 NL 2037362 A NL2037362 A NL 2037362A NL 2037362 A NL2037362 A NL 2037362A NL 2037362 B1 NL2037362 B1 NL 2037362B1
- Authority
- NL
- Netherlands
- Prior art keywords
- sound
- text
- audio
- output
- series
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H60/00—Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
- H04H60/02—Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention provides live subtitling system for subtitling a series of people while they are speaking to a series of clients, in particular the series of people performing in a theatre, for instance playing in a play, singing in a musical, opera or operetta, with the clients forming part of a live audience. The live subtitling system comprising at least one speech-to-text conversion system and a wireless broadcast system.
Description
P100917NL00
Subtitling system and method
The invention relates to a live subtitling system and a method for live subtitling an event.
WO2020017961 according to its abstract relates to “Methods for a voice processing system comprising P microphone units (102A...102D) and a central unit (104) are disclosed. Each microphone unit ís linked to a person and derives from N microphone signals a source localisation signal. The source localisation signal is used to control an adaptive beam form process to obtain a beam formed audio signal. The microphone unit is further configured to derive metadata from for N microphone signals, such direction the sound is coming from. Packages with the metadata and beam formed audio signal are transmitted to the central unit. The central unit processes the metadata to determine which parts of the P beam formed audio signal comprises speech from a person that is linked to another microphone unit. By removing said parts from the audio signals before transcription, the quality of the transcription is improved.
The transcriptions are displayed on a remote device. “
GB2568656 according to its abstract relates to “A system for displaying captions during a live performance, comprises: a memory storing a follower script, including waypoints associated with performance cues, and a caption script, a speech follower component to recognise performance spoken dialogue and compare it with the follower script to track the location in the follower script of the spoken dialogue, identifying when a caption is displayed; a caption output module, accessing from the caption script a caption for display at each location in the follower script associated with a caption; and a cue handler storing performance cue identifiers with associated cue metadata and which receives detected performance cues and outputs cue signals to the speech follower, assisting the speech follower to determine the location based on waypoints at detected cues. Also, a method of delivering an information output to a live performance viewer, the information being displayed text or an audio description at predefined times relative to stage events. A follower script with entries organized along a timeline, and metadata at timepoints between at least some entries are provided. Metadata is associated with stage events. Speech recognition tracks spoken dialogue against the follower script entries, and the stage events, aiding following the live performance. “
A disadvantage of prior art is that subtitles or captions are mixed between performers, and for instance language difficulties remain a problem. This problem grows bigger if the performance includes other sound, like singing, music, and the like.
Hence, it is an aspect of the invention to provide an alternative system and method, which preferably further at least partly obviates one or more of above- described drawbacks.
There is provide a live subtitling system for subtitling a series of people while they are speaking to a series of clients, in particular the series of people performing in a theatre, for instance playing in a play, singing in a musical, opera or operetta, with the clients forming part of a live audience, the live subtitling system comprising: - a series of sound input devices, for instance microphones, each sound input device dedicated to one of the series of people and in operation each providing a sound signal; - a mixing console having the series of sound input devices each functionally coupled thereto to a respective input channel for receiving a sound signal on each input channel, the mixing console comprising a digital output channel for each sound input device of the series sound input device, and at least one fader for setting an output level for each of the output channels separately, and a further output channel for providing an output sound signal to a sound output system; - a sound processing system comprising a sound system input functionally coupled to one of the digital output channels of the mixing console, comprising a data processor running a computer program which, when running on the data processor, performs at least one selected from a sound compression, a sound multiplexing, a labelling of sound data with mixing console input channels, and a combination thereof for producing at least one sound data package that comprises labels indicative of a respective one of the series of sound input devices and sound input device sound output if sound of that sound input device is above the preset output level, and transmitting said at least one sound data package to at least one sound processing output; - at least one speech-to-text conversion system functionally coupled to the at least one sound processing output for receiving said at least one sound data package, the speech-to-text conversion system adapted for providing for each sound data package a text data package to said sound processing system, said sound processing system labelling the text in said text data package to the series of sound input devices, providing a labelled text data package; - a wireless broadcast system operationally coupled to the sound processing system for receiving the labelled text data package and adapted for wirelessly broadcasting the generated labelled text to client display devices for functionally live subtitling the sound produced by the series of people, in particular for allowing displaying of the subtitles in synchronisation with the output sound signal, more in particular in synchronisation with the sound played by the sound system resulting from the output sound signal.
There is further provided a method for live subtitling an event in which a series of people talk to a series of clients, in particular the series of people performing in a theatre with the clients forming part of an audience, in particular a live subtitling system of any one of the preceding claims, wherein each person of the series of people is provided with a sound input device, in particular a microphone, the sound input devices provide a series of sound streams to a mixing console, the mixing console couples the data steams each to a speech-to-text conversion system, the speech-to-text conversion system comprises at least one sound input channel, providing a separate digital output from the mixing console for each of the series of input channels, coupling each digital output to a sound-to-text system, the sound-to-text system providing digital data comprising a text with a label indicating one of the series of microphones to a wireless broadcast system, said wireless broadcasting system wirelessly broadcasting the generated labelled texts to the clients.
There is further provided a computer program product which, when executed on a data processing device, preforms receiving a series of sound streams, applying sound processing comprising at least one selected from a sound compression and a multiplexing to combine the series of sound streams into a sound data package,
transmitting the sound data package to at least one speech-to-text conversion system, receiving text data from said speech-to-text conversion system which text data is a conversion of said sound data package, converting the text data in a subtitle data package comprising a series of subtitles including an indication with each subtitle of the series of subtitles labelling the subtitle to a sound stream of the series of sound streams, and broadcasting the subtitle data package to a series of display devices, wherein the subtitle data is displayed live with respect to a time at which the series of sound streams were produced.
In the current context, reference is made to “ subtitling”. In some instances, this may also be referred to as “ caption”. In the current technology, sound is generated by people or persons and that sound comprises words. This sound is converted into readable text and presented to one or more persons that are live present, hearing the sound. Usually, this is an audience. This current system may also be applied in large meetings like a UN assembly meeting.
Important in the current invention is that the people perform or speak live. For instance, the person or people give a live performance. In this respect, live means that for a spectator or person attending the performance the motions and lips of the people performing or speaking are in synchronization (“in sinc”) with produced sound. In some embodiments, also sound, like music, or pre-recorded sound is added to the performance. This in some way complicates the subtitling process. The requirement of being “in sinc” means that usually there can only be a little time between the motions of lips and hearing the sound and displaying the subtitling. If there is a time difference, this is also referred to as latency. In practise, usually there is less than 1 second time difference. For a good synchronisation, there is less than 10 milliseconds time difference. In more optimal situations, the time difference is less than 1 millisecond.
In the current description, the people are speaking. This usually includes public speaking, like giving a lecture or a speech, recite a poem or part of a book. This here also includes singing. In fact, this can also be seen as performing.
In the current context, the clients are at the same physical location as the people that are performing. This means that in fact the performance is live. The current system can help people with a hearing disability to understand and follow a performance. It can also help people follow and understand a performance. This may also include people that do not have a hearing disability. It may even include presenting a translated version of the performance. In an embodiment, the clients may even be allowed to select a language in which the text is displayed. Such a selection of language may be individually.
With respect to a mixing console, such a device comprises a series of input 5 channels. To these input channels, a series of sound input devices can be functionally coupled, each to a respective input channel. The mixing console further comprises at least one digital output channel for the series sound input device. It further comprises a fader for setting an output level for each of the output channels separately. The mixing console has one or more outputs for providing output to sound output devices, like for instance to speakers.
In particular, a mixing console or mixing desk is an electronic device for mixing audio signals, used in sound recording and reproduction and sound reinforcement systems. Inputs to the console include microphones, signals from electric or electronic instruments, or pre-recorded sounds. Mixers may control analog or digital signals. The modified signals are summed to produce the combined output signals, which can then be broadcast, amplified through a sound reinforcement system or recorded.
Examples of suitable mixing consoles are from Alesis, Allen & Heath, Audient,
Automated Processes, Inc., AMS Neve, Avid, Behringer, Cadac Electronics, Calrec,
Crest Audio, D&R, DHD audio, DiGiCo, Electro-Voice, Euphonix, Fairlight,
Focusrite, Harrison Audio Consoles, Klotz Digital, Lawo, Logitek, Mackie, MCI,
Midas, Peavey, Phonic, PreSonus, QSC, Rane, Roland, Shure, Solid State Logic (SSL), Soundcraft, Speck Electronics, Stage Tec, Studer, Studiomaster, TASCAM,
Telos Alliance, Ward-Beck Systems, Wheatstone, Yamaha, Yorkville.
As described above, there is provide a live subtitling system for subtitling a series of people while they are speaking to a series of clients, in particular the series of people performing in a theatre, for instance playing in a play, singing in a musical, opera or operetta, with the clients forming part of a live audience. Below, some specific embodiments are discussed. It should be noted that combinations of these embodiments are explicitly also foreseen.
In an embodiment, the computer program when running further retrieves a label coupled to each sound input device, and couples each label with a sound output from said sound input device.
In a embodiment, the mixing console comprises a series of said at least one fader. In an embodiment, the mixing console comprises at least one fader per sound input device. In an embodiment, for at least one fader per sound input device, said faders are provided for at least one of setting a sound level pass percentage per sound input device and setting relative mutual levels for the sound input devices.
In an embodiment, the mixing console provides a sound output for a mixed sound signal as output from the at least one fader, and the series of individual output channels from the at least one fader.
In an embodiment, the live subtitling system comprises a time division multiplexing device for providing a series of sound channels as a train of time slices of said sound of said sound input devices in a series of channels. In particular, the multiplexer reduces the series of sound channels to one channel.
In an embodiment, the computer program preforms a lossless sound compression, in particular compressing the timeline of a time segment of the sound input device sound output.
In an embodiment, the sound output of the series of sound input devices is processed parallel. In a particular embodiment, the sound output is processed by applying time compression. In this way, it may be possible to reduce latency between production of sound and display of the subtitles.
In an embodiment, the computer program applies sound compression and subsequently multiplexing. In a particular embodiment, the computer program applies time division multiplexing to produce a reduced number of sound data packages.
In an embodiment, the label includes sound input device data. In a particular embodiment, the label comprises position data of at least one of the sound input devices. In this way, the display system like glasses may project the subtitling close to a performer that produced that text.
In an embodiment, the live subtitling system comprises a speech-to-text conversion system per sound data stream. In this way, relevant sound input devices are all parallel processed. This may reduce latency.
In an embodiment, wireless broadcast comprises a WIFI transmission system adapted for providing a broadcast. The WIFI transmission system may be an open system. Broadcasting may include transmitting as WIFI or via Bluetooth or similar digital system, allowing the clinets/audience to easily receive the subtitling live.
In an embodiment, the mixing console comprises a mixing console data processor and a mixing console computer program which, when running on said mixing console data processor, applies a trained neural network to operate said at least one switch, said trained neural network trained using a series of follower scripts and resulting mixing console switch settings.
In an embodiment, the live subtitling system further comprises a text-translation system for translating at least part of the text output by said speech-to-text conversion system. This improves and adds more involvement of the clients/users. For instance, the performer/actor may use different languages, or sing and speak in different languages. Auto-translation may present all in one language. The client may select a preferred language, increasing involvement or understanding.
In an embodiment, the labelled text is/are received on a personal display device of each client. In an embodiment, such a personal display device comprises a wearable device. In an embodiment, the wearable device comprises a pair of glasses comprising a projecting device for projecting data on at least one glass of the pair of grasses, including a smart contact lens. A pair of glasses or contact lenses or the like may increase the experience and combine visual and sound-to-text converted visual information.
In an embodiment, the label comprises an additional indication of a nature of said text, said additional indication selected from at least one of an indication that the text was sung, the text resulted from a relatively loud sound. This makes the user experience more intense.
In an embodiment, the live subtitling system further comprises a positioning system for tracking the position of the people talking in public. In an embodiment of such a positioning system, the labelled text is displayed on the personal display device near a representation of the person.
The terms “upstream” and “downstream” relate to an arrangement of items or features relative to the propagation of the light from a light generating means or the flow of water or the transmission of sound. Relative to a first position within a beam of light from the light generating means, a second position in the beam of light closer to the light generating means is “upstream”, and a third position within the beam of light further away from the light generating means is “downstream”. In the current system, the source of sound is referred to as upstream, and the displaying of subtitle is referred toas downstream.
The term “substantially” herein, such as in “substantially consists”, will be understood by the person skilled in the art. The term “substantially” may also include embodiments with “entirely”, “completely”, “all”, etc. Hence, in embodiments the adjective substantially may also be removed. Where applicable, the term “substantially” may also relate to 90% or higher, such as 95% or higher, especially 99% or higher, even more especially 99.5% or higher, including 100%. The term “comprise” includes also embodiments wherein the term “comprises” means “consists of”.
The term "functionally" will be understood by, and be clear to, a person skilled in the art. The term “substantially” as well as “functionally” may also include embodiments with “entirely”, “completely”, “all”, etc. Hence, in embodiments the adjective functionally may also be removed. When used, for instance in “functionally parallel”, a skilled person will understand that the adjective “functionally” includes the term substantially as explained above. Functionally in particular is to be understood to include a configuration of features that allows these features to function as if the adjective “functionally” was not present. The term “functionally” is intended to cover variations in the feature to which it refers, and which variations are such that in the functional use of the feature, possibly in combination with other features it relates to in the invention, that combination of features is able to operate or function. For instance, if an antenna is functionally coupled or functionally connected to a communication device, received electromagnetic signals that are receives by the antenna can be used by the communication device. The word “functionally” as for instance used in “functionally parallel” is used to cover exactly parallel, but also the embodiments that are covered by the word “substantially” explained above. For instance, “functionally parallel” relates to embodiments that in operation function as if the parts are for instance parallel. This covers embodiments for which it is clear to a skilled person that it operates within its intended field of use as if it were parallel.
Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The devices or apparatus herein are amongst others described during operation.
As will be clear to the person skilled in the art, the invention is not limited to methods of operation or devices in operation.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb "to comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device or apparatus claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The invention further applies to an apparatus or device comprising one or more of the characterising features described in the description and/or shown in the attached drawings. The invention further pertains to a method or process comprising one or more of the characterising features described in the description and/or shown in the attached drawings.
The various aspects discussed in this patent can be combined in order to provide additional advantages. Furthermore, some of the features can form the basis for one or more divisional applications.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:
Figure 1 schematically depicts an embodiment of a live subtitling system;
Figure 2 schematically shows an alternative embodiment of a live subtitling system.
The drawings are schematic and not necessarily on scale.
In general, the currently claims system is used in venues, which includes theatre halls, but can also include indoor venues like general halls, classrooms. It may also include outdoor venues. In such venues at least one person speaks, sings, or produces sound. In particular, such a sound is spoken sound, like a recital, a speech, or text in a play. In many circumstances, it may be difficult for a person, like an audience, to hear or understand the sound. For instance, for some people the lyrics may be difficult to hear of understand. Thus, the current system can assist hearing disabled people, but may also help to understand lyrics, for instance. It may even be combined with or include “on the fly” translation. In fact, the available text may be translated in different languages at the same time. Such translated text may be subsequently broadcast over separate channels, allowing audience to switch to a text in a selected language. For translating, currently text is send to a computer translation system, usually based on a trained neural network.
Figure | schematically depicts an embodiment of a live subtitling system 1. Such a system is usually applied in a venue 9 like a hall, theatre hall, or the like. In such a venue 9, an audience 3 is present. One or more persons 2 perform in front of the audience 3. Usually, more than one person performs. In fact, the complexity and problems multiply when the performance comprises more than one person. The complexity further multiplies when the performance includes music and spoken or sung text, and further becomes more complex if more than one person produces audio text. Often, the performers use a microphone 4, and the produced text (including lyrics) 1s output via at least one speaker 8, often amplified using a sound system.
Usually in these venues, there is a mixing console 5. The mixing console 5 may be part of the sound system that plays sound or music via the speakers 8 so that the audience can hear the performance of the people 2, for instance artists, singers, speakers. The sound system may include one or more amplifiers.
Usually, during the performance a sound technician operates the mixing console 5. In recent developments, attempts are made to use automated systems, like trained neural networks or other artificial intelligence (Al), to operate the mixing console.
Using the mixing console, a sound technician mixes incoming channels (“in”) of the mixing console. The incoming input channels of the mixing console 5 includes signals of the microphones (which may be wirelessly coupled), but may additionally also include music or other sources of sound, for instance pre-recorded sound. Usually, each source of incoming sound has at least one input channel “in”. The various sound levels of each input channel are usually individually adapted using a fader.
The mixed sound result from the mixing console 5 is coupled, via an output channel “out”, to a speaker system. The speakers 8 of the speaker system may be wire coupled, or wirelessly coupled. With respect to the output channel “out”, the mixing console mixes all the incoming channels into (usually one) sound output that is coupled to a sound system, usually speakers playing the result of the mixed channels.
The mixing console 5 usually comprises at least one further output channel out- 2. In an embodiment, this further output channel out-2 is digital. In an embodiment, each input channel “in” is coupled to an individual further output channel out-2. In an embodiment, the further output channel out-2 comprises a series of digital further output channels out-2, individually coupled to one of the input channels “in”. Usually, the output channel(s) out-2 is/are coupled via a fader. This means that the relative levels of input channels are changed. After the faders, the input channels are individually coupled to out-2, allowing to identify individual sound input devices. The output from the faders is also coupled to a mixer, to provide one mixed output via output channel “out”.
Via mixing console further output out-2, the mixing console 5 has a coupling 6 with a computer system 13. The coupling 6 may be a wired or wireless coupling. The computer system often is remote from the mixing console 5. The may means that the computer system 13 is at a distance from the mixing console 5. It may be in the vicinity of the mixing console 5. In an embodiment, it is in the same building.
Alternatively or in combination, it is coupled via a LAN, WLAN of even via a cabled coupling.
For processing the output received from the mixing console 5 from the further output out-2 via coupling 6, the computer system 13 comprises a sound processing system 14. Such a sound processing system 14 in an embodiment comprises a data processor and software running on that data processor. The software can perform a sound compression, for instance using known sound compression algorithms. The software can also perform sound multiplexing. This can be included after sound compression. In sound multiplexing, several channels are placed one behind the other (time division multiplexing) or in any other multiplexing method suitable for sound and known to a skilled person. The software can further provide labelling of sound data with mixing console input channels. These software implemented functions can be combined. The processing using the software produces at least one sound data package that comprises labels indicative of a respective one of the series of sound input devices and sound input device sound output if sound of that sound input device is above the preset output level. The software running on the sound processing system 14 furthermore transmits the at least one sound data package to at least one sound processing output. Via that output, sound data packages are coupled to a speech-to-text conversion system.
In an embodiment, the sound processing system 14 comprises a so-called Dante system.
Dante is the product name for a combination of software, hardware, and network protocols that delivers uncompressed, multi-channel, low-latency digital audio over a standard Ethernet network using Layer 3 IPpackets. It is developed in 2006 by the
Sydney-based Audinate. Dante builds on previous audio over Ethernet and audio over
IP technologies.
The sound that is transmitted from the mixing console 5 in an embodiment can also be input to the sound processing system 14 via an automixer 18. An automixer is a device that is known in the working field of sound technicians. This device automatically reduces the level or even silences unused microphones or other sound input devices 4 that are not used. An automixer 18 is also referred to as an automatic microphone mixer. Usually, the automixer reduces extraneous noise pick-up.
Application in the current system allows a better sound-to-text conversion, as was found by applicant.
The computer system 13 is functionally coupled with a sound-to text conversion system, also referred to as a speech-to-text conversion system 17. The coupling can be a wireless coupling, for instance via the internet (IP protocol). The speech-to-text system as such often comprises an artificial intelligence system that is trained to convert sound, often spoken words, but also singing, to text. These systems as such are known to a skilled person. In the current system, often various channels of sound are available, often producing sound, including spoken or sung, via separate channels but at the same time. Using the sound system, in an embodiment the different channels can multiplexed. In order to reduce time delay as much as possible, sound data may be compressed and/or time-compressed. Often, a time multiplexing is performed. In such a set up, n incoming channels are cut up into time slices. From various channels, time slices are combined. A sound-to-text conversion system may comprise a setting for applying a frequency shift of the sound.
As described, the computer system 13 is coupled to a speech-to-text conversion system 17. The one or more sound channels are transmitted from the computer system 13 to the speech-to-text conversion system 17. The text-to-speech conversion system 17 converts incoming sound to text. In order to relate the text to the original source (microphone), each part of text is labelled as labelled text fragment. The labelled text fragments are transmitted from the speech-to-text conversion system 17 to the computer system 13. The computer system 13 can process the labelled text fragments into subtitles. The subtitles are subsequently broadcasted via transmitter 10 in the venue. There, display devices comprising screens, tablets 11, or for instance smart glasses 12 or even smart contact lenses, can display the subtitles. The subtitles may include an indication linking the subtitles to the specific source, like microphone 4.
Figure 2 schematically illustrates an alternative system. The references show the same or functionally the same elements and features as figure 1. In some instances, differences are explained.
In figure 2, the mixing console 5 has a controller or switch 7 for controlling an output level for each incoming input channel. In this embodiment, each incoming channel has its own output, providing a series of mixing console couplings 6. These can be physically separate channels. Alternatively, these can comprise one multiplexed in a single sound data stream. Each input channel in this embodiment is separately input into an input channel of a speech-to-text system 17. In an embodiment, only those channels 6 that have an output above a predefined level are input into an input channel of a speech-to-text system. In an embodiment, multiple speech-to-text systems 17 are provided in order to limit possible delay between production of a sound and displaying a subtitle representation of that sound.
The various subtitle texts are output by a respective speech-to-text system. A wireless transmission or broadcast system 10 can label each subtitle text before it is displayed on a display device, for instance a screen 11. This allows a user in the audience to see which performer produced which text.
The current live subtitling system has various functional systems and devices that preform a function in the system. These various functions may be transferred or also be included in other system of the current live subtitling system.
It will also be clear that the above description and drawings are included to illustrate some embodiments of the invention, and not to limit the scope of protection.
Starting from this disclosure, many more embodiments will be evident to a skilled person. These embodiments are within the scope of protection and the essence of this invention and are obvious combinations of prior art techniques and the disclosure of this patent.
Reference numbers 1 live subtitling system 2 performers 3 audience 4 microphone 5 mixing console 6 mixing console coupling 7 switch 8 speakers 9 venue, theatre hall 10 wireless transmission/broadcast 11 display device, screen 12 display device, smart glass 13 computer system 14 sound processor/ 15 theatre seats 16 functional coupling, including internet 17 speech-to-text conversion system 18 automixer
Out mixing console output (to speaker system)
Out-2mixing console further output
In mixing console input
Claims (18)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| NL2037362A NL2037362B1 (en) | 2024-03-28 | 2024-03-28 | Subtitling system and method |
| PCT/NL2025/050153 WO2025206956A1 (en) | 2024-03-28 | 2025-03-28 | Subtitling system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| NL2037362A NL2037362B1 (en) | 2024-03-28 | 2024-03-28 | Subtitling system and method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| NL2037362B1 true NL2037362B1 (en) | 2025-10-10 |
Family
ID=91129832
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| NL2037362A NL2037362B1 (en) | 2024-03-28 | 2024-03-28 | Subtitling system and method |
Country Status (2)
| Country | Link |
|---|---|
| NL (1) | NL2037362B1 (en) |
| WO (1) | WO2025206956A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170092274A1 (en) * | 2015-09-24 | 2017-03-30 | Otojoy LLC | Captioning system and/or method |
| GB2568656A (en) | 2017-09-28 | 2019-05-29 | The Royal Nat Theatre | Caption delivery system |
| WO2020017961A1 (en) | 2018-07-16 | 2020-01-23 | Hazelebach & Van Der Ven Holding B.V. | Methods for a voice processing system |
| US11500226B1 (en) * | 2019-09-26 | 2022-11-15 | Scott Phillip Muske | Viewing area management for smart glasses |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017179040A1 (en) * | 2016-04-15 | 2017-10-19 | Gala Prompter Ltd. | System and method for distribution and synchronized presentation of content |
| US20250006226A1 (en) * | 2023-06-30 | 2025-01-02 | Adobe Inc. | Script based video effects for live video |
-
2024
- 2024-03-28 NL NL2037362A patent/NL2037362B1/en active
-
2025
- 2025-03-28 WO PCT/NL2025/050153 patent/WO2025206956A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170092274A1 (en) * | 2015-09-24 | 2017-03-30 | Otojoy LLC | Captioning system and/or method |
| GB2568656A (en) | 2017-09-28 | 2019-05-29 | The Royal Nat Theatre | Caption delivery system |
| WO2020017961A1 (en) | 2018-07-16 | 2020-01-23 | Hazelebach & Van Der Ven Holding B.V. | Methods for a voice processing system |
| US11500226B1 (en) * | 2019-09-26 | 2022-11-15 | Scott Phillip Muske | Viewing area management for smart glasses |
Non-Patent Citations (1)
| Title |
|---|
| "Captioning and Subtitling for d/Deaf and Hard of Hearing Audiences", 14 January 2021, UCL PRESS, London WC1E 6BT, ISBN: 978-1-78735-710-5, article SOLEDAD ZÁRATE: "Captioning and Subtitling for d/Deaf and Hard of Hearing Audiences", pages: 1 - 178, XP093080436, DOI: . https://doi.org/10.14324/111.9781787357105 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025206956A1 (en) | 2025-10-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3633671B1 (en) | Audio guidance generation device, audio guidance generation method, and broadcasting system | |
| US10726842B2 (en) | Caption delivery system | |
| US20080195386A1 (en) | Method and a Device For Performing an Automatic Dubbing on a Multimedia Signal | |
| EP2165531B1 (en) | An audio animation system | |
| US20060285654A1 (en) | System and method for performing automatic dubbing on an audio-visual stream | |
| USRE42647E1 (en) | Text-to speech conversion system for synchronizing between synthesized speech and a moving picture in a multimedia environment and a method of the same | |
| US20090079833A1 (en) | Technique for allowing the modification of the audio characteristics of items appearing in an interactive video using rfid tags | |
| EP3224834B1 (en) | Apparatus and method for generating visual content from an audio signal | |
| US20230345086A1 (en) | System and method for providing descriptive video | |
| Huwiler | A Narratology of Audio Art: Telling Stories by Sound¹ | |
| Janer et al. | Immersive orchestras: audio processing for orchestral music VR content | |
| CN113545096B (en) | Information processing device and information processing system | |
| NL2037362B1 (en) | Subtitling system and method | |
| Simon et al. | MPEG-H Audio for Improving Accessibility in Broadcasting and Streaming | |
| Walczak et al. | Artificial voices | |
| US20220264193A1 (en) | Program production apparatus, program production method, and recording medium | |
| JP2008294722A (en) | Movie playback apparatus and movie playback method | |
| Lodge et al. | Helping blind people to watch television-the AUDETEL project | |
| Baumgartner et al. | Speech Intelligibility in TV | |
| JP2003259320A (en) | Video and audio synthesizer | |
| WO2025120784A1 (en) | Translated content generation system | |
| Abraitienė et al. | Translation as a Means of Social Integration. | |
| WO2021255831A1 (en) | Transmission device, communication method, and program | |
| WALLEY et al. | Practical Implementation of Automated Next Generation Audio Production for Live Sports | |
| Janer Mestres et al. | Immersive orchestras: Audio processing for orchestral music VR content |