Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Before explaining the subtitle optimization scheme provided by the embodiment of the present disclosure, hardware devices and application scenarios related to the subtitle optimization scheme are briefly introduced, so as to facilitate better understanding of the subtitle optimization scheme provided by the embodiment of the present disclosure.
Live broadcast simultaneous transmission proofreading refers to: the method comprises the steps of adding subtitles to live content of a main broadcast and then sending the added subtitles to a broadcast watching end so that a user of the broadcast watching end can see a live broadcast picture with the subtitles, firstly carrying out voice recognition on live broadcast audio by a machine in the subtitle adding process to obtain a first subtitle to be corrected, and then carrying out machine translation on the first subtitle to be corrected to obtain a second subtitle to be corrected (for example, the first subtitle is Chinese, and the second subtitle is corresponding English). And the original text proofreader proofreads the first subtitles, if an error is found, the manual modification is carried out, the translated text proofreader proofreads the second subtitles, and if an error is found, the manual modification is carried out. It is understood that the original proofreader and the translation proofreader may be the same person or different persons, and usually, in order to reduce the work intensity and improve the work efficiency and the proofreading accuracy, the original proofreader and the translation proofreader are different persons.
Wherein, live with pass proofreading flow includes: the live broadcast simultaneous transmission hardware equipment pulls a live broadcast video stream of a main broadcast from a server or a main broadcast end, records and processes the live broadcast video stream (the processing comprises the steps of collecting audio in the live broadcast video stream, carrying out voice recognition on the audio to obtain a first subtitle to be corrected, and translating the first subtitle to be corrected to obtain a second subtitle to be corrected), then plays the recorded live broadcast video stream through audio and video equipment, displays the first subtitle and the second subtitle on a display interface, a proofreader of original text proofreads the first subtitle, if errors are found, manual modification is carried out, a proofreader of translated text proofreads the second subtitle, and if errors are found, manual modification is carried out.
Optionally, referring to a schematic structural diagram of a live broadcast simultaneous transmission hardware device shown in fig. 1, where the live broadcast simultaneous transmission hardware device and the audio and video device are the same device, and the live broadcast simultaneous transmission hardware device and the audio and video device correspond to the device 24 in fig. 1. The original proofreader and the translated text proofreader correspond to one live broadcast simultaneous transmission hardware device respectively, for example, the original proofreader proofreads the first subtitles based on the device 24 (the device 24 may be regarded as a first live broadcast simultaneous transmission hardware device), and the translated text proofreader proofreads the second subtitles based on the device 25 (the device 25 is a backup of the device 24, and may be regarded as a second live broadcast simultaneous transmission hardware device). The terminal 21 corresponds to a terminal of the anchor, and the terminal 21 uploads the live video stream to the server 22. The device 24 pulls the live video stream from the terminal 21 or the server 22. For example, the device 24 pulls the live video stream from the server 22 according to a URL (Uniform Resource Locator) of the live video stream. Terminal 27 corresponds to the terminal of the viewing user and device 26 corresponds to a server.
The moment when the device 24 starts pulling the live video stream can be any time. Optionally, the device 24 starts to pull the live video stream after the original proofreader issues a "start instruction". For example, the proof reader may click on a button or icon in the user interface of device 24 at the 9:50 time of the day, i.e., issue a "start command," and device 24 may pull the live video stream from the 9:50 time of the day. Further, if the proof reader clicks "start live" on the user interface of the device 24 at 10:00 of the day, the device 24 records the live video stream pulled by the proof reader from 10:00, and the device 24 synchronously processes the pulled live video stream from 10:00, wherein the processing procedure includes: the audio in the live video stream is collected, the collected audio is subjected to speech recognition to obtain a first caption, the first caption is displayed on the display interface of the device 24, so that the original proofreader can proofread the first caption, the result of the speech recognition (for example, a chinese text) is translated to obtain a translation text (for example, english), that is, a second caption, and the second caption is displayed on the display interface of the device 25, so that the translation proofreader can proofread the second caption.
Optionally, referring to a schematic structural diagram of another live broadcast simultaneous transmission hardware device shown in fig. 2, where the live broadcast simultaneous transmission hardware device and the audio-video device are not the same device, but two different devices, for example, the live broadcast simultaneous transmission hardware device corresponds to the device 24 in fig. 3, the audio-video device corresponds to the second server 23 in fig. 3, the original text proofreader and the translation proofreader perform subtitle proofreading based on different live broadcast simultaneous transmission hardware devices, for example, the original text proofreader performs subtitle proofreading based on the device 24 (the device 24 may be regarded as a first live broadcast simultaneous transmission hardware device), and the translation proofreader performs proofreading based on the device 25 (the device 25 is a backup of the device 24, and may be regarded as a second live broadcast simultaneous transmission hardware device). The terminal 21 corresponds to a terminal of a main broadcast, the terminal 21 uploads a live video stream to the first server 22, and the second server 23 pulls the live video stream from the first server 22 or the terminal 21. Terminal 27 corresponds to the terminal of the viewing user and device 26 corresponds to a server.
The moment when the second server 23 starts pulling the live video stream from the first server 22 or the terminal 21 may be any time. Optionally, the second server 23 starts to pull the live video stream after the original proofreader issues a "start instruction". For example, the proof reader clicks a certain button or icon in the user interface of the device 24 at the time of 9:50 th day to issue a "start instruction", the device 24 sends the "start instruction" to the second server 23, and the second server 23 starts to pull the live video stream from the first server 22 or the terminal 21 after receiving the "start instruction". At 10:00 of the day, the proofreader of the original text clicks a "start live broadcasting" button on the user interface of the device 24, the device 24 sends a recording instruction to the second server 23 according to the clicking operation of the proofreader of the original text, and assuming that the second server 23 receives the recording instruction soon, that is, the second server 23 receives the recording instruction at 10:00, the second server 23 records the live video stream pulled by the second server 23 from 10:00, and the second server 23 synchronously processes the pulled live video stream from 10:00, that is, the recording of the live video stream and the processing of the live video stream are synchronously performed. The processing operation on the live video stream comprises the following steps: the audio in the live video stream is collected, the collected audio is subjected to speech recognition to obtain a first caption, the first caption is displayed on the display interface of the device 24, so that the original proofreader can proofread the first caption, the result of the speech recognition (for example, a chinese text) is translated to obtain a translation text (for example, an english text), that is, a second caption, and the second caption is displayed on the display interface of the device 25, so that the translation proofreader can proofread the second caption.
Taking fig. 1 as an example, if the original proofreader modifies the first subtitle (for example, chinese text) based on the process of the device 24 performing the proofreading of the first subtitle, the device 24 synchronizes the modified first subtitle to the device 25, so that the translation proofreader modifies the corresponding second subtitle (for example, english text) according to the modified first subtitle. Further, the device 25 sends the modified second subtitle to the device 24.
Taking fig. 2 as an example, if the original proofreader modifies the first subtitle (for example, chinese text) based on the process of the device 24 performing the proofreading on the first subtitle, the device 24 synchronizes the modified first subtitle to the second server 23, and the second server 23 further synchronizes the modified first subtitle to the device 25, so that the translation proofreader modifies the corresponding second subtitle (for example, english text) according to the modified first subtitle. Further, the device 25 sends the modified second subtitle to the second server 23, and synchronizes the modified second subtitle to the device 24 through the second server 23.
Fig. 3 is a flowchart of a subtitle optimization method in an embodiment of the present disclosure, where the subtitle optimization method is applied to live broadcast simultaneous transmission hardware devices, and aims to improve the proofreading accuracy and the proofreading efficiency of subtitles to be proofread through a certain subtitle optimization scheme. The method can be executed by a subtitle optimization device, the device can be realized in a software and/or hardware mode, and the device can be configured in live broadcast simultaneous transmission hardware equipment, such as an electronic terminal, specifically including but not limited to a smart phone, a palm computer, a tablet computer, wearable equipment with a display screen, a desktop computer, a notebook computer, an all-in-one machine, smart home equipment and the like. As shown in fig. 3, the method may specifically include the following steps:
step 301, displaying a user interface, where the user interface includes a player and one or more first subtitles corresponding to an audio stream in a live video stream.
Specifically, a user interface, such as the schematic diagram of a user interface shown in fig. 4, may be displayed on the display, and includes a player 410 and a plurality of first subtitles 420 corresponding to audio streams in a live video stream. The number of the first subtitles 420 may also be one, and fig. 4 shows an example where the number of the first subtitles 420 is 3 (a plurality is usually at least two).
The first subtitle is generally text obtained by performing audio extraction on a live video stream and performing voice recognition based on the extracted audio. Since audio extraction and speech recognition are usually performed automatically by a machine, the accuracy is not high, for example, the actual text corresponding to the audio is "zhang san", and the result of the speech recognition is "zhang san", so in order to improve the accuracy of the first subtitle, the first subtitle is usually checked manually after being obtained to be modified in time when an error is found. When the first subtitle is calibrated, the recorded live video stream is usually played in the player 410, and the original text calibrator can calibrate the first subtitle while watching the video, so that the calibration efficiency and the calibration accuracy can be improved.
Illustratively, a plurality of first subtitles are displayed in a first area of the user interface in a contextual manner, as shown in fig. 4, by displaying the plurality of first subtitles in the user interface in the contextual manner, an original proofreader can proofread the first subtitles by combining longitudinal context information, and can complete positioning and retrieval of content conveniently and quickly by the original proofreader, which not only can improve proofreading precision, but also can help to improve proofreading efficiency.
In one embodiment, the language corresponding to the first subtitle is the same as the language corresponding to the audio stream. For example, if the language corresponding to the audio stream is chinese, the first subtitle is a chinese text, and if the language corresponding to the audio stream is english, the first subtitle is an english text.
In one embodiment, the language corresponding to the first subtitle is different from the language corresponding to the audio stream. For example, if the language corresponding to the audio stream is chinese, the first subtitle is an english text, and if the language corresponding to the audio stream is english, the first subtitle is a chinese text.
Step 302, responding to a trigger operation for a target subtitle, playing a live video stream segment corresponding to the target subtitle in the player, so that a user proofs the target subtitle by watching the live video stream segment.
And the target caption is a caption in the one or more first captions. For example, the target subtitle is a subtitle 420 shown in fig. 4. The triggering operation for the target caption may be an operation of clicking the target caption, an operation of sliding the target caption, an operation of clicking a related control associated with the target caption, an operation of triggering a shortcut key when the target caption is in a specific state, or the like.
Specifically, the playing a live video stream segment corresponding to a target subtitle in the player in response to a trigger operation for the target subtitle includes:
responding to a triggering operation acted on a playing control piece associated with the target subtitle, and playing a live video stream segment corresponding to the target subtitle in the player; and when the target subtitle is in an editing state, displaying the playing control at the associated position of the target subtitle. For example, when the mouse is suspended above the target subtitle, the target subtitle is in an editing state, and the original translator can delete or modify a certain character of the target subtitle, or add a certain character to the target subtitle, etc. to edit the target subtitle; or when the target caption is selected, the target caption is in an editing state; or when the relevant control is clicked, the target subtitle is in an editing state. When the target subtitle is in an editing state, a play control is displayed at a position associated with the target subtitle, as shown in fig. 5, the target subtitle 420 is in the editing state, a play control 510 is displayed at a position associated with the target subtitle 420, and when the original translator clicks the play control 510, a live video stream segment corresponding to the target subtitle 420 is played in the player 410. The live video stream segment corresponding to the target subtitle 420 refers to: the target caption 420 can be obtained by performing voice recognition on the audio in the live video stream segment, that is, the live video stream segment whose audio semantic is the semantic expressed by the target caption.
Optionally, the playing, in response to the trigger operation for the target subtitle, a live video stream segment corresponding to the target subtitle in the player includes:
and when the target subtitle is in an editing state, responding to the triggering operation of a preset shortcut key to play a live video stream segment corresponding to the target subtitle in the player.
According to the technical scheme, the plurality of first subtitles are displayed on the user interface in the context mode, so that an original text proofreader can conveniently proofread the first subtitles by combining the context, and the proofreading efficiency and the proofreading accuracy are improved; when an original text proofreader wants to watch a video picture corresponding to a target subtitle, a certain operation can be triggered aiming at the target subtitle so that the video picture corresponding to the target subtitle can be seen in a player, the original text proofreader can conveniently play back any frame of video picture, and the first subtitle can be proofread conveniently by combining the video.
In some embodiments, to further facilitate the collation of the first subtitles, a recorded live video stream may be played in one player and a video segment played back in another player. Specifically, as shown in fig. 6, the player 420 includes a first player 610 and a second player 620. The playing of the live video stream segment corresponding to the target subtitle in the player includes: and playing a live video stream segment corresponding to the target caption in the first player 610. Playing the recorded live video stream in the second player 620 in response to the live video stream playing instruction; and in the process of playing the recorded live video stream, responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction. Specifically, for example, when the original proofreader clicks (may be a touch click, or may be a click manner such as a mouse click) an icon or button (such as the "start live" icon 630 shown in fig. 6) of "start live" on the live broadcast hardware device user interface, a live broadcast video stream playing instruction is triggered, and in response to the instruction, the recorded live broadcast video stream is played in the second player 620, and meanwhile, the original proofreader may also pause the playing of the live broadcast video stream at any time according to the self-proofreading condition. Therefore, the original text proofreader can control the playing of the live video stream in real time according to the self proofreading progress, and can achieve the purposes of proofreading, watching live broadcast and listening audio, so that the aim of proofreading the first subtitle by means of the heard audio and the mouth shape of the main broadcast in the live broadcast picture is achieved, and the proofreading accuracy and the proofreading efficiency of the first subtitle can be improved. In the proofreading process, if the original text proofreader finds that the first subtitle does not correspond to the text determined based on the live video stream heard and seen by the original text proofreader, the first subtitle is modified, and the purpose of proofreading the first subtitle is achieved.
The setting positions of the first player 610 and the second player 620 in the user interface are not limited, and the position of the first player 610 and the position of the second player 620 may be in an up-down relationship or a left-right relationship as shown in fig. 6. While the second player 620 for playing the recorded live video stream may be larger and the first player 610 for playing back the video segments may be smaller. Alternatively, the original proof reader is enabled to manually resize the first player 610 and the second player 620, and when playback of the video segment is required, the first player 610 may be resized for easy viewing.
On the basis of the foregoing embodiment, before playing the recorded live video stream in the second player in response to a live video stream playing instruction, the method further includes:
acquiring the live video stream according to the address information of the live video stream; and responding to a live broadcast starting instruction, and recording the live broadcast video stream. The address information of the live video stream is, for example, a source stream URL, and may be an address of the anchor. The relevant staff may fill in the source stream URL manually, or may automatically fill in the source stream URL by a machine in an automated manner, for example, when a user selects a target live video stream through a selection interface of another live video stream, the source stream URL of the target live video stream is automatically filled in the source stream URL entry box.
Further, the method further comprises:
and responding to a second caption display instruction, displaying second captions corresponding to the one or more first captions in the user interface respectively, wherein the languages corresponding to the second captions are different from the languages corresponding to the audio stream, and the second captions are displayed in a second area of the user interface in a contextual mode. Controlling the second area to be hidden and displayed in the user interface in response to a second subtitle hiding instruction; and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction. Any first subtitle in the one or more first subtitles and a second subtitle corresponding to the any first subtitle are in a transverse contrast relation in the user interface.
Optionally, the second subtitle display instruction may be triggered by clicking a preset icon or a preset button disposed in the user interface, for example, a user interface diagram shown in fig. 7 includes a preset icon 710, when the icon 710 is clicked, the second subtitle display instruction is triggered, second subtitles corresponding to one or more first subtitles are displayed in the user interface, for example, a user interface diagram shown in fig. 8 in which second subtitles 820 corresponding to each first subtitle 810 are displayed, and when the preset icon 830 in fig. 8 is clicked, the second subtitle hiding instruction is triggered to hide the second subtitles displayed in the user interface, that is, return to the user interface diagram shown in fig. 7 in which only the first subtitles 720 are displayed and the second subtitles corresponding to the first subtitles are not displayed.
With continued reference to FIG. 8, in one embodiment, the second subtitle 820 is displayed in the second area of the user interface in the form of a context. By displaying the plurality of second subtitles on the user interface in the context mode, a translation proofreader can proofread the second subtitles by combining the longitudinal context information, so that the proofreading precision can be improved, and the proofreading efficiency can be improved.
In an embodiment, any one of the one or more first subtitles and the second subtitle corresponding to any one of the first subtitles are in a horizontal comparison relationship in the user interface, as shown in fig. 8, the second subtitle corresponding to the first subtitle is in a horizontal comparison relationship in the user interface, so that an original text proofreader can conveniently proofread the first subtitle with reference to the second subtitle corresponding to the first subtitle, a translation proofreader can conveniently proofread the second subtitle with reference to the first subtitle corresponding to the second subtitle, and the proofreading efficiency and accuracy can be improved.
Further, the method further comprises: and marking the pushed first subtitle, the identification information corresponding to the pushed first subtitle and the second subtitle corresponding to the pushed first subtitle in the user interface according to the pushing progress of the live video stream.
By marking the pushed first caption, the identification information corresponding to the pushed first caption and the second caption corresponding to the pushed first caption in the user interface according to the pushing progress of the live video stream, an original proofreader can master the pushing progress of the live video stream at any time, and the proofreading rhythm and rate can be adjusted at any time.
By marking the corrected first subtitles in the user interface, the translation proofreader can perform proofreading of the second subtitles based on the first subtitles already proofread by the original proofreader, meanwhile, the proofreading speeds of the original proofreader and the translation proofreader can be kept balanced, and the proofreading efficiency of the original proofreader and the translation proofreader is improved together.
As shown in fig. 9, the user interface further includes identification information corresponding to one or more first subtitles, for example, the left sequence number "1, 2, 3, 4, 5, 6, 7, 8, 9" of each first subtitle, where "1, 2, 3, 4, 5, 6, 7, 8, 9" of each first subtitle is a screen-casting progress sequence number, and each screen-casting progress sequence number corresponds to a chinese text of a sentence in audio. The pushed state of the subtitles may be represented by different colors, for example, a green part (such as the subtitle 910 in fig. 9) represents the subtitles that have been pushed (i.e. the live video stream is pushed to the viewer, such as the device 27 shown in fig. 1), and the remaining non-green part (such as the subtitle part represented by the reference numeral 920) represents the subtitles that have not been pushed. The blue portion (e.g., the subtitle 930 in fig. 9), i.e., the 6 th line of subtitles, represents the portion that the proof reader is currently proofreading, the subtitles above the blue portion (e.g., the subtitles indicated by the reference numeral 940, i.e., the subtitles in the lines 1, 2, 3, 4, and 5) represent the subtitles that have already been proofread, and the subtitles below the blue portion (e.g., the subtitles indicated by the reference numeral 1050, i.e., the subtitles in the lines 7, 8, and 9) represent the subtitles that have not yet been proofread. By displaying the identification information on the user interface, an original proofreader can master the live push progress at any time so as to adjust the proofreading rhythm and rate at any time. For example, it can be determined from fig. 9 that the 2 nd, 3 rd, 4 th, and 5 th subtitles are subtitles that have been collated but have not been pushed yet, and the 7 th and 8 th subtitles are subtitles that have not been collated yet. Correspondingly, taking fig. 1 as an example, the device 24 may send the screen-casting progress sequence number of the first subtitle already checked by the original proofreader to the device 25, and the device 25 may identify the first subtitle already checked by the original proofreader in the local user interface, for example, by a certain color, so that the translation proofreader performs proofreading of the second subtitle based on the first subtitle already checked by the original proofreader, and meanwhile, the proofreading speeds of the original proofreader and the translation proofreader may also be kept balanced, and the proofreading efficiencies of the two are improved together.
Further, the method further comprises:
responding to a live broadcast delay setting instruction, and pushing a live broadcast video stream at least comprising the first subtitle according to the live broadcast delay set by the live broadcast delay setting instruction; or responding to a live broadcast delay setting instruction, and pushing a subtitle file formed by the first subtitle and the recorded live broadcast video stream according to the live broadcast delay set by the live broadcast delay setting instruction. The live video stream including the first subtitle refers to a video stream added with a subtitle in the live video stream. For example, the live broadcast delay set by the live broadcast delay setting instruction is 30 minutes, when the current day is 10:00, the original proofreader triggers a live broadcast start instruction, the device 24 or the server 23 in fig. 2 starts to record the pulled live broadcast video stream, and the device 24 or the server 23 pushes the live broadcast video stream with the subtitles to the terminal 27 of the watching user at 10:30 of the current day. Alternatively, the device 24 or the server 23 pushes the subtitle file and the already recorded live video stream to the terminal 27 of the watching user at 10:30 of the day.
Taking fig. 1 as an example, in one embodiment: the device 24 may generate a subtitle image according to each piece of collated first subtitle, second subtitle, and time information corresponding to the first subtitle. The time information of the subtitle image is the time information corresponding to the first subtitle, and the time information corresponding to the first subtitle is the time stamp of the audio frame corresponding to the first subtitle. Further, the device 24 determines, according to the time information of the generated subtitle image, a video frame corresponding to the time information in the live video stream, and combines the subtitle image with a picture of the video frame corresponding to the time information, thereby obtaining the live video stream to which the subtitle is added. The device 24 may send the live video stream added with the subtitles to the server 26 when the preset delay time is reached according to a preset delay time (for example, 30 minutes), and the server 26 sends the live video stream added with the subtitles to the terminal 27, so that the user at the viewing end can view the live video stream added with the first subtitles and/or the second subtitles.
In another embodiment: the device 24 may generate a subtitle file according to each piece of collated first subtitle, second subtitle, and time information corresponding to the first subtitle, and when a preset delay time is reached, the device 24 sends the subtitle file and the recorded live video stream to the server 26, the server 26 sends the subtitle file and the live video stream to the terminal 27, the terminal 27 plays the subtitle file and the live video stream synchronously in the player, and the time information in the subtitle file needs to be aligned with the time information in the live video stream, so that a user at a viewing end can view the live video stream to which the first subtitle and/or the second subtitle is added.
Taking fig. 2 as an example, in one embodiment: the server 23 may generate a subtitle image according to each piece of corrected first subtitle, second subtitle, and time information corresponding to the first subtitle. The time information of the subtitle image is the time information corresponding to the first subtitle, and the time information corresponding to the first subtitle is the time stamp of the audio frame corresponding to the first subtitle. Further, the server 23 determines a video frame corresponding to the time information in the live video stream according to the time information of the generated subtitle image, and combines the subtitle image with a picture of the video frame corresponding to the time information, thereby obtaining the live video stream to which the subtitle is added. The server 23 may transmit the live video stream added with the subtitles to the server 26 when a preset delay time (for example, 30 minutes) is reached, and the server 26 may transmit the live video stream added with the subtitles to the terminal 27 of the viewing user.
In another embodiment: the server 23 may generate a subtitle file according to each piece of collated first subtitle, second subtitle, and time information corresponding to the first subtitle, when a preset delay time is reached, the server 23 sends the subtitle file and the recorded live video stream to the server 26, the server 26 sends the subtitle file and the live video stream to the terminal 27 of the viewing user, the terminal 27 plays the subtitle file and the live video stream synchronously in the player, and the time information in the subtitle file needs to be aligned with the time information in the live video stream.
An optional processing flow of the subtitle optimization scheme provided by the embodiment of the disclosure is as follows: firstly pulling the live video stream, then recording the live video stream, and processing the live video stream while recording to obtain a first caption and a second caption. The first caption and the second caption are displayed on a user interface of an original text proofreader and also displayed on a user interface of a translation proofreader, the original text proofreader proofs the first caption, and the translation proofreader proofs the second caption. And generating a subtitle image or a subtitle file according to the corrected first subtitle and the second subtitle, if the subtitle image is the subtitle image, fusing the subtitle image into a live video stream to obtain a live video stream with the subtitle, and further pushing the live video stream with the subtitle to a terminal of a watching user. And if the video stream is the subtitle file, simultaneously pushing the subtitle file and the recorded live video stream to a terminal of a watching user.
Further, referring to a schematic diagram of a user interface shown in fig. 10, the player 420 further includes a third player 630, and the method further includes: the live video stream at least including the first subtitle is played in the third player 630, so that the original proofreader can conveniently check the video frame after the subtitle is added, and further confirm whether the proofread subtitle is accurate and consistent with the video frame.
According to the subtitle optimization method provided by the embodiment of the disclosure, the player is displayed on the user interface, and when a live video stream playing instruction is received, the recorded live video stream is played in the player, so that an original text proofreader can watch a live picture corresponding to the live video stream, and thus the original text proofreader can watch the live picture while proofreading a first subtitle corresponding to an audio stream in the live video stream, namely, listening, watching and proofreading are realized. When an original text proofreader wants to watch a video picture corresponding to a target subtitle, a certain operation can be triggered aiming at the target subtitle so that the video picture corresponding to the target subtitle can be seen in a player, the original text proofreader can conveniently play back any frame of video picture, and the first subtitle can be proofread conveniently by combining the video.
Fig. 11 is a schematic structural diagram of a subtitle optimizing apparatus in an embodiment of the present disclosure. The device provided by the embodiment of the disclosure can be configured in live broadcast simultaneous transmission hardware equipment. As shown in fig. 11, the apparatus specifically includes: a first display module 1110 and a play module 1120.
The first display module 1110 is configured to display a user interface, where the user interface includes a player and one or more first subtitles corresponding to an audio stream in a live video stream; a playing module 1120, configured to, in response to a trigger operation for a target subtitle, play a live video stream segment corresponding to the target subtitle in the player, so that a user checks the target subtitle by watching the live video stream segment; and the target caption is a caption in the one or more first captions.
Optionally, the playing module 1120 is specifically configured to: responding to a triggering operation acted on a playing control piece associated with the target subtitle, and playing a live video stream segment corresponding to the target subtitle in the player; and when the target subtitle is in an editing state, displaying the playing control at the associated position of the target subtitle.
Optionally, the playing module 1120 is specifically configured to: and when the target subtitle is in an editing state, responding to the triggering operation of a preset shortcut key to play a live video stream segment corresponding to the target subtitle in the player.
Optionally, the player includes a first player and a second player; the playing module 1120 is configured to play a live video stream segment corresponding to the target subtitle in the first player, and in response to a live video stream playing instruction, play a recorded live video stream in the second player.
Optionally, the system further includes a modification module, configured to modify, in response to a first subtitle modification instruction, a first subtitle pointed by the first subtitle modification instruction in a process of playing the recorded live video stream.
Optionally, the system further includes an obtaining module, configured to obtain the live video stream according to address information of the live video stream before playing the recorded live video stream in the second player. And the recording module is used for responding to a live broadcast starting instruction and recording the live broadcast video stream.
Optionally, the method further includes: the pushing module is used for responding to a live broadcast delay setting instruction and pushing a live broadcast video stream at least comprising the first caption according to the live broadcast delay set by the live broadcast delay setting instruction; or responding to a live broadcast delay setting instruction, and pushing a subtitle file formed by the first subtitle and the recorded live broadcast video stream according to the live broadcast delay set by the live broadcast delay setting instruction.
Optionally, the player further includes a third player, and the playing module 1120 is configured to play a live video stream including at least the first subtitle in the third player.
Optionally, the language corresponding to the one or more first subtitles is the same as the language corresponding to the audio stream.
Optionally, the method further includes: and the second display module is used for responding to a second caption display instruction, displaying second captions corresponding to the one or more first captions in the user interface respectively, wherein the languages corresponding to the second captions are different from the languages corresponding to the audio streams, and the second captions are displayed in a second area of the user interface in a contextual manner.
Optionally, the apparatus further includes a hiding module, configured to control the second area to be hidden and displayed in the user interface in response to a second subtitle hiding instruction; the modification module is further to: and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.
Optionally, the plurality of first subtitles are displayed in a first area of the user interface in a contextual manner; any first subtitle in the one or more first subtitles and a second subtitle corresponding to the any first subtitle are in a transverse contrast relation in the user interface.
Optionally, the user interface further includes identification information corresponding to the one or more first subtitles.
Optionally, the apparatus further comprises: and the marking module is used for marking the pushed first caption, the identification information corresponding to the pushed first caption and the second caption corresponding to the pushed first caption in the user interface according to the pushing progress of the live video stream.
According to the subtitle optimization device provided by the embodiment of the disclosure, the plurality of first subtitles are displayed in the form of the context on the user interface, so that an original proofreader can conveniently proofread the first subtitles by combining the context, and the proofreading efficiency and the proofreading accuracy are improved; when an original text proofreader wants to watch a video picture corresponding to a target subtitle, a certain operation can be triggered aiming at the target subtitle so that the video picture corresponding to the target subtitle can be seen in a player, the original text proofreader can conveniently play back any frame of video picture, and the first subtitle can be proofread conveniently by combining the video.
The apparatus provided in the embodiment of the present disclosure may perform the method steps provided in the embodiment of the method of the present disclosure, and the advantageous effects are not described herein again.
Fig. 12 is a schematic structural diagram of an electronic device in an embodiment of the disclosure. Referring now specifically to fig. 12, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 500 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), a wearable electronic device, and the like, and fixed terminals such as a digital TV, a desktop computer, a smart home device, and the like. The electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 12, an electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes to implement the … method of embodiments as described in this disclosure, according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 12 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart, thereby implementing the method as described above. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
displaying a user interface, wherein the user interface comprises a player and one or more first subtitles corresponding to an audio stream in a live video stream; responding to a trigger operation aiming at a target subtitle, playing a live video stream segment corresponding to the target subtitle in the player, so that a user proofreads the target subtitle by watching the live video stream segment; and the target caption is a caption in the one or more first captions.
Optionally, when the one or more programs are executed by the electronic device, the electronic device may further perform other steps described in the above embodiments.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a subtitle optimization method including: displaying a user interface, wherein the user interface comprises a player and one or more first subtitles corresponding to an audio stream in a live video stream; responding to a trigger operation aiming at a target subtitle, playing a live video stream segment corresponding to the target subtitle in the player, so that a user proofreads the target subtitle by watching the live video stream segment; and the target caption is a caption in the one or more first captions.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the playing, in the player, a live video stream segment corresponding to a target subtitle in response to a trigger operation for the target subtitle includes: responding to a triggering operation acted on a playing control piece associated with the target subtitle, and playing a live video stream segment corresponding to the target subtitle in the player; and when the target subtitle is in an editing state, displaying the playing control at the associated position of the target subtitle.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the playing, in the player, a live video stream segment corresponding to a target subtitle in response to a trigger operation for the target subtitle includes: and when the target subtitle is in an editing state, responding to the triggering operation of a preset shortcut key to play a live video stream segment corresponding to the target subtitle in the player.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the player includes a first player and a second player; the playing of the live video stream segment corresponding to the target subtitle in the player includes: and playing a live video stream segment corresponding to the target caption in the first player.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the method further includes: responding to a live video stream playing instruction, and playing the recorded live video stream in the second player; and in the process of playing the recorded live video stream, responding to a first subtitle modification instruction, and modifying the first subtitle pointed by the first subtitle modification instruction.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, before playing the recorded live video stream in the second player in response to the live video stream playing instruction, the method further includes: acquiring the live video stream according to the address information of the live video stream; and responding to a live broadcast starting instruction, and recording the live broadcast video stream.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the method further includes: responding to a live broadcast delay setting instruction, and pushing a live broadcast video stream at least comprising the first subtitle according to the live broadcast delay set by the live broadcast delay setting instruction; or responding to a live broadcast delay setting instruction, and pushing a subtitle file formed by the first subtitle and the recorded live broadcast video stream according to the live broadcast delay set by the live broadcast delay setting instruction.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the player further includes a third player, and the method further includes: and playing a live video stream at least comprising the first subtitle in the third player.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the language corresponding to the one or more first subtitles is the same as the language corresponding to the audio stream; the method further comprises the following steps: and responding to a second caption display instruction, displaying second captions corresponding to the one or more first captions in the user interface respectively, wherein the languages corresponding to the second captions are different from the languages corresponding to the audio stream, and the second captions are displayed in a second area of the user interface in a contextual mode.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, in response to a second subtitle hiding instruction, controlling the second region to be hidden and displayed in the user interface; and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the plurality of first subtitles are displayed in a first area of the user interface in a contextual manner; any first subtitle in the one or more first subtitles and a second subtitle corresponding to the any first subtitle are in a transverse contrast relation in the user interface.
According to one or more embodiments of the present disclosure, in the method provided by the present disclosure, optionally, the user interface further includes identification information corresponding to the one or more first subtitles, respectively; the method further comprises the following steps: and marking the pushed first subtitle, the identification information corresponding to the pushed first subtitle and the second subtitle corresponding to the pushed first subtitle in the user interface according to the pushing progress of the live video stream.
According to one or more embodiments of the present disclosure, there is provided a subtitle optimizing apparatus including: the first display module is used for displaying a user interface, and the user interface comprises a player and one or more first subtitles corresponding to an audio stream in a live video stream; the playing module is used for responding to the triggering operation aiming at the target caption, and playing a live video stream segment corresponding to the target caption in the player so that a user proofreads the target caption by watching the live video stream segment; wherein the target caption is one of the one or more first captions.
According to one or more embodiments of the present disclosure, in the subtitle optimizing apparatus provided by the present disclosure, optionally, the playing module is specifically configured to: responding to a triggering operation acted on a playing control piece associated with the target subtitle, and playing a live video stream segment corresponding to the target subtitle in the player; and when the target subtitle is in an editing state, displaying the playing control at the associated position of the target subtitle.
According to one or more embodiments of the present disclosure, in the subtitle optimizing apparatus provided by the present disclosure, optionally, the playing module is specifically configured to: and when the target subtitle is in an editing state, responding to the triggering operation of a preset shortcut key to play a live video stream segment corresponding to the target subtitle in the player.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the player includes a first player and a second player; the playing module is used for playing a live video stream segment corresponding to the target caption in the first player, and responding to a live video stream playing instruction to play a recorded live video stream in the second player.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, a modification module is further included, configured to, in a process of playing the recorded live video stream, respond to a first subtitle modification instruction, and modify a first subtitle pointed by the first subtitle modification instruction.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, an obtaining module is further included, configured to obtain a live video stream according to address information of the live video stream before playing a recorded live video stream in the second player. And the recording module is used for responding to a live broadcast starting instruction and recording the live broadcast video stream.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the apparatus further includes: the pushing module is used for responding to a live broadcast delay setting instruction and pushing a live broadcast video stream at least comprising the first caption according to the live broadcast delay set by the live broadcast delay setting instruction; or responding to a live broadcast delay setting instruction, and pushing a subtitle file formed by the first subtitle and the recorded live broadcast video stream according to the live broadcast delay set by the live broadcast delay setting instruction.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the player further includes a third player, and the playing module is configured to play a live video stream including at least the first subtitle in the third player.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the language corresponding to the one or more first subtitles is the same as the language corresponding to the audio stream.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the apparatus further includes: and the second display module is used for responding to a second caption display instruction, displaying second captions corresponding to the one or more first captions in the user interface respectively, wherein the languages corresponding to the second captions are different from the languages corresponding to the audio streams, and the second captions are displayed in a second area of the user interface in a contextual manner.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, a hiding module is further included, configured to control, in response to a second subtitle hiding instruction, the second area to be hidden and displayed in the user interface; the modification module is further to: and responding to a second subtitle modification instruction, and modifying the second subtitle pointed by the second subtitle modification instruction.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the plurality of first subtitles are displayed in a first area of the user interface in a contextual manner; any first subtitle in the one or more first subtitles and a second subtitle corresponding to the any first subtitle are in a transverse contrast relation in the user interface.
According to one or more embodiments of the present disclosure, in the subtitle optimization apparatus provided by the present disclosure, optionally, the user interface further includes identification information corresponding to each of the one or more first subtitles.
According to one or more embodiments of the present disclosure, in the subtitle optimizing apparatus provided by the present disclosure, optionally, the apparatus further includes: and the marking module is used for marking the pushed first caption, the identification information corresponding to the pushed first caption and the second caption corresponding to the pushed first caption in the user interface according to the pushing progress of the live video stream.
In accordance with one or more embodiments of the present disclosure, there is provided an electronic device including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement any of the methods provided by the present disclosure.
According to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods provided by the present disclosure.
Embodiments of the present disclosure also provide a computer program product comprising a computer program or instructions which, when executed by a processor, implement the method as described above.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.