CN113992972B - Subtitle display method, device, electronic device and readable storage medium - Google Patents
Subtitle display method, device, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN113992972B CN113992972B CN202111267156.2A CN202111267156A CN113992972B CN 113992972 B CN113992972 B CN 113992972B CN 202111267156 A CN202111267156 A CN 202111267156A CN 113992972 B CN113992972 B CN 113992972B
- Authority
- CN
- China
- Prior art keywords
- image frame
- audio data
- audio
- target video
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/441—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
- H04N21/4415—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47217—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
本申请公开了一种字幕显示方法、装置、电子设备和可读存储介质,属于终端技术领域。字幕显示方法包括:在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象,其中,第一音频数据与第一图像帧相对应;根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息;其中,音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。
The present application discloses a subtitle display method, device, electronic device and readable storage medium, which belongs to the field of terminal technology. The subtitle display method includes: during the playback of a target video, when it is detected that a first image frame of the target video includes multiple first objects, according to an audio recognition model, a target object corresponding to first audio data is determined from multiple first objects, wherein the first audio data corresponds to the first image frame; according to the display position of the target object, subtitle information corresponding to the first audio data is displayed; wherein the audio recognition model is trained based on a second image frame and second audio data corresponding to the second image frame, and the second image frame includes a single first object.
Description
技术领域Technical Field
本申请属于终端技术领域,具体涉及一种字幕显示方法、装置、电子设备和可读存储介质。The present application belongs to the field of terminal technology, and specifically relates to a subtitle display method, device, electronic device and readable storage medium.
背景技术Background Art
目前,字幕已经普遍应用在电影、电视剧、游戏等各种类型的多媒体数据中。通过显示字幕,一方面,能够方便用户欣赏视频原片的声音艺术,另一方面,也便于听力障碍的用户观看视频,获知视频的内容。At present, subtitles have been widely used in various types of multimedia data such as movies, TV series, games, etc. By displaying subtitles, on the one hand, it is convenient for users to appreciate the sound art of the original video, and on the other hand, it is also convenient for hearing-impaired users to watch the video and understand the content of the video.
随着技术的发展,目前在视频播放的过程中,能够通过识别模型实现将音频数据实时转换为字幕,但由于难以获得有效的训练数据,因此,实时得到的字幕通常难以与视频中包括人物相对应,影响了用户的观看视频的体验。With the development of technology, it is now possible to convert audio data into subtitles in real time during video playback through recognition models. However, due to the difficulty in obtaining effective training data, the subtitles obtained in real time are usually difficult to correspond to the characters in the video, affecting the user's viewing experience.
发明内容Summary of the invention
本申请实施例的目的是提供一种字幕显示方法、装置、电子设备和可读存储介质,能够解决字字幕难以与视频中包括人物相对应,影响用户的观看视频的体验的问题。The purpose of the embodiments of the present application is to provide a subtitle display method, device, electronic device and readable storage medium, which can solve the problem that subtitles are difficult to correspond to characters included in the video, affecting the user's experience of watching the video.
第一方面,本申请实施例提供了一种字幕显示方法,该方法包括:In a first aspect, an embodiment of the present application provides a subtitle display method, the method comprising:
在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象,其中,第一音频数据与第一图像帧相对应;During the playback of the target video, when it is detected that a first image frame of the target video includes a plurality of first objects, determining a target object corresponding to first audio data from the plurality of first objects according to an audio recognition model, wherein the first audio data corresponds to the first image frame;
根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息;Displaying subtitle information corresponding to the first audio data according to the display position of the target object;
其中,音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。The audio recognition model is trained based on the second image frame and second audio data corresponding to the second image frame, and the second image frame includes a single first object.
第二方面,本申请实施例提供了一种字幕显示装置,该装置包括:In a second aspect, an embodiment of the present application provides a subtitle display device, the device comprising:
处理模块,用于在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象,其中,第一音频数据与第一图像帧相对应;a processing module, configured to determine, during the playback of a target video, a target object corresponding to first audio data from among the multiple first objects when it is detected that a first image frame of the target video includes multiple first objects according to an audio recognition model, wherein the first audio data corresponds to the first image frame;
显示模块,用于根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息;A display module, used for displaying subtitle information corresponding to the first audio data according to a display position of the target object;
其中,音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。The audio recognition model is trained based on the second image frame and second audio data corresponding to the second image frame, and the second image frame includes a single first object.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps of the method described in the first aspect.
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.
在本申请实施例中,在目标视频播放的过程中,对于包括多个第一对象的第一图像帧,通过使用训练好的音频识别模型,可以从多个第一对象中确定与第一音频数据对应的目标对象,由于音频识别模型是根据目标视频中第二图像帧对应的第二音频数据训练得到的,又由于第二图像帧中只包括单个第一对象,因此,训练得到的音频识别模型能够准确的从多个第一对象中识别出目标对象。接下来,根据所述目标对象所在的显示位置,显示第一音频数据对应的字幕信息,可以方便用户了解该字幕信息对应的目标对象,提高用户的观看视频的体验。In an embodiment of the present application, during the playback of a target video, for a first image frame including multiple first objects, by using a trained audio recognition model, a target object corresponding to the first audio data can be determined from multiple first objects. Since the audio recognition model is trained based on the second audio data corresponding to the second image frame in the target video, and since the second image frame only includes a single first object, the trained audio recognition model can accurately identify the target object from multiple first objects. Next, according to the display position of the target object, the subtitle information corresponding to the first audio data is displayed, which can facilitate users to understand the target object corresponding to the subtitle information and improve the user's video viewing experience.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本申请实施例提供的一种字幕显示方法的流程示意图;FIG1 is a schematic flow chart of a subtitle display method provided in an embodiment of the present application;
图2是本申请实施例提供的一种与音频数据对应的图像帧的示意图;FIG2 is a schematic diagram of an image frame corresponding to audio data provided by an embodiment of the present application;
图3是本申请实施例提供的一种字幕显示示意图;FIG3 is a schematic diagram of a subtitle display provided by an embodiment of the present application;
图4是本申请实施例提供的一种字幕显示装置的结构示意图;FIG4 is a schematic diagram of the structure of a subtitle display device provided in an embodiment of the present application;
图5是本申请实施例提供的一种电子设备的结构示意图;FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;
图6是本申请实施例提供的另一种电子设备的硬件结构示意图。FIG6 is a schematic diagram of the hardware structure of another electronic device provided in an embodiment of the present application.
具体实施方式DETAILED DESCRIPTION
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. All other embodiments obtained by ordinary technicians in this field based on the embodiments in the present application belong to the scope of protection of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first", "second", etc. are generally of one type, and the number of objects is not limited. For example, the first object can be one or more. In addition, "and/or" in the specification and claims represents at least one of the connected objects, and the character "/" generally indicates that the objects associated with each other are in an "or" relationship.
针对背景技术中出现的问题,本申请实施例提供了一种字幕显示方法、装置、电子设备和可读存储介质,在目标视频播放的过程中,对于包括多个第一对象的第一图像帧,通过使用训练好的音频识别模型可以从多个第一对象中确定与所述第一音频数据对应的目标对象,由于音频识别模型是根据目标视频中第二图像帧对应的第二音频数据训练得到的,又由于第二图像帧中只包括单个第一对象,因此,训练得到的音频识别模型能够准确的从多个第一对象中识别出目标对象。接下来,根据所述目标对象所在的显示位置,显示第一音频数据对应的字幕信息,可以方便用户了解该字幕信息对应的目标对象,提高用户的观看视频的体验。In response to the problems arising from the background technology, the embodiments of the present application provide a subtitle display method, device, electronic device and readable storage medium. During the playback of a target video, for a first image frame including multiple first objects, a target object corresponding to the first audio data can be determined from multiple first objects by using a trained audio recognition model. Since the audio recognition model is trained based on the second audio data corresponding to the second image frame in the target video, and since the second image frame only includes a single first object, the trained audio recognition model can accurately identify the target object from multiple first objects. Next, the subtitle information corresponding to the first audio data is displayed according to the display position of the target object, which can facilitate users to understand the target object corresponding to the subtitle information and improve the user's video viewing experience.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的字幕显示方法、装置、电子设备和可读存储介质进行详细地说明。The following is a detailed description of the subtitle display method, device, electronic device and readable storage medium provided by the embodiments of the present application through specific embodiments and their application scenarios in conjunction with the accompanying drawings.
图1是本申请实施例提供的一种字幕显示方法的流程示意图,该字幕显示方法可以包括步骤110至步骤120。FIG. 1 is a flow chart of a subtitle display method provided in an embodiment of the present application. The subtitle display method may include steps 110 to 120 .
步骤110,在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象。Step 110 , during the playback of the target video, when it is detected that the first image frame of the target video includes a plurality of first objects, a target object corresponding to the first audio data is determined from the plurality of first objects according to an audio recognition model.
其中,第一音频数据与第一图像帧相对应,音频识别模型音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。The first audio data corresponds to the first image frame, the audio recognition model is trained based on the second image frame and the second audio data corresponding to the second image frame, and the second image frame includes a single first object.
在本申请实施例中,目标视频可以是电视剧、电影、小视频、综艺、视频片段等,也可以是网络直播,或者是直播回放,还可以是其他的视频,在此不做限定。为了便于用户欣赏目标视频中包括的各种语言,了解目标视频中原有的声音,在播放目标视频的过程中,在播放界面显示的字幕信息可以是在视频播放的过程中实时获取到的,也可以是在电子设备在播放目标视频后,缓存在电子设备的字幕信息,在此并不具体限定。In the embodiment of the present application, the target video may be a TV series, a movie, a short video, a variety show, a video clip, etc., or a live broadcast, or a live broadcast playback, or other videos, which are not limited here. In order to facilitate users to appreciate the various languages included in the target video and understand the original sound in the target video, during the process of playing the target video, the subtitle information displayed on the playback interface may be obtained in real time during the video playback process, or may be the subtitle information cached in the electronic device after the electronic device plays the target video, which is not specifically limited here.
在一些实施例中,在目标视频中可能存在一个第一对象,也可能存在多个第一对象。例如,在影视剧中的一些画面帧中可以只包括一个角色,也可以在一些画面帧中同时包括可以多个角色,又例如,在直播中,一些画面帧可以包括一个主播,一些画面帧中也可以包括多个主播,在此不一一列举。In some embodiments, there may be one first object or multiple first objects in the target video. For example, some frames in a film or TV series may include only one character, or some frames may include multiple characters at the same time. For another example, in a live broadcast, some frames may include one anchor, or some frames may include multiple anchors. The examples are not listed here.
在本申请实施例中,第一图像帧为目标视频中包括两个或两个以上第一对象时对应的图像帧,第二图像帧为目标视频中包括一个第一对象时对应的图像帧。In the embodiment of the present application, the first image frame is an image frame corresponding to when the target video includes two or more first objects, and the second image frame is an image frame corresponding to when the target video includes one first object.
在本申请实施例中,通过音频识别模型,能够识别出音频数据所对应的目标对象。为了提高音频识别模型的可靠性,在一些实施例中,可以将第二图像帧和第二图像帧对应的第二音频数据作为训练数据,由于第二图像帧中只包括一个第一对象,由此可以认为第二图像帧对应的音频数据来自第二图像中包括的第一对象。In the embodiment of the present application, the target object corresponding to the audio data can be identified through the audio recognition model. In order to improve the reliability of the audio recognition model, in some embodiments, the second image frame and the second audio data corresponding to the second image frame can be used as training data. Since the second image frame only includes one first object, it can be considered that the audio data corresponding to the second image frame comes from the first object included in the second image.
基于此,电子设备可以获取目标视频中的多个第二图像帧,以及每个第二图像帧对应的音频数据。示例性的,电子设备可以获取每个第二图像帧对应的音频数据中的音频特征数据,其中,音频特征数据,例如可以包括用词习惯、音色、音调,以及声纹信息等。通过每个第二图像帧对应音频数据的音频特征数据,可以作为训练数据,用来训练得到音频识别模型。Based on this, the electronic device can obtain multiple second image frames in the target video, and the audio data corresponding to each second image frame. Exemplarily, the electronic device can obtain audio feature data in the audio data corresponding to each second image frame, wherein the audio feature data may include, for example, word usage habits, timbre, pitch, and voiceprint information. The audio feature data of the audio data corresponding to each second image frame can be used as training data to train an audio recognition model.
在本申请实施例中,由于音频识别模型是根据目标视频自身的音频数据,以及图像帧中的第一对象作为训练数据训练得到,因此,在第一图像帧中包括多个第一对象的情况下,在解析第一图像帧对应的第一音频数据时,能够准确地从多个第一对象中确定与第一音频数据对应的目标对象,以便于获取到的字幕信息能够在多个第一对象准确的对应到目标对象。In an embodiment of the present application, since the audio recognition model is trained based on the audio data of the target video itself and the first object in the image frame as training data, when the first image frame includes multiple first objects, when parsing the first audio data corresponding to the first image frame, it is possible to accurately determine the target object corresponding to the first audio data from the multiple first objects, so that the acquired subtitle information can accurately correspond to the target object among the multiple first objects.
作为一个具体地实施例,为了提高音频识别模型的可靠性,在根据音频识别模型,解析第一图像帧对应的第一音频数据,从多个第一对象中确定与第一音频数据对应的目标对象之前,本申请实施例还可以包括以下步骤:As a specific embodiment, in order to improve the reliability of the audio recognition model, before parsing the first audio data corresponding to the first image frame according to the audio recognition model and determining the target object corresponding to the first audio data from multiple first objects, the embodiment of the present application may further include the following steps:
首先,获取训练数据集,其中,训练数据集包括第二图像帧和第二图像帧对应的第二音频数据;接下来,根据训练数据集,对预设的音频识别网络进行训练,得到音频识别模型。First, a training data set is obtained, wherein the training data set includes a second image frame and second audio data corresponding to the second image frame; next, a preset audio recognition network is trained according to the training data set to obtain an audio recognition model.
具体地,可以根据图像识别技术确认目标视频中的每个图像帧中包括的第一对象的数量,例如,图像识别技术可以对图像帧中是否包括人脸进行识别。其中,图像识别技术例如可以是预设的CNN算法等,在此并不具体限定。Specifically, the number of first objects included in each image frame in the target video can be confirmed based on image recognition technology. For example, the image recognition technology can identify whether the image frame includes a face. The image recognition technology can be, for example, a preset CNN algorithm, etc., which is not specifically limited here.
在一些实施例中,由于第二图像帧对应的音频数据的音频特征数据例如可以包括的用词习惯、音色、音调,以及声纹信息等,因此,第二图像帧中包括的第一对象可以与第二图像帧对应的音频特征数据一一对应,作为预设的音频识别网络的训练数据,由于训练数据来自目标视频本身,不仅获取训练数据的成本低,而且训练数据可靠性高,从而能实现对预设的音频识别网络进行有效的训练。In some embodiments, since the audio feature data of the audio data corresponding to the second image frame may include, for example, word usage habits, timbre, pitch, and voiceprint information, the first object included in the second image frame may correspond one-to-one to the audio feature data corresponding to the second image frame, as training data for the preset audio recognition network. Since the training data comes from the target video itself, not only is the cost of acquiring the training data low, but the reliability of the training data is also high, thereby enabling effective training of the preset audio recognition network.
为了能够提高从多个第一对象中辨别出目标对象的准确性,在本申请实施例中,获取训练数据集,具体可以包括以下步骤:In order to improve the accuracy of distinguishing a target object from multiple first objects, in an embodiment of the present application, obtaining a training data set may specifically include the following steps:
根据目标视频中第一对象的发声时间段,对目标视频进行分割处理,得到多个与第一对象的音频数据对应的图像帧;从多个图像帧中,获取包括单个第一对象的第二图像帧,以及第二图像帧对应的第二音频数据;将第二图像帧中的第一对象与第二音频数据相关联,以得到训练数据集。According to the sound utterance time period of the first object in the target video, the target video is segmented to obtain a plurality of image frames corresponding to the audio data of the first object; from the plurality of image frames, a second image frame including a single first object and second audio data corresponding to the second image frame are obtained; the first object in the second image frame is associated with the second audio data to obtain a training data set.
具体地,以目标视频为电影为例,在电影的音频数据包括多句第一对象的台词,目标视频中第一对象的发声时,例如可以是不同第一对象之间的对话、也可以包括第一对象的独白等,在此并不具体限定。Specifically, taking the target video as a movie as an example, when the audio data of the movie includes multiple lines of the first object, the sound of the first object in the target video may be, for example, a conversation between different first objects, or a monologue of the first object, etc., which is not specifically limited here.
在获取目标视频的多个图像帧时,可以根据每个第一对象的台词,以句为单位,根据每句台词开始时间点和结束的时间点,对整个电影的音频数据进行划分。相应的,根据每句台词开始时间点和结束的时间点,对目标视频进行分割处理,得到与每句台词对应的图像帧,接下来,可以将每句台词的音频数据与图像帧一一对应。When acquiring multiple image frames of the target video, the audio data of the entire movie can be divided according to the lines of each first object, in units of sentences, according to the start time point and the end time point of each line. Accordingly, the target video is segmented according to the start time point and the end time point of each line to obtain image frames corresponding to each line, and then the audio data of each line can be matched one by one with the image frame.
在一些实施例中,根据预设的图像识别算法,可确定每个图像帧中第一对象的数量,并将包括两个或两个以上第一对象时对应的图像帧作为第一图像帧,将包括一个第一对象时对应的图像帧作为第二图像帧。In some embodiments, according to a preset image recognition algorithm, the number of first objects in each image frame can be determined, and the image frame corresponding to two or more first objects is used as the first image frame, and the image frame corresponding to one first object is used as the second image frame.
图2是本申请实施例提供的一种与音频数据对应的图像帧的示意图,结合图2所示,根据第一对象的发声时间段,可以将目标视频的音频数据划分为音频1至音频6,得到六个子音频数据。相应的,基于每个第一对象的发声时间段,目标视频可以划分为图像帧1至图像帧6,得到六个图像帧。其中,图像帧1、图像帧2、图像帧3分别包括第一对象A、第一对象B、第一对象C,图像帧1、图像帧2、图像帧3可以作为第二图像帧。基于图像帧1、图像帧2、图像帧3分别对应音频1、音频2、音频3,因此,音频1可以标记为第一对象A的声音数据,音频2可以标记为第一对象B的声音数据,音频3可以标记为第一对象C的声音数据。FIG2 is a schematic diagram of an image frame corresponding to audio data provided by an embodiment of the present application. In combination with FIG2, according to the sounding time period of the first object, the audio data of the target video can be divided into audio 1 to audio 6 to obtain six sub-audio data. Correspondingly, based on the sounding time period of each first object, the target video can be divided into image frames 1 to 6 to obtain six image frames. Among them, image frame 1, image frame 2, and image frame 3 include first object A, first object B, and first object C, respectively, and image frame 1, image frame 2, and image frame 3 can be used as second image frames. Based on the fact that image frame 1, image frame 2, and image frame 3 correspond to audio 1, audio 2, and audio 3, respectively, audio 1 can be marked as the sound data of the first object A, audio 2 can be marked as the sound data of the first object B, and audio 3 can be marked as the sound data of the first object C.
接下来,可以将第一对象A、第一对象B、第一对象C,分别与音频1、音频2、音频3进行关联,得到用于训练预设音频识别网络的训练数据集。如此,不仅能够得到有效的训练数据,而且由于无需预先准备大量的带标记的音频数据,有效降低了训练数据的成本。Next, the first object A, the first object B, and the first object C can be associated with the audio 1, the audio 2, and the audio 3, respectively, to obtain a training data set for training a preset audio recognition network. In this way, not only can effective training data be obtained, but also because a large amount of labeled audio data does not need to be prepared in advance, the cost of training data is effectively reduced.
作为一个具体的示例,以声纹作为音频特征数据为例,预设的音频识别网络例如可以是基于DNN算法的声纹识别网络,在此并不具体限定。As a specific example, taking voiceprint as audio feature data, the preset audio recognition network can be, for example, a voiceprint recognition network based on a DNN algorithm, which is not specifically limited here.
根据本申请实施例提供的音频识别模型,可以准确的区分出当前的音频数据属于目标视频中的具体的目标对象,即使多个第一对象同时发声时,也能实现有效的判别。According to the audio recognition model provided in the embodiment of the present application, it is possible to accurately distinguish that the current audio data belongs to a specific target object in the target video, and effective discrimination can be achieved even when multiple first objects speak at the same time.
作为一个具体的实施例,继续结合图2所示,图像帧4、图像帧5、图像帧6均包括两个或两个以上的第一对象,即,图像帧4、图像帧5、图像帧6均为第一图像帧。以图像帧4为例,图像帧4中包括第一对象A、第一对象B、第一对象C,根据本身申请实施例提供的音频识别模型,可以对图像帧4对应的音频4进行解析,从而判断音频4来自于图像帧4中的具体的目标对象。As a specific embodiment, in combination with FIG. 2 , image frame 4, image frame 5, and image frame 6 all include two or more first objects, that is, image frame 4, image frame 5, and image frame 6 are all first image frames. Taking image frame 4 as an example, image frame 4 includes first object A, first object B, and first object C. According to the audio recognition model provided in the embodiment of the application, the audio 4 corresponding to image frame 4 can be analyzed, so as to determine that the audio 4 comes from the specific target object in image frame 4.
根据本申请实施例,在从多个第一对象中确定与第一音频数据对应的目标对象后,接下来可以执行步骤120。According to the embodiment of the present application, after determining the target object corresponding to the first audio data from the multiple first objects, step 120 may be performed next.
步骤120,根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息。Step 120: display subtitle information corresponding to the first audio data according to the display position of the target object.
在一些实施例中,通过将第一音频数据对应的字幕信息,可以显示在目标对象的旁边,从而可以方便用户了解第一图像帧中在说话的具体的目标对象,以及目标对象所说的具体内容。In some embodiments, subtitle information corresponding to the first audio data may be displayed next to the target object, so that the user can easily understand the specific target object speaking in the first image frame and the specific content said by the target object.
示例性的,通过预设的音频识别网络,可以获取到第一音频对应的字幕信息。为了便于用户了解音频数据相对应的目标对象,可将字幕显示在目标对象的附近,例如,可以以气泡的方式显示在目标对象的旁边。参考图3所示,以图像帧4为例,通过对音频4进行解析,识别出音频4来自于图像帧4中第一对象B,结合图3所示的字幕显示示意图,音频4对应的字幕信息可以以气泡的方式显示在第一对象B旁边。Exemplarily, through a preset audio recognition network, subtitle information corresponding to the first audio can be obtained. In order to facilitate the user to understand the target object corresponding to the audio data, the subtitles can be displayed near the target object, for example, they can be displayed next to the target object in the form of bubbles. Referring to FIG3, taking image frame 4 as an example, by parsing audio 4, it is recognized that audio 4 comes from the first object B in image frame 4. Combined with the subtitle display schematic diagram shown in FIG3, the subtitle information corresponding to audio 4 can be displayed next to the first object B in the form of bubbles.
作为一个具体的示例,当第一音频数据中包括多个第一对象的同时发声时,可以先通过语音分离算法,从第一音频数据中分离得到多个子音频数据,再通过音频识别模型分别对子音频数据进行解析,判断每个子音频数据对应的目标对象。若识别得到多个目标对象分别有对应的字幕信息,则可以在每个目标对象的旁边显示字幕信息,在此并不具体限定。As a specific example, when the first audio data includes multiple first objects making sounds at the same time, a speech separation algorithm can be used to separate the first audio data into multiple sub-audio data, and then the sub-audio data can be analyzed by the audio recognition model to determine the target object corresponding to each sub-audio data. If it is identified that multiple target objects have corresponding subtitle information, the subtitle information can be displayed next to each target object, which is not specifically limited here.
在一些实施例中,字幕的显示方式可以根据用户的实际需要进行设置,例如,字幕的显示颜色、字体、字幕的大小等等,在此并不具体限定。In some embodiments, the display mode of subtitles can be set according to the actual needs of the user, for example, the display color, font, size of subtitles, etc. of the subtitles, which are not specifically limited here.
在本申请实施例中,对于正在播放的目标视频,在检测到目标视频的第二图像帧中包括单个第一对象的情况下,获取第二音频数据对应的字幕信息;接下来,根据单个第一对象的显示位置,显示字幕信息。In an embodiment of the present application, for a target video being played, when it is detected that a single first object is included in the second image frame of the target video, subtitle information corresponding to the second audio data is obtained; next, the subtitle information is displayed according to the display position of the single first object.
根据本申请实施例,通过根据第一对象的所在位置,显示相应的字幕信息,使得用户无需把注意力集中在屏幕底部,避免错过画面帧中的精彩画面或者情节,提高用户观看视频的体验感。According to the embodiment of the present application, by displaying corresponding subtitle information according to the location of the first object, the user does not need to focus on the bottom of the screen, avoids missing the wonderful pictures or plots in the picture frame, and improves the user's experience of watching the video.
在一些实施例中,通过在播放目标视频时,为目标视频中每个第一对象添加字幕,从而可以准确地得到每个音频数据对应的第一对象,接下来,可以将音频数据对应的第一对象,以及音频数据的字幕信息缓存在电子设备中,以便于目标视频再次播放时,能够快速显示目标视频中每个第一对象在发声时对应的字幕信息,提高用户体验。In some embodiments, by adding subtitles to each first object in the target video when playing the target video, the first object corresponding to each audio data can be accurately obtained. Next, the first object corresponding to the audio data and the subtitle information of the audio data can be cached in the electronic device, so that when the target video is played again, the subtitle information corresponding to each first object in the target video when it makes a sound can be quickly displayed, thereby improving the user experience.
需要说明的是,本申请实施例提供的字幕显示方法,执行主体可以为字幕显示装置,或者该字幕显示装置中的用于执行字幕显示的方法的控制模块。本申请实施例中以字幕显示装置执行字幕显示的方法为例,说明本申请实施例提供的字幕显示的装置。It should be noted that the subtitle display method provided in the embodiment of the present application can be executed by a subtitle display device or a control module in the subtitle display device for executing the subtitle display method. In the embodiment of the present application, the subtitle display device executing the subtitle display method is taken as an example to illustrate the subtitle display device provided in the embodiment of the present application.
图4是本申请实施例提供的一种字幕显示装置的结构示意图,如图4所示,字幕显示装置400应用于弹幕生成装置,该字幕显示装置400可以包括:处理模块410和显示模块420。FIG4 is a schematic diagram of the structure of a subtitle display device provided in an embodiment of the present application. As shown in FIG4 , a subtitle display device 400 is applied to a bullet screen generating device. The subtitle display device 400 may include: a processing module 410 and a display module 420 .
处理模块410,用于在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象,其中,第一音频数据与第一图像帧相对应;The processing module 410 is used to determine, during the playback of the target video, a target object corresponding to the first audio data from the multiple first objects according to the audio recognition model when it is detected that the first image frame of the target video includes multiple first objects, wherein the first audio data corresponds to the first image frame;
显示模块420,用于根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息。The display module 420 is used to display subtitle information corresponding to the first audio data according to the display position of the target object.
其中,音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。The audio recognition model is trained based on the second image frame and second audio data corresponding to the second image frame, and the second image frame includes a single first object.
根据本申请实施例,由于音频识别模型是根据目标视频中第二图像帧对应的第二音频数据训练得到的,又由于第二图像帧中只包括单个第一对象,因此,训练得到的音频识别模型能够准确的从多个第一对象中识别出目标对象。接下来,根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息,可以方便用户了解该字幕信息对应的目标对象,提高用户的观看视频的体验。According to the embodiment of the present application, since the audio recognition model is trained based on the second audio data corresponding to the second image frame in the target video, and since the second image frame only includes a single first object, the trained audio recognition model can accurately identify the target object from multiple first objects. Next, according to the display position of the target object, the subtitle information corresponding to the first audio data is displayed, which can facilitate the user to understand the target object corresponding to the subtitle information and improve the user's video viewing experience.
在一些实施例中,装置还包括:In some embodiments, the apparatus further comprises:
获取模块,还用于在检测到目标视频的第二图像帧中包括单个第一对象的情况下,获取第二音频数据对应的字幕信息;The acquisition module is further used to acquire subtitle information corresponding to the second audio data when it is detected that the second image frame of the target video includes a single first object;
显示模块420,还用于根据单个第一对象的显示位置,显示字幕信息。The display module 420 is further configured to display subtitle information according to the display position of the single first object.
根据本申请实施例,通过根据第一对象的所在位置,显示相应的字幕信息,使得用户无需把注意力集中在屏幕底部,避免错过画面帧中的精彩画面或者情节,提高用户观看视频的体验感。According to the embodiment of the present application, by displaying corresponding subtitle information according to the location of the first object, the user does not need to focus on the bottom of the screen, avoids missing the wonderful pictures or plots in the picture frame, and improves the user's experience of watching the video.
在一些实施例中,获取模块,用于获取训练数据集,训练数据集包括第二图像帧和第二图像帧对应的第二音频数据;In some embodiments, the acquisition module is used to acquire a training data set, the training data set including the second image frame and the second audio data corresponding to the second image frame;
处理模块410,还用于根据训练数据集,对预设的音频识别网络进行训练,得到音频识别模型。The processing module 410 is further used to train a preset audio recognition network according to the training data set to obtain an audio recognition model.
如此,由于训练数据来自目标视频本身,不仅获取训练数据的成本低,而且训练数据可靠性高,从而能实现对预设的音频识别网络进行有效的训练。In this way, since the training data comes from the target video itself, not only is the cost of obtaining the training data low, but the reliability of the training data is also high, thereby enabling effective training of the preset audio recognition network.
在一些实施例中,处理模块410,还用于根据目标视频中第一对象的发声时间段,对目标视频进行分割处理,得到多个与第一对象的音频数据对应的图像帧;In some embodiments, the processing module 410 is further configured to segment the target video according to the sound utterance time period of the first object in the target video to obtain a plurality of image frames corresponding to the audio data of the first object;
获取模块,还用于从多个图像帧中,获取包括单个第一对象的第二图像帧,以及第二图像帧对应的第二音频数据;The acquisition module is further used to acquire, from the plurality of image frames, a second image frame including a single first object and second audio data corresponding to the second image frame;
处理模块410,还用于将第二图像帧中的第一对象与第二音频数据相关联,以得到训练数据集。The processing module 410 is further configured to associate the first object in the second image frame with the second audio data to obtain a training data set.
如此,不仅能够得到有效的训练数据,而且由于无需预先准备大量的带标记的音频数据,有效降低了训练数据的成本。In this way, not only can effective training data be obtained, but also the cost of training data is effectively reduced because there is no need to prepare a large amount of labeled audio data in advance.
本申请实施例中的字幕显示装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personaldigital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(NetworkAttached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The subtitle display device in the embodiment of the present application can be a device, or a component, integrated circuit, or chip in a terminal. The device can be a mobile electronic device or a non-mobile electronic device. For example, the mobile electronic device can be a mobile phone, a tablet computer, a laptop computer, a PDA, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (PDA), etc. The non-mobile electronic device can be a server, a network attached storage (NAS), a personal computer (PC), a television (TV), a teller machine or a self-service machine, etc., which is not specifically limited in the embodiment of the present application.
本申请实施例中的字幕显示装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The subtitle display device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.
本申请实施例提供的字幕显示装置能够实现图1至图4的方法实施例实现的各个过程,为避免重复,这里不再赘述。The subtitle display device provided in the embodiment of the present application can implement each process implemented by the method embodiments of Figures 1 to 4, and will not be described again here to avoid repetition.
可选地,如图5所示,本申请实施例还提供一种电子设备500,包括处理器501,存储器502,存储在存储器502上并可在所述处理器501上运行的程序或指令,该程序或指令被处理器501执行时实现上述字幕显示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG5 , an embodiment of the present application further provides an electronic device 500, comprising a processor 501, a memory 502, and a program or instruction stored in the memory 502 and executable on the processor 501. When the program or instruction is executed by the processor 501, each process of the above-mentioned subtitle display method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be described here.
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.
图6为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 6 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备600包括但不限于:射频单元601、网络模块602、音频输出单元603、输入单元604、传感器605、显示单元606、用户输入单元607、接口单元608、存储器609、以及处理器610等部件。The electronic device 600 includes but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, and a processor 610.
本领域技术人员可以理解,电子设备600还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器610逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图6中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art will appreciate that the electronic device 600 may also include a power source (such as a battery) for supplying power to each component, and the power source may be logically connected to the processor 610 through a power management system, so that the power management system can manage charging, discharging, and power consumption. The electronic device structure shown in FIG6 does not constitute a limitation on the electronic device, and the electronic device may include more or fewer components than shown, or combine certain components, or arrange components differently, which will not be described in detail here.
处理器610,用于在目标视频播放的过程中,在检测到目标视频的第一图像帧中包括多个第一对象的情况下,根据音频识别模型,从多个第一对象中确定与第一音频数据对应的目标对象,其中,第一音频数据与第一图像帧相对应;Processor 610 is configured to determine, during playback of a target video, when it is detected that a first image frame of the target video includes a plurality of first objects, according to an audio recognition model, a target object corresponding to first audio data from the plurality of first objects, wherein the first audio data corresponds to the first image frame;
显示单元606,用于根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息;A display unit 606, configured to display subtitle information corresponding to the first audio data according to a display position of the target object;
其中,音频识别模型是根据第二图像帧以及与第二图像帧对应的第二音频数据训练得到的,第二图像帧中包括单个第一对象。The audio recognition model is trained based on the second image frame and second audio data corresponding to the second image frame, and the second image frame includes a single first object.
根据本申请实施例,由于音频识别模型是根据目标视频中第二图像帧对应的第二音频数据训练得到的,又由于第二图像帧中只包括单个第一对象,因此,训练得到的音频识别模型能够准确的从多个第一对象中识别出目标对象。接下来,根据目标对象所在的显示位置,显示第一音频数据对应的字幕信息,可以方便用户了解该字幕信息对应的目标对象,提高用户的观看视频的体验。According to the embodiment of the present application, since the audio recognition model is trained based on the second audio data corresponding to the second image frame in the target video, and since the second image frame only includes a single first object, the trained audio recognition model can accurately identify the target object from multiple first objects. Next, according to the display position of the target object, the subtitle information corresponding to the first audio data is displayed, which can facilitate the user to understand the target object corresponding to the subtitle information and improve the user's video viewing experience.
在一些实施例中,处理器610,还用于在检测到目标视频的第二图像帧中包括单个第一对象的情况下,获取第二音频数据对应的字幕信息;In some embodiments, the processor 610 is further configured to obtain subtitle information corresponding to the second audio data when it is detected that the second image frame of the target video includes a single first object;
显示单元606,还用于根据单个第一对象的显示位置,显示字幕信息。The display unit 606 is further configured to display subtitle information according to the display position of the single first object.
根据本申请实施例,通过根据第一对象的所在位置,显示相应的字幕信息,使得用户无需把注意力集中在屏幕底部,避免错过画面帧中的精彩画面或者情节,提高用户观看视频的体验感。According to the embodiment of the present application, by displaying corresponding subtitle information according to the location of the first object, the user does not need to focus on the bottom of the screen, avoids missing the wonderful pictures or plots in the picture frame, and improves the user's experience of watching the video.
在一些实施例中,处理器610,用于获取训练数据集,训练数据集包括第二图像帧和第二图像帧对应的第二音频数据;In some embodiments, the processor 610 is configured to obtain a training data set, where the training data set includes a second image frame and second audio data corresponding to the second image frame;
处理器610,还用于根据训练数据集,对预设的音频识别网络进行训练,得到音频识别模型。The processor 610 is further configured to train a preset audio recognition network according to a training data set to obtain an audio recognition model.
如此,由于训练数据来自目标视频本身,不仅获取训练数据的成本低,而且训练数据可靠性高,从而能实现对预设的音频识别网络进行有效的训练。In this way, since the training data comes from the target video itself, not only is the cost of obtaining the training data low, but the reliability of the training data is also high, thereby enabling effective training of the preset audio recognition network.
在一些实施例中,处理器610,还用于根据目标视频中第一对象的发声时间段,对目标视频进行分割处理,得到多个与第一对象的音频数据对应的图像帧;In some embodiments, the processor 610 is further configured to segment the target video according to a time period of a first object making a sound in the target video, to obtain a plurality of image frames corresponding to the audio data of the first object;
处理器610,还用于从多个图像帧中,获取包括单个第一对象的第二图像帧,以及第二图像帧对应的第二音频数据;The processor 610 is further configured to obtain, from the plurality of image frames, a second image frame including a single first object and second audio data corresponding to the second image frame;
处理器610,还用于将第二图像帧中的第一对象与第二音频数据相关联,以得到训练数据集。The processor 610 is further configured to associate the first object in the second image frame with the second audio data to obtain a training data set.
如此,不仅能够得到有效的训练数据,而且由于无需预先准备大量的带标记的音频数据,有效降低了训练数据的成本。In this way, not only can effective training data be obtained, but also the cost of training data is effectively reduced because there is no need to prepare a large amount of labeled audio data in advance.
应理解的是,本申请实施例中,输入单元604可以包括图形处理器(GraphicsProcessing Unit,GPU)6041和麦克风6042,图形处理器6041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元606可包括显示面板6061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板6061。用户输入单元607包括触控面板6071以及其他输入设备6072。触控面板6071,也称为触摸屏。触控面板6071可包括触摸检测装置和触摸控制器两个部分。其他输入设备6072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器609可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器610可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器610中。It should be understood that in the embodiment of the present application, the input unit 604 may include a graphics processor (GPU) 6041 and a microphone 6042, and the graphics processor 6041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc. The user input unit 607 includes a touch panel 6071 and other input devices 6072. The touch panel 6071 is also called a touch screen. The touch panel 6071 may include two parts: a touch detection device and a touch controller. Other input devices 6072 may include but are not limited to a physical keyboard, a function key (such as a volume control button, a switch button, etc.), a trackball, a mouse, and a joystick, which will not be repeated here. The memory 609 can be used to store software programs and various data, including but not limited to applications and operating systems. The processor 610 can integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, the user interface, and the application program, and the modem processor mainly processes wireless communication. It is understandable that the above-mentioned modem processor may not be integrated into the processor 610.
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述字幕显示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, each process of the above-mentioned subtitle display method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。The processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述字幕显示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned subtitle display method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this article, the terms "comprise", "include" or any other variant thereof are intended to cover non-exclusive inclusion, so that the process, method, article or device including a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise one..." do not exclude the presence of other identical elements in the process, method, article or device including the element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved, for example, the described method may be performed in an order different from that described, and various steps may also be added, omitted, or combined. In addition, the features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application are described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the guidance of the present application, ordinary technicians in this field can also make many forms without departing from the purpose of the present application and the scope of protection of the claims, all of which are within the protection of the present application.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111267156.2A CN113992972B (en) | 2021-10-28 | 2021-10-28 | Subtitle display method, device, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111267156.2A CN113992972B (en) | 2021-10-28 | 2021-10-28 | Subtitle display method, device, electronic device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113992972A CN113992972A (en) | 2022-01-28 |
CN113992972B true CN113992972B (en) | 2024-11-08 |
Family
ID=79743927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111267156.2A Active CN113992972B (en) | 2021-10-28 | 2021-10-28 | Subtitle display method, device, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113992972B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114630179B (en) * | 2022-03-17 | 2024-07-23 | 维沃移动通信有限公司 | Audio extraction method and electronic equipment |
CN115277264B (en) * | 2022-09-28 | 2023-03-24 | 季华实验室 | Subtitle generating method based on federal learning, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108419141A (en) * | 2018-02-01 | 2018-08-17 | 广州视源电子科技股份有限公司 | Subtitle position adjusting method and device, storage medium and electronic equipment |
CN111078932A (en) * | 2019-12-18 | 2020-04-28 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for matching similar human faces according to human voice |
CN113326844A (en) * | 2021-06-18 | 2021-08-31 | 咪咕数字传媒有限公司 | Video subtitle adding method and device, computing equipment and computer storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107993665B (en) * | 2017-12-14 | 2021-04-30 | 科大讯飞股份有限公司 | Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system |
CN109862422A (en) * | 2019-02-28 | 2019-06-07 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device, computer readable storage medium and computer equipment |
CN110909613B (en) * | 2019-10-28 | 2024-05-31 | Oppo广东移动通信有限公司 | Video character recognition method and device, storage medium and electronic equipment |
-
2021
- 2021-10-28 CN CN202111267156.2A patent/CN113992972B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108419141A (en) * | 2018-02-01 | 2018-08-17 | 广州视源电子科技股份有限公司 | Subtitle position adjusting method and device, storage medium and electronic equipment |
CN111078932A (en) * | 2019-12-18 | 2020-04-28 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for matching similar human faces according to human voice |
CN113326844A (en) * | 2021-06-18 | 2021-08-31 | 咪咕数字传媒有限公司 | Video subtitle adding method and device, computing equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113992972A (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9786326B2 (en) | Method and device of playing multimedia and medium | |
CN112437353B (en) | Video processing method, video processing device, electronic apparatus, and readable storage medium | |
CN106227335A (en) | Interactive learning method for preview lecture and video course and application learning client | |
CN113259740A (en) | Multimedia processing method, device, equipment and medium | |
WO2017181598A1 (en) | Method and device for playing video | |
CN113343675B (en) | Subtitle generation method and device and subtitle generation device | |
CN105867718A (en) | Multimedia interaction method and apparatus | |
CN113992972B (en) | Subtitle display method, device, electronic device and readable storage medium | |
US12425702B2 (en) | Multimedia content sharing method and apparatus, device, and medium | |
CN109697245A (en) | Voice search method and device based on video web page | |
CN111209437A (en) | Label processing method and device, storage medium and electronic equipment | |
CN111988654B (en) | Video data alignment method and device and electronic equipment | |
CN112291614A (en) | Video generation method and device | |
CN113806570A (en) | Image generation method and generation device, electronic device and storage medium | |
WO2023169361A1 (en) | Information recommendation method and apparatus and electronic device | |
CN105744338B (en) | A kind of method for processing video frequency and its equipment | |
CN112711368B (en) | Operation guidance method and device and electronic equipment | |
CN112487247B (en) | Video processing method and video processing device | |
CN113259754B (en) | Video generation method, device, electronic equipment and storage medium | |
CN113676776B (en) | Video playing method and device and electronic equipment | |
CN111770388A (en) | Content processing method, device, equipment and storage medium | |
CN114598923B (en) | Video text clearing method, device, electronic device and storage medium | |
CN117615221A (en) | Video playing method and device, electronic equipment and readable storage medium | |
CN112565913B (en) | Video call method, device and electronic equipment | |
WO2023011300A1 (en) | Method and apparatus for recording facial expression of video viewer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |