CN105898169A

CN105898169A - Video processing method and device

Info

Publication number: CN105898169A
Application number: CN201510511387.1A
Authority: CN
Inventors: 李瑞科; 姜乐
Original assignee: LeTV Information Technology Beijing Co Ltd
Current assignee: LeTV Information Technology Beijing Co Ltd
Priority date: 2015-08-19
Filing date: 2015-08-19
Publication date: 2016-08-24

Abstract

An embodiment of the invention provides a video processing method and device that are used for adding captions in a recorded video and correcting a defect of technologies of the prior art via which video shooting is homogenized. The video processing method comprises the following steps: when a start-recording-voice trigger event is detected, the recorded video is played and voice information is recorded; when a conclude-recording-voice trigger event is detected, the video is stopped and voice information recording operation is ceased, caption information is obtained after recorded voice information is identified, and the caption information is inserted into the video according to time information of the recorded voice information.

Description

A kind of method for processing video frequency and device

Technical field

The present embodiments relate to video capture technical field, particularly relate to a kind of method for processing video frequency and dress Put.

Background technology

Along with the development of intelligent mobile terminal technology, using mobile terminal to carry out short video capture becomes and works as One of lower popular application.At present, mobile terminal is utilized to carry out the style of shooting of video capture single, i.e. User utilizes the aids such as template, the filter that video capture application program provides to carry out video capture. After video capture completes, user can publish to website and for other users viewing or download, also Can be locally stored at mobile terminal, the video shot cannot be entered by user according to the wish of oneself Row processes, and uses the video of user's shooting to tend to homogeneity.

In actual application, user there may be following demand: adds captions in the video shot, And existing video capture method cannot realize this function, therefore, how to increase in the video of shooting Captions become and utilize mobile terminal to carry out one of video capture technical field technical problem urgently to be resolved hurrily.

Summary of the invention

The embodiment of the present invention provides a kind of method for processing video frequency and device, in order to increase in the video of shooting Captions, solve the defect of video capture homogeneity in prior art.

The embodiment of the present invention provides a kind of method for processing video frequency, including:

Commence play out, when beginning recorded speech trigger event being detected, the video shot to go forward side by side lang Message breath is recorded；

When end recorded speech trigger event being detected, stop playing described video and recorded speech information；

It is identified obtaining caption information to the voice messaging recorded；

Described caption information is inserted in described video by the temporal information according to the voice messaging recorded.

The embodiment of the present invention provides a kind of video process apparatus, including:

Control unit, has shot for commencing play out when beginning recorded speech trigger event being detected Video and carry out voice messaging recording；And when end recorded speech trigger event being detected, stop Play described video and recorded speech information；

Voice recognition unit, for being identified obtaining caption information to the voice messaging recorded；

Caption information inserts unit, is used for the temporal information according to the voice messaging recorded by described captions Information is inserted in described video.

The embodiment of the present invention provides a kind of video processing equipment, and including processor and memorizer, processor can For the program read in memorizer, perform following process: beginning recorded speech trigger event detected Time commence play out the video shot and carry out voice messaging recording；End recorded speech detected During trigger event, stop playing described video and recorded speech information；The voice messaging recorded is known Do not obtain caption information；Described caption information is inserted into institute by the temporal information according to the voice messaging recorded State in video.

The method for processing video frequency of embodiment of the present invention offer and device, after video capture completes, by inciting somebody to action The speech recognition that user records is captions and is inserted in the video shot, and user can be shooting Video adds captions so that the video of shooting more individual character, improves Consumer's Experience.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under, Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 a is that user needs processing completing video after in the embodiment of the present invention, video capture completes Time display interface schematic diagram；

Fig. 1 b is the interface schematic diagram of the voice recording page in the embodiment of the present invention；

Fig. 1 c in the embodiment of the present invention for provide captions on/off switch interface schematic diagram；

Fig. 1 d is the video display effect schematic diagram that with the addition of captions in the embodiment of the present invention；

Fig. 2 is method for processing video frequency flow chart in the embodiment of the present invention；

Fig. 3 is video capture apparatus structure schematic diagram in the embodiment of the present invention.

Detailed description of the invention

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.

Embodiment one

In order to improve Consumer's Experience, in the embodiment of the present invention, user is after shooting completes video, permissible Video for shooting adds captions.As shown in Figure 1a, in the embodiment of the present invention, after video capture completes User needs to complete display interface schematic diagram when video processes, and user needs to complete for shooting Video add captions time, can click on interpolation aside button trigger video processing applications program start.Depending on After frequency processes application program launching, enter the voice recording page, as shown in Figure 1 b, for the voice recording page Interface schematic diagram.User can start to record button by click and start recorded speech, triggers Speech Record Flow process processed, user click on start to record button while, commence play out completed video, user according to The video record voice messaging play.When user needs to terminate voice recording flow process, click on and terminate to record Button, stops playing video simultaneously.

Based on this, as in figure 2 it is shown, the method for processing video frequency embodiment flow process provided for the embodiment of the present invention Figure, may comprise steps of:

S21, detect beginning recorded speech trigger event time commence play out the video shot and carry out Voice messaging is recorded.

Beginning recorded speech trigger event is detected detecting that user determines when clicking on and start to record button, Now commence play out and completed video, and enter voice recording flow process.

S22, when end recorded speech trigger event being detected, stop playing described video and recorded speech Information.

End recorded speech trigger event is detected detecting that user determines when clicking on end recording button, Now stop playing having completed video, terminate voice recording flow process.

S23, to record voice messaging be identified obtaining caption information.

S24, according to the temporal information of voice messaging recorded, the caption information obtained is inserted in video.

During it should be noted that be embodied as, it is also possible to the language recorded while recorded speech information Message breath is identified, and is inserted in video and shows, i.e. step S23, step S24 can be with Step S21 performs simultaneously.

In step S23, can there is a following two embodiment:

The first embodiment, it is identified at network side

Concrete, when end recorded speech trigger event being detected, send voice to network side server Identify request, wherein carry the voice messaging of recording, network side server utilize speech recognition technology Return after the voice messaging received is identified as caption information.

The second embodiment, it is identified in terminal unit this locality

Under this embodiment, need exploit person to be previously written in video processing applications program in advance and turn over Translate storehouse, store when installing video processing applications program to terminal unit local, to identify what user recorded Voice.Being limited by the memory space of terminal unit, the language comprised in the translation storehouse being previously written may be also Imperfect, for example, it is possible to be only only written the language that major part user commonly uses, such as Chinese, English, right Can be not written in some uncommon language, to save the memory space of terminal unit.

It addition, compared with network side recognition method, identify that in terminal unit this locality response speed is relatively fast, User is immediately available for identifying the caption information of correspondence after recording terminates.

In the embodiment that network side is identified, due to the available translated resources of network side server More, therefore, it can be identified as the voice messaging that user records the caption information of different language, and its Recognition result is more accurate.But, it is relatively big by network environment influence, if network environment is preferable, eventually The delay that end equipment obtains caption information is less, if network environment is poor, terminal unit obtains captions letter The delay of breath is bigger.

So that the caption information syncretizing effect that video playback and voice messaging identification obtain is more preferable, specifically During enforcement, video playback and voice recording can be controlled during carrying out subtitle recognition and Tong Bu carry out.Under Face is illustrated by embodiment two.

Embodiment two

Commence play out video when beginning recorded speech trigger event being detected, and obtain video start broadcast Put time point；Stop playing video when end recorded speech trigger event being detected, and obtain video Terminate play time.Based on this when inserting caption information in video, caption information can be inserted In the broadcast start time point obtained and the video terminated between play time.

Such as, commence play out video when beginning recorded speech trigger event being detected, and get video Broadcast start time point be 5s, detect end recorded speech trigger event time stop play video, And the end play time getting video is 15s, then the caption information that will identify that is inserted into video 5s-15s between.

For the recognition result delay issue of network side identification, in the embodiment of the present invention, end detected After recorded speech trigger event, postponing more than predetermined threshold value if obtaining caption information, such as, captions are believed Breath postpones 2s and just obtains, then, when inserting caption information in video, need to increase the time postponed.Example As, user records the voice messaging of 5s, obtains the caption information identified, this situation after 2s Under, it is assumed that the broadcast start time point of acquisition is 5s, and end play time is 10s, to regarding When inserting caption information in Pin, need to increase the delay of 2s, 5s-12s will be inserted into by caption information Between video in.If obtaining caption information to postpone to be not more than predetermined threshold value, then it is negligible, I.e. carry out inserting still according to the broadcast start time point obtained and end play time, Ji Jiangshi The caption information not gone out is inserted in the video between 5s-10s.

Based on implementing the embodiment that in two, video playback is Tong Bu with voice recording, in embodiment three, user is permissible Complete the recording of voice messaging several times, and be identified respectively obtaining captions, according to the voice recording time It is inserted in the video content of correspondence.Based on this, embodiments provide embodiment three.

Embodiment three

The video completed for one section of shooting, user can be inserted into this after recorded speech information several times and regard In Pin.Such as, user have taken the video of one section of 20s, can recorded speech information in four times.Each Section voice messaging is corresponding with one section of video therein.When being embodied as, record one section of voice messaging also After obtaining the caption information of its correspondence, it is inserted into the video of corresponding play time according to recording time point In.For example, it is assumed that the recording time point recording first paragraph voice messaging is 0s-5s, broadcasting of its correspondence Putting time point is 0s-5s, then, after obtaining the captions identified, insert it into 0s-5s's In video.Assume that the recording time point recording second segment voice messaging is 5s-12s, broadcasting of its correspondence Putting time point is 5s-12s, then, after obtaining the captions identified, insert it into 5s-12s Video in, by that analogy.

If in recording process, network environment is poor, for example, it is assumed that record the record of first paragraph voice messaging Time point processed is 0s-5s, and the play time of its correspondence is 0s-5s, 2s after recording terminates Obtain the caption information identified, then the caption information of acquisition be inserted in 0s-7s video, And the beginning recording time of second segment voice messaging is from the beginning of 7s, it is assumed that record second segment voice messaging Recording time point is 7s-15s, and the play time of its correspondence is 7s-15s, then obtain knowledge After the captions not gone out, insert it in the video of 7s-15s.In processing procedure, need all the time Keep recording time point consistent with play time.

User is recording whole voice messagings or during recorded speech information, if used The voice messaging of one section of video has been recorded at family several times, the voice messaging i.e. recorded by least one according to The sub-voice messaging composition of recording time arrangement.User can delete the unsatisfied voice messaging of recording and lay equal stress on New recording, when being embodied as, can implement, according to embodiment four, the mistake that sub-video is deleted and again recorded Journey.

Embodiment four

In the present embodiment, when the arbitrary sub-voice messaging trigger event of deletion being detected, delete corresponding son Voice messaging.Accordingly, user is when deleting the sub-voice messaging recorded, if again detected To when starting recorded speech trigger event, can prompt the user whether again to record deleted sub-voice letter Breath, determines whether again to record deleted sub-voice messaging according to the selection of user.Again record determining During the sub-voice messaging that system is deleted, commence play out from the beginning recording time point being deleted sub-voice messaging Video also re-starts voice messaging recording；And be identified obtaining captions to the voice messaging again recorded The caption information that identification is obtained by information the temporal information according to recorded speech information again inserts this video In.

It is also preferred that the left when again recording deleted sub-voice messaging, the embodiment of the present invention provides following two Kind again record the embodiment of deleted sub-voice messaging:

The first embodiment, records deleted sub-voice messaging the most again.

Under this embodiment, detect again record deleted sub-voice messaging start record language During sound trigger event, commence play out video again from the beginning recording time point being deleted sub-voice messaging Carry out voice messaging recording, until it reaches this stops when being deleted the end recording time point of sub-voice messaging Recorded speech information also stops playing video.

Such as, user is 4 sub-voice messagings of video record of one section of 20s, it is assumed that the son that user deletes The recording time point of voice messaging is 5s-12s, then the video playback time point of its correspondence is 5s- 12s, user, after deleting this sub-voice messaging, commences play out video from play time 5s, Carry out voice recording simultaneously, when play time arrives 12s, stop terminating voice recording and stopping Play video.

The second embodiment, to recording time point user select delete sub-voice messaging after whole Sub-voice messaging is recorded again.

Under this embodiment, the beginning recorded speech again recording deleted sub-voice messaging detected During trigger event, commence play out described video from the beginning recording time point being deleted sub-voice messaging and lay equal stress on Newly carry out voice messaging recording, until it reaches the end recording time of recording time voice messaging the latest Stop recorded speech information during point and stop playing described video.

Such as, user is 4 sub-voice messagings of video record of one section of 20s, it is assumed that the son that user deletes The recording time point of voice messaging is 5s-12s, then the video playback time point of its correspondence is 5s- 12s, user, after deleting this sub-voice messaging, commences play out video from play time 5s, Carry out voice recording simultaneously, when play time arrives 20s, stop terminating voice recording and stopping Play video.Certainly, in recording process, it is corresponding that user can also record 5s-20s several times Voice messaging.Such as, user records 5s-8s for the first time, and second time records 8s-15s, the Record 15s-20s three times.Every time during recorded speech, when video playback time point and voice recording Between point keep consistent.

When being embodied as, in order to improve the accuracy that speech recognition is captions further, user can also divide Section recorded video, is finally each section of video record voice messaging.Such as, 4 shooting 20s of user's part Video, it is assumed that shooting the video segment of 5s, 6s, 5s, 4s, afterwards, user is respectively directed to again every time Each section of video utilizes said method to be that it adds captions.

When being embodied as, it is also possible to captions on/off switch is provided, touch according to the unlatching/closedown detected Captions are opened/closed to the event of sending out.As illustrated in figure 1 c, for providing the interface signal of captions on/off switch Figure, in Fig. 1 c, the lower right corner is the switch that captions are opened/closed, concrete, when switch is shape shown in Fig. 1 c During state, captions can be opening, when switch is triggered and slides to the left, can be closed by captions.As Shown in Fig. 1 d, for the addition of the video display effect schematic diagram of captions.

In the method for processing video frequency that the embodiment of the present invention provides, user can be that the video of oneself shooting adds Captions so that the video shot more has individual character, improves Consumer's Experience.

One is provided based on same inventive concept, embodiment five with the embodiment of the present application one～embodiment four Plant video capture processing means, owing to this device solves principle and the process of above-mentioned video capture of problem Method is similar, and therefore the enforcement of said apparatus may refer to the enforcement of method, repeats no more in place of repetition.

Embodiment five

As it is shown on figure 3, be video capture processing means example structure schematic diagram in the present invention, permissible Including:

Control unit 31, has shot for commencing play out when beginning recorded speech trigger event being detected Become video and carry out voice messaging recording；And when end recorded speech trigger event being detected, stop Only play described video and recorded speech information；

Voice recognition unit 32, for being identified obtaining caption information to the voice messaging recorded；

Caption information inserts unit 33, is used for the temporal information according to the voice messaging recorded by described captions Information is inserted in described video.

When being embodied as, voice recognition unit 32, may be used for sending speech recognition request to network side, Wherein said speech recognition request carries the voice messaging of recording；Receive described network side according to institute's predicate Message ceases the caption information identified；Or the voice messaging of recording is identified as caption information in this locality.

The video process apparatus that the embodiment of the present invention provides, it is also possible to include acquiring unit, wherein:

Acquiring unit, for when beginning recorded speech trigger event being detected, obtaining opening of described video Beginning play time；And when end recorded speech trigger event being detected, obtain the knot of described video Bundle play time；Caption information inserts unit 33, for described caption information is inserted into described beginning In video between play time and end play time.

When being embodied as, the voice messaging of recording is believed according to the sub-voice of recording time arrangement by least one Breath composition.

Based on this, the video process apparatus that the embodiment of the present invention provides can also include:

Delete unit, for when the arbitrary sub-voice messaging trigger event of deletion being detected, deleting correspondence Sub-voice messaging.

Control unit 31, it is also possible to for after deleting the sub-voice messaging that element deletion is corresponding, in inspection When measuring the beginning recorded speech trigger event again recording deleted sub-voice messaging, from deleted son The beginning recording time point of voice messaging commences play out described video and re-starts voice messaging recording；

Voice recognition unit 32, it is also possible to for being identified obtaining captions to the voice messaging again recorded Information；

Caption information inserts unit 33, it is also possible to for the temporal information according to the voice messaging again recorded Caption information identification obtained inserts in described video.

Wherein, control unit 31, specifically for again recording deleted sub-voice messaging detecting When starting recorded speech trigger event, commence play out from the beginning recording time point being deleted sub-voice messaging Described video also re-starts voice messaging recording, until it reaches the described end being deleted sub-voice messaging Stop recorded speech information during recording time point and stop playing described video；Or again record detecting When making the beginning recorded speech trigger event of the sub-voice messaging being deleted, from being deleted sub-voice messaging Start recording time point commence play out described video and re-start voice messaging recording, until it reaches record Stop recorded speech information during the end recording time point of time sub-voice messaging the latest and stop playing institute State video.

For convenience of description, above each several part is divided by function and retouches respectively for each module (or unit) State.Certainly, can be the function of each module (or unit) same or multiple when implementing the present invention Software or hardware realize.When being embodied as, the said equipment identification device can be arranged in terminal unit.

The embodiment of the present invention can be passed through hardware processor (hardware processor) and realize Fig. 3 Shown correlation function.When being embodied as, processor may be used for reading the program in memorizer, Perform following process to realize the correlation function shown in Fig. 3: detect that beginning recorded speech triggers thing Commence play out the video shot during part and carry out voice messaging recording；Detecting that language is recorded in end During sound trigger event, stop playing described video and recorded speech information；The voice messaging recorded is carried out Identification obtains caption information；Described caption information is inserted into by the temporal information according to the voice messaging recorded In described video.

Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, the parts shown as unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible Understand and implement.

Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words Dividing and can embody with the form of software product, this computer software product can be stored in computer can Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented The method described in some part of example or embodiment.

Last it is noted that above example is only in order to illustrate technical scheme, rather than to it Limit；Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or Person carries out equivalent to wherein portion of techniques feature；And these amendments or replacement, do not make corresponding skill The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a method for processing video frequency, it is characterised in that including:

Method the most according to claim 1, it is characterised in that the voice messaging recorded is known The caption information not obtained, specifically includes:

Sending speech recognition request to network side, wherein said speech recognition request carries the voice of recording Information；Receive the caption information that described network side identifies according to described voice messaging；Or

The voice messaging of recording is identified as caption information in this locality.

Method the most according to claim 1, it is characterised in that also include:

When beginning recorded speech trigger event being detected, obtain the broadcast start time point of described video；

When end recorded speech trigger event being detected, obtain the end play time of described video； And

Described caption information is inserted in described video by the temporal information according to the voice messaging recorded, tool Body includes:

Described caption information is inserted into described broadcast start time point and terminates regarding between play time In Pin.

Method the most according to claim 1, it is characterised in that the voice messaging of recording is by least one The individual sub-voice messaging composition according to recording time arrangement.

Method the most according to claim 4, it is characterised in that described method also includes: in detection To when deleting arbitrary sub-voice messaging trigger event, delete corresponding sub-voice messaging.

Method the most according to claim 5, it is characterised in that the sub-voice letter that described deletion is corresponding Include after breath:

When the beginning recorded speech trigger event again recording deleted sub-voice messaging being detected, from The beginning recording time point being deleted sub-voice messaging commences play out described video and re-starts voice messaging Record；

It is identified the voice messaging again recorded obtaining caption information and according to recorded speech information again Temporal information caption information that identification is obtained insert in described video.

Method the most according to claim 6, it is characterised in that from being deleted opening of sub-voice messaging Beginning recording time point commences play out described video and re-starts voice messaging recording, specifically includes:

When the beginning recorded speech trigger event again recording deleted sub-voice messaging being detected, from The beginning recording time point being deleted sub-voice messaging commences play out described video and re-starts voice messaging Record, until it reaches described be deleted sub-voice messaging end recording time point time stop recorded speech letter Cease and stop playing described video；Or

When the beginning recorded speech trigger event again recording deleted sub-voice messaging being detected, from The beginning recording time point being deleted sub-voice messaging commences play out described video and re-starts voice messaging Record, until it reaches stop during the end recording time point of recording time voice messaging the latest recording language Message ceases and stops playing described video.

8. a video process apparatus, it is characterised in that including:

Caption information inserts unit, for being believed by described captions according to the temporal information of the voice messaging recorded Breath is inserted in described video.

Device the most according to claim 8, it is characterised in that

Described voice recognition unit, specifically for sending speech recognition request, wherein said language to network side Sound identification request carries the voice messaging of recording；Receive described network side according to described voice messaging identification The caption information gone out；Or the voice messaging of recording is identified as caption information in this locality.

Device the most according to claim 8, it is characterised in that also include acquiring unit, wherein:

Described acquiring unit, for when beginning recorded speech trigger event being detected, obtaining described video Broadcast start time point；And when end recorded speech trigger event being detected, obtain described video End play time；

Described caption information inserts unit, for described caption information is inserted into described broadcast start time In video between point and end play time.

11. devices according to claim 8, it is characterised in that the voice messaging of recording is by least One sub-voice messaging composition according to recording time arrangement.

12. devices according to claim 11, it is characterised in that also include:

13. devices according to claim 12, it is characterised in that

Described control unit, is additionally operable to after the sub-voice messaging that described deletion element deletion is corresponding, When the beginning recorded speech trigger event again recording deleted sub-voice messaging being detected, from being deleted The beginning recording time point of sub-voice messaging commences play out described video and re-starts voice messaging recording；

Described voice recognition unit, is additionally operable to be identified the voice messaging again recorded obtaining captions letter Breath；

Described caption information inserts unit, and being additionally operable to the temporal information according to the voice messaging again recorded will Identify that the caption information obtained inserts in described video.

14. devices according to claim 13, it is characterised in that

Described control unit, specifically for detecting the beginning again recording deleted sub-voice messaging During recorded speech trigger event, commence play out described from the beginning recording time point being deleted sub-voice messaging Video also re-starts voice messaging recording, until it reaches the described end being deleted sub-voice messaging is recorded Stop recorded speech information during time point and stop playing described video；Or again record quilt detecting During the beginning recorded speech trigger event of sub-voice messaging deleted, from the beginning of being deleted sub-voice messaging Recording time point commences play out described video and re-starts voice messaging recording, until it reaches recording time Stop during the end recording time point of sub-voice messaging the latest recorded speech information stopping play described in regard Frequently.