Embodiment
See also Fig. 1, in the present embodiment, electronic installation 100 of the present invention comprises the voice feature data storehouse, the phonetic feature that comprises a plurality of users in this voice feature data storehouse, this electronic installation 100 can identify according to described a plurality of users' phonetic feature the user who makes a speech in pending audio file/video file.This electronic installation 100 can record the time period of the user's who identifies speech.This electronic installation 100 can generate based on the time period of the user who identifies and this user speech label file editable, that can search for.Each label file is associated with corresponding audio file/video file, and so, the user can find its required audio file/video file by the keyword search mode.
For example, suppose that content that a name is called the audio file of " minutes 20120820 " discusses the matters of buying and selling of commodities contract for user first, user second, user third, user's fourth, this electronic installation 100 can be set up 4 label files at least, the content of each label file is respectively " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", " user's second; time limit of speech section: 1:30-2:10,5:20-6:40 ", " user third; time limit of speech section: 6:40-8:50 ", " user the third, time limit of speech section: 8:50-10:30 ".When the user searches in electronic installation 100 take " user's first " as keyword, can search label file " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", so the user can be when listening audio file " minutes 20120820 ", can selectively listen to 0:00-1:30 and the content of two time periods of 2:10-5:20 and can determine the speech content of user's first, needn't hear out whole audio file.
In the present embodiment, this electronic installation 100 is long-range server, and the user can be by handheld device or this electronic installations 100 of computer access such as mobile phones.This electronic installation 100 can be based on user's request and audio file/video file that the user selectes is processed, this electronic installation 100 can also be by cable network or wireless network and with an audio-visual record device 200(for example, recording pen, the DV video camera or have the mobile phone of audio-visual recording function) be connected, and can send identifiers (identifier) to electronic installation 100 when audio-visual record device 200 and electronic installation 100 communication, this electronic installation 100 can identify this audio-visual record device 200 by this identifier, so, behind the audio file/video file that receives audio-visual record device 200 transmission, this electronic installation 100 is processed this audio file/video file immediately.
In the present embodiment, this electronic installation 100 comprises processor 10, storage unit 20, voice recognition unit 30, speech conversion text unit 40 and matching unit 50.This storage unit 20 stores above-mentioned voice feature data storehouse, user name corresponding to each phonetic feature in this voice feature data storehouse.This voice feature data storehouse is renewable, for example, when participating in the phonetic feature that one or more people are arranged among the personnel of a meeting and not being stored in this voice feature data storehouse, these one or more personnel can record separately one section speech samples and upload to electronic installation 100, this processor 10 response users' speech feature extraction request is extracted phonetic feature according to the speech samples of uploading, then the input according to the user is associated the user name of this phonetic feature that extracts with an input, at last phonetic feature and the user name associated therewith that extracts stored in the voice feature data storehouse, thereby finish the renewal in voice feature data storehouse.
Behind the request that receives the user or pending audio file/video file, this processor 10 is play pending audio file/video file, the phonetic feature that this voice recognition unit 30 extracts in this audio file/video file, and the phonetic feature of storage in this phonetic feature that extracts and the storage unit 20 is compared, thereby can determine the corresponding user name of every section voice in audio file/video file.In pending audio file/video file playing process, this speech conversion text unit 40 is converted to literal with the voice content in pending audio file/video file.
This matching unit 50 gets up for the characters matching of the user name that voice recognition unit 30 is determined and 40 conversions of speech conversion text unit.In the present embodiment, this matching unit 50 at first obtains the playing duration of this pending audio file/video file, then this playing duration is divided into N sub-range, play beginning from this pending audio file/video file, this matching unit 50 is recorded in the literal that the corresponding user name of voice content in each sub-range and speech conversion text unit 40 are changed successively.Finally, the continuous sub-range arrangement that this matching unit 50 will all correspond to same user name is a time period, generates to comprise user name, with it corresponding time period and the word content corresponding with each time period.
This processor 10 is the generating labels file on the basis of the text that matching unit 50 generates, and label file is stored in this storage unit 20.In the present embodiment, this processor 10 at first obtains the user name in the speech database in the storage unit 20, then search for the user name of above-mentioned acquisition in above-mentioned text, the user name that at last each is searched and literal and the time period corresponding with it are integrated into a label file according to predetermined template.
In other embodiments, this processor 10 can obtain the creation-time of pending file, this date created is defaulted as date that the voice content in pending audio file/video file occurs and this date created and user name are reached literal and the time period corresponding with it be integrated into a label file.Because this each label file all is editable, when needed, the information that the user can make amendment in label file or add other, for example, the place that voice content occurs.
In the present embodiment, this processor 10 also can link in the insertion in each label file, by this link each label file and corresponding audio file/video file is associated.When the user clicked link in the label file, this processor 10 was play the part corresponding with the time period in this label file successively.Also the example in the preamble is as example, for the label file that comprises " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", when the user clicked link in this label file, this processor 10 is 0:00-1:30 part and the 2:10-5:20 part in displaying audio file/video file " minutes 20120820 " successively.So, the progress bar of user in need not manual operation audio/video playback software regulated it and wanted to hear content, has great convenience for the user.
In other embodiments, this processor 10 can write store path and the file name of the audio file/video file corresponding with this label file in the remarks section of the file attribute of this label file, so, label file and this audio file/video file are associated.
Consult Fig. 2, in the present embodiment, this electronic installation 100 also provides a query interface 60, and the user can pass through such as equipment such as smart mobile phone or computing machines by this query interface 60 of access to netwoks.This query interface 60 comprises search condition district 61 and result for retrieval district 62.61 kinds in this search condition district comprises a plurality of input frames, and the user can input search condition in these a plurality of input frames.For example, in input frame 611 inputting date, in input frame 612, input user name, input place etc. in input frame 613.The user can only input a search condition and retrieve, and also can input simultaneously a plurality of search conditions and retrieve.Processor 10 is searched for qualified label file according to one or more search condition of user's input, and the relevant information of the label file that retrieves can be presented in the result for retrieval district 62.For example, the time period corresponding with this user name 623 that comprises in meeting demonstration user name 621, label file name 622, the label file in the result for retrieval district 62 etc.In the present embodiment, this query interface 60 comprises audio playing module 63, and the user clicks 623 o'clock time periods, and this processor 10 is carried out these audio playing module 63, the part corresponding with this 623 of playing corresponding audio file time period.This result for retrieval district 62 also comprises text display box 64, and this literal display box 64 is used for showing the literal corresponding with the voice content of the audio file of playing.
In the present embodiment, this query interface 60 also comprises download button, when the content in the user selection result for retrieval district 62 and when clicking download button, processor 10 is a file with single or a plurality of content integrations of user selection, and with above-mentioned file copy in the store path of user's appointment.Take Fig. 2 as example, the user can select time section " 0:20-0:50 " reach " 0:50-1:00 ", processor 10 can be an audio/video file with the content of " 0:20-0:50 " in " minutes 1 " part and with " 0:50-1:00 " in " minutes 1 " content integration partly according to above-mentioned selection, and the user can download to above-mentioned audio/video file in its needed store path.
See also Fig. 3, the method for utilizing electronic installation 100 of the present invention to carry out audio file/video file processing comprises step S100-S500.Particularly, in step S100, electronic installation 100 receives pending audio file/video file.
In step S200, the phonetic feature that this voice recognition unit 30 extracts in progress audio file/video file.In step S300, this voice recognition unit 30 is compared this phonetic feature that extracts with the phonetic feature of storage in the storage unit 20, thereby can determine the corresponding user name of every section voice in audio file/video file.
In step S400, this matching unit 50 at first obtains the playing duration of this pending audio file/video file, then this playing duration is divided into N sub-range, play beginning from this pending audio file/video file, this matching unit 50 is recorded in the literal that the corresponding user name of voice content in each sub-range and speech conversion text unit 40 are changed successively.Finally, the continuous sub-range arrangement that this matching unit 50 will all correspond to same user name is a time period, generates to comprise user name, with it corresponding time period and the word content corresponding with each time period.
In step S500, this processor 10 is the generating labels file on the basis of the text that matching unit 50 generates, and its corresponding audio file/video file of this label file is associated.