CN102982800A

CN102982800A - Electronic device with audio video file video processing function and audio video file processing method

Info

Publication number: CN102982800A
Application number: CN2012104431248A
Authority: CN
Inventors: 蒋浩良
Original assignee: Hongfujin Precision Industry Shenzhen Co Ltd; Hon Hai Precision Industry Co Ltd
Current assignee: Hongfujin Precision Industry Shenzhen Co Ltd; Hon Hai Precision Industry Co Ltd
Priority date: 2012-11-08
Filing date: 2012-11-08
Publication date: 2013-03-20

Abstract

The present invention provides an electronic device and a method for processing audio-visual files. The electronic device includes a processor, a memory, a speech recognition unit and a matching unit. Voice features, each voice feature is corresponding to a user name; the voice recognition unit is used to extract the voice features in the audio/video file, and then compare the extracted voice features with a plurality of voice features in the voice feature database , so as to be able to determine the user name of the user speaking in the audio/video file; the matching unit is used to record the time period of the speech corresponding to the user name of the user speaking in the audio/video file determined in the audio/video file; The processor is used for generating a label file according to the determined user name of the user speaking in the audio/video file and the time period corresponding to the user name.

Description

Electronic installation and audio/video file disposal route with audio/video file processing capacity

Technical field

The present invention relates to a kind of electronic installation, relate in particular to a kind of electronic installation with audio/video file processing capacity.

Background technology

Usually, people can use recording pen when making minutes, can go up the content of not remembeing by listening to the content augmentation that recording pen records after the meeting.Sometimes, need to hear out the content that recording pen is recorded and all hear out the speech content that to determine each spokesman, may need like this time of growing.

Summary of the invention

In view of this, the invention provides a kind of electronic installation, it can generate the label file of the time period that comprises user name and its speech, the time period the when user can determine that each convention goer makes a speech fast according to the content of label file.

A kind of electronic installation with audio/video file processing capacity, comprise processor and storage unit, also comprise voice recognition unit and matching unit, store the voice feature data storehouse in this storage unit, this voice feature data storehouse comprises a plurality of users' phonetic feature, and each phonetic feature is corresponding with a user name; This processor responds user's request and plays specific audio file or video file, this voice recognition unit is used for extracting the phonetic feature of described audio file or video file, then the phonetic feature that extracts and a plurality of phonetic features in this voice feature data storehouse are compared, thereby can determine the user's that makes a speech in described audio file or the video file user name; This matching unit be used for record with described determine described audio file or the time period of the corresponding speech of the user's that makes a speech of video file user name; This processor is used for generating a label file according to described definite user's who makes a speech at described audio file or video file user name and the time period corresponding with described user name, this processor associates described label file and described audio file or video file, thereby allows the user to find audio file or the video file that is associated with label file by label file.

A kind of audio file/video file disposal route comprises: receive pending audio file/video file; Extract the user vocal feature in pending audio file/video file; Determine the user name corresponding with the user vocal feature in pending audio file/video file; Determine the corresponding time limit of speech section of each user name; Generate the label file that is associated with user vocal feature in pending audio file/video file.

The label file that uses electronic installation of the present invention to generate, the time period the when user can determine fast that each convention goer makes a speech, thus be convenient to fast definite certain specific people's speech content of user.

Description of drawings

Fig. 1 is the module map of electronic installation of the present invention.

Fig. 2 is the process flow diagram of electronics process audio file/video file of the present invention.

Fig. 3 is the query interface for user's inquiry of electronic installation of the present invention.

The main element symbol description

Electronic installation	100
		Audio-visual record device	200
Processor	10
		Storage unit	20
Voice recognition unit	30
		The speech conversion text unit	40
Matching unit	50
		Query interface	60
The search condition district	61
		Input frame	611
Input frame	612
		Input frame	613
The result for retrieval district	62
		User name	621
The label file name	622
		Time period	623
Playing module	63
		Text display box	64

Following embodiment further specifies the present invention in connection with above-mentioned accompanying drawing.

Embodiment

See also Fig. 1, in the present embodiment, electronic installation 100 of the present invention comprises the voice feature data storehouse, the phonetic feature that comprises a plurality of users in this voice feature data storehouse, this electronic installation 100 can identify according to described a plurality of users' phonetic feature the user who makes a speech in pending audio file/video file.This electronic installation 100 can record the time period of the user's who identifies speech.This electronic installation 100 can generate based on the time period of the user who identifies and this user speech label file editable, that can search for.Each label file is associated with corresponding audio file/video file, and so, the user can find its required audio file/video file by the keyword search mode.

For example, suppose that content that a name is called the audio file of " minutes 20120820 " discusses the matters of buying and selling of commodities contract for user first, user second, user third, user's fourth, this electronic installation 100 can be set up 4 label files at least, the content of each label file is respectively " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", " user's second; time limit of speech section: 1:30-2:10,5:20-6:40 ", " user third; time limit of speech section: 6:40-8:50 ", " user the third, time limit of speech section: 8:50-10:30 ".When the user searches in electronic installation 100 take " user's first " as keyword, can search label file " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", so the user can be when listening audio file " minutes 20120820 ", can selectively listen to 0:00-1:30 and the content of two time periods of 2:10-5:20 and can determine the speech content of user's first, needn't hear out whole audio file.

In the present embodiment, this electronic installation 100 is long-range server, and the user can be by handheld device or this electronic installations 100 of computer access such as mobile phones.This electronic installation 100 can be based on user's request and audio file/video file that the user selectes is processed, this electronic installation 100 can also be by cable network or wireless network and with an audio-visual record device 200(for example, recording pen, the DV video camera or have the mobile phone of audio-visual recording function) be connected, and can send identifiers (identifier) to electronic installation 100 when audio-visual record device 200 and electronic installation 100 communication, this electronic installation 100 can identify this audio-visual record device 200 by this identifier, so, behind the audio file/video file that receives audio-visual record device 200 transmission, this electronic installation 100 is processed this audio file/video file immediately.

In the present embodiment, this electronic installation 100 comprises processor 10, storage unit 20, voice recognition unit 30, speech conversion text unit 40 and matching unit 50.This storage unit 20 stores above-mentioned voice feature data storehouse, user name corresponding to each phonetic feature in this voice feature data storehouse.This voice feature data storehouse is renewable, for example, when participating in the phonetic feature that one or more people are arranged among the personnel of a meeting and not being stored in this voice feature data storehouse, these one or more personnel can record separately one section speech samples and upload to electronic installation 100, this processor 10 response users' speech feature extraction request is extracted phonetic feature according to the speech samples of uploading, then the input according to the user is associated the user name of this phonetic feature that extracts with an input, at last phonetic feature and the user name associated therewith that extracts stored in the voice feature data storehouse, thereby finish the renewal in voice feature data storehouse.

Behind the request that receives the user or pending audio file/video file, this processor 10 is play pending audio file/video file, the phonetic feature that this voice recognition unit 30 extracts in this audio file/video file, and the phonetic feature of storage in this phonetic feature that extracts and the storage unit 20 is compared, thereby can determine the corresponding user name of every section voice in audio file/video file.In pending audio file/video file playing process, this speech conversion text unit 40 is converted to literal with the voice content in pending audio file/video file.

This matching unit 50 gets up for the characters matching of the user name that voice recognition unit 30 is determined and 40 conversions of speech conversion text unit.In the present embodiment, this matching unit 50 at first obtains the playing duration of this pending audio file/video file, then this playing duration is divided into N sub-range, play beginning from this pending audio file/video file, this matching unit 50 is recorded in the literal that the corresponding user name of voice content in each sub-range and speech conversion text unit 40 are changed successively.Finally, the continuous sub-range arrangement that this matching unit 50 will all correspond to same user name is a time period, generates to comprise user name, with it corresponding time period and the word content corresponding with each time period.

This processor 10 is the generating labels file on the basis of the text that matching unit 50 generates, and label file is stored in this storage unit 20.In the present embodiment, this processor 10 at first obtains the user name in the speech database in the storage unit 20, then search for the user name of above-mentioned acquisition in above-mentioned text, the user name that at last each is searched and literal and the time period corresponding with it are integrated into a label file according to predetermined template.

In other embodiments, this processor 10 can obtain the creation-time of pending file, this date created is defaulted as date that the voice content in pending audio file/video file occurs and this date created and user name are reached literal and the time period corresponding with it be integrated into a label file.Because this each label file all is editable, when needed, the information that the user can make amendment in label file or add other, for example, the place that voice content occurs.

In the present embodiment, this processor 10 also can link in the insertion in each label file, by this link each label file and corresponding audio file/video file is associated.When the user clicked link in the label file, this processor 10 was play the part corresponding with the time period in this label file successively.Also the example in the preamble is as example, for the label file that comprises " user's first; time limit of speech section: 0:00-1:30,2:10-5:20 ", when the user clicked link in this label file, this processor 10 is 0:00-1:30 part and the 2:10-5:20 part in displaying audio file/video file " minutes 20120820 " successively.So, the progress bar of user in need not manual operation audio/video playback software regulated it and wanted to hear content, has great convenience for the user.

In other embodiments, this processor 10 can write store path and the file name of the audio file/video file corresponding with this label file in the remarks section of the file attribute of this label file, so, label file and this audio file/video file are associated.

Consult Fig. 2, in the present embodiment, this electronic installation 100 also provides a query interface 60, and the user can pass through such as equipment such as smart mobile phone or computing machines by this query interface 60 of access to netwoks.This query interface 60 comprises search condition district 61 and result for retrieval district 62.61 kinds in this search condition district comprises a plurality of input frames, and the user can input search condition in these a plurality of input frames.For example, in input frame 611 inputting date, in input frame 612, input user name, input place etc. in input frame 613.The user can only input a search condition and retrieve, and also can input simultaneously a plurality of search conditions and retrieve.Processor 10 is searched for qualified label file according to one or more search condition of user's input, and the relevant information of the label file that retrieves can be presented in the result for retrieval district 62.For example, the time period corresponding with this user name 623 that comprises in meeting demonstration user name 621, label file name 622, the label file in the result for retrieval district 62 etc.In the present embodiment, this query interface 60 comprises audio playing module 63, and the user clicks 623 o'clock time periods, and this processor 10 is carried out these audio playing module 63, the part corresponding with this 623 of playing corresponding audio file time period.This result for retrieval district 62 also comprises text display box 64, and this literal display box 64 is used for showing the literal corresponding with the voice content of the audio file of playing.

In the present embodiment, this query interface 60 also comprises download button, when the content in the user selection result for retrieval district 62 and when clicking download button, processor 10 is a file with single or a plurality of content integrations of user selection, and with above-mentioned file copy in the store path of user's appointment.Take Fig. 2 as example, the user can select time section " 0:20-0:50 " reach " 0:50-1:00 ", processor 10 can be an audio/video file with the content of " 0:20-0:50 " in " minutes 1 " part and with " 0:50-1:00 " in " minutes 1 " content integration partly according to above-mentioned selection, and the user can download to above-mentioned audio/video file in its needed store path.

See also Fig. 3, the method for utilizing electronic installation 100 of the present invention to carry out audio file/video file processing comprises step S100-S500.Particularly, in step S100, electronic installation 100 receives pending audio file/video file.

In step S200, the phonetic feature that this voice recognition unit 30 extracts in progress audio file/video file.In step S300, this voice recognition unit 30 is compared this phonetic feature that extracts with the phonetic feature of storage in the storage unit 20, thereby can determine the corresponding user name of every section voice in audio file/video file.

In step S400, this matching unit 50 at first obtains the playing duration of this pending audio file/video file, then this playing duration is divided into N sub-range, play beginning from this pending audio file/video file, this matching unit 50 is recorded in the literal that the corresponding user name of voice content in each sub-range and speech conversion text unit 40 are changed successively.Finally, the continuous sub-range arrangement that this matching unit 50 will all correspond to same user name is a time period, generates to comprise user name, with it corresponding time period and the word content corresponding with each time period.

In step S500, this processor 10 is the generating labels file on the basis of the text that matching unit 50 generates, and its corresponding audio file/video file of this label file is associated.

Claims

1. the electronic installation with audio/video file processing capacity comprises processor and storage unit, it is characterized in that:

Also comprise voice recognition unit and matching unit, store the voice feature data storehouse in this storage unit, this voice feature data storehouse comprises a plurality of users' phonetic feature, and each phonetic feature is corresponding with a user name;

This processor responds user's request and plays specific audio file or video file, this voice recognition unit is used for extracting the audio file of described broadcast or the phonetic feature of video file, then the phonetic feature that extracts and a plurality of phonetic features in this voice feature data storehouse are compared, thereby can determine the user name of speech user in described audio file or the video file;

This matching unit is used for make a speech user's the corresponding time limit of speech section of user name of the audio file of record and described broadcast or video file;

This processor is used for generating a label file according to the audio file of described broadcast or video file speech user's user name and the time limit of speech section corresponding with described user name, this processor associates audio file or the video file of described label file and described broadcast, thereby allows the user to find audio file or the video file that is associated with label file by label file.

2. the electronic installation with audio/video file processing capacity as claimed in claim 1, it is characterized in that, this matching unit is divided into a plurality of sub-ranges with the audio file of described broadcast or the duration of video file, record successively the corresponding user name in each sub-range, and when continuous sub-range all during corresponding same user name, this matching unit is a time period with those continuous sub-ranges arrangements.

3. the electronic installation with audio/video file processing capacity as claimed in claim 2, it is characterized in that, also comprise the speech conversion text unit, when playing described audio file or video file, this speech conversion text unit is converted to literal with the audio file of described broadcast or the voice content in the video file, this matching unit records the literal that corresponding this speech conversion text unit in each sub-range is changed, thereby the literal that described user name and described speech conversion text unit are changed is complementary, and this processor also inserts the literal corresponding with the user name in each label file in each label file.

4. the electronic installation with audio/video file processing capacity as claimed in claim 1 is characterized in that, this processor inserts a link in each label file, and this link is used for this label file is associated with corresponding audio file or video file.

5. the electronic installation with audio/video file processing capacity as claimed in claim 4, it is characterized in that, when the link in each label file was clicked, this processor was play part corresponding with the time period in each label file in corresponding audio file or the video file.

6. audio/video file disposal route comprises:

Receive pending audio file/video file;

Extract the user vocal feature in pending audio file/video file;

Determine the user name corresponding with the user vocal feature in pending audio file/video file;

Determine the corresponding time limit of speech section of each user name;

Generate the label file that is associated with user vocal feature in pending audio file/video file.