[go: up one dir, main page]

TW201042636A - Video and audio editing system, method and electronic device using same - Google Patents

Video and audio editing system, method and electronic device using same Download PDF

Info

Publication number
TW201042636A
TW201042636A TW98117143A TW98117143A TW201042636A TW 201042636 A TW201042636 A TW 201042636A TW 98117143 A TW98117143 A TW 98117143A TW 98117143 A TW98117143 A TW 98117143A TW 201042636 A TW201042636 A TW 201042636A
Authority
TW
Taiwan
Prior art keywords
sound
image
indecent
module
editing
Prior art date
Application number
TW98117143A
Other languages
Chinese (zh)
Other versions
TWI385646B (en
Inventor
Chuan-Feng Wu
Original Assignee
Hon Hai Prec Ind Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hon Hai Prec Ind Co Ltd filed Critical Hon Hai Prec Ind Co Ltd
Priority to TW98117143A priority Critical patent/TWI385646B/en
Publication of TW201042636A publication Critical patent/TW201042636A/en
Application granted granted Critical
Publication of TWI385646B publication Critical patent/TWI385646B/en

Links

Landscapes

  • Television Signal Processing For Recording (AREA)

Abstract

The present invention relates to a video and audio editing system. The system filters the video and audio. The system includes a sample memory and a processor. The sample memory stores uncivil sound samples. The processor includes a capturing sound module, an identifying sound module, a contrasting sound module, and an editing sound module. The capturing sound module is configured for capturing sound from the audio. The identifying sound module is configured for identifying the captured sound. The contrasting sound module is configured for contrasting the captured sound and the uncivil sound samples to judge whether there have any uncivil sound. The editing sound module is configured for editing the uncivil sound judged by the contrasting sound module. The present invention also provides a video and audio editing method. And the present invention provides an electronic device using the video and audio editing system.

Description

201042636 •六、發明說明: 【發明所屬之技術領域】 本發明涉及一種影音編輯系統、方法及具有該影音編 •輯系統的電子設備。 【先前技術】 科技的進步為人們帶來越來越多的便利。電子產品、 資讯產品已經與人們的生活結合的越來越緊密。而聲音資 〇訊及圖像資訊的傳遞和記錄也隨著科技的發展變得越來越 容易。但是,無論是播放的電視節目還是家庭錄影中,偶 爾會有些不雅的言語被播出或記錄下來。這可能會造成小 孩錯誤的不範或對小孩的言行產生不利的影響。而一般家 庭在使用攝像機進行錄音錄影時,也可能會因錄影中包含 不雅的話s吾而造成一些具有紀念價值的影片無法與別人分 享。 〇 【發明内容】 ^有鑒於此,有必要提供一種可以對不雅的影音内容進 行過遽的影音編輯系統、方法及具有該影音編輯系統的電 子設備。 立D種〜曰編輯系統,其用於影音内容的過濾’所述影 三、扁輯系統包括樣本記憶體及處理器。所述樣本記憶體内 子儲有不雅聲曰樣本,所述處理器包括:聲音獲取模塊, 用於從所述影音内容中獲取聲音;聲音辨識模塊,用於將 4 201042636201042636 • VI. Description of the Invention: [Technical Field] The present invention relates to an audio-visual editing system, a method, and an electronic device having the video-audio editing system. [Prior Art] The advancement of technology has brought more and more convenience to people. Electronic products and information products have become more and more closely integrated with people's lives. The transmission and recording of voice information and image information has become easier with the development of technology. However, whether it is a TV show or a home video, occasionally some indecent words are broadcast or recorded. This may cause the child's mistakes to be inappropriate or adversely affect the child's words and deeds. When a general family uses a video camera to record video, it may also cause some commemorative films to be shared with others because of the indecent words in the video. SUMMARY OF THE INVENTION In view of the above, it is necessary to provide an audio-visual editing system and method that can perform inconspicuous audio-visual content and an electronic device having the audio-visual editing system. The D type ~ 曰 editing system, which is used for filtering the audio and video content. The video system includes a sample memory and a processor. The sample memory body stores an indecent sonar sample, the processor includes: a sound acquisition module, configured to acquire a sound from the audio and video content; and a sound recognition module for using 4 201042636

I 獲取的聲音進行識別;聲音對比模塊,用於將識別出的聲 音與所述樣本記憶體内的不雅聲音樣本對比,判斷是否存 在不雅聲音;聲音編輯模塊,用於當所述對比模塊判斷出 存在不雅聲音時,將不雅聲音進行編輯。 一種影音編輯方法,該影音編輯方法根據預設的不雅 聲音樣本,將影音内容中的不雅聲音進行編輯。所述影音 編輯方法包括以下步驟:獲取影音内容中的聲音;將獲取 的聲音與不雅聲音樣本進行對比;若存在不雅聲音,將不 ο雅聲音進行編輯。 一種電子設備,該電子設備包括可以輸出影音資料的 影音輸出端。所述電子設備還包括樣本記憶體及處理器, 所述樣本記憶體内存儲有不雅聲音樣本,所述處理器與所 述影音輸出端電連接,所述處理器包括:聲音獲取模塊, 用於從所述影音内容中獲取聲音;聲音辨識模塊,用於將 獲取的聲音進行識別;聲音對比模塊,用於將識別出的聲 音與所述樣本記憶體内的不雅聲音樣本對比,判斷是否存 在不雅聲音;聲音編輯模塊,用於當所述對比模塊判斷出 存在不雅聲音時,將不雅聲音進行編輯。 本發明提供的影音編輯系統、方法及具有該影音編輯 系統的電子設備可以對影音内容中不雅聲音進行過濾,從 而有利於保證影音内容的健康。 【實施方式】 下面將結合附圖,對本發明作進一步的詳細說明。 5 201042636 請參見圖卜為本發明實施方式提供的電子設備100, 所述電子設備謂可以是具有攝像功能的手機、具有攝像 ’功能的掌上電腦、數位相機或數位攝像機等。本實施方式 中’所述電子設備為數位攝像機。所述電子設備⑽ 還可以是具有圖像處理及存儲功能的數位電視或網路電視 等圖像播放設備。 所述電子設備100包括影音輸出# 10、樣本記憶體 20、影音記憶體30、處理器4〇。所述影音輸出端1〇用於 。輸出影音資料。所述影音輸出端1〇可以由影像感測器及麥 克風組成,也可以由接收影音訊號的接收裝置、圖像解碼 器組成。本實施方式中,所述電子設備1〇〇為數位攝像機。 所述影音輸出端10由影像感測器及麥克風組成。 所述樣本記憶體20内存儲有不雅聲音樣本及不雅圖像 樣本。本實施方式中,先利用語音識別技術,將大量不雅 的字、詞、句、聲調等通過神經網路演算法進行訓練,從 〇而提取各種不雅聲音所具有的特徵值,再將各種不雅聲音 的特欲值作為不雅聲音樣本存儲於所述樣本記憶體内。 對於不雅圖像樣本,是通過圖像識別技術,將大量與不雅 聲音樣本相對應的嘴、手勢、文字等圖形進行處理,從而 提取各種不雅圖像所具有的特徵圖形作為不雅圖像樣本存 儲於所述樣本記憶體2〇内。本實施方式中,所述樣本記憶 體20内存儲的不雅圖像樣本是與不雅聲音資訊對應的嘴 形。 ’ 所述影音記憶體30用於存儲錄製及編輯過的影音資 6 201042636 料。 所述處理器40包括聲音獲取模塊41、聲音 =聲音對比模塊43、聲音編輯模塊44、圖像獲取模塊45: 夺曰1段模塊46、圖像辨識模塊47、圖像對比模塊48 編輯模塊49及存儲模塊5〇。 圃像 所述聲音獲取模塊41驗從影音:#料巾獲取聲音。本 ^施方式中’所述獲取模塊41用於從影音輸 曰 聲音資料。 役取The sound obtained by the I is identified; the sound comparison module is configured to compare the recognized sound with the indecent sound sample in the sample memory to determine whether there is an indecent sound; and the sound editing module is configured to be used as the comparison module When it is judged that there is an indecent sound, the indecent sound is edited. An audio-visual editing method for editing an indecent sound in a video content according to a preset indecent sound sample. The video editing method includes the steps of: acquiring a sound in the audiovisual content; comparing the acquired sound with the indecent sound sample; if there is an indecent sound, the sound is not edited. An electronic device includes an audio and video output terminal that can output audio and video data. The electronic device further includes a sample memory and a processor, wherein the sample memory stores an indecent sound sample, the processor is electrically connected to the audio and video output end, and the processor includes: a sound acquisition module, Acquiring a sound from the audio and video content; a sound recognition module for identifying the acquired sound; and a sound comparison module, configured to compare the recognized sound with the indecent sound sample in the sample memory to determine whether There is an indecent sound; a sound editing module is configured to edit the indecent sound when the comparison module determines that there is an indecent sound. The audio-visual editing system and method provided by the present invention and the electronic device having the audio-visual editing system can filter indecent sounds in the audio-visual content, thereby facilitating the health of the audio-visual content. [Embodiment] Hereinafter, the present invention will be further described in detail with reference to the accompanying drawings. 5 201042636 Please refer to FIG. 2 for an electronic device 100 according to an embodiment of the present invention. The electronic device may be a mobile phone with an imaging function, a handheld computer with a camera function, a digital camera or a digital camera. In the present embodiment, the electronic device is a digital camera. The electronic device (10) may also be an image playback device such as a digital television or a network television having image processing and storage functions. The electronic device 100 includes a video output #10, a sample memory 20, a video memory 30, and a processor 4. The video output terminal 1 is used for . Output audio and video data. The video output terminal 1〇 may be composed of an image sensor and a microphone, or may be composed of a receiving device and an image decoder that receive video signals. In this embodiment, the electronic device 1 is a digital camera. The video output terminal 10 is composed of an image sensor and a microphone. The sample memory 20 stores indecent sound samples and indecent image samples. In the present embodiment, the speech recognition technology is first used to train a large number of indecent words, words, sentences, tones, etc. through neural network algorithms, and extract the characteristic values of various indecent sounds from the sputum, and then The special value of the elegant sound is stored in the sample memory as an indecent sound sample. For indecent image samples, through the image recognition technology, a large number of graphics, gestures, characters and the like corresponding to the indecent sound samples are processed, thereby extracting characteristic graphics of various indecent images as indecent maps. The sample sample is stored in the sample memory 2〇. In the present embodiment, the indecent image sample stored in the sample memory 20 is a mouth shape corresponding to the indecent sound information. The audio and video memory 30 is used to store recorded and edited audio and video resources 6 201042636. The processor 40 includes a sound acquisition module 41, a sound=sound comparison module 43, a sound editing module 44, and an image acquisition module 45: a capture segment module 46, an image recognition module 47, an image comparison module 48, and an editing module 49. And the storage module 5〇. The sound acquisition module 41 detects the sound from the video: #巾巾. The acquisition module 41 is used to input sound data from video and audio. Take

、聲音辨識模塊42用於將獲取的聲音進行識別。本實施 =式中’所述聲音韻模塊42洲端則貞㈣術判斷所述 獲取模塊41獲取的聲音資料中哪些區段是有聲段,那些β 屬於無聲段或背景雜訊。#找财聲段後,所述聲音辨= 模塊42利用強波處理技術補償語音訊號中濁音訊號的衰 提高辨識的準確率,錢利用數位濾波器組來處理 吾音訊號’將每個濾、波器的頻譜能量值利用線性倒頻譜 數轉換成為聲音的特徵值。 °曰’、 所述聲音對比模塊43用於將識別出的聲音與所述記憶 體内的不雅聲音樣本對比,判斷是否存在不雅聲音。本^ 施方式中,所述聲音對比模塊43將所述聲音辨識模塊^ 獲取的聲音的特徵值與所述樣本記憶體2〇内的不雅聲音的 特徵值進行對比,根據特徵值是否相同來判斷獲取的聲音 資料中是否存在不雅聲音。 曰 所述聲音編輯模塊44用於當所述聲音對比模塊43判 斷出存在不雅聲音時,將不雅聲音進行編輯。本實施方式 7 201042636 中,若所述聲音對比模塊43判斷出所述聲音辨識模塊42 獲取的聲音的特徵值與所述樣本記憶體20内的不雅聲音的 特徵值相同,所述聲音編輯模塊42將對不雅聲音進行編 輯。所述聲音編輯模塊42可以將不雅聲音進行替換或刪 '除。本實施方式中,當所述聲音對比模塊43判斷出存在不 雅的言語時,所述聲音編輯模塊44就將包含不雅聲音的資 料進行刪除。當所述聲音編輯模塊42需要將不雅聲音進行 替換時,所述聲音編輯模塊42内應預先存有替換聲音,例 〇如“嗶嗶”聲。當存在不雅的言語時,所述聲音編輯模塊 44就將包含不雅聲音的資料利用替換聲音進行替換。 有時,不雅聲音存在的場景中也會存在不希望看到的 動作或景象,例如嘴、手勢及暴露的衣著等。當發現存在 不雅聲音時,將利用圖像獲取模塊45、時間段模塊46、圖 像辨識模塊47、圖像對比模塊48及圖像編輯模塊49對存 在不雅聲音的時間段内的不雅圖像進行編輯。 ^ 所述圖像獲取模塊45用於從所述影音内容中獲取圖 像。本實施方式中,所述圖像獲取模塊45在所述聲音獲取 模塊41獲取聲音時,從所述影音輸出端10獲取圖像資料。 所述時間段模塊46用於獲取存在不雅聲音的時間段。 本實施方式中,所述時間段模塊46直接利用所述聲音辨識 模塊42辨識出存在不雅聲音的時間段。 所述圖像辨識模塊47用於獲取圖像中的特徵圖形。本 實施方式中,所述圖像辨識模塊47是用於辨識人的嘴部。 所述圖像辨識模塊47首先利用人臉識別技術從所述圖像獲 8 201042636 取模塊4 5獲取的圖像#料中偵測出人臉。㈣在人臉 出嘴巴的區域,為了處理方便,可以將彩色影像轉換成灰 階影像。接著依照適應性臨界值法,將嘴巴依照灰階值的 比例作二值化。再經過形態學的閉合運算,將很接近的黑 色部位連接起來。最後運用連同成分標示法,找出圖像; 所有的區域’並經過比對後找出最大區域作為嘴巴的特徵 圖形。 所述圖像對比模塊48用於所述圖像辨識模塊47獲取 的特徵圖形與所述樣本記憶體2〇中存儲的圖像樣本進行對 比。本實施方式中,所述圖像對比模塊47將所述圖像辨識 模塊47獲取的嘴巴特徵圖形與所述樣本記憶體2〇中存儲 的嘴巴樣本進行對比。若相符合,則認定獲取的圖像中存 在嘴巴。由於言語由嘴發出,當出現不雅的言語時,嘴巴 f往會有開合動作。本實施方式中,所述圖像對比模塊48 還包括動作模塊481,所述動作模塊481用於判斷識別出的 ❹嘴巴疋否在不雅5吾&出現時產生變化。本實施方式中,所 述動作模塊481判斷的嘴巴邊緣是否產生變形,當所述動 作杈塊481判斷出嘴巴不雅語言出現時產生變形時,則對 嘴巴進行編輯。當然、,若是手勢、文字等不易變動的圖形, 則可以無需判斷圖像是否產生變形,只需找到與不雅圖像 樣本相符的圖形即可。 所述圖像編輯模塊49用於根據圖像辨識模塊47的對 比結果,對圖像進行編輯。所述圖像編輯模塊49既可以刪 除圖像也可以對圖像進行修改。當所述圖像編輯模塊49需 9 201042636 要刪除圖像時,根據時間段模塊46獲 又 時間,刪除對應時間的圖像資料。 1不雅聲音的 .對圖像做修改時,可以預先存儲可模塊49需要 圖像編輯程式做圖像變形,馬赛 2的_’或利用 利用替換圖形覆蓋到不雅圖形上田存在不雅圖像時, 變形,馬赛克編輯等。所述本實施雅圖形進行圖像 模塊的對嘴巴部位添加馬赛克實也方式中,所述圖像編輯 所述存儲模塊50用於將編輯過的聲音、圖傻亡梓, 〇述影音記憶體30中。 …圖像存儲到所 請參閱圖2,為本發明實#古— 圖。 方式提供的影音編輯的流程 切€取影音内料的聲音及时。本實施方 II咨Γ 難41詩朗述影音輸出端ig中獲取 圖像資料及聲音資料。 /驟S115.從所述獲取的聲音資訊中提取聲音的特 ❹·_本實⑯方式中’所述聲音辨識模塊42利用端點偵測技 術對所述獲取模塊41 所述獲取模塊41獲取的聲音資 2中哪些區段是有聲段,那些是屬於無聲段或背景雜訊。 田、找到有聲段後’所述聲音辨識模塊42湘強波處理技術 補償語音訊號中濁音訊號的衰減,以提高辨識的準確率, 然後利用數位滤波器組來處理語音訊號,將每個滤波器的 頻譜能量值利用線性倒頻譜系數轉換成為聲音的特徵值。 步驟S120:將獲取的聲音的特徵與不雅聲音樣本進行 對比疋否相同。本貫施方式中,所述聲音對比模塊43將所 201042636 述聲音辨識模塊42獲取的聲音的特徵值與所述樣本記憶體 20内的不雅聲音的特徵值進行對比,以判斷獲取的聲音資 料中是否存在不雅聲音。 步驟S125 :根據對比結果將與所述不雅聲音樣本相符 的聲音進行編輯。本實施方式中,若所述聲音對比模塊43 判斷出所述聲音辨識模塊42獲取的聲音的特徵值與所述樣 本記憶體20内的不雅聲音的特徵值相同,所述聲音編輯模 塊42將對不雅聲音進行編輯。所述聲音編輯模塊42可以 〇將不雅聲音進行替換或刪除。本實施方式中,當所述聲音 對比模塊43判斷出存在不雅的言語時,所述聲音編輯模塊 44就將包含不雅聲音的資料進行刪除。當所述聲音編輯模 塊42需要將不雅聲音進行替換時,所述聲音編輯模塊42 内應預先存有替換聲音,例如“嗶嗶”聲。當存在不雅的 言語時,所述聲音編輯模塊44就將包含不雅聲音的資料利 用替換聲音進行替換。 ^ 步驟S130 :獲取與所述不雅聲音樣本相符的聲音資訊 〇 的時間段。本實施方式中,所述時間段模塊46直接利用所 述聲音辨識模塊42辨識出存在不雅聲音的時間段。 步驟S135 :獲取在所述不雅聲音對應時間段内圖像的 特徵圖形。利用所述圖像辨識模塊47將存在不雅聲音的時 間段的圖像與不雅圖像樣本對比,判斷是否存在不雅圖 像。本實施方式中,所述圖像辨識模塊47是用於辨識人的 嘴部。所述圖像辨識模塊47首先利用人臉識別技術從所述 圖像獲取模塊45獲取的圖像資料中偵測出人臉。然後在人 11 201042636 臉上找出嘴巴的區域,為了處理方便,可以將彩色影像轉 換成灰階影像。接著依照適應性臨界值法,將嘴巴依照灰 .階值的比例作二值化。再經過形態學的閉合運算,將很接 近的黑色部位連接起來。最後運用連同成分標示法,找出 圖像中所有的區域,並經過比對後找出最大區域作為嘴巴 的特徵圖形。所述圖像辨識模塊47再將該區域與所述樣本 記憶體20中存儲的嘴巴樣本進行對比。若相符纟,則認定 獲取的圖像中存在嘴巴。由於言語由嘴發出,#出現不雅 的言語時,嘴巴往往會有開合動作。 步驟S140:將獲取的特徵圖形與不雅圖像樣本進行對 比。本實施方式中’所述圖像對比模塊47將所述圖像辨識 模塊47獲取的嘴巴特徵圖形與所述樣本記憶體加中存儲 =嘴巴樣本進彳讀比。若相符合,則認定獲取的 在嘴巴。 τ 〇 ㈣S145:判斷與所述不雅圖像樣本相符的特徵 否變形。由於言語由嘴發出,當出現不雅的言語時,嘴 48 = 1^1合動作。本實施方式中’所述圖像對比模塊 出的^巴b ^塊481,所述動作模塊術用於判斷識別 嘴巴疋否在不雅語言出現時產生變化。本實施方式 所動作模塊481判斷的嘴巴邊緣是否產生變形’當 時481判斷出嘴巴不雅語言出現時產生變形 動二Γ行編輯。當然,若是手勢、文字等不易變 二無需判斷圖像是否產生變形,只需找到 -不雅圖像樣本相符的圖形即可,可以無需此步驟。 12 201042636 y驟S15G ·根據對比結果將與所述不 的圖像進行編輯。所述圖傻编蛭_ 界休+邳付 模垃47靜卜Η士^ 塊49用於根據圖像辨識 • Q塊的對比結果’對圖像進行編輯。所述圖像編輯模塊 , *㈣除圖像時,根據時間段模塊46獲取的存 ^雅聲音的時間,刪除對應時間的圖像資料。當圖像編 ^塊^需㈣圖像做修改時’可以預先存儲可 的 〇=雅=圖像編輯程式做圖像變形,馬赛克等。當存 雅ΰ形進&㈣換圖形覆蓋到不雅圖形上,或將不 =形進订圖像變形,馬赛克編輯等。所述本實施方式卜 述圖像編輯模塊49對嘴巴部位添加馬赛克。 ^驟S155 .將編輯完的影音内容進行存儲。本實施方 i音記=^州_峨彻保存到所述 在步驟S120中,甚獻立斗上L » 〇 音的特徵值與所述樣本判斷出獲取的聲 不同,則返回到所述=2重 料。 哪1Αυ置新獲取聲音貧料及圖像資 =步驟S140中’若獲取的特徵圖形 问。,不再對圖像進行處理,直接轉到步驟阳圖5像樣本不 形不存若與所述不雅圖像樣本相符的特徵圖 515不5存在㈣,則不再對圖像進行處理,直接轉到步驟 本發明提供的影音編輯系統可以對影音内容令不雅聲 13 201042636 音進行過濾,從而有利於保證影音内容的健康。 另外,本領域技術人員可在本發明精神内做其他變 化,但是,凡依據本發明精神實質所做的變化,都應包含 在本發明所要求保護的範圍之内。 【圖式簡單說明】 圖1為本發明提供的電子設備的硬體架構圖; 圖2為圖1的電子設備的影音編輯方法的流程圖。 〇 【主要組件符號說明】 電子設備 100 影音輸出端 10 樣本記憶體 20 影音記憶體 30 處理器 40 聲音獲取模塊 41 聲音辨識模塊 42 聲音對比模塊 43 聲音編輯模塊 44 圖像獲取模塊 45 時間段模塊 46 圖像辨識模塊 47 圖像對比模塊 48 動作模塊 481 圖像編輯模塊 49 存儲模塊 50 14The voice recognition module 42 is configured to identify the acquired voice. In the present embodiment, the sound rhythm module 42 of the formula is used to determine which segments of the sound data acquired by the acquisition module 41 are voiced segments, and those betas belong to silent segments or background noise. After the search for the sound segment, the sound discrimination module 42 uses the strong wave processing technique to compensate for the accuracy of the identification of the voice signal in the voice signal, and the money is processed by the digital filter bank to process the voice signal. The spectral energy value of the wave is converted into a characteristic value of the sound by using a linear cepstrum number. The sound comparison module 43 is configured to compare the recognized sound with the indecent sound sample in the memory to determine whether there is an indecent sound. In the embodiment, the sound comparison module 43 compares the feature value of the sound acquired by the sound recognition module ^ with the feature value of the indecent sound in the sample memory 2, according to whether the feature values are the same. Determine whether there is an indecent sound in the acquired sound data.声音 The sound editing module 44 is configured to edit the indecent sound when the sound comparison module 43 determines that there is an indecent sound. In the embodiment 7 201042636, if the sound comparison module 43 determines that the feature value of the sound acquired by the sound recognition module 42 is the same as the feature value of the indecent sound in the sample memory 20, the sound editing module 42 will edit the indecent sound. The sound editing module 42 can replace or delete the indecent sound. In the present embodiment, when the sound comparison module 43 determines that there is an indecent speech, the sound editing module 44 deletes the material containing the indecent sound. When the sound editing module 42 needs to replace the indecent sound, the sound editing module 42 should have a replacement sound in advance, such as a "click" sound. When there is an indecent language, the sound editing module 44 replaces the material containing the indecent sound with the replacement sound. Sometimes, scenes in which indecent sounds exist can also have unwanted movements or sights, such as mouths, gestures, and exposed clothing. When an indecent sound is found, the image acquisition module 45, the time period module 46, the image recognition module 47, the image comparison module 48, and the image editing module 49 are used to indecently present the indecent sound. The image is edited. The image acquisition module 45 is configured to acquire an image from the video content. In the embodiment, the image acquisition module 45 acquires image data from the video output terminal 10 when the sound acquisition module 41 acquires a sound. The time period module 46 is configured to acquire a time period in which an indecent sound exists. In this embodiment, the time period module 46 directly uses the voice recognition module 42 to identify a time period in which an indecent sound exists. The image recognition module 47 is configured to acquire a feature graphic in an image. In the embodiment, the image recognition module 47 is for identifying a person's mouth. The image recognition module 47 first uses the face recognition technology to detect a human face from the image obtained from the image 4 201042636. (4) In the area where the face is out of the mouth, for the convenience of processing, the color image can be converted into a grayscale image. Then, according to the adaptive threshold method, the mouth is binarized according to the proportion of the gray scale value. Then, through the morphological closing operation, the close black parts are connected. Finally, use the component labeling method to find the image; all the areas' and after comparison, find the largest area as the characteristic figure of the mouth. The image comparison module 48 is configured to compare the feature graphics acquired by the image recognition module 47 with the image samples stored in the sample memory 2A. In this embodiment, the image comparison module 47 compares the mouth feature pattern acquired by the image recognition module 47 with the mouth sample stored in the sample memory 2〇. If they match, it is determined that there is a mouth in the acquired image. Since the words are spoken by the mouth, when there is indecent language, the mouth will have a opening and closing action. In this embodiment, the image comparison module 48 further includes an action module 481, and the action module 481 is configured to determine whether the recognized mouth is changed when the indecent 5 & In the present embodiment, whether the edge of the mouth determined by the action module 481 is deformed, and when the action block 481 determines that the indecent language of the mouth is deformed, the mouth is edited. Of course, if the gesture, the text, and the like are not easily changed, it is not necessary to judge whether the image is deformed or not, and only the graphic corresponding to the indecent image sample can be found. The image editing module 49 is configured to edit the image according to the comparison result of the image recognition module 47. The image editing module 49 can either delete the image or modify the image. When the image editing module 49 needs to delete the image, the time segment module 46 obtains the time and deletes the image data of the corresponding time. 1 Indecent sound. When the image is modified, it can be pre-stored. The module 49 needs an image editing program to perform image deformation, and the Marseille 2 _' or the use of replacement graphics to cover the indecent graphics. Like time, deformation, mosaic editing, etc. In the manner of adding the mosaic to the mouth part of the image module, the image editing the storage module 50 is used to edit the edited sound and the figure, and the video memory 30 is described. in. ...the image is stored as shown in Fig. 2, which is the actual #古图 of the present invention. The process of video and audio editing provided by the mode cuts the sound of the audio and video material in time. In this embodiment, the image data and the sound data are acquired in the audio output ig of the poem. /Step S115. Extracting the characteristics of the sound from the acquired sound information. In the present embodiment, the sound recognition module 42 acquires the acquisition module 41 by the acquisition module 41 by using the endpoint detection technology. Which sections of sound resource 2 are voiced segments, those are silent segments or background noise. After the field is found, the sound recognition module 42 compensates for the attenuation of the voiced signal in the voice signal to improve the accuracy of the identification, and then uses the digital filter bank to process the voice signal, and each filter is used. The spectral energy value is converted into a characteristic value of the sound by using a linear cepstral coefficient. Step S120: Comparing the characteristics of the acquired sound with the indecent sound samples is the same. In the present embodiment, the sound comparison module 43 compares the feature value of the sound acquired by the sound recognition module 42 of 201042636 with the feature value of the indecent sound in the sample memory 20 to determine the acquired sound data. Whether there is an indecent sound in the middle. Step S125: Edit the sound corresponding to the indecent sound sample according to the comparison result. In this embodiment, if the sound comparison module 43 determines that the feature value of the sound acquired by the sound recognition module 42 is the same as the feature value of the indecent sound in the sample memory 20, the sound editing module 42 Edit the indecent sound. The sound editing module 42 can replace or delete the indecent sound. In the present embodiment, when the sound comparison module 43 determines that there is an indecent language, the sound editing module 44 deletes the material containing the indecent sound. When the sound editing module 42 needs to replace the indecent sound, a replacement sound such as a "click" sound should be pre-stored in the sound editing module 42. When there is an indecent language, the sound editing module 44 replaces the data containing the indecent sound with the replacement sound. ^ Step S130: Obtain a time period of the sound information 〇 that matches the indecent sound sample. In this embodiment, the time period module 46 directly uses the voice recognition module 42 to identify a time period in which an indecent sound exists. Step S135: Acquire a feature graphic of the image in the corresponding time period of the indecent sound. The image recognition module 47 compares the image of the time period in which the indecent sound exists with the indecent image sample to determine whether or not there is an indecent image. In the embodiment, the image recognition module 47 is for identifying a person's mouth. The image recognition module 47 first detects a human face from the image data acquired by the image acquisition module 45 by using a face recognition technology. Then find the area of the mouth on the face of person 11 201042636, for the convenience of processing, you can convert the color image into grayscale image. Then, according to the adaptive threshold method, the mouth is binarized according to the ratio of the gray value. Then, through the morphological closing operation, the black parts that are close to each other are connected. Finally, together with the component labeling method, all the areas in the image are found, and after comparison, the largest area is found as the characteristic figure of the mouth. The image recognition module 47 then compares the region to the mouth sample stored in the sample memory 20. If it matches, it is determined that there is a mouth in the acquired image. Since the words are spoken by the mouth, when the indecent words appear, the mouth often has a opening and closing action. Step S140: Comparing the acquired feature graphic with the indecent image sample. In the present embodiment, the image comparison module 47 stores the mouth feature pattern acquired by the image recognition module 47 and the sample memory to store the mouth sample into the read ratio. If it matches, it is determined that it is in the mouth. τ 〇 (4) S145: Judging whether the feature corresponding to the indecent image sample is deformed. Since the words are spoken by the mouth, when indecent words appear, the mouth 48 = 1^1 is combined. In the present embodiment, the image comparison module outputs a block 481, and the action module is used to determine whether the recognition mouth changes when an indecent language occurs. In the embodiment, the action module 481 determines whether the edge of the mouth is deformed. When the 481 judges that the indecent language of the mouth appears, the deformation is generated. Of course, if the gesture, the text, etc. are not easy to change, it is not necessary to judge whether the image is deformed or not, and it is only necessary to find a graphic that matches the inconspicuous image sample, and this step is not necessary. 12 201042636 y S15G • Edit the image with the above according to the comparison result. The figure is a silly compilation _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The image editing module, *(4), in addition to the image, deletes the image data of the corresponding time according to the time of the memory of the sound acquired by the time period module 46. When the image is edited and the image is to be modified (4), it can be pre-stored. 〇=雅=Image editing program for image deformation, mosaic, etc. When you save the ya 进 && (4) change the graphics to cover the indecent graphics, or will not deform the image, mosaic editing, etc. The image editing module 49 of the present embodiment adds a mosaic to the mouth portion. Step S155. The edited audio and video content is stored. The present embodiment i============================================================================================== 2 heavy materials. Which one is newly acquired sound poor material and image material = in step S140 'If the acquired feature graphic is asked. , the image is no longer processed, directly to the step of the positive image 5, the sample is invisible, if the feature map 515 does not exist in the indecent image sample (4), the image is no longer processed, Going directly to the step The audio-visual editing system provided by the present invention can filter the audio and video content to make the sound of the indecent sound 13 201042636, thereby ensuring the health of the audio-visual content. In addition, those skilled in the art can make other changes within the spirit of the invention, and all changes that are made according to the spirit of the invention should be included in the scope of the invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a hardware architecture diagram of an electronic device provided by the present invention; FIG. 2 is a flow chart of a video and audio editing method of the electronic device of FIG. 〇[Main component symbol description] Electronic device 100 video output terminal 10 sample memory 20 video memory 30 processor 40 sound acquisition module 41 sound recognition module 42 sound comparison module 43 sound editing module 44 image acquisition module 45 time segment module 46 Image recognition module 47 image comparison module 48 action module 481 image editing module 49 storage module 50 14

Claims (1)

201042636 七、申請專利範圍·· ^ ―種料編m其料影音㈣的㈣,所述影音 •士輯系統包括樣本記憶體及處理器,其改進在於,所述樣 己憶體内存儲有不雅聲音樣本,所述處理器包括: 聲曰獲取模塊,用於從所述影音内容中獲取聲音; 聲音辨識模塊,用於將獲取的聲音進行‘別;θ 莫塊’用於將識別出的聲音與所述記憶體内的不 雅革9樣本對比,判斷是否存在不雅聲音; 〇:::輯模塊,用於當所述聲音對比模:判斷出存在不雅 未曰時,將不雅聲音進行編輯。 請專利範圍第1項所述之影音編輯系統,其中’所 體内還存有不雅圖像樣本,所述處理器還包 ,r 取模塊’時間段模塊’圖像辨識模塊、圖像對 1匕=:圖像編輯模塊,所述圖像獲取模塊用於從所= 二=獲取圖像;所述時間段模塊用於獲取存在不雅聲 ❹二=傻Γ圖像辨識模塊用於獲取圖像中的特徵圖 = 像模塊用於將所述圖像辨識模塊獲取的特 樣本記憶體中存儲的圖像樣本進行對比,所 迷圖像編輯模塊用於當所诚 所 圖像時,將不雅圖像進行編輯。、朗判斷出存在不雅 ::二申請專利範圍第1項所述之影音編輯系統,其中,所 述聲曰編輯杈塊内預設有替換聲音,當立 所述聲音編輯模塊對不雅聲立 耳《時, 不雅聲音。 小雅耳曰的編軏是利用替換聲音替換 15 201042636 4述::二專:範圍第1項所述之影音編輯系統,其中,所 對聲音的編輯是將不雅聲音刪除。 述圖像:=第2項所述之影音編輯系統,其中,所 豕、,局輯模塊内預設有替換 ♦ .述圖像編輯模塊像時,所 像。 弭疋利用替換圖覆蓋不雅圖 6♦如申請專利範圍第2項所述之影音編 述圖像編輯模塊對# θ ”為軏糸統,其中,所 〇 7.-種影、^像林疋將不雅圖像刪除。 音樣本,將影音内容中二方去根據預设的不雅聲 輯方法包括以下步^ 行編輯,所述影音編 獲取影音内容中的聲音; ,獲取的聲音與不雅聲音樣本進行對比; 若存在不雅聲音,將不雅聲音進行編輯。 餐㈣W法,其中,在 ❹切編輯时財,對不料音進行㈣_除 :二圍第7項所述之影音編輯方法,其中,所 令的二雅㈣根據預設的不雅圖像樣本,將影音内容 驟.、Ltr編輯,所述影音編輯方法還包括以下步 雅舞立又音内容中的聲音的同時獲取圖像;若存在不 耳曰,獲取不雅聲音所在的時間段 間段的圖像盘不验圖德样m L +雅聲曰所在時 將不雅圖像進打編私圖像樣本進灯對比;若存在不雅圖像, 16 201042636 10.如申請專利範圍第9項所述之影音編輯方法,其中, 在將不雅圖像進行編輯的步驟中,對不雅圖像進行替換或 刪除處理。 '201042636 VII. The scope of application for patents·· ^ —— The material of the material is composed of (4) (4), the audio-visual system includes sample memory and processor, and the improvement is that the sample has no memory stored in the body. The sound sample, the processor includes: a sonar acquisition module for acquiring sound from the audio and video content; and a sound recognition module for performing the sound of the acquired sound; The sound is compared with the indecent leather sample in the memory to determine whether there is an indecent sound; 〇:::The module is used to be indecent when the sound is compared to the mode: it is judged that there is an indecent attempt The sound is edited. Please refer to the audio-visual editing system described in claim 1, wherein the in-vivo still has indecent image samples, and the processor further includes a module, a time-segment module, an image recognition module, and an image pair. 1匕=: image editing module, the image obtaining module is configured to acquire an image from ==== the time period module is used to obtain an indecent sound=two silly image recognition module for acquiring The feature map in the image is used to compare the image samples stored in the special sample memory acquired by the image recognition module, and the image editing module is used when the image is taken Indecent images are edited.朗, judging the existence of indecentness: the application of the audio-visual editing system of claim 1, wherein the sonar editing block is pre-set with a replacement sound, and the sound editing module is indecent When the ear is "indecent, the sound is indecent." The compilation of Xiaoyao is replaced by a replacement sound. 15 201042636 4: 2: The video editing system described in the first item, wherein the editing of the sound is to delete the indecent sound. Description: The video editing system according to item 2, wherein the image module is pre-set with a replacement image in the image module.覆盖Using the replacement map to cover the indecent picture 6♦ The video editing module as described in item 2 of the patent application scope is # θ ”, which is the 〇 . - - - - - -疋Delete the indecent image. The sound sample, the two sides of the audio and video content according to the preset indecent sound recording method includes the following steps: the audio and video encoding to obtain the sound in the audio and video content; Inconsistent sound samples are compared; if there is an indecent sound, the indecent sound is edited. Meal (4) W method, in which, in the editing of the time, the fortune is performed (4) _ except: the video of the seventh item The editing method, wherein the ordered second (4) edits the audio and video content according to the preset indecent image sample, and the LTR editing method further includes the following steps: Obtain an image; if there is no deafness, the image disc of the period of time during which the indecent sound is located is not inspected, and the image of the indecent image is entered into the private image sample. Light contrast; if there is an indecent image, 16 201042636 10. Video editing method of claim 9 of patent applications range, wherein, in the step of editing the image indecent, or replacement of indecent image deletion processing. ' 11.—種電子設備,該電子設備包括可以輸出影音資料的影 音輸出端,其改進在於,所述電子設備還包括樣本記憶體 及處理器,所述樣本記憶體内存儲有不雅聲音樣本,所述 處理器與所述影音輸出端電連接,所述處理器包括: 聲音獲取模塊,用於從所述影音内容中獲取聲音; 聲音辨識模塊,用於將獲取的聲音進行識別; 聲音對比模塊,用於將識別出的聲音與所述 的不雅聲音樣本料,㈣是否存在㈣聲音L體内 聲音編輯模塊,用於當所述聲音料模塊判斷出存在不雅 走曰時’將不雅聲音進行編輯。 如申請專利範圍第n項所述之電子設備,其中,該電 子設備是攝像設備或影音播放設備。 ❹ 13·如中請專利範圍第u項所述之電子設備,其中,所述 己憶體内還存有不雅圖像樣本,所述處理11還包括圖 1=模塊,時間段模塊’圖像辨識模塊、圖像對比模塊 圖像編輯模塊’所述圖像獲取模塊用於從所述影音内容 獲取圖像;所科間段模制於獲取存在不雅聲音的時 所述圖像辨識模塊用於獲取圖像中的特徵圖形;所 :戶 莫塊用於將所述圖像辨識模塊獲取的特徵圖形 總短,憶財存㈣圖像樣本進行對比,所述圖像 吴塊用於當所述圖像對比模塊判斷出存在不雅圖像 17 201042636 時,將不雅圖像進行編輯。11. An electronic device, comprising: a video output capable of outputting audio and video data, wherein the electronic device further comprises a sample memory and a processor, wherein the sample memory stores an indecent sound sample. The processor is electrically connected to the audio and video output end, the processor includes: a sound acquiring module, configured to obtain sound from the audio and video content; a sound recognition module, configured to identify the acquired sound; and a sound comparison module For the sound to be recognized and the indecent sound sample, (4) whether there is (4) sound L in-vivo sound editing module, for when the sound material module determines that there is indecent walking, 'will be indecent The sound is edited. The electronic device of claim n, wherein the electronic device is an imaging device or a video playback device. The electronic device of claim 5, wherein the invisible image sample is still present in the memory, and the processing 11 further includes a module of the time segment module. The image recognition module is configured to acquire an image from the audio and video content, and the image recognition module is configured to acquire an indecent sound. For acquiring a feature graphic in an image; the user module is used to compare the feature graphic acquired by the image recognition module, and the image sample is compared, and the image block is used for When the image comparison module determines that there is an indecent image 17 201042636, the indecent image is edited. 1818
TW98117143A 2009-05-22 2009-05-22 Video and audio editing system, method and electronic device using same TWI385646B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98117143A TWI385646B (en) 2009-05-22 2009-05-22 Video and audio editing system, method and electronic device using same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98117143A TWI385646B (en) 2009-05-22 2009-05-22 Video and audio editing system, method and electronic device using same

Publications (2)

Publication Number Publication Date
TW201042636A true TW201042636A (en) 2010-12-01
TWI385646B TWI385646B (en) 2013-02-11

Family

ID=45000643

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98117143A TWI385646B (en) 2009-05-22 2009-05-22 Video and audio editing system, method and electronic device using same

Country Status (1)

Country Link
TW (1) TWI385646B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11133849A (en) * 1997-10-31 1999-05-21 Nippon Columbia Co Ltd Speech counting device
TWI238379B (en) * 2001-11-16 2005-08-21 Inventec Besta Co Ltd System and method for language reiterating and correcting pronunciation in a portable electronic device
US20050254782A1 (en) * 2004-05-14 2005-11-17 Shu-Fang Hsu Method and device of editing video data

Also Published As

Publication number Publication date
TWI385646B (en) 2013-02-11

Similar Documents

Publication Publication Date Title
CN110853646B (en) Methods, devices, equipment and readable storage media for distinguishing conference speaking roles
Czyzewski et al. An audio-visual corpus for multimodal automatic speech recognition
CN109446876B (en) Sign language information processing method and device, electronic equipment and readable storage medium
KR20140114238A (en) Method for generating and displaying image coupled audio
CN108874356A (en) Voice broadcasting method and device, mobile terminal and storage medium
CN105512348A (en) Method and device for processing videos and related audios and retrieving method and device
CN105302315A (en) Image processing method and device
WO2005069171A1 (en) Document correlation device and document correlation method
CN108242238B (en) A method and device for generating audio files, and terminal equipment
JP2016502157A (en) Lip shape changing device and method based on automatic word translation
CN112232276B (en) An emotion detection method and device based on speech recognition and image recognition
CN104298694A (en) Picture message adding method and device and mobile terminal
CN113948076B (en) Voice interaction method, device and system
CN110019848A (en) Conversation interaction method and device and robot
US11157549B2 (en) Emotional experience metadata on recorded images
TW201626364A (en) System and method for recovering missed voice automatically
WO2024093460A1 (en) Voice detection method and related device thereof
CN110491384B (en) Voice data processing method and device
CN101877223A (en) Video and audio editing system, method and electronic equipment with the video and audio editing system
JP5320913B2 (en) Imaging apparatus and keyword creation program
CN111147914A (en) Video processing method, storage medium and electronic equipment
CN116916089B (en) Intelligent video editing method integrating voice features and face features
CN115914742B (en) Character recognition method, device and equipment for video captions and storage medium
CN108334806B (en) Image processing method, device and electronic device
CN105208283A (en) Method and device for voice-activated photographing

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees