TWI913240B

TWI913240B - System of intelligently adjust mixing audio and method of intelligently adjust mixing audio

Info

Publication number: TWI913240B
Application number: TW110102178A
Authority: TW
Inventors: 王明昌
Original assignee: 圓剛科技股份有限公司
Filing date: 2021-01-20
Publication date: 2026-02-01

Abstract

A system of intelligently adjust mixing audio includes a class identification module, an attribute computation module, and an adjusting module. The class identification module is configured to identify an audio class of audio signals. The attribute computation module is coupled to the class identification module so as to determine a primary audio signal of the audio signals by determining whose audio class satisfies a primary audio class. The adjusting module is coupled to the attribute computation module. The attribute computation module adjusts an attribute of the audio signals and the primary audio signal, and then the signals are audio mixed to ouput.

Description

Intelligent audio adjustment and mixing system and intelligent audio adjustment and mixing method

本案是有關於一種音頻調整技術，且特別是有關於一種音頻調整與其混音的技術。This case relates to an audio adjustment technique, and more particularly to an audio adjustment and mixing technique.

在多音源的多媒體環境中，多個音源混合輸出時，使用者通常需要凸顯特定的音源，讓聆聽者可以清楚接收到主題。習知方法是先設定特定音源頻道(channel)來做為主要音源。然而，這樣的方法相當不彈性，尤其是當播放主題改變的時候，需要更改指定為主要音源的音源頻道。若漏未更改音源頻道，這導致在混音時強化了錯誤的頻道。據此，如何更彈性地判定主音源及次音源係本技術領域亟需解決的技術問題。In a multi-source multimedia environment, when multiple audio sources are mixed and output, users often need to highlight specific audio sources so that listeners can clearly receive the theme. The conventional method is to first set a specific audio source channel as the primary audio source. However, this method is quite inflexible, especially when the theme changes, requiring the designated primary audio source channel to be changed. If the audio source channel is not changed, this leads to the amplification of the wrong channel during mixing. Therefore, how to more flexibly determine the primary and secondary audio sources is a technical problem that urgently needs to be solved in this field.

發明內容旨在提供本揭示內容的簡化摘要，以使閱讀者對本案內容具備基本的理解。此發明內容並非本揭示內容的完整概述，且其用意並非在指出本案實施例的重要/關鍵元件或界定本案的範圍。The invention is intended to provide a simplified summary of this disclosure to enable the reader to have a basic understanding of the subject matter. This invention is not a complete overview of this disclosure and is not intended to identify key/critical elements of the embodiments of this case or to define the scope of this case.

根據本案之一實施例，揭示一種智能音頻調整混音系統，適用以處理複數個音頻訊號。智能音頻調整混音系統包括類別辨識模組、屬性計算模組以及調整模組。類別辨識模組經配置以分別辨識各該音頻訊號的而得到對應一音頻類別。屬性計算模組耦接於類別辨識模組，其中屬性計算模組從音頻訊號判斷音頻類別符合主要音頻類別的一主要音頻訊號。調整模組耦接於屬性計算模組，其中調整模組經分別對各該音頻訊號或/及主要音頻訊號的一音頻屬性進行調整後，混音輸出。According to one embodiment of this case, an intelligent audio adjustment and mixing system is disclosed, suitable for processing a plurality of audio signals. The intelligent audio adjustment and mixing system includes a category identification module, an attribute calculation module, and an adjustment module. The category identification module is configured to identify each audio signal to obtain a corresponding audio category. The attribute calculation module is coupled to the category identification module, wherein the attribute calculation module determines from the audio signals that the audio category matches a primary audio signal of a primary audio category. The adjustment module is coupled to the attribute calculation module, wherein the adjustment module adjusts an audio attribute of each audio signal and/or the primary audio signal respectively before mixing and outputting the audio.

根據另一實施例，揭示一種智能音頻調整混音方法，包括：接收複數個音頻訊號；分別辨識各該音頻訊號的而得到對應一音頻類別；從音頻訊號判斷音頻類別符合主要音頻類別的一主要音頻訊號；以及分別對音頻訊號及/或主要音頻訊號的一音頻屬性進行調整後，混音輸出。According to another embodiment, an intelligent audio adjustment and mixing method is disclosed, comprising: receiving a plurality of audio signals; identifying each audio signal to obtain a corresponding audio category; determining from the audio signals that the audio category matches a primary audio signal of the primary audio category; and adjusting an audio attribute of the audio signals and/or the primary audio signal respectively, and then mixing and outputting the audio signal.

根據另一實施例，揭示一種智能音頻調整混音方法，包括：接收至少兩個音頻訊號；辨識該兩個音頻訊號中的至少其中一個訊號，得到對應的一音頻類別；依據該音頻類別，來對該兩個音頻訊號中的至少其中一個訊號進行一音頻屬性進行調整後，混音輸出。According to another embodiment, an intelligent audio adjustment and mixing method is disclosed, comprising: receiving at least two audio signals; identifying at least one of the two audio signals to obtain a corresponding audio category; adjusting an audio attribute of at least one of the two audio signals according to the audio category, and then mixing and outputting the audio.

以下揭示內容提供許多不同實施例，以便實施本案之不同特徵。下文描述元件及排列之實施例以簡化本案。當然，該些實施例僅為示例性且並不欲為限制性。舉例而言，本案中使用「第一」、「第二」等用語描述元件，僅是用以區別以相同或相似的元件或操作，該用語並非用以限定本案的技術元件，亦非用以限定操作的次序或順位。另外，本案可在各實施例中重複元件符號及/或字母，並且相同的技術用語可使用相同及/或相應的元件符號於各實施例。此重複係出於簡明性及清晰之目的，且本身並不指示所論述之各實施例及/或配置之間的關係。The following disclosure provides numerous different embodiments to implement the various features of this application. Examples of elements and their arrangements are described below to simplify this application. Of course, these embodiments are merely illustrative and not intended to be restrictive. For example, the use of terms such as "first" and "second" to describe elements in this application is solely for distinguishing between identical or similar elements or operations; these terms are not intended to limit the technical elements of this application, nor to limit the order or sequence of operations. Furthermore, element symbols and/or letters may be repeated in various embodiments of this application, and the same technical terms may use the same and/or corresponding element symbols in various embodiments. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or configurations discussed.

本案實施例中，揭示一種智能音頻調整混音方法包括至少下列步驟，其中，步驟一接收至少兩個音頻訊號。步驟二：辨識該兩個音頻訊號中的至少其中一個訊號得到對應的一音頻類別。步驟三：依據該音頻類別來對該兩個音頻訊號中的至少其中一個訊號進行一音頻屬性進行調整後，混音輸出。This embodiment discloses an intelligent audio adjustment and mixing method comprising at least the following steps: Step 1: receiving at least two audio signals; Step 2: identifying at least one of the two audio signals to obtain a corresponding audio category; Step 3: adjusting the audio attribute of at least one of the two audio signals according to the audio category, and then outputting the mixed audio.

在上述實施例中，步驟二可以依據實際需求而有不同的變化態樣，例如可以是辨識這兩個音頻訊號中的某一個訊號被辨識而得到對應的音頻類別，或者兩個音頻訊號都辨識，這兩個音頻訊號皆被得到對應的音頻類別，當然音頻訊號也可以是多個，在此不設限。In the above embodiments, step two can vary depending on actual needs. For example, it can be that one of the two audio signals is identified and its corresponding audio category is obtained, or both audio signals are identified and their corresponding audio categories are obtained. Of course, there can be multiple audio signals, and there is no limitation here.

另外，步驟三也可以依據實際需求而有不同的變化態樣，例如，在步驟二中當第一音頻訊號被辨識對應一第一音頻類別時，步驟三可以依據第一音頻訊號對應的第一音頻類別，對第一音頻訊號之至少一音頻屬性進行調整，當然在前面的狀況也可以是依據第一音頻訊號對應的第一音頻類別，對其他的音頻訊號(例如可以是指第二音頻訊號)的至少一音頻屬性進行調整。又例如，在步驟二中當第一音頻訊號被辨識對應一第一音頻類別，且第二音頻訊號被辨識對應一第二音頻類別時，步驟三可以依據第一音頻訊號對應的第一音頻類別，對第一音頻訊號之至少一音頻屬性進行調整，且可以依據第二音頻訊號對應的第二音頻類別，對第二音頻訊號之至少一音頻屬性進行調整，而其中，第一音頻訊號所調整的音頻屬性與第二音頻訊號所調整的音頻屬性可以是相同或者不同。再者，在前述調整音頻屬性之後，就可以進行混音，為了更清楚的說明本發明的細節，以下參照第1圖及第2圖，舉一些更具體的實施例。本案實施例中，假設兩個音頻訊號皆被調整，則將被調整的兩個音頻訊號混音後輸出，假設一個音頻訊號皆被調整，一個未被調整，則將被調整的音頻訊號與未被調整的音頻訊號混音後輸出。In addition, step three can also have different variations depending on actual needs. For example, in step two, when the first audio signal is identified as corresponding to a first audio category, step three can adjust at least one audio attribute of the first audio signal according to the first audio category it corresponds to. Of course, in the previous case, it is also possible to adjust at least one audio attribute of other audio signals (for example, the second audio signal) according to the first audio category it corresponds to. For example, in step two, when the first audio signal is identified as corresponding to a first audio category and the second audio signal is identified as corresponding to a second audio category, step three can adjust at least one audio attribute of the first audio signal according to the first audio category corresponding to the first audio signal, and can adjust at least one audio attribute of the second audio signal according to the second audio category corresponding to the second audio signal. The audio attribute adjusted for the first audio signal and the audio attribute adjusted for the second audio signal can be the same or different. Furthermore, after adjusting the audio attributes as described above, mixing can be performed. To illustrate the details of this invention more clearly, some more concrete embodiments are given below with reference to Figures 1 and 2. In this embodiment, assuming both audio signals are adjusted, the two adjusted audio signals are mixed and output. assuming one audio signal is adjusted and the other is not adjusted, the adjusted audio signal is mixed with the unadjusted audio signal and output.

請參照第1圖，其繪示根據本案一些實施例中一種智能音頻調整混音系統100的示意圖。智能音頻調整混音系統100例如是混音器、搭載混音功能的裝置或軟體、串接混音器的可執行混音功能的裝置或軟體、或任何可執行混音功能的電子裝置或軟體，在此不受限。智能音頻調整混音系統100包括音頻訊號接收模組110a~110d、類別辨識模組120a~120d、調整模組130a~130d、屬性計算模組140，以及音頻輸出模組150。如第1圖所示，音頻訊號接收模組110a耦接於類別辨識模組120a以及調整模組130a、音頻訊號接收模組110b耦接於類別辨識模組120b以及調整模組130b、音頻訊號接收模組110c耦接於類別辨識模組120c以及調整模組130c，以及音頻訊號接收模組110d耦接於類別辨識模組120d以及調整模組130d。類別辨識模組120a~120d耦接於屬性計算模組140。屬性計算模組140耦接於調整模組130a~130d。調整模組130a~130d耦接於音頻輸出模組150。Please refer to Figure 1, which illustrates a schematic diagram of an intelligent audio adjustment and mixing system 100 according to some embodiments of this invention. The intelligent audio adjustment and mixing system 100 may be, for example, a mixer, a device or software equipped with mixing functions, a device or software capable of performing mixing functions connected in series with a mixer, or any electronic device or software capable of performing mixing functions, without limitation herein. The intelligent audio adjustment and mixing system 100 includes audio signal receiving modules 110a-110d, category identification modules 120a-120d, adjustment modules 130a-130d, attribute calculation module 140, and audio output module 150. As shown in Figure 1, audio signal receiving module 110a is coupled to category identification module 120a and adjustment module 130a; audio signal receiving module 110b is coupled to category identification module 120b and adjustment module 130b; audio signal receiving module 110c is coupled to category identification module 120c and adjustment module 130c; and audio signal receiving module 110d is coupled to category identification module 120d and adjustment module 130d. Category identification modules 120a-120d are coupled to attribute calculation module 140. Attribute calculation module 140 is coupled to adjustment modules 130a-130d. Adjustment modules 130a~130d are coupled to audio output module 150.

於一些實施例中，音頻訊號接收模組110a~110d用以接收各種來源的音頻訊號。舉例而言，音頻來源可以為透過音頻連接端子(未圖示)輸入的音頻訊號、音頻擷取模組(未繪示)所擷取的音頻訊號、影片中的音頻訊號等，在此不受限。音頻訊號接收模組110a~110d可以是同時收到音頻訊號，分別處理這些音頻訊號之後，被傳送至音頻輸出模組150以混音輸出。In some embodiments, audio signal receiving modules 110a-110d are used to receive audio signals from various sources. For example, the audio source can be an audio signal input through an audio connection terminal (not shown), an audio signal captured by an audio capture module (not shown), an audio signal from a video, etc., and is not limited thereto. The audio signal receiving modules 110a-110d can receive audio signals simultaneously, process these audio signals separately, and then transmit them to the audio output module 150 for mixing and output.

於一些實施例中，音頻訊號接收模組110a將音頻訊號傳送至類別辨識模組120a。類別辨識模組120a用以辨識音頻訊號而得到對應的音頻類別。舉例而言，音頻類別可以是人聲、樂器聲、噪音等。類別辨識模組120a對於辨識出來的音頻類別之資料，傳送至屬性計算模組140。相似地，音頻訊號接收模組110b將音頻訊號傳送至類別辨識模組120b、音頻訊號接收模組110c將音頻訊號傳送至類別辨識模組120c，以及音頻訊號接收模組110d將音頻訊號傳送至類別辨識模組120d，類別辨識模組120b~120d分別對於辨識出來的音頻類別之資料，傳送至屬性計算模組140。舉例而言，類別辨識模組120a辨識出音頻訊號接收模組110a的音頻訊號是一樂器聲或音樂聲。類別辨識模組120b辨識出音頻訊號接收模組110b的音頻訊號是一人聲。類別辨識模組120c辨識出音頻訊號接收模組110c的音頻訊號是一環境背景聲。類別辨識模組120d辨識出音頻訊號接收模組110d的音頻訊號是一噪音。另外，類別辨識模組120a~120d辨識音頻訊號的方式可透過聲音辨識技術來實現。In some embodiments, the audio signal receiving module 110a transmits the audio signal to the category identification module 120a. The category identification module 120a identifies the audio signal to obtain the corresponding audio category. For example, the audio category can be human voice, instrumental sound, noise, etc. The category identification module 120a transmits the identified audio category data to the attribute calculation module 140. Similarly, audio signal receiving module 110b transmits the audio signal to category identification module 120b, audio signal receiving module 110c transmits the audio signal to category identification module 120c, and audio signal receiving module 110d transmits the audio signal to category identification module 120d. Category identification modules 120b-120d respectively transmit the identified audio category data to attribute calculation module 140. For example, category identification module 120a identifies the audio signal from audio signal receiving module 110a as an instrument sound or a musical sound. Category identification module 120b identifies the audio signal from audio signal receiving module 110b as a human voice. Category identification module 120c identifies the audio signal from audio signal receiving module 110c as ambient background noise. Category identification module 120d identifies the audio signal from audio signal receiving module 110d as noise. Furthermore, the audio signal identification by category identification modules 120a-120d can be achieved using sound recognition technology.

於一些實施例中，屬性計算模組140可以預先設定一主要音頻類別，以從音頻訊號中判斷音頻類別符合主要音頻類別的一主要音頻訊號。舉例而言，若使用者希望可以對人聲的音頻訊號進行處理，屬性計算模組140可以將屬於人聲的音頻訊號設定為主要音頻類別。值得一提的是，使用者可以根據實際需求，設定其他類別作為主要音頻類別。主要音頻類別除了可以預設外，亦可是自動產生：例子(1)，主要音頻類別可以藉由即時分析音頻訊號自動即時產生，例如依據辨識出的各音頻訊號所屬類別，藉由例如內建資料庫比對、雲端分析、類神經網路運算等運算分析，找出對應的場景(例如音樂演奏、歌唱表演、會議等)，依據該場景產生對應的主要音頻類別。另外，主要音頻類別自動產生的例子(2)來說，使用者可以根據實際需求，針對各音頻類別分別設置權重或優先級別，自動產生主要音頻類別，例如設置人聲為第一級別(最高級別)，樂器聲或音樂聲為第二級別(次級別)，環境背景聲為第三級別(次次級別)等等，舉例來說，當從音頻訊號中判斷有人聲，將此音頻訊號(在此不限一個)設定為主要音頻訊號時，舉另一例來說，當從音頻訊號中判斷有樂器聲或音樂聲而沒有人聲時，將此音頻訊號(在此不限一個)設定為主要音頻訊號。In some embodiments, the attribute calculation module 140 can pre-set a primary audio category to identify a primary audio signal whose category matches the primary audio category from the audio signals. For example, if the user wants to process human voice audio signals, the attribute calculation module 140 can set audio signals belonging to human voice as the primary audio category. It is worth mentioning that the user can set other categories as the primary audio category according to actual needs. In addition to being preset, the main audio categories can also be generated automatically: For example (1), the main audio categories can be generated automatically in real time by analyzing audio signals. For example, based on the category to which each audio signal belongs, the corresponding scene (such as music performance, singing performance, meeting, etc.) can be found through computational analysis such as built-in database comparison, cloud analysis, and neural network-like computation, and the corresponding main audio category can be generated based on the scene. In addition, regarding the example of automatic generation of main audio categories (2), users can set weights or priorities for each audio category according to their actual needs, and automatically generate the main audio category. For example, set human voice as the first level (highest level), instrumental or musical sounds as the second level (secondary level), ambient background sounds as the third level (secondary level), etc. For example, when it is determined from the audio signal that there is human voice, set this audio signal (not limited to one) as the main audio signal. For another example, when it is determined from the audio signal that there is instrumental or musical sound but no human voice, set this audio signal (not limited to one) as the main audio signal.

於本實施例中，屬性計算模組140從此四個音頻訊號中判定音頻訊號接收模組110b的音頻訊號屬於人聲之類別，因此將音頻訊號接收模組110b的音頻訊號設定為主要音頻訊號。接著，屬性計算模組140根據此判定結果產生一控制訊號(以下為了方便說明，以控制訊號為調整指令為例，但不以此為限)，並回傳至調整模組130b。於另一實施例中，屬性計算模組140判定其他的音頻訊號為次要音頻訊號，並回傳調整指令至對應的調整模組130a、130c~130d。此調整指令是對應於各自的音頻訊號的類別所產生，以對不同音頻訊號的類別作對應的音頻屬性處理。於一些實施例中，可以僅對主要音頻訊號的接收模組回傳控制訊號(調整指令)、僅對非主要音頻訊號的接收模組回傳控制訊號(調整指令)，或對所有音頻訊號的接收模組回傳控制訊號(調整指令)。於另一實施例中，主要音頻訊號也可以是二個以上，例如屬性計算模組140從此四個音頻訊號中，判定音頻訊號接收模組110a、110b的音頻訊號屬於人聲之類別，判定音頻訊號接收模組110c、110d的音頻訊號分別屬於環境背景聲、噪音之類別，因此將音頻訊號接收模組110a、110b的音頻訊號設定為主要音頻訊號。於一些實施例中，調整模組130a~130d接收對應的調整指令，使用對應的調整指令分別對音頻訊號或/及主要音頻訊號的音頻屬性進行處理。於一些實施例中，可以僅對主要音頻訊號的音頻屬性進行處理、僅對非主要音頻訊號的音頻屬性進行處理，或對所有音頻訊號的音頻屬性進行處理。音頻訊號的音頻屬性可以是音量或響度或組合等，在此不受限。舉例而言，屬性計算模組140判定音頻訊號接收模組110b的音頻訊號(人聲)是主要音頻類別。因此，調整模組130b接收到的調整指令是可以對人聲的音頻訊號之音量進行放大，以在後續混音之後輸出音頻訊號可凸顯人聲。再另一舉例，屬性計算模組140判定音頻訊號接收模組110d的音頻訊號是噪音。因此，調整模組130d接收到的調整指令是降低噪音之音量，以降低噪音在後續混音之後輸出音頻訊號之影響。再另一舉例，可以調整至少一音頻訊號的不同音頻屬性，例如屬性計算模組140判定音頻訊號接收模組110b的音頻訊號(人聲)是主要音頻類別，音頻訊號接收模組110d的音頻訊號是噪音。因此，調整模組130b接收到的調整指令是可以對人聲的音頻訊號之音量進行放大，而調整模組130d接收到的調整指令是降低噪音之響度，以在後續混音之後輸出音頻訊號可凸顯人聲、降低噪音之影響。In this embodiment, the attribute calculation module 140 determines from the four audio signals that the audio signal of the audio signal receiving module 110b belongs to the category of human voice, and therefore sets the audio signal of the audio signal receiving module 110b as the primary audio signal. Then, the attribute calculation module 140 generates a control signal based on this determination result (hereinafter, for ease of explanation, the control signal is exemplified as an adjustment command, but this is not a limitation), and sends it back to the adjustment module 130b. In another embodiment, the attribute calculation module 140 determines that the other audio signals are secondary audio signals and sends adjustment commands back to the corresponding adjustment modules 130a, 130c~130d. This adjustment command is generated corresponding to the category of each audio signal to perform corresponding audio attribute processing on different audio signal categories. In some embodiments, control signals (adjustment commands) can be returned only to the receiving module of the main audio signal, only to the receiving module of the non-main audio signal, or to the receiving module of all audio signals. In another embodiment, there may be two or more primary audio signals. For example, attribute calculation module 140 determines from these four audio signals that the audio signals of audio signal receiving modules 110a and 110b belong to the category of human voice, and determines that the audio signals of audio signal receiving modules 110c and 110d belong to the categories of ambient background sound and noise, respectively. Therefore, the audio signals of audio signal receiving modules 110a and 110b are set as the primary audio signals. In some embodiments, adjustment modules 130a to 130d receive corresponding adjustment commands and use the corresponding adjustment commands to process the audio attributes of the audio signals and/or the primary audio signals, respectively. In some embodiments, the audio attributes of the primary audio signal may be processed only, the audio attributes of the non-primary audio signal may be processed only, or the audio attributes of all audio signals may be processed. The audio attributes of the audio signal can be volume, loudness, or a combination thereof, and are not limited here. For example, the attribute calculation module 140 determines that the audio signal (voice) of the audio signal receiving module 110b is the primary audio category. Therefore, the adjustment instruction received by the adjustment module 130b can amplify the volume of the voice audio signal so that the output audio signal after subsequent mixing can highlight the voice. For another example, attribute calculation module 140 determines that the audio signal from audio signal receiving module 110d is noise. Therefore, the adjustment instruction received by adjustment module 130d is to reduce the volume of the noise to reduce its impact on the output audio signal after subsequent mixing. For yet another example, different audio attributes of at least one audio signal can be adjusted. For instance, attribute calculation module 140 determines that the audio signal (voice) from audio signal receiving module 110b is the primary audio category, and the audio signal from audio signal receiving module 110d is noise. Therefore, the adjustment command received by the adjustment module 130b can amplify the volume of the human voice audio signal, while the adjustment command received by the adjustment module 130d can reduce the noise level so that the output audio signal after subsequent mixing can highlight the human voice and reduce the impact of noise.

於一些實施例中，音頻輸出模組150用以接收被調整音頻屬性後之音頻訊號，對這些經調整音頻屬性之音頻訊號進行混音之後，輸出一混音訊號。於一些實施例中，音頻輸出模組150也可以用以接收至少一被調整音頻屬性後之音頻訊號和至少一沒被調整音頻屬性之音頻訊號，對接收之音頻訊號進行混音之後，輸出混音訊號，因此，音頻輸出模組150輸出的混音訊號，可以由部分或全部被調整音頻屬性後之音頻訊號以及由部分或全部未被調整音頻屬性之音頻訊號所混合而成，實際狀況可依實際產品需求而有所不同。In some embodiments, the audio output module 150 is used to receive audio signals with adjusted audio attributes, mix these audio signals with adjusted audio attributes, and output a mixed signal. In some embodiments, the audio output module 150 can also be used to receive at least one audio signal with adjusted audio attributes and at least one audio signal without adjusted audio attributes, mix the received audio signals, and then output a mixed signal. Therefore, the mixed signal output by the audio output module 150 can be composed of part or all of the audio signals with adjusted audio attributes and part or all of the audio signals without adjusted audio attributes. The actual situation may vary depending on the actual product requirements.

於一些實施例中，類別辨識模組120a~120d可能無法從這些音頻訊號中辨識出所屬音頻類別，則調整模組130a~130d會直接將未知類別音頻訊號輸出至音頻輸出模組150，而不對此未知類別音頻訊號進行處理。In some embodiments, the category identification modules 120a-120d may be unable to identify the audio category from these audio signals. In such cases, the adjustment modules 130a-130d will directly output the audio signal of the unknown category to the audio output module 150 without processing the audio signal of the unknown category.

於一些實施例中，屬性計算模組140會預先設定各音頻屬性對應的優先權，以根據此優先權來對應各音頻屬性的音頻來產生調整指令。舉例而言，主要音頻訊號具有最高的優先權。因此，主要音頻訊號的音量、響度等音頻屬性會優先被調整為所有音頻訊號中的較大值。In some embodiments, the attribute calculation module 140 pre-sets the priority of each audio attribute, and generates adjustment instructions for each audio attribute's audio based on this priority. For example, the primary audio signal has the highest priority. Therefore, the volume, loudness, and other audio attributes of the primary audio signal are preferentially adjusted to the largest value among all audio signals.

如此一來，使用者可以隨意地根據實際的需求，設定對應的主要音頻類別，以針對欲強調的音頻訊號進行音量、響度等音頻屬性之強化，對於會造成干擾的音頻訊號進行弱化，讓整體輸出音源的主題辨識度上升。In this way, users can freely set the corresponding main audio category according to their actual needs, and strengthen the audio attributes such as volume and loudness of the audio signals they want to emphasize, while weakening the audio signals that may cause interference, thereby increasing the overall thematic recognition of the output audio source.

請參照第2圖，其繪示根據本案一些實施例中一種智能音頻調整混音方法200的流程圖。智能音頻調整混音方法200可由第1圖的智能音頻調整混音系統100所執行。Please refer to Figure 2, which illustrates a flowchart of an intelligent audio adjustment and mixing method 200 according to some embodiments of this invention. The intelligent audio adjustment and mixing method 200 can be performed by the intelligent audio adjustment and mixing system 100 of Figure 1.

如步驟S210，預先設定一主要音頻類別。於一些實施例中，音頻類別可以是複數種不同的類別(例如人聲、樂器聲、噪音等)，而從這些不同類別當中，設定一個類別來作為主要音頻類別。使用者可以根據實際的應用場合，來設定要凸顯音頻的類別。舉例而言，預先設定人聲為主要音頻類別。For example, in step S210, a primary audio category is preset. In some embodiments, the audio category can be multiple different categories (e.g., vocals, instrumental sounds, noise, etc.), and one category is set as the primary audio category from these different categories. Users can set the category to highlight the audio according to the actual application. For example, vocals can be preset as the primary audio category.

如步驟S220，接收複數個音頻訊號。舉例而言，音頻訊號可以是音頻擷取模組或多媒體檔案的音頻訊號，本案不限於音頻訊號的來源。For example, in step S220, a plurality of audio signals are received. For instance, the audio signals may be audio signals from an audio capture module or a multimedia file, and this application is not limited to the source of the audio signals.

如步驟S230，分別辨識該些音頻訊號的複數個音頻類別。舉例而言，類別辨識模組120a辨識出音頻訊號接收模組110a的音頻訊號是一樂器聲或音樂聲。類別辨識模組120b辨識出音頻訊號接收模組110b的音頻訊號是一人聲。類別辨識模組120c辨識出音頻訊號接收模組110c的音頻訊號是一環境背景聲。類別辨識模組120d辨識出音頻訊號接收模組110d的音頻訊號是一噪音。In step S230, multiple audio categories of the audio signals are identified. For example, category identification module 120a identifies the audio signal of audio signal receiving module 110a as an instrumental sound or musical sound. Category identification module 120b identifies the audio signal of audio signal receiving module 110b as a human voice. Category identification module 120c identifies the audio signal of audio signal receiving module 110c as an ambient background sound. Category identification module 120d identifies the audio signal of audio signal receiving module 110d as noise.

如步驟S240，從該些音頻訊號判斷該音頻類別符合主要音頻類別的一主要音頻訊號。舉例而言，於步驟S230已辨識出個音頻訊號所屬的音頻類別，因而可以分別判斷各音頻類別是否為主要音頻類別。於此實施例中，音頻訊號接收模組110b的音頻訊號是人聲，屬於主要音頻訊號。其他的音頻訊號可被判定為次要音頻訊號。For example, in step S240, the audio signal category is determined to be a primary audio signal belonging to the primary audio category. For instance, in step S230, the audio category of each audio signal has been identified, and therefore it can be determined whether each audio category is a primary audio category. In this embodiment, the audio signal from the audio signal receiving module 110b is human voice, which is a primary audio signal. Other audio signals can be determined as secondary audio signals.

如步驟S250，判斷主要音頻訊號是否受其他音頻訊號干擾。於一些實施例中，屬性計算模組140可以對所有接收到的音頻訊號之音量及/或響度來偵測主要音頻訊號的受干擾程度。舉例而言，次要音頻訊號音量及/或響度與主要音頻訊號的音量及/或響度相近。在整體混音輸出之後，主要音頻訊號會被遮蔽，例如遮蔽效應的問題，導致使用者無法清楚聆聽到主要音頻訊號的內容。若判定主要音頻訊號受其他音頻訊號的干擾，則執行步驟S260。For example, in step S250, it is determined whether the primary audio signal is interfered with by other audio signals. In some embodiments, the attribute calculation module 140 can detect the degree of interference of the primary audio signal by measuring the volume and/or loudness of all received audio signals. For example, the volume and/or loudness of the secondary audio signals may be similar to that of the primary audio signal. After the overall mix output, the primary audio signal may be masked, for example, due to masking effects, making it impossible for the user to clearly hear the content of the primary audio signal. If it is determined that the primary audio signal is interfered with by other audio signals, then step S260 is executed.

如步驟S260，使用調整指令來分別對各音頻訊號及主要音頻訊號的一音頻屬性進行調整。舉例而言，提升主要音頻訊號的音量及/或響度，及/或降低次要音頻訊號的音量及/或響度。In step S260, adjustment commands are used to adjust the audio attributes of each audio signal and the primary audio signal respectively. For example, the volume and/or loudness of the primary audio signal is increased, and/or the volume and/or loudness of the secondary audio signal is decreased.

如步驟S270，輸出音頻訊號至一音頻輸出模組。舉例而言，這些被調整音頻屬性的音頻訊號會被傳送至音頻輸出模組進行混音輸出。As in step S270, an audio signal is output to an audio output module. For example, these audio signals with adjusted audio properties are sent to the audio output module for mixing and output.

於一些實施例中，若判定主要音頻訊號沒有受其他音頻訊號的干擾，則回到步驟S270。換言之，若主要音頻訊號沒有受其他音頻訊號的干擾，可以不需要調整其音頻屬性，直接輸出混音即可。In some embodiments, if it is determined that the main audio signal is not interfered with by other audio signals, the process returns to step S270. In other words, if the main audio signal is not interfered with by other audio signals, its audio properties do not need to be adjusted, and the mix can be output directly.

於一些實施例中，若步驟S230中無法辨識音頻訊號的所屬音頻類別，則此一未知類別的音頻訊號會直接被輸出至音頻輸出模組150。In some embodiments, if the audio category of the audio signal cannot be identified in step S230, the audio signal of this unknown category will be directly output to the audio output module 150.

於一些實施例中，智能音頻調整混音方法200會預先設定各音頻類別對應的優先權，以根據此優先權來對應各音頻屬性的音頻來產生調整指令。舉例而言，主要音頻訊號的音頻類別具有最高的優先權。因此，主要音頻訊號的音量、響度等音頻屬性會優先被加強，例如調整為所有音頻訊號中的較大值，以從這些音頻訊號中被凸顯出來。In some embodiments, the intelligent audio adjustment mixing method 200 pre-sets priorities for each audio category, generating adjustment instructions based on these priorities for each audio attribute. For example, the audio category of the primary audio signal has the highest priority. Therefore, the volume, loudness, and other audio attributes of the primary audio signal are prioritized for enhancement, such as being adjusted to the largest value among all audio signals, to stand out from these audio signals.

智能音頻調整的方法不僅可以用在混音領域，亦可用在其他地方，例如可以透過上述所有實施例所述的智慧音頻調整後，在各別輸出所有(或部分)調整後音頻訊號以及未調整的所有(或部分)音頻訊號，例如用在多聲道擴音，或者多個頻道的輸出，當然也可以是接到其他音頻輸出的應用，在此不設限。The intelligent audio adjustment method can be used not only in the field of audio mixing, but also in other places. For example, after intelligent audio adjustment as described in all the above embodiments, all (or part of) the adjusted audio signals and all (or part of) the unadjusted audio signals can be output separately. For example, it can be used in multi-channel amplification or multi-channel output. Of course, it can also be used in applications that connect to other audio outputs. There are no limitations here.

綜上所述，有別於以往的音頻調整方法是以特定硬體或頻道來作為主音源，本案的智能音頻調整混音系統及智能音頻調整混音方法可針對不同場景來設定對應的主要音頻類別，藉由判斷各音源的類別來辨識出主音源，提升主音源的辨識度。如此一來，可以隨著主要音頻類別的不同，而動態地變更主音源，以適地適性的根據對應之主音源來調整其音頻屬性，以達成在音源間自動切換所要被凸顯的音頻類別。In summary, unlike traditional audio adjustment methods that use specific hardware or channels as the primary audio source, the intelligent audio adjustment and mixing system and method in this application can set corresponding primary audio categories for different scenarios. By judging the category of each audio source, the primary audio source is identified, thus improving its recognizability. In this way, the primary audio source can be dynamically changed according to different primary audio categories, adjusting its audio attributes appropriately based on the corresponding primary audio source, thereby achieving automatic switching between audio sources to highlight the desired audio category.

上述內容概述若干實施例之特徵，使得熟習此項技術者可更好地理解本案之態樣。熟習此項技術者應瞭解，在不脫離本案的精神和範圍的情況下，可輕易使用上述內容作為設計或修改為其他變化的基礎，以便實施本文所介紹之實施例的相同目的及/或實現相同優勢。上述內容應當被理解為本案的舉例，其保護範圍應以申請專利範圍為準。The foregoing outlines the features of several embodiments to enable those skilled in the art to better understand the nature of this application. Those skilled in the art should understand that, without departing from the spirit and scope of this application, the foregoing can be readily used as the basis for designing or modifying other variations to achieve the same purpose and/or realize the same advantages of the embodiments described herein. The foregoing should be understood as illustrative of this application, and its scope of protection should be determined by the scope of the patent application.

100:智能音頻調整混音系統 110a~110d:音頻訊號接收模組 120a~120d:類別辨識模組 130a~130d:調整模組 140:屬性計算模組 150:音頻輸出模組 200:智能音頻調整混音方法 S210~S270:步驟100: Intelligent Audio Adjustment and Mixing System; 110a~110d: Audio Signal Receiving Module; 120a~120d: Category Recognition Module; 130a~130d: Adjustment Module; 140: Attribute Calculation Module; 150: Audio Output Module; 200: Intelligent Audio Adjustment and Mixing Method; S210~S270: Steps

以下詳細描述結合隨附圖式閱讀時，將有利於較佳地理解本揭示文件之態樣。應注意，根據說明上實務的需求，圖式中各特徵並不一定按比例繪製。實際上，出於論述清晰之目的，可能任意增加或減小各特徵之尺寸。第1圖繪示根據本案一些實施例中一種智能音頻調整混音系統的示意圖。第2圖繪示根據本案一些實施例中一種智能音頻調整混音方法的流程圖。The following detailed description, when read in conjunction with the accompanying diagrams, will facilitate a better understanding of the form of this disclosure. It should be noted that, depending on the practical needs of the illustration, the features in the diagrams are not necessarily drawn to scale. In fact, the dimensions of the features may be arbitrarily increased or decreased for clarity of explanation. Figure 1 illustrates a schematic diagram of an intelligent audio adjustment and mixing system according to some embodiments of this application. Figure 2 illustrates a flowchart of an intelligent audio adjustment and mixing method according to some embodiments of this application.

國內寄存資訊(請依寄存機構、日期、號碼順序註記) 無國外寄存資訊(請依寄存國家、機構、日期、號碼順序註記) 無Domestic storage information (please record in order of storage institution, date, and number): None. International storage information (please record in order of storage country, institution, date, and number): None.

100:智能音頻調整混音系統 100: Intelligent Audio Adjustment and Mixing System

110a~110d:音頻訊號接收模組 110a~110d: Audio signal receiving module

120a~120d:類別辨識模組 120a~120d: Category Recognition Module

130a~130d:調整模組 130a~130d: Adjustment Module

140:屬性計算模組 140: Attribute Calculation Module

150:音頻輸出模組 150: Audio Output Module

Claims

An intelligent audio adjustment and mixing system is adapted to process a plurality of audio signals, wherein the intelligent audio adjustment and mixing system includes: a category identification module configured to identify each of the audio signals to obtain a corresponding audio category; an attribute calculation module coupled to the category identification module, wherein the attribute calculation module determines from the audio signals that the audio category conforms to a primary audio signal of a primary audio category; and an adjustment module coupled to the attribute calculation module, wherein the adjustment module adjusts an audio attribute of each of the audio signals and/or the primary audio signal respectively before outputting a mixed audio signal. The attribute calculation module is configured to set a priority for the audio category, with the audio category of the main audio signal having the highest priority. The attribute calculation module generates an adjustment command based on the priority of the audio category and sends the adjustment command back to the adjustment module. The adjustment module uses the adjustment command to process the audio attributes of the audio signal respectively for remixing and output.

The intelligent audio adjustment and mixing system as described in claim 1, wherein the audio attributes include a volume and/or a loudness of each audio signal, and the audio categories of the audio signals include a voice, an instrument sound, and/or noise.

The intelligent audio adjustment and mixing system as described in claim 2, wherein the primary audio category has the highest priority, such that the audio attribute of the primary audio signal is preferentially enhanced.

The intelligent audio adjustment and mixing system as described in claim 1, wherein the attribute calculation module is configured to pre-set the primary audio category to determine from the audio signals that the audio category matches the primary audio signal of the primary audio category.

A smart audio adjustment and mixing method includes: setting a priority for an audio category; receiving a plurality of audio signals; identifying each audio signal to obtain a corresponding audio category; determining from the audio signals that the audio category matches a primary audio signal of a primary audio category, wherein the audio category of the primary audio signal has the highest priority; adjusting the audio attributes of each audio signal and/or the primary audio signal respectively, and then mixing and outputting the audio signal; generating an adjustment instruction based on the priority of the audio category; and using the adjustment instruction to process the audio attributes of the audio signals respectively, so as to remix and output the audio signal.

The intelligent audio adjustment and mixing method described in claim 5 further includes: presetting the main audio category.

The intelligent audio adjustment and mixing method as described in claim 5, wherein the audio attributes include a volume and/or a loudness of each audio signal, and the audio categories of the audio signals include a voice, an instrument sound, and/or noise.

The intelligent audio adjustment and mixing method as described in claim 5 further includes: generating the adjustment instruction according to the priority of the audio category, wherein the audio category of the primary audio signal has the highest priority.

The intelligent audio adjustment and mixing method as described in claim 8, wherein the primary audio category has the highest priority, such that the numerical value of the audio attribute of the primary audio signal is preferentially strengthened.

A smart audio adjustment and mixing method includes: setting a priority for an audio category; receiving at least two audio signals; identifying at least one of the at least two audio signals to obtain the corresponding audio category; and generating an adjustment instruction based on the priority of the audio category, and using the adjustment instruction to adjust the audio attributes of at least one of the at least two audio signals, followed by a mixed output.