TWI899919B - Method for creation of linearly interpolated head related transfer functions - Google Patents
Method for creation of linearly interpolated head related transfer functionsInfo
- Publication number
- TWI899919B TWI899919B TW113111623A TW113111623A TWI899919B TW I899919 B TWI899919 B TW I899919B TW 113111623 A TW113111623 A TW 113111623A TW 113111623 A TW113111623 A TW 113111623A TW I899919 B TWI899919 B TW I899919B
- Authority
- TW
- Taiwan
- Prior art keywords
- ear
- hrtfs
- audio data
- delay
- audio
- Prior art date
Links
Landscapes
- Stereophonic System (AREA)
Abstract
Description
本發明係關於自原始頭部相關傳輸函數(HRTF)建立修改之HRTF。 The present invention relates to creating a modified head-related transfer function (HRTF) from an original HRTF.
除非本文中另有指示,否則此章節中所描述之方法不是此申請案中之申請專利範圍之先前技術且不藉由包含於此章節中而被認為係先前技術。 Unless otherwise indicated herein, the methods described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
雙耳音訊信號包括意欲用於透過兩個(左側及右側)各自耳朵向一收聽者回放之兩個音訊聲道。雙耳回放可經由靠近各耳朵放置之揚聲器或透過耳機(包含耳罩式及入耳式耳機)達成。 A binaural audio signal consists of two audio channels intended for playback to a listener through each of their two ears (left and right). Binaural playback can be achieved via speakers placed close to each ear or via headphones (including over-ear and in-ear headphones).
雙耳信號可藉由使用一對頭部相關傳輸函數(HRTF)濾波器回應處理一源音訊信號來產生。HRTF回應可依諸多方式界定,包含作為時域脈衝回應或作為頻域回應。HRTF回應通常成對分組以針對各耳朵傳感器提供一回應。 Binaural signals can be generated by processing a source audio signal using a pair of head-related transfer function (HRTF) filter responses. HRTF responses can be defined in a variety of ways, including as time-domain impulse responses or as frequency-domain responses. HRTF responses are typically grouped in pairs to provide a response for each ear sensor.
當用於處理一音訊信號時,一HRTF濾波器對可用於向一收聽者提供模擬將在音訊信號自一特定到達方向呈現時會出現之聲音(在各耳朵處)之一體驗。不同HRTF濾波器對將產生不同聲源方向之錯覺。 When used to process an audio signal, an HRTF filter pair can be used to provide a listener with an experience that simulates the sound (at each ear) that would occur if the audio signal were presented from a specific arrival direction. Different HRTF filter pairs will produce the illusion of different sound source directions.
與一特定到達方向相關聯之一對參考HRTF濾波器可藉由量測自位於相同方向上之某一距離處之一聲源至一收聽者之耳朵之各者之聲學傳輸函數來判定。替代地,參考HRTF濾波器可藉由其他方式來判定,包含數值模擬或一人體模型之聲學量測。 A pair of reference HRTF filters associated with a particular arrival direction can be determined by measuring the acoustic transfer function from a sound source located at a certain distance in the same direction to a listener's ear. Alternatively, the reference HRTF filters can be determined by other means, including numerical simulation or acoustic measurement of a human body model.
一對修改之HRTF濾波器可不同於一對聲學量測之HRTF濾波器,同時仍向一收聽者提供來自相同方向之一聲音之所要印象。特定而言,左側及右側修改之HRTF濾波器之高頻部分之間的相位差可實質上不同於左側及右側參考HRTF濾波器之高頻部分之間的相位差,所感知的收聽者體驗沒有顯著損失。此係可行的,因為一高頻範圍中之耳間相位差相對於一收聽者之感知而言很大程度上不重要。 A pair of modified HRTF filters can differ from a pair of acoustically measured HRTF filters while still providing a listener with the desired impression of a sound originating from the same direction. Specifically, the phase difference between the high-frequency portions of the left and right modified HRTF filters can be substantially different from the phase difference between the high-frequency portions of the left and right reference HRTF filters, without a significant loss in perceived listener experience. This is possible because interaural phase differences in the high-frequency range are largely insignificant to a listener's perception.
一HRTF集合函數係鑑於一到達方向來判定左耳及右耳HRTF濾波器之一函數:(h l (t),h r (t))←H(x,y,z) (1) A HRTF set function is a function that determines the left and right ear HRTF filters given a direction of arrival: ( hl ( t ) ,hr ( t ))← H ( x,y,z ) (1)
在方程式1中,HRTF集合函數H(x,y,z)具有呈一3D單位向量(x,y,z)之形式之一到達方向,且函數返回一對左/右耳HRTF濾波器。 In Equation 1, the HRTF set function H ( x,y,z ) has an arrival direction in the form of a 3D unit vector ( x,y,z ), and the function returns a pair of left/right ear HRTF filters.
相對於此等及其他考慮,呈現本文中所製作之本發明。 It is with respect to these and other considerations that the present invention made herein is presented.
描述用於處理音訊信號之技術。本文中所描述之各種實例提供用於建立及使用具有替代之高頻相位回應之修改之HRTF濾波器之系統、方法及/或裝置。 Techniques for processing audio signals are described. Various examples described herein provide systems, methods, and/or devices for creating and using modified HRTF filters with alternative high-frequency phase responses.
根據一些實例性實施例,一種用於包含一或多個處理器之一控制系統之音訊處理方法可涉及:由該控制系統獲得一第一組頭部相關傳輸函數(HRTF);及由該控制系統將該第一組HRTF變換為一第二組 HRTF。在一些實例性實施例中,該變換可涉及用該第二組HRTF中之全通濾波器替換該第一組HRTF之延遲分量。在一些實例性實施例中,該變換可涉及調整該第二組HRTF中之該等全通濾波器之各者之一相位回應使得:針對低於該對應全通濾波器之一相關聯臨限頻率之頻率,各耳間相位回應實質上係線性的,且針對高於該對應全通濾波器之該相關聯臨限頻率之頻率,各耳間相位具有減小之耳間相位差。 According to some exemplary embodiments, an audio processing method for a control system comprising one or more processors may involve: obtaining, by the control system, a first set of head-related transfer functions (HRTFs); and transforming, by the control system, the first set of HRTFs into a second set of HRTFs. In some exemplary embodiments, the transformation may involve replacing a delayed component of the first set of HRTFs with an all-pass filter in the second set of HRTFs. In some exemplary embodiments, the transformation may involve adjusting a phase response of each of the all-pass filters in the second set of HRTFs such that: for frequencies below a threshold frequency associated with the corresponding all-pass filter, each interaural phase response is substantially linear, and for frequencies above the threshold frequency associated with the corresponding all-pass filter, each interaural phase has a reduced interaural phase difference.
在一些實例性實施例中,該方法可涉及輸出該第二組HRTF。根據一些實例性實施例,輸出該第二組HRTF可涉及儲存該第二組HRTF,將該第二組HRTF傳送至經組態以處理音訊資料之一裝置,提供該第二組HRTF以供進一步處理,或其等之組合。 In some example embodiments, the method may involve outputting the second set of HRTFs. According to some example embodiments, outputting the second set of HRTFs may involve storing the second set of HRTFs, transmitting the second set of HRTFs to a device configured to process audio data, providing the second set of HRTFs for further processing, or a combination thereof.
根據一些實例性實施例,該方法可涉及:由該控制系統基於該第二組HRTF來界定一組基本濾波器。該組基本濾波器可具有少於該第二組HRTF之構件。在一些實例性實施例中,該方法可涉及:由該控制系統獲得呈一輸入音訊格式之輸入音訊資料之一位元流;及由該控制系統組合該輸入音訊資料與該組基本濾波器之一或多個基本濾波器以產生左音訊資料及右音訊資料。 According to some example embodiments, the method may involve: defining, by the control system, a set of basic filters based on the second set of HRTFs. The set of basic filters may have fewer components than the second set of HRTFs. In some example embodiments, the method may involve: obtaining, by the control system, a bit stream of input audio data in an input audio format; and combining, by the control system, the input audio data with one or more basic filters of the set of basic filters to generate left audio data and right audio data.
在一些實例性實施例中,該方法可涉及:由該控制器輸出該左音訊資料及該右音訊資料。根據一些實例性實施例,輸出該左音訊資料及該右音訊資料可涉及儲存該左音訊資料及該右音訊資料,傳送該左音訊資料及該右音訊資料,由該控制系統將該左音訊資料及該右音訊資料提供至一組揚聲器以供回放,提供該左音訊資料及該右音訊資料以供進一步處理,或其等之組合。 In some example embodiments, the method may involve: outputting, by the controller, the left audio data and the right audio data. According to some example embodiments, outputting the left audio data and the right audio data may involve storing the left audio data and the right audio data, transmitting the left audio data and the right audio data, providing, by the control system, the left audio data and the right audio data to a set of speakers for playback, providing the left audio data and the right audio data for further processing, or a combination thereof.
根據一些實例性實施例,該變換亦可涉及:自該第一組 HRTF獲得左耳HRTF及右耳HRTF;自該等左耳HRTF之各者識別一左耳非延遲脈衝回應及一左耳延遲;自該等右耳HRTF之各者識別一右耳非延遲脈衝回應及一右耳延遲。在一些實例性實施例中,該變換亦可涉及:產生左耳全通濾波器,該等左耳全通濾波器之各者至少部分基於該等左耳延遲之一例項。根據一些實例性實施例,該變換亦可涉及:產生右耳全通濾波器,該等右耳全通濾波器之各者至少部分基於該等右耳延遲之一例項。在一些實例性實施例中,該變換亦可涉及:將該左耳及右耳非延遲脈衝回應之例項與該左耳及右耳全通濾波器之對應例項組合以產生該第二組HRTF之HRTF對。 According to some exemplary embodiments, the transformation may also involve: obtaining a left-ear HRTF and a right-ear HRTF from the first set of HRTFs; identifying a left-ear undelayed pulse response and a left-ear delay from each of the left-ear HRTFs; and identifying a right-ear undelayed pulse response and a right-ear delay from each of the right-ear HRTFs. In some exemplary embodiments, the transformation may also involve: generating a left-ear all-pass filter, each of the left-ear all-pass filters being based at least in part on an instance of the left-ear delays. According to some exemplary embodiments, the transformation may also involve: generating a right-ear all-pass filter, each of the right-ear all-pass filters being based at least in part on an instance of the right-ear delays. In some exemplary embodiments, the transformation may also involve combining instances of the left-ear and right-ear undelayed pulse responses with corresponding instances of the left-ear and right-ear all-pass filters to generate HRTF pairs for the second set of HRTFs.
在一些實例性實施例中,該方法亦可涉及:基於提取之左耳延遲及右耳延遲之一或多者來產生修改之左耳延遲值及右耳延遲值。該等左耳全通濾波器及右耳全通濾波器可基於該等修改之左耳延遲值及該等修改之右耳延遲值。 In some exemplary embodiments, the method may also involve generating modified left-ear delay values and right-ear delay values based on one or more of the extracted left-ear delay and right-ear delay. The left-ear all-pass filters and the right-ear all-pass filters may be based on the modified left-ear delay values and the modified right-ear delay values.
根據一些實例性實施例,產生該等修改之左耳延遲值及右耳延遲值之例項可涉及判定一提取之左耳延遲與一提取之右耳延遲之間的一差。在一些此等實例性實施例中,產生該等修改之左耳延遲值及右耳延遲值之例項可涉及判定一提取之左耳延遲與一提取之右耳延遲之間的一最大預期差。根據一些實例性實施例,一提取之左耳延遲與一提取之右耳延遲之間的一差可等於一對應修改之左耳延遲值與一修改之右耳延遲值之間的一差。 According to some example embodiments, generating the modified left-ear delay values and right-ear delay values may involve determining a difference between an extracted left-ear delay and an extracted right-ear delay. In some such example embodiments, generating the modified left-ear delay values and right-ear delay values may involve determining a maximum expected difference between an extracted left-ear delay and an extracted right-ear delay. According to some example embodiments, a difference between an extracted left-ear delay and an extracted right-ear delay may be equal to a difference between a corresponding modified left-ear delay value and a modified right-ear delay value.
在一些實例性實施例中,該等修改之左耳延遲值及該等修改之右耳延遲值可對應於平滑函數。根據一些實例性實施例,該等修改之左耳延遲值及該等修改之右耳延遲值之各對可包含一較低延遲值及一較高 延遲值。在一些實例中,該較低延遲值可具有小於該較高延遲值之延遲變動。在一些實例性實施例中,該等非延遲脈衝回應可為最小相位濾波器回應。 In some exemplary embodiments, the modified left-ear delay values and the modified right-ear delay values may correspond to a smoothing function. According to some exemplary embodiments, each pair of the modified left-ear delay values and the modified right-ear delay values may include a lower delay value and a higher delay value. In some examples, the lower delay value may have a smaller delay variation than the higher delay value. In some exemplary embodiments, the non-delayed pulse responses may be minimum phase filter responses.
根據一些實例性實施例,自該等左耳及右耳HRTF之各者提取各左耳非延遲脈衝回應、各右耳非延遲脈衝回應、各左耳延遲及各右耳延遲可涉及:判定該第一組HRTF之一原始HRTF濾波器之一頻率回應;判定該原始HRTF濾波器之一振幅回應;及判定一新的非延遲最小相位濾波器之一最小相位頻率回應。在一些此等實例性實施例中,自該等左耳及右耳HRTF之各者提取各左耳非延遲脈衝回應、各右耳非延遲脈衝回應、各左耳延遲及各右耳延遲可涉及:判定該原始HRTF濾波器之一相位回應及該新的非延遲最小相位濾波器之一相位回應;及至少部分基於該原始HRTF濾波器之該相位回應及該新的非延遲最小相位濾波器之該相位回應來判定與該原始HRTF濾波器相關聯之一延遲。在一些此等實例性實施例中,判定該最小相位頻率回應可涉及實施涉及該原始HRTF濾波器之該振幅回應之一希爾伯特變換(Hilbert transform)。根據一些實例性實施例,判定與該原始HRTF濾波器相關聯之該延遲亦可至少部分基於在300Hz至1600Hz之一範圍內之一延遲量測頻率。 According to some exemplary embodiments, extracting each left ear non-delayed pulse response, each right ear non-delayed pulse response, each left ear delay, and each right ear delay from each of the left ear and right ear HRTFs may involve: determining a frequency response of an original HRTF filter of the first set of HRTFs; determining an amplitude response of the original HRTF filter; and determining a minimum phase frequency response of a new non-delayed minimum phase filter. In some such exemplary embodiments, extracting each left-ear undelayed pulse response, each right-ear undelayed pulse response, each left-ear delay, and each right-ear delay from each of the left-ear and right-ear HRTFs may involve: determining a phase response of the original HRTF filter and a phase response of the new undelayed minimum phase filter; and determining a delay associated with the original HRTF filter based at least in part on the phase response of the original HRTF filter and the phase response of the new undelayed minimum phase filter. In some such exemplary embodiments, determining the minimum phase frequency response may involve performing a Hilbert transform on the amplitude response of the original HRTF filter. According to some example embodiments, determining the delay associated with the original HRTF filter may also be based at least in part on a delay measurement frequency in a range of 300 Hz to 1600 Hz.
在一些實例性實施例中,該組基本濾波器可具有比該第二組HRTF少至少一數量級之構件。根據一些實例性實施例,針對高於該臨限頻率之頻率,一全通相位回應可偏離一線性斜坡相位回應且可平滑接近零相位。 In some exemplary embodiments, the set of elementary filters may have at least an order of magnitude fewer components than the second set of HRTFs. According to some exemplary embodiments, for frequencies above the threshold frequency, an all-pass phase response may deviate from a linearly ramped phase response and may be smoothed to near zero phase.
根據一些實例性實施例,該控制系統可對應於沉浸式語音及音訊服務(IVAS)之一編解碼器之至少部分。 According to some example embodiments, the control system may correspond to at least a portion of a codec of an immersive voice and audio service (IVAS).
根據一些進一步實施例,一種非暫時性電腦可讀媒體可儲存由一或多個處理器執行時引起該一或多個處理器執行本文中所揭示之方法之任一者之操作之指令。 According to some further embodiments, a non-transitory computer-readable medium may store instructions that, when executed by one or more processors, cause the one or more processors to perform the operations of any of the methods disclosed herein.
根據一些額外實例性實施例,一種音訊處理器裝置可經組態以處理輸入音訊資料。在一些實例性實施例中,該音訊處理器裝置可包含經組態以接收該輸入音訊資料之一接收器單元及一電腦單元。根據一些實例性實施例,該電腦單元可經組態以擷取一第一組頭部相關傳輸函數(HRTF)且將該第一組HRTF變換為一第二組HRTF。在一些實例性實施例中,該變換可涉及用該第二組HRTF中之全通濾波器替換該第一組HRTF之延遲分量。在一些實例性實施例中,該變換可涉及調整該第二組HRTF中之該等全通濾波器之各者之一相位回應使得:針對低於該對應全通濾波器之一相關聯臨限頻率之頻率,各耳間相位回應實質上係線性的,且針對高於該對應全通濾波器之該相關聯臨限頻率之頻率,各耳間相位具有減小之耳間相位差。 According to some additional exemplary embodiments, an audio processor device may be configured to process input audio data. In some exemplary embodiments, the audio processor device may include a receiver unit configured to receive the input audio data and a computer unit. According to some exemplary embodiments, the computer unit may be configured to extract a first set of head-related transfer functions (HRTFs) and transform the first set of HRTFs into a second set of HRTFs. In some exemplary embodiments, the transformation may involve replacing a delayed component of the first set of HRTFs with an all-pass filter in the second set of HRTFs. In some exemplary embodiments, the transformation may involve adjusting a phase response of each of the all-pass filters in the second set of HRTFs such that: for frequencies below a threshold frequency associated with the corresponding all-pass filter, each interaural phase response is substantially linear, and for frequencies above the threshold frequency associated with the corresponding all-pass filter, each interaural phase has a reduced interaural phase difference.
在一些實例性實施例中,該電腦單元可經組態以輸出該第二組HRTF。根據一些實例性實施例,輸出該第二組HRTF可涉及儲存該第二組HRTF,將該第二組HRTF傳送至經組態以處理音訊資料之一裝置,提供該第二組HRTF以供進一步處理,或其等之組合。 In some example embodiments, the computer unit may be configured to output the second set of HRTFs. According to some example embodiments, outputting the second set of HRTFs may involve storing the second set of HRTFs, transmitting the second set of HRTFs to a device configured to process audio data, providing the second set of HRTFs for further processing, or a combination thereof.
根據一些實例性實施例,該電腦單元可經進一步組態以基於該第二組HRTF來界定一組基本濾波器。該組基本濾波器可具有少於該第二組HRTF之構件。在一些實例性實施例中,該電腦單元可經進一步組態以獲得呈一輸入音訊格式之輸入音訊資料之一位元流且組合該輸入音訊資料與該組基本濾波器之一或多個基本濾波器以產生左音訊資料及右音訊 資料。 According to some exemplary embodiments, the computer unit may be further configured to define a set of basic filters based on the second set of HRTFs. The set of basic filters may have fewer components than the second set of HRTFs. In some exemplary embodiments, the computer unit may be further configured to receive a bit stream of input audio data in an input audio format and combine the input audio data with one or more basic filters in the set of basic filters to generate left audio data and right audio data.
在一些實例性實施例中,該電腦單元可經進一步組態以輸出該左音訊資料及該右音訊資料。根據一些實例性實施例,輸出該左音訊資料及該右音訊資料可涉及儲存該左音訊資料及該右音訊資料,傳送該左音訊資料及該右音訊資料,由該控制系統將該左音訊資料及該右音訊資料提供至一組揚聲器以供回放,提供該左音訊資料及該右音訊資料以供進一步處理,或其等之組合。 In some exemplary embodiments, the computer unit may be further configured to output the left audio data and the right audio data. According to some exemplary embodiments, outputting the left audio data and the right audio data may involve storing the left audio data and the right audio data, transmitting the left audio data and the right audio data, providing the left audio data and the right audio data to a set of speakers for playback by the control system, providing the left audio data and the right audio data for further processing, or a combination thereof.
根據一些實例性實施例,該音訊處理器裝置可包含經組態以儲存該等第一HRTF、該等第二HRTF、該左音訊資料、該右音訊資料、該輸入音訊資料或其等之組合之一儲存裝置。在一些此等實例性實施例中,該儲存裝置可包含一隨機存取記憶體、一唯讀記憶體、一非暫時性電腦可讀媒體或其等之組合。 According to some exemplary embodiments, the audio processor device may include a storage device configured to store the first HRTFs, the second HRTFs, the left audio data, the right audio data, the input audio data, or a combination thereof. In some such exemplary embodiments, the storage device may include a random access memory, a read-only memory, a non-transitory computer-readable medium, or a combination thereof.
在一些實例性實施例中,該音訊處理器裝置可對應於沉浸式語音及音訊服務(IVAS)之一編解碼器之至少部分。 In some example embodiments, the audio processor device may correspond to at least a portion of a codec of an immersive voice and audio service (IVAS).
本文中所描述之實施例通常可描述為技術,其中術語「技術」可係指(若干)系統、(若干)裝置、(若干)方法、(若干)電腦可讀指令、(若干)模組、(若干)組件、硬體邏輯及/或(若干)操作,如由本文中所應用之內文所建議。 The embodiments described herein may generally be described as techniques, where the term "technique" may refer to system(s), device(s), method(s), computer-readable instruction(s), module(s), component(s), hardware logic, and/or operation(s), as suggested by the context used herein.
除上文所明確描述之特徵及技術益處之外的特徵及技術益處將自閱讀以下[實施方式]及檢視相關聯圖式明白。提供本[發明內容]以依一簡化形式介紹一系列技術,且不意在識別由隨附申請專利範圍界定之所主張標的物之關鍵或基本特徵。 Features and technical benefits other than those explicitly described above will become apparent upon reading the following [Implementation Methods] and examining the associated drawings. This [Invention Content] is provided to introduce a series of technologies in a simplified form and is not intended to identify the key or essential features of the claimed subject matter as defined by the accompanying patent applications.
100:配置 100: Configuration
101:設備 101: Equipment
105:介面系統 105: Interface System
110:控制系統 110: Control System
111:脈衝回應 111: Pulse Response
115:選用記憶體系統 115: Select memory system
120:選用麥克風系統 120: Select microphone system
125:選用揚聲器系統 125: Select speaker system
130:選用感測器系統 130: Select sensor system
135:選用顯示器系統 135: Select a display system
138:延遲處理區塊 138: Delayed processing block
140:延遲 140: Delay
140L:延遲 140L: Delay
140R:延遲 140R: Delay
141:中央處理單元(CPU) 141: Central Processing Unit (CPU)
141L:新的延遲 141L: New Delay
141R:新的延遲 141R: New Delay
142:唯讀記憶體(ROM) 142: Read-Only Memory (ROM)
143:隨機存取記憶體(RAM) 143: Random Access Memory (RAM)
144:匯流排 144: Bus
145:輸入/輸出(I/O)介面 145: Input/Output (I/O) Interface
146:輸入單元 146: Input unit
147:輸出單元 147: Output unit
148:儲存單元 148: Storage Unit
149:通信單元 149: Communication unit
150:驅動器 150:Driver
151:可移除媒體 151: Removable Media
160:電子處理器 160: Electronic Processor
161:記憶體 161:Memory
162:編碼軟體 162:Encoding software
163:解碼軟體 163: Decoding software
170:沉浸式語音及音訊服務(IVAS)編解碼器 170: Immersive Voice and Audio Services (IVAS) Codec
171:IVAS編碼器 171:IVAS Encoder
172:空間編碼器 172: Spatial Encoder
173:核心音訊編碼器 173: Core Audio Codec
174:IVAS解碼器 174:IVAS Decoder
175:核心音訊解碼器 175: Core Audio Decoder
176:空間解碼器/呈現器 176: Spatial Decoder/Renderer
180:中間值 180: median value
181:s平面濾波器極點 181:s Planar filter pole
182:濾波器極點 182: Filter Pole
200:頭部 200: Head
211:脈衝回應 211: Pulse Response
211L:L HRTF 211L:L HRTF
211R:R HRTF 211R:R HRTF
250:HRTF變換子區塊 250: HRTF transformation sub-block
251:HRTF處理區塊 251: HRTF processing block
251L:左HRTF處理區塊 251L: Left HRTF processing block
251R:右HRTF處理區塊 251R: Right HRTF processing block
252:全通產生器 252: All-pass generator
252L:全通濾波器產生器 252L: All-pass filter generator
252R:全通濾波器產生器 252R: All-pass filter generator
253:卷積程序 253: Volume Procedure
253L:左HRTF產生區塊 253L: Left HRTF generation area
253R:右HRTF產生區塊 253R: Right HRTF generation area
272:延遲處理區塊 272: Delayed processing block
273:映射區塊 273: Mapping Block
274:變換區塊 274: Transformation Block
275:全通運算區塊 275: Full-pass operation block
311:非延遲脈衝回應 311: Non-delayed pulse response
311L:非延遲脈衝回應 311L: Non-delayed pulse response
311R:非延遲脈衝回應 311R: Non-delayed pulse response
411:線性相位回應 411: Linear Phase Response
412:替代相位回應 412: Alternative Phase Response
500:配置 500: Configuration
501:配置 501: Configuration
503:更詳細視圖 503: More detailed view
511:因果全通濾波器 511:Causal All-Pass Filter
512:全通濾波器 512: All-pass filter
512L:全通濾波器 512L: All-pass filter
512R:全通濾波器 512R: All-pass filter
520:原始HRTF庫 520: Original HRTF Library
521:HRTF變換區塊 521: HRTF Transformation Block
522:基本濾波器產生區塊 522: Basic filter generation block
523:基本濾波器 523: Basic Filter
524:到達方向 524: Arrival Direction
525:加權係數產生區塊 525: Weighting coefficient generation block
526:加權係數 526: Weighting coefficient
527:加權係數及基本濾波器組合區塊 527: Weighting coefficient and basic filter combination block
528:左耳HRTF濾波器 528: Left ear HRTF filter
529:右耳HRTF濾波器 529: Right Ear HRTF Filter
530:音訊產生區塊 530: Audio generation block
531:音訊信號 531: Audio signal
532:音訊輸入及基本濾波器組合區塊 532: Audio input and basic filter combination block
533:左耳音訊信號 533: Left ear audio signal
534:右耳音訊信號 534: Right ear audio signal
541:修改之HRTF庫 541: Modified HRTF library
611:左耳HRTF 611: Left ear HRTF
711:修改之右耳HRTF對 711: Modified Right Ear HRTF
711L:左耳HRTF 711L: Left ear HRTF
711R:右耳HRTF 711R: Right Ear HRTF
771A:延遲 771A: Delay
771B:延遲 771B: Delay
771C:延遲 771C: Delay
772A:延遲 772A: Delay
772B:延遲 772B: Delay
772C:延遲 772C: Delay
801:X軸 801: X-axis
802:Y軸 802: Y axis
803:Z軸 803: Z axis
811:脈衝回應 811: Pulse Response
911:脈衝回應 911: Pulse Response
2100:方法 2100: Methods
2105:區塊 2105: Block
2110:區塊 2110: Block
2115:區塊 2115: Block
現將僅藉由實例方式參考附圖描述本發明之實施例,其中:圖1A係展示能夠實施本發明之各種態樣之一設備之組件之實例的一方塊圖;圖1B繪示可用於實施本發明之各種態樣之一實例性裝置架構之一示意方塊圖;圖1C繪示在可用於實施本發明之各種態樣之圖1B之裝置架構中實施之一實例性CPU之一示意方塊圖;圖1D係根據一或多個實施例之用於編碼及解碼一沉浸式語音及音訊服務(IVAS)位元流之IVAS編碼器/解碼器(「編解碼器」)框架之一方塊圖;圖1E係展示以一收聽者之頭部為中心之一笛卡爾座標系統的一圖式;圖2係展示一左耳參考HRTF的一曲線圖;圖3係展示一右耳參考HRTF的一曲線圖;圖4係展示其中移除延遲之一右耳HRTF濾波器的一曲線圖;圖5係展示兩個替代濾波器之相位回應的一曲線圖;圖6係展示兩個替代濾波器之相位回應的一曲線圖;圖7係展示一左耳經修改HRTF的一曲線圖;圖8係展示一右耳經修改HRTF的一曲線圖;圖9係展示具有新增延遲之一左耳HRTF的一曲線圖;圖10係展示一右耳經修改HRTF的一曲線圖; 圖11係展示一HRTF濾波器之修改方案的一圖式;圖12係展示具有一相關聯延遲之一全通脈衝回應之形成的一圖式;圖13係展示一原始HRTF庫轉換為一更緊湊HRTF基本集合的一圖式;圖14係展示用於高效運算HRTF之一緊湊HRTF基本集合的一圖式;圖15係展示用於處理一基於場景之音訊信號之一緊湊HRTF基本集合的一圖式;圖16展示根據一些實施方案之圖13至圖15之HRTF變換區塊之額外細節;圖17展示根據一些實施方案之圖16之HRTF變換字區塊之額外細節;圖18、圖19及圖20展示可由圖17之延遲處理區塊實施之函數之實例;及圖21係概述可由諸如本文中所揭示之設備或系統之一設備或系統執行之一方法之一個實例的一流程圖。 Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: FIG1A is a block diagram showing an example of components of an apparatus capable of implementing various aspects of the present invention; FIG1B is a schematic block diagram of an example device architecture that can be used to implement various aspects of the present invention; FIG1C is a schematic block diagram of an example CPU implemented in the device architecture of FIG1B that can be used to implement various aspects of the present invention; FIG1D is a schematic block diagram of an immersive voice and audio service (IVAS) for encoding and decoding an IVAS bit stream according to one or more embodiments. A block diagram of a codec (codec) framework; FIG1E is a diagram showing a Cartesian coordinate system centered on a listener's head; FIG2 is a graph showing a left-ear reference HRTF; FIG3 is a graph showing a right-ear reference HRTF; FIG4 is a graph showing a right-ear HRTF filter with delay removed; FIG5 is a graph showing the phase responses of two alternative filters; FIG6 is a graph showing the phase responses of two alternative filters; FIG7 is a graph showing a modified left-ear HRTF; Figure 8 shows a graph of a modified HRTF for the right ear; Figure 9 shows a graph of a HRTF for the left ear with an added delay; Figure 10 shows a graph of a modified HRTF for the right ear; Figure 11 shows a diagram of a modified HRTF filter; Figure 12 shows a diagram of the formation of a full-pass pulse response with an associated delay; Figure 13 shows the conversion of an original HRTF library into a more compact HRTF base set; Figure 14 shows a diagram of a compact HRTF base set for efficient HRTF computation. FIG15 is a diagram showing a compact HRTF basis set for processing a scene-based audio signal; FIG16 shows additional details of the HRTF transform block of FIG13 to FIG15 according to some embodiments; FIG17 shows additional details of the HRTF transform word block of FIG16 according to some embodiments; FIG18, FIG19, and FIG20 show examples of functions that may be implemented by the delay processing block of FIG17; and FIG21 is a flow chart summarizing an example of a method that may be performed by one of the apparatus or systems disclosed herein.
相關申請案之交叉參考 Cross-reference to related applications
本申請案主張2023年3月29日申請之美國臨時申請案第63/455,539號、2023年11月2日申請之美國臨時申請案第63/595,752號及2024年3月19日申請之美國臨時申請案第63/567,376號之優先權,該等案之全部內容以引用方式併入本文中。 This application claims priority to U.S. Provisional Application No. 63/455,539, filed on March 29, 2023, U.S. Provisional Application No. 63/595,752, filed on November 2, 2023, and U.S. Provisional Application No. 63/567,376, filed on March 19, 2024, the entire contents of which are incorporated herein by reference.
本發明係關於自原始HRTF建立修改之HRTF,使得修改之HRTF可由一線性混合更高效近似同時保留原始HRTF之心理聲學性質。本文中所描述係與處理HRTF濾波器以產生適合於用於基於線性內插之一組濾波器中之修改之HRTF濾波器的技術。在以下描述中,出於解釋目的,闡述數種實例及特定細節以提供對本發明之一透徹理解。然而,熟習技術者將明白,如由申請專利範圍界定之本發明可單獨或與下文所描述之其他特徵組合地包含此等實例中之一些或全部特徵,且可進一步包含本文中所描述之特徵及概念之修改方案及等效物。 The present invention relates to creating a modified HRTF from an original HRTF, such that the modified HRTF can be more efficiently approximated by a linear blend while preserving the psychoacoustic properties of the original HRTF. Described herein are techniques for processing HRTF filters to produce modified HRTF filters suitable for use in a filter set based on linear interpolation. In the following description, for purposes of explanation, several examples and specific details are set forth to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention, as defined by the claims, may encompass some or all of the features of these examples, alone or in combination with other features described below, and may further encompass modifications and equivalents of the features and concepts described herein.
在以下描述中,詳述各種系統、裝置、方法、程序及流程。儘管特定步驟可依某一順序描述,但此順序僅為了方便及清楚。一特定步驟可重複超過一次,可在其他步驟之前或之後發生(即使此等步驟否則依另一順序描述),且可與其他步驟並行發生。僅當必須在開始第二步驟之前完成第一步驟時,才需在一第一步驟之後進行一第二步驟。當內文不清楚時,將明確指出此一情形。 In the following description, various systems, devices, methods, programs, and processes are described in detail. Although certain steps may be described in a particular order, this order is for convenience and clarity only. A particular step may be repeated more than once, may occur before or after other steps (even if those steps would otherwise be described in a different order), and may occur concurrently with other steps. A second step may be performed after a first step only if the first step must be completed before starting the second step. When the context is unclear, this will be explicitly stated.
在本文件中,使用術語「及」、「或」及「及/或」。此等術語應被理解為具有一包含性含義。例如,「A及B」可意謂至少以下:「A及B兩者」、「至少A及B兩者」。作為另一實例,「A或B」可意謂至少以下:「至少A」、「至少B」、「A及B兩者」、「至少A及B兩者」。作為另一實例,「A及/或B」可意謂至少以下:「A及B」、「A或B」。當想要一互斥或時,則將明確註明(例如,「A或B」、「A及B之至多一者」)。 Throughout this document, the terms "and," "or," and "and/or" are used. These terms should be understood to have an inclusive meaning. For example, "A and B" may mean at least the following: "both A and B," "at least both A and B." As another example, "A or B" may mean at least the following: "at least A," "at least B," "both A and B," "at least both A and B." As another example, "A and/or B" may mean at least the following: "A and B," "A or B." When an exclusive or is intended, this will be explicitly stated (e.g., "A or B," "at most one of A and B").
術語「包含」及其變體應被理解為意謂「包含但不限於」之開放式術語。術語「一個實例性實施方案」及「一實例性實施方案」應 被理解為「至少一個實例性實施方案」。術語「另一實施方案」應被理解為「至少一個其他實施方案」。術語「判定(determined、determines或determining)」應被理解為獲得、接收、運算、計算、估計、預測或推導。另外,在以下描述及申請專利範圍中,除非另有定義,否則本文中所使用之所有技術及科學術語具有相同於本發明所屬技術之一般人員所共同理解之含義。 The term "including" and its variations should be understood as open-ended terms meaning "including, but not limited to." The terms "one exemplary embodiment" and "an exemplary embodiment" should be understood to mean "at least one exemplary embodiment." The term "another embodiment" should be understood to mean "at least one other embodiment." The terms "determined," "determines," or "determining" should be understood to mean to obtain, receive, calculate, compute, estimate, predict, or infer. Furthermore, in the following description and claims, unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this invention belongs.
本文件描述與諸如區塊、元件、組件、電路等之結構相關聯之各種處理函數。一般而言,此等結構可由一或多個電腦程式所控制之一處理器實施。 This document describes various processing functions associated with structures such as blocks, components, assemblies, and circuits. Generally, these structures can be implemented by a processor controlled by one or more computer programs.
各種縮寫詞可貫穿本發明出現且在相關聯申請專利範圍及/或圖式中如下列出。其他常用縮寫詞及技術術語可為了簡潔起見自此清單排除。因此,下文提供縮寫詞之一簡短清單以供讀者參考。 Various abbreviations may be used throughout the present invention and in the associated patent claims and/or drawings, as listed below. Other commonly used abbreviations and technical terms may be excluded from this list for the sake of brevity. Therefore, a brief list of abbreviations is provided below for the reader's reference.
IVAS-沉浸式語音及音訊服務 IVAS - Immersive Voice and Audio Services
HRTF-頭部相關傳輸函數 HRTF-Head Related Transfer Function
LPC-線性預測式編碼 LPC - Linear Predictive Coding
CLDFB-複雜低延遲濾波器組 CLDFB-Complex Low Delay Filter Bank
SBA-基於場景之音訊 SBA-Scene-Based Audio
SPAR-空間重建,一空間音訊編碼技術 SPAR - Spatial Reconstruction, a spatial audio coding technique
DirAC-定向音訊編碼,另一空間音訊編碼技術 DirAC - Directional Audio Coding, another spatial audio coding technology
MD-元資料 MD-Metadata
BS-位元流 BS-Bitstream
HOA-高階立體音響 HOA-High-end Stereo Speakers
FOA-一階立體音響 FOA-First-order stereo speakers
MDFT-修改之離散傅里葉變換 MDFT-Modified Discrete Fourier Transform
MDCT-修改之離散餘弦變換 MDCT-Modified Discrete Cosine Transform
圖1A係展示能夠實施本發明之各種態樣之一設備之組件之實例的一方塊圖。如同本文中所提供之其他圖,圖1A中所展示之元件之類型及數目僅藉由實例方式提供。其他實施方案可包含更多、更少及/或不同類型及數目之元件。根據一些實例,設備101可為或可包含經組態用於執行本文中所揭示之方法之至少一些之一裝置,諸如一智慧型音訊裝置、一膝上型電腦、一蜂巢式電話、一平板裝置、一智慧家庭集線器等。在一些此等實施方案中,設備101可為或可包含經組態用於執行本文中所揭示之方法之至少一些之一伺服器。 FIG1A is a block diagram illustrating an example of components of a device capable of implementing various aspects of the present invention. As with other figures provided herein, the types and numbers of components shown in FIG1A are provided by way of example only. Other embodiments may include more, fewer, and/or different types and numbers of components. According to some examples, device 101 may be or may include a device configured to perform at least some of the methods disclosed herein, such as a smart audio device, a laptop, a cellular phone, a tablet device, a smart home hub, etc. In some such embodiments, device 101 may be or may include a server configured to perform at least some of the methods disclosed herein.
在此實例中,設備101包含一介面系統105及一控制系統110。在一些實施方案中,介面系統105可經組態用於將一第一組HRTF提供至控制系統110。在一些實例中,介面系統105可經組態用於輸出控制系統110處理第一組HRTF(諸如一第二組HRTF、基於第二組HRTF之一組基本濾波器、用基本濾波器之一或多者處理之音訊資料(諸如左耳音訊資料及右耳音訊資料)等)之一或多個結果。 In this example, apparatus 101 includes an interface system 105 and a control system 110. In some embodiments, interface system 105 can be configured to provide a first set of HRTFs to control system 110. In some examples, interface system 105 can be configured to output one or more results of control system 110 processing the first set of HRTFs (e.g., a second set of HRTFs, a set of basic filters based on the second set of HRTFs, audio data processed by one or more of the basic filters (e.g., left ear audio data and right ear audio data), etc.).
介面系統105可包含一或多個網路介面及/或一或多個外部裝置介面(諸如一或多個通用串列匯流排(USB)介面)。根據一些實施方案,介面系統105可包含一或多個無線介面。介面系統105可包含用於實施一使用者介面之一或多個裝置,諸如一或多個麥克風、一或多個揚聲器、一顯示器系統、一觸控感測器系統及/或一姿勢感測器系統。在一些實例中,介面系統105可包含位於控制系統110與一記憶體系統(諸如圖1A中所展示之選用記憶體系統115)之間的一或多個介面。然而,在一些例項 中,控制系統110可包含一記憶體系統。 The interface system 105 may include one or more network interfaces and/or one or more external device interfaces (such as one or more Universal Serial Bus (USB) interfaces). According to some embodiments, the interface system 105 may include one or more wireless interfaces. The interface system 105 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system, and/or a gesture sensor system. In some examples, the interface system 105 may include one or more interfaces between the control system 110 and a memory system (such as the optional memory system 115 shown in FIG. 1A ). However, in some examples, control system 110 may include a memory system.
控制系統110可(例如)包含一通用單晶片或多晶片處理器、一數位信號處理器(DSP)、一專用積體電路(ASIC)、一場可程式化閘陣列(FPGA)或其他可程式化邏輯裝置、離散閘或電晶體邏輯及/或離散硬體組件。 The control system 110 may, for example, include a general purpose single-chip or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
在一些實施例中,控制系統110可常駐於超過一個裝置中。例如,控制系統110之一部分可常駐於一環境內之一裝置中(諸如一膝上型電腦、一平板電腦、一智慧音訊裝置等)且控制系統110之另一部分可常駐於環境外之一裝置中(諸如一伺服器)。在其他實例中,控制系統110之一部分可常駐於一環境內之一裝置中且控制系統110之另一部分可常駐於環境之一或多個其他裝置中。 In some embodiments, control system 110 may reside on more than one device. For example, a portion of control system 110 may reside on a device within an environment (e.g., a laptop, a tablet, a smart audio device, etc.) and another portion of control system 110 may reside on a device outside the environment (e.g., a server). In other examples, a portion of control system 110 may reside on a device within an environment and another portion of control system 110 may reside on one or more other devices within the environment.
在一些實施方案中,控制系統110可經組態用於至少部分執行本文中所揭示之方法。根據一些實例,控制系統110可經組態用於接收一第一組HRTF且用於將第一組HRTF變換為一第二組HRTF。第二組HRTF可藉由一線性混合比第一組HRTF更高效近似,同時保留第一組HRTF之心理聲學性質。在一些此等實例中,變換可涉及用第二組HRTF中之全通濾波器替換第一組HRTF之延遲分量。根據一些此等實例,變換可涉及調整第二組HRTF中之全通濾波器之各者之一相位回應使得:針對低於對應全通濾波器之一相關聯臨限頻率之頻率,各耳間相位回應實質上係線性的,且針對高於對應全通濾波器之相關聯臨限頻率之頻率,各耳間相位具有減小之耳間相位差。 In some implementations, the control system 110 can be configured to at least partially perform the methods disclosed herein. According to some examples, the control system 110 can be configured to receive a first set of HRTFs and transform the first set of HRTFs into a second set of HRTFs. The second set of HRTFs can be more efficiently approximated than the first set of HRTFs by linear blending while preserving the psychoacoustic properties of the first set of HRTFs. In some such examples, the transformation can involve replacing the delayed components of the first set of HRTFs with all-pass filters in the second set of HRTFs. According to some such examples, the transformation may involve adjusting a phase response of each of the all-pass filters in the second set of HRTFs such that: for frequencies below a threshold frequency associated with the corresponding all-pass filter, each interaural phase response is substantially linear, and for frequencies above the threshold frequency associated with the corresponding all-pass filter, each interaural phase has a reduced interaural phase difference.
在一些實例中,控制系統110可經組態用於基於第二組HRTF來界定一組基本濾波器。基本濾波器組可具有少於第二組HRTF之 構件。在此內文中,第二組HRTF之一「構件」係第二組HRTF中之HRTF之一者。類似地,基本濾波器組之一「構件」係基本濾波器組之基本濾波器之一者。根據一些實例,基本濾波器組可具有比第二組HRTF少至少一數量級之構件。例如,在一些例項中,第二組HRTF可具有數百或數千個構件,而基本濾波器組可包含少於100個構件、少於50個構件或甚至少於20個構件。 In some examples, control system 110 may be configured to define a set of basic filters based on the second set of HRTFs. The basic filter set may have fewer components than the second set of HRTFs. In this context, a "component" of the second set of HRTFs is one of the HRTFs in the second set of HRTFs. Similarly, a "component" of the basic filter set is one of the basic filters of the basic filter set. According to some examples, the basic filter set may have at least an order of magnitude fewer components than the second set of HRTFs. For example, in some examples, the second set of HRTFs may have hundreds or thousands of components, while the basic filter set may include fewer than 100 components, fewer than 50 components, or even fewer than 20 components.
根據一些實例,控制系統110可經組態用於經由介面系統105接收呈一輸入音訊格式之輸入音訊資料之一位元流。輸入音訊格式可(例如)為一立體音響音訊格式、一基於音訊物件之音訊格式(諸如Dolby AtmosTM)、一基於聲道之音訊格式等。在一些實例中,控制系統110可經組態用於組合輸入音訊資料與基本濾波器組之一或多個基本濾波器以產生左音訊資料及右音訊資料,諸如左耳音訊資料及右耳音訊資料。在一些此等實例中,控制系統110可經組態用於經由介面系統105輸出左音訊資料及右音訊資料。輸出左音訊資料及右音訊資料可涉及儲存左音訊資料及右音訊資料,傳輸左音訊資料及右音訊資料,將左音訊資料及右音訊資料提供至一組揚聲器以供回放,提供左音訊資料及右音訊資料以供進一步處理或其等之組合。 According to some examples, the control system 110 can be configured to receive a bit stream of input audio data in an input audio format via the interface system 105. The input audio format can be, for example, a stereo audio format, an audio object-based audio format (such as Dolby Atmos ™ ), a channel-based audio format, etc. In some examples, the control system 110 can be configured to combine the input audio data with one or more basic filters of the basic filter set to generate left audio data and right audio data, such as left ear audio data and right ear audio data. In some such examples, the control system 110 can be configured to output the left audio data and the right audio data via the interface system 105. Outputting the left audio data and the right audio data may involve storing the left audio data and the right audio data, transmitting the left audio data and the right audio data, providing the left audio data and the right audio data to a set of speakers for playback, providing the left audio data and the right audio data for further processing, or a combination thereof.
在一些實例中,控制系統110可經組態用於實施沉浸式語音及音訊服務(IVAS)之一編解碼器之至少部分。本文中參考圖23描述一些實例。 In some examples, the control system 110 can be configured to implement at least a portion of a codec for an immersive voice and audio service (IVAS). Some examples are described herein with reference to FIG. 23 .
本文中所描述之一些或全部實例可由一或多個裝置根據儲存於一或多個非暫時性媒體上之指令(例如軟體)來執行。此非暫時性媒體可包含諸如本文中所描述之記憶體裝置的記憶體裝置,包含但不限於隨機 存取記憶體(RAM)裝置、唯讀記憶體(ROM)裝置等。一或多個非暫時性媒體可(例如)常駐於圖1A中所展示之選用記憶體系統115中及/或控制系統110中。因此,本發明中所描述之標的物之各種創新態樣可實施於其上儲存有軟體之一或多個非暫時性媒體中。軟體可(例如)包含用於控制至少一個裝置處理音訊資料之指令。軟體可(例如)由諸如圖1A中之控制系統110之一控制系統之一或多個組件執行。 Some or all of the examples described herein may be executed by one or more devices according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, and the like. The one or more non-transitory media may, for example, reside in the optional memory system 115 and/or the control system 110 shown in FIG. 1A . Thus, various innovative aspects of the subject matter described herein may be implemented in one or more non-transitory media on which software is stored. The software may, for example, include instructions for controlling at least one device to process audio data. The software may, for example, be executed by one or more components of a control system, such as control system 110 in FIG. 1A .
在一些實例中,設備101可包含圖1A中所展示之選用麥克風系統120。選用麥克風系統120可包含一或多個麥克風。在一些實施方案中,麥克風之一或多者可為另一裝置(諸如揚聲器系統之一揚聲器、一智慧音訊裝置等)之部分或與另一裝置相關聯。 In some examples, device 101 may include the optional microphone system 120 shown in FIG. 1A . Optional microphone system 120 may include one or more microphones. In some implementations, one or more of the microphones may be part of or associated with another device (e.g., a speaker in a speaker system, a smart audio device, etc.).
根據一些實施方案,設備101可包含圖1A中所展示之選用揚聲器系統125。選用揚聲器系統125可包含一或多個揚聲器。揚聲器有時可指稱「揚聲器(speaker)」。在一些實例中,選用揚聲器系統125之至少一些揚聲器可經任意定位。例如,選用揚聲器系統125之至少一些揚聲器可放置於不對應於任何標準規定之揚聲器佈局之位置中(諸如杜比5.1、杜比5.1.2、杜比7.1、杜比7.1.4、杜比9.1、濱崎22.2等)。在一些此等實例中,選用揚聲器系統125之至少一些揚聲器可放置於空間方便之位置中(例如,其中有空間容納揚聲器之位置中)而非任何標準規定之揚聲器佈局中。 According to some embodiments, device 101 can include optional speaker system 125, as shown in FIG1A . Optional speaker system 125 can include one or more speakers. A speaker is sometimes referred to as a "speaker." In some examples, at least some of the speakers of optional speaker system 125 can be positioned arbitrarily. For example, at least some of the speakers of optional speaker system 125 can be placed in positions that do not correspond to a speaker layout specified by any standard (e.g., Dolby 5.1, Dolby 5.1.2, Dolby 7.1, Dolby 7.1.4, Dolby 9.1, Hamasaki 22.2, etc.). In some of these examples, at least some of the speakers of the selected speaker system 125 may be placed in spatially convenient locations (e.g., locations where there is room to accommodate the speakers) rather than in any standard prescribed speaker layout.
在一些實施方案中,設備101可包含圖1A中所展示之選用感測器系統130。選用感測器系統130可包含一觸控感測器系統、一姿勢感測器系統、一或多個攝影機等。 In some embodiments, device 101 may include the optional sensor system 130 shown in FIG1A . Optional sensor system 130 may include a touch sensor system, a gesture sensor system, one or more cameras, etc.
在一些實施方案中,設備101可包含圖1A中所展示之選用 顯示器系統135。選用顯示器系統135可包含一或多個顯示器,諸如一或多個發光二極體(LED)顯示器。在一些例項中,選用顯示器系統135可包含一或多個有機發光二極體(OLED)顯示器。在其中設備101包含顯示器系統135之一些實例中,感測器系統130可包含靠近顯示器系統135之一或多個顯示器之一觸控感測器系統及/或一姿勢感測器系統。根據一些此等實施方案,控制系統110可經組態用於控制顯示器系統135呈現一圖形使用者介面(GUI),諸如與實施本文中所揭示之方法之一者相關之一GUI。 In some embodiments, device 101 may include optional display system 135, shown in FIG1A . Optional display system 135 may include one or more displays, such as one or more light-emitting diode (LED) displays. In some examples, optional display system 135 may include one or more organic light-emitting diode (OLED) displays. In some examples where device 101 includes display system 135, sensor system 130 may include a touch sensor system and/or a gesture sensor system proximate to one or more displays of display system 135. According to some such implementations, the control system 110 can be configured to control the display system 135 to present a graphical user interface (GUI), such as a GUI associated with implementing one of the methods disclosed herein.
圖1B繪示可用於實施本發明之各種態樣之一實例性裝置架構101(在此實例中,一設備101)之一示意方塊圖。圖1B之設備101係圖1A之設備101之一例項。架構101包含(但不限於)伺服器及用戶端裝置、系統等,其等可經組態以執行參考圖11至圖17及圖21之任何者或全部所描述之方法。如所展示,架構101包含中央處理單元(CPU)141,其能夠根據儲存於(例如)唯讀記憶體(ROM)142中之一程式或自(例如)儲存單元148載入至隨機存取記憶體(RAM)143之一程式來執行各種程序。CPU 141可為(例如)一電子處理器。在此等實例中,CPU 141係圖1A之控制系統110之一例項且ROM 142及RAM 143係記憶體系統115之例項。在RAM 143中,根據需要,亦儲存在CPU 141執行各種程序時所需之資料。CPU 141、ROM 142及RAM 143經由匯流排144彼此連接。輸入/輸出(I/O)介面145亦連接至匯流排144。匯流排144及I/O介面145係圖1A之介面系統105之例項。 FIG1B illustrates a schematic block diagram of an exemplary device architecture 101 (in this example, a device 101) that may be used to implement various aspects of the present invention. The device 101 of FIG1B is an example of the device 101 of FIG1A. The architecture 101 includes, but is not limited to, server and client devices, systems, and the like that may be configured to perform the methods described with reference to any or all of FIG11 through FIG17 and FIG21. As shown, the architecture 101 includes a central processing unit (CPU) 141 that is capable of executing various programs based on a program stored in, for example, a read-only memory (ROM) 142 or a program loaded from, for example, a storage unit 148 into a random access memory (RAM) 143. CPU 141 can be, for example, an electronic processor. In these examples, CPU 141 is an example of control system 110 in FIG. 1A , and ROM 142 and RAM 143 are examples of memory system 115 . RAM 143 also stores data required by CPU 141 to execute various programs, as needed. CPU 141, ROM 142, and RAM 143 are connected to each other via bus 144 . Input/output (I/O) interface 145 is also connected to bus 144 . Bus 144 and I/O interface 145 are examples of interface system 105 in FIG. 1A .
以下組件連接至I/O介面145:輸入單元146,其可包含一鍵盤、一滑鼠或其類似者;輸出單元147,其可包含一顯示器,諸如一液晶顯示器(LCD)及一或多個揚聲器;儲存單元148,其包含一硬碟或另一 適合儲存裝置;及通信單元149,其包含一網路介面卡,諸如一網卡(例如有線或無線)。 The following components are connected to the I/O interface 145: an input unit 146, which may include a keyboard, a mouse, or the like; an output unit 147, which may include a display, such as a liquid crystal display (LCD) and one or more speakers; a storage unit 148, which may include a hard drive or another suitable storage device; and a communication unit 149, which may include a network interface card, such as a network card (e.g., wired or wireless).
在一些實施方案中,輸入單元146包含實現各種格式(例如單聲道、立體聲、空間、沉浸式及其他適合格式)之音訊信號之捕捉之處於不同位置中(取決於主機裝置)之一或多個麥克風。 In some implementations, the input unit 146 includes one or more microphones located in different locations (depending on the host device) that enable capture of audio signals in various formats (e.g., mono, stereo, spatial, immersive, and other suitable formats).
在一些實施方案中,輸出單元147包含具有各種數目之揚聲器之系統。輸出單元147(取決於主機裝置之能力)可呈現各種格式(例如單聲道、立體聲、沉浸式、雙耳及其他適合格式)之音訊信號。 In some embodiments, output unit 147 includes a system with a varying number of speakers. Output unit 147 (depending on the capabilities of the host device) can present audio signals in a variety of formats, such as mono, stereo, immersive, binaural, and other suitable formats.
在一些實施例中,通信單元149經組態以與其他裝置通信(例如,經由一網路)。驅動器150亦根據需要連接至I/O介面145。可移除媒體151(諸如一磁碟、一光碟、一磁光碟、一快閃驅動器或另一適合可移除媒體)安裝於驅動器150上,使得自其讀取之一電腦程式根據需要安裝至儲存單元148中。熟習技術者應理解,儘管設備101經描述為包含上述組件,但在真實應用中,可添加、移除及/或替換此等組件之一些且所有此等修改或更改全部落入本發明之範疇內。 In some embodiments, communication unit 149 is configured to communicate with other devices (e.g., via a network). Drive 150 is also connected to I/O interface 145 as needed. Removable media 151 (e.g., a magnetic disk, an optical disk, a magneto-optical disk, a flash drive, or another suitable removable medium) is mounted on drive 150, allowing a computer program read therefrom to be installed into storage unit 148 as needed. Those skilled in the art will understand that although device 101 is described as including the aforementioned components, in actual applications, some of these components may be added, removed, and/or replaced, and all such modifications or changes fall within the scope of the present invention.
根據本發明之實例性實施例,上述程序可作為電腦軟體程式或在一電腦可讀儲存媒體上實施。例如,本發明之實施例包含含有有形地體現於一機器可讀媒體上之一電腦程式之一電腦程式產品,該電腦程式包含用於執行方法之程式碼。在此等實施例中,電腦程式可經由通信單元149自網路下載及安裝,及/或自可移除媒體151安裝,如圖1B中所展示。 According to exemplary embodiments of the present invention, the aforementioned procedures may be implemented as a computer software program or on a computer-readable storage medium. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for executing the method. In such embodiments, the computer program may be downloaded and installed from a network via communication unit 149 and/or installed from removable media 151, as shown in FIG. 1B .
圖1C繪示在可用於實施本發明之各種態樣之圖1B之裝置架構101中實施之一實例性CPU 141之一示意方塊圖。CPU 141包含一電子處理器160及一記憶體161。電子處理器160電氣及/或通信連接至記憶體 161以供雙向通信。記憶體161儲存編碼軟體162及解碼軟體163。記憶體161可為(例如)一ROM、一RAM或另一非暫時性電腦可讀媒體。電子處理器160可實施儲存於記憶體161中之編碼軟體162以執行(尤其)圖21之方法2100。另外,電子處理器160可實施儲存於記憶體161中之解碼軟體163以執行(尤其)參考圖11至圖17及圖21之任何者或全部所描述之方法。 FIG1C illustrates a schematic block diagram of an exemplary CPU 141 implemented in the device architecture 101 of FIG1B , which may be used to implement various aspects of the present invention. CPU 141 includes an electronic processor 160 and a memory 161 . Electronic processor 160 is electrically and/or communicatively coupled to memory 161 for bidirectional communication. Memory 161 stores encoding software 162 and decoding software 163 . Memory 161 may be, for example, a ROM, RAM, or another non-transitory computer-readable medium. Electronic processor 160 may implement encoding software 162 stored in memory 161 to perform, among other things, method 2100 of FIG21 . Additionally, the electronic processor 160 may implement decoding software 163 stored in the memory 161 to perform the methods described with reference to, among other things, any or all of Figures 11 to 17 and 21.
一般而言,本發明之各種實例性實施例可在硬體或專用電路(例如控制電路系統)、軟體、邏輯或其等之任何組合中實施。例如,上文所討論之單元可由控制電路系統(例如,與圖1B之其他組件組合之CPU 141)執行,因此,控制電路系統可執行本發明中所描述之動作。一些態樣可在硬體中實施,而其他態樣可在可由一控制器、微處理器或其他運算裝置(例如控制電路系統)執行之韌體或軟體中實施。儘管本發明之實例性實施例之各種態樣經繪示及描述為方塊圖、流程圖或使用一些其他圖示表示,但應瞭解,作為非限制性實例,本文中所描述之區塊、設備、系統、技術或方法可在硬體、軟體、韌體、專用電路或邏輯、通用硬體或控制器或其他運算裝置、或其等之某一組合中實施。 In general, various exemplary embodiments of the present invention may be implemented in hardware or dedicated circuitry (e.g., control circuitry), software, logic, or any combination thereof. For example, the units discussed above may be executed by the control circuitry (e.g., CPU 141 in combination with other components of FIG. 1B ), and thus, the control circuitry may perform the actions described herein. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device (e.g., control circuitry). Although various aspects of exemplary embodiments of the present invention may be depicted and described as block diagrams, flow charts, or using some other graphical representation, it should be understood that, as non-limiting examples, the blocks, devices, systems, techniques, or methods described herein may be implemented in hardware, software, firmware, dedicated circuits or logic, general-purpose hardware or controllers or other computing devices, or some combination thereof.
另外,流程圖中所展示之各種區塊可被視為方法步驟,及/或由電腦程式碼之操作產生之操作,及/或經建構以實行(若干)相關聯功能之複數個耦合邏輯電路元件。例如,本發明之實施例包含含有有形地體現於一機器可讀媒體上之一電腦程式之一電腦程式產品,該電腦程式含有經組態以實行如上文所描述之方法之程式碼。 Additionally, the various blocks shown in the flowcharts may be viewed as method steps, and/or operations resulting from the operation of computer program code, and/or as a plurality of coupled logic circuit elements constructed to perform (some) related functions. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code configured to perform the method described above.
在本發明之內文中,一機器可讀媒體可為可含有或儲存由一指令執行系統、設備或裝置使用或結合一指令執行系統、設備或裝置一起使用之一程式之任何有形媒體。機器可讀媒體可為一機器可讀信號媒體 或一機器可讀儲存媒體。一機器可讀媒體可為非暫時性的且可包含(但不限於)一電子、磁、光學、電磁、紅外線或半導體系統、設備或裝置或前述之任何適合組合。機器可讀儲存媒體之較多特定實例將包含以下各項:具有一或多個電線之一電連接、一可攜式電腦磁片、一硬碟、一隨機存取記憶體(RAM)、一唯讀記憶體(ROM)、一可擦除可程式化唯讀記憶體(EPROM或快閃記憶體)、一光纖、一可攜式光碟唯讀記憶體(CD-ROM)、一光學儲存裝置、一磁性儲存裝置或前述之任何適合組合。 In the context of this invention, a machine-readable medium is any tangible medium that can contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be non-transitory and can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include the following: an electrical connection having one or more wires, a portable computer disk, a hard drive, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
用於實行本發明之方法之電腦程式碼可依一或多個程式化語言之任何組合寫入。可將此等電腦程式碼提供至一通用電腦、專用電腦或具有控制電路系統之其他可程式化資料處理設備之一處理器,使得在由電腦或其他可程式化資料處理設備執行時,程式碼引起流程圖及/或方塊圖中所指定之功能/操作被實施。程式碼可作為一單獨軟體封裝在一電腦上完全執行、在電腦上部分執行、在電腦上部分執行及在一遠端電腦上部分執行或在遠端電腦或伺服器上完全執行或分佈於一或多個遠端電腦及/或伺服器上。 Computer program code for implementing the methods of the present invention can be written in any combination of one or more programming languages. Such computer program code can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device having a control circuit system, so that when executed by the computer or other programmable data processing device, the program code causes the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code can be packaged as a separate software and executed entirely on one computer, partially on one computer, partially on one computer and partially on a remote computer, or entirely on a remote computer or server, or distributed across one or more remote computers and/or servers.
圖1D係根據一或多個實施例之用於編碼及解碼一沉浸式語音及音訊服務(IVAS)位元流之IVAS編碼器/解碼器(「編解碼器」)框架170之一方塊圖。預期IVAS支援一系列音訊服務能力,包含(但不限於)單聲道至立體聲升混及完全沉浸式音訊編碼、解碼及呈現。亦預期IVAS由廣泛裝置、終端及網路節點支援,包含(但不限於):行動及智慧型電話、電子平板電腦、個人電腦、會議電話、會議室、虛擬實境(VR)及擴增現實(AR)裝置、家庭影院裝置及其他適合裝置。 FIG1D is a block diagram of an Immersive Voice and Audio Service (IVAS) codec/decoder (“codec”) framework 170 for encoding and decoding an IVAS bitstream, according to one or more embodiments. IVAS is expected to support a range of audio service capabilities, including, but not limited to, mono-to-stereo upmixing and fully immersive audio encoding, decoding, and presentation. IVAS is also expected to be supported by a wide range of devices, terminals, and network nodes, including, but not limited to, mobile and smartphones, tablets, personal computers, conference phones, meeting rooms, virtual reality (VR) and augmented reality (AR) devices, home theater devices, and other suitable devices.
在此實例中,IVAS編解碼器170包含IVAS編碼器171及IVAS解碼器174。在一些實例中,IVAS編碼器171、IVAS解碼器174或兩者可由圖1A之控制系統110之一或多個例項、圖1B及圖1C之CPU 141等實施。在一些實例中,IVAS編碼器171可由圖1C之編碼軟體162實施且IVAS解碼器174可由圖1C之解碼軟體163實施。根據一些實例,實施IVAS編碼器171、IVAS解碼器174或兩者之一控制系統亦可經組態以執行本文中所揭示之操作之一些或全部,諸如參考圖11至圖17及圖21之一或多者所描述之方法。 In this example, the IVAS codec 170 includes an IVAS codec 171 and an IVAS decoder 174. In some examples, the IVAS codec 171, the IVAS decoder 174, or both can be implemented by one or more instances of the control system 110 of FIG. 1A , the CPU 141 of FIG. 1B and FIG. 1C , or the like. In some examples, the IVAS codec 171 can be implemented by the encoding software 162 of FIG. 1C , and the IVAS decoder 174 can be implemented by the decoding software 163 of FIG. 1C . According to some examples, a control system implementing the IVAS codec 171, the IVAS decoder 174, or both can also be configured to perform some or all of the operations disclosed herein, such as the methods described with reference to one or more of FIG. 11 to FIG. 17 and FIG. 21 .
根據此實例,IVAS編碼器171包含N個聲道之接收輸入空間音訊(例如FOA、HOA)之空間編碼器172。在一些實施方案中,空間編碼器172可經組態以實施空間重建(SPAR)、定向音訊編碼(DirAC)、另一空間音訊編碼技術或其等之組合。在此實例中,空間編碼器172之輸出包含一空間元資料(MD)位元流(BS)及N_dmx個聲道之空間降混。根據此實例,空間MD經量化及熵編碼。在一些實施方案中,量化可包含細化、中等、粗化及特別粗化量化策略且熵編碼可包含霍夫曼或算數編碼。在一些實施方案中,框架可允許在一給定操作模式下不超過3個量化位準;然而,隨著位元速率降低,在一些此等實施方案中,三個位準總體上變得越來越粗糙以滿足位元速率要求。根據此實例,可(例如)基於一單聲道增強語音服務(EVS)編碼單元之核心音訊編碼器173經組態以將空間降混之N_dmx聲道(N_dmx=1至16聲道)編碼成一音訊位元流,其與空間MD位元流組合成傳送至IVAS解碼器174之一IVAS編碼位元流。 According to this example, IVAS encoder 171 includes a spatial encoder 172 that receives N channels of input spatial audio (e.g., FOA, HOA). In some embodiments, spatial encoder 172 can be configured to implement spatial reconstruction (SPAR), directional audio coding (DirAC), another spatial audio coding technique, or a combination thereof. In this example, the output of spatial encoder 172 includes a spatial metadata (MD) bitstream (BS) and a spatial downmix of N_dmx channels. According to this example, the spatial MD is quantized and entropy coded. In some embodiments, quantization can include fine, medium, coarse, and extra-coarse quantization strategies, and entropy coding can include Huffman or arithmetic coding. In some implementations, the framework may allow no more than three quantization levels in a given operating mode; however, as the bit rate decreases, in some such implementations, the three levels generally become coarser to meet the bit rate requirements. According to this example, the core audio encoder 173, which may be based on a mono Enhanced Voice Service (EVS) coding unit, for example, is configured to encode the N_dmx channels ( N_dmx = 1 to 16 channels) of the spatial downmix into an audio bitstream, which is combined with the spatial MD bitstream into an IVAS-encoded bitstream that is transmitted to the IVAS decoder 174.
在此實例中,IVAS解碼器174包含將自IVAS位元流提取之音訊位元流進行解碼以恢復N_dmx音訊聲道之核心音訊解碼器175(例 如EVS解碼器)。根據此實例,空間解碼器/呈現器176(例如SPAR/DirAC)將自IVAS位元流提取之空間MD位元流進行解碼以恢復空間MD,且使用空間MD及一空間升混來合成/呈現輸出音訊聲道以供在具有不同揚聲器組態及能力之各種音訊系統上回放。 In this example, the IVAS decoder 174 includes a core audio decoder 175 (e.g., an EVS decoder) that decodes the audio bitstream extracted from the IVAS bitstream to recover the N_dmx audio channels. According to this example, the spatial decoder/renderer 176 (e.g., SPAR/DirAC) decodes the spatial MD bitstream extracted from the IVAS bitstream to recover the spatial MD, and uses the spatial MD and a spatial upmix to synthesize/render the output audio channels for playback on various audio systems with different speaker configurations and capabilities.
圖1E展示參考一收聽者之頭部之一座標系統之一實例。頭部相關傳輸函數(HRTF)濾波器可用於處理音訊信號以產生雙耳音訊信號,以向一收聽者提供自規定到達方向到達之聲音之錯覺。到達方向可用一(x,y,z)單位向量來界定,其中笛卡爾座標可如圖1E中所展示般界定。根據圖1E中所展示之實例,一座標系之原點大致位於收聽者之頭部200之中心,且X軸801指向前方(在收聽者之鼻子之方向上),Y軸802指向收聽者之左側,且Z軸803向上指向收聽者之頭部之頂部。 FIG1E shows an example of a coordinate system with reference to a listener's head. A head-related transfer function (HRTF) filter can be used to process audio signals to generate binaural audio signals, providing a listener with the illusion that sounds are arriving from a specified arrival direction. The arrival direction can be defined by an ( x, y, z ) unit vector, where Cartesian coordinates can be defined as shown in FIG1E . According to the example shown in FIG1E , the origin of the coordinate system is approximately at the center of the listener's head 200, with the X-axis 801 pointing forward (in the direction of the listener's nose), the Y-axis 802 pointing to the left of the listener, and the Z-axis 803 pointing upward toward the top of the listener's head.
音訊信號s(t)可使用HRTF濾波器來處理以向收聽者提供自由單位向量(x,y,z)界定之到達方向到達之(信號s(t))之聲音之錯覺。此程序藉由將輸入音訊信號與一對HRTF濾波器(h l (t)及h r (t))之各者進行卷積來產生兩個耳朵信號e l (t)及e r (t):
HRTF濾波器(h l (t)及h r (t))可根據以下自方向向量(x,y,z)推導:(h l (t),h r (t))←H(x,y,z) (3) The HRTF filters ( hl ( t ) and hr ( t )) can be derived from the direction vector ( x,y,z ) as follows: ( hl ( t ) ,hr ( t ))← H ( x,y,z ) (3)
H(x,y,z)在本文中指稱一HRTF集合函數,因為此函數適合於運算一組(x,y,z)方向向量之HRTF濾波器。HRTF集合函數針對其產生有效HRTF濾波器之(x,y,z)向量集合在本文中指稱HRTF集合函數之域。 In this paper, H ( x,y,z ) is referred to as a HRTF set function because it is suitable for computing HRTF filters for a set of ( x,y,z ) direction vectors. The set of ( x,y,z ) vectors for which the HRTF set function generates valid HRTF filters is referred to as the domain of the HRTF set function.
在下文給定之解釋中,時域脈衝回應用於表示濾波器回應。熟習技術者應瞭解,濾波器回應之等效儲存及操縱可在其他域中實行,包含(但不限於)頻域。 In the explanations given below, time-domain pulse responses are used to represent filter responses. Those skilled in the art will appreciate that equivalent storage and manipulation of filter responses can be implemented in other domains, including (but not limited to) the frequency domain.
一HRTF集合函數可用於建立一HRTF離散庫,其界定一組N(x,y,z)單位向量之左耳及右耳HRTF回應:
且當在方程式4中評估HRTF集合函數時,HRTF離散庫可寫為:
期望能夠提供用於界定一HRTF集合函數之一構件,藉此由HRTF集合函數產生之各輸出HRTF濾波器由基本濾波器之一線性組合形成。一線性HRTF集合函數可根據方程式6來界定,其中e l (t)及e r (t)濾波器經運算為:
根據方程式6,一組K個左耳基本濾波器(t)及K個右耳基本濾波器(t)與由增益函數(x,y,z)及(x,y,z)界定之權重線性組合。 According to Equation 6, a set of K left-ear basic filters ( t ) and K right ear basic filters ( t ) and the gain function ( x,y,z ) and A weighted linear combination defined by ( x, y, z ).
在一替代實施例中,一對稱HRTF集合函數可使用基本濾波器及增益函數之一較小集合來界定(其中用於方向(x,y,z)之左耳HRTF濾波器相同於用於方向(x,-y,z)之右耳HRTF):
在不損失一般性之情況下,吾人可檢查方程式7之第一行,應理解以下解釋將同樣適用於方程式7之第二行及/或方程式6。 Without loss of generality, we can examine the first line of Equation 7, understanding that the following explanation will also apply to the second line of Equation 7 and/or Equation 6.
針對一組N個到達方向((x n ,y n ,z n ),n=1..N),吾人可依矩陣形式重寫方程式7之第一行(亦省略e l (t)之下標l以簡化方程式):
吾人依更簡單形式將方程式8重寫為:E(t)=G×B(t) (9) We rewrite Equation 8 in a simpler form: E ( t ) = G × B ( t ) (9)
在方程式9中,行向量E(t)界定N個單位向量((x n ,y n ,z n ),n=1..N)之一組N個左耳HRTF濾波器回應,且行向量B(t)界定一組K個濾波器回應。在一些實施例中,一目標係判定濾波器回應B(t),使得所得HRTF濾波器E(t)係一組原始HRTF濾波器回應E orig (t)之一緊密近似。 In Equation 9, the row vector E ( t ) defines a set of N left- ear HRTF filter responses of N unit vectors ( ( xn , yn , zn ) , n = 1..N), and the row vector B ( t ) defines a set of K filter responses. In some embodiments, a goal is to determine the filter responses B ( t ) such that the resulting HRTF filter E ( t ) is a close approximation of the set of original HRTF filter responses Eorig ( t ) .
已知用於判定適合濾波器B(t)之各種方法,且根據以下發現一個實例:B(t)=G +×E orig (t) (10)其中G +係指矩陣G之偽逆(如方程式9中所界定)。 Various methods for determining a suitable filter B ( t ) are known, and one example is found as follows: B ( t ) = G + × Eorig ( t ) (10) where G + refers to the pseudo-inverse of the matrix G (as defined in Equation 9 ).
應瞭解,可採用其他方法,其中各方法之目標可為最小化差E(t)-E orig (t)之幅值。 It will be appreciated that other methods may be employed, wherein the objective of each method may be to minimize the magnitude of the difference E ( t ) - Eorig ( t ) .
可能需要大量(K)基本濾波器以提供一合理近似(E(t) E orig (t))。使用一線性混合程序(根據方程式6、7或8)之困難在於HRTF濾 波器之高頻分量通常可能很難用線性混合來界定。 A large number ( K ) of elementary filters may be required to provide a reasonable approximation ( E ( t ) E orig ( t )). The difficulty with using a linear mixing procedure (according to Equations 6, 7, or 8) is that the high-frequency components of the HRTF filter can often be difficult to define using linear mixing.
在一些實施例中,原始HRTF濾波器集合E orig (t)經修改以產生修改之HRTF濾波器集合E mod (t),其中修改之HRTF濾波器在高頻下之其相位回應方面不同於原始濾波器。針對N個方向之各者,吾人可使用傅里葉變換來界定原始HRTF及修改之HRTF之頻率回應:R orig,n (f)=F{E orig,n (t)} R mod,n (f)=F{E mod,n (t)} (11) In some embodiments, an original HRTF filter set E orig ( t ) is modified to produce a modified HRTF filter set E mod ( t ) , where the modified HRTF filters differ from the original filters in their phase response at high frequencies. For each of the N directions, we can use a Fourier transform to define the frequency response of the original HRTF and the modified HRTF: R orig,n ( f ) = F { E orig,n ( t )} R mod,n ( f ) = F { E mod,n ( t )} (11)
頻率回應函數R orig,n (f)及R mod,n (f)係複值的,且因此吾人接著可以說:
圖2、圖3及圖4展示HRTF濾波器之脈衝回應之實例。圖2展示一左耳HRTF濾波器之脈衝回應111,針對到達方向:(x,y,z)=(係左前之一方向)。同樣地,圖3展示用於相同到達方向之右耳HRTF之脈衝回應211。自圖3之檢查可見,脈衝回應211包含0.4ms之一延遲。 Figures 2, 3, and 4 show examples of pulse responses of HRTF filters. Figure 2 shows the pulse response 111 of a left ear HRTF filter for the arrival direction: (x, y, z ) = (This is a direction of the front left.) Similarly, FIG3 shows the impulse response 211 of the right ear HRTF for the same arrival direction. From inspection of FIG3 , it can be seen that the impulse response 211 includes a delay of 0.4 ms.
圖4展示藉由自圖3之脈衝回應211移除0.4ms延遲來建立之 (無延遲)脈衝回應311。圖5展示指示相位回應與頻率之曲線圖之實例。圖3之0.4ms延遲亦可界定為圖5中之一線性相位回應411。另外,圖5中繪製一替代相位回應412,藉此,針對0至1400Hz之間的頻率,此替代相位曲線412緊密匹配線性相位回應411。 Figure 4 shows a (delay-free) pulse response 311 created by removing a 0.4 ms delay from the pulse response 211 in Figure 3 . Figure 5 shows an example of a graph indicating phase response versus frequency. The 0.4 ms delay in Figure 3 can also be defined as a linear phase response 411 in Figure 5 . Additionally, an alternative phase response 412 is plotted in Figure 5 , whereby this alternative phase curve 412 closely matches the linear phase response 411 for frequencies between 0 and 1400 Hz.
在一些實施例中,圖3之原始脈衝回應211可藉由移除0.4ms之大延遲來修改以產生圖4之無延遲脈衝回應311,且圖5之相位回應412可應用於脈衝回應311以產生針對低於1400Hz之頻率擁有正確相位回應之一新的濾波器脈衝回應。不幸的是,此可導致非因果之一新的脈衝回應,因為為了在一即時音訊程序中實施此濾波器,可添加3ms之一額外延遲以產生脈衝回應:例如,參閱圖10中所展示之脈衝回應911。為了維持與右耳回應之相容性,此實例性脈衝回應111(左耳回應)亦將需要添加一3ms延遲,以導致圖9之脈衝回應811。 In some embodiments, the original pulse response 211 of FIG3 can be modified by removing the large delay of 0.4 ms to produce the delay-free pulse response 311 of FIG4, and the phase response 412 of FIG5 can be applied to the pulse response 311 to produce a new filter pulse response with the correct phase response for frequencies below 1400 Hz. Unfortunately, this can result in a new pulse response that is non-causal, because in order to implement this filter in a real-time audio program, an additional delay of 3 ms can be added to produce the pulse response: for example, see the pulse response 911 shown in FIG10. To maintain compatibility with the right ear response, this example pulse response 111 (left ear response) will also need to have a 3ms delay added, resulting in the pulse response 811 of FIG9 .
一些所揭示之實例涉及修改用於左耳及右耳兩者之原始HRTF濾波器,以提供類似於圖5之相位回應412中所展示之耳間相位差的一耳間相位差,而沒有一非所要延遲(例如,上文參考圖9及圖10所討論之3ms延遲)之副作用。 Some disclosed examples involve modifying the original HRTF filters for both the left and right ears to provide an interaural phase difference similar to the interaural phase difference shown in the phase response 412 of FIG. 5 , without the side effect of an undesirable delay (e.g., the 3 ms delay discussed above with reference to FIG. 9 and FIG. 10 ).
圖6展示因果全通濾波器之實例。圖7及圖8展示可由因果全通濾波器產生之修改之HRTF之實例。在一些實施例中,圖6之因果全通濾波器511及512可分別應用於原始左耳及右耳脈衝回應111及311以分別產生圖7之修改之左耳HRTF 611及圖8之修改之右耳HRTF 711。 FIG6 shows an example of a causal all-pass filter. FIG7 and FIG8 show examples of modified HRTFs that can be generated by the causal all-pass filter. In some embodiments, the causal all-pass filters 511 and 512 of FIG6 can be applied to the original left-ear and right-ear impulse responses 111 and 311, respectively, to generate the modified left-ear HRTF 611 of FIG7 and the modified right-ear HRTF 711 of FIG8, respectively.
圖11展示HRTF處理區塊之實例。根據一些實例,圖11之區塊可由圖1A之控制系統110實施,例如根據儲存於電腦可讀媒體上之指令。在圖11中,配置100展示一原始HRTF脈衝回應211 h(t),其由HRTF 處理區塊251接收及處理以判定大延遲140 d(係脈衝回應211中固有之延遲)。根據此實例,HRTF處理區塊251亦產生無延遲HRTF h'(t),使得h'(t)=h(t+d)。 FIG11 illustrates an example of an HRTF processing block. According to some examples, the blocks of FIG11 can be implemented by the control system 110 of FIG1A , for example, according to instructions stored on a computer-readable medium. In FIG11 , configuration 100 illustrates a raw HRTF pulse response 211 h ( t ), which is received and processed by HRTF processing block 251 to determine a large delay 140 d (the delay inherent in pulse response 211). According to this example, HRTF processing block 251 also generates a delay-free HRTF h' ( t ), such that h' ( t ) = h ( t + d ).
在圖11中所展示之實例中,全通產生器252回應於延遲140 d而產生全通濾波器512 α(t),且卷積程序253組合無延遲脈衝回應311及全通濾波器512以產生修改之右耳HRTF 711 m(t)。 In the example shown in FIG11 , the all-pass generator 252 generates the all-pass filter 512 α ( t ) in response to a delay of 140 d , and the convolution process 253 combines the undelayed pulse response 311 and the all-pass filter 512 to generate the modified right-ear HRTF 711 m ( t ).
全通濾波器512 α(t)可經界定為一函數,諸如以下:α(t)=C(d,t) (13)其中函數C(d,t)界定全通產生器252之操作。吾人對C(d,t)之相位回應感興趣:Φ(f,d)=arg(F{C(d,t)}(f)) (14)其中Φ(f,d)表示由全通產生器252針對延遲值d產生之全通濾波器之頻率f下之相位回應。 The all-pass filter 512 α ( t ) can be defined as a function as follows: α ( t ) = C ( d,t ) (13) where the function C ( d,t ) defines the operation of the all-pass generator 252. We are interested in the phase response of C ( d,t ): Φ ( f,d ) = arg( F { C ( d,t )}( f )) (14) where Φ ( f,d ) represents the phase response of the all-pass filter generated by the all-pass generator 252 at frequency f for a delay value d .
讓吾人界定Φ 0(t)=arg(F{C(0,t)}(f)),其係由全通產生器252在延遲d=0時產生之全通相位回應。吾人可將此指代為零延遲全通。在一些實施例中,吾人可能需要全通相位回應來滿足:
方程式15左側表示零延遲全通與針對延遲d界定之全通濾波器之間的相位差。此相位差等同於圖5之相位回應412。方程式15之右側表示針對一延遲d預期之線性相位斜坡。此等同於圖5之線性相位回應411。 The left side of Equation 15 represents the phase difference between a zero-delay all-pass filter and an all-pass filter defined for a delay d . This phase difference is equivalent to the phase response 412 of Figure 5 . The right side of Equation 15 represents the expected linear phase ramp for a delay d . This is equivalent to the linear phase response 411 of Figure 5 .
因此,方程式15表達要求,即在此實例中,針對直至F p 之頻率,全通相位回應412應匹配線性相位回應411。另外,方程式15界定 延遲值範圍之一上限d max ,在該延遲值範圍內,期望全通產生器函數C(d,t)產生有效結果。d max 之一典型值係d max =0.7ms(毫秒),但在一些應用中,d max 可為某個其他值,諸如介於0.6ms與0.8ms之間的一值、介於0.5ms與0.8ms之間的一值、介於0.6ms與0.9ms之間的一值、介於0.5ms與1.0ms之間的一值,等等。 Thus, Equation 15 expresses the requirement that, in this example, the all-pass phase response 412 should match the linear phase response 411 for frequencies up to Fp . Furthermore, Equation 15 defines an upper limit dmax for the range of delay values within which the all-pass generator function C ( d,t ) is expected to produce valid results. A typical value for dmax is dmax = 0.7 ms (milliseconds), but in some applications, dmax can be some other value, such as a value between 0.6ms and 0.8ms, a value between 0.5ms and 0.8ms, a value between 0.6ms and 0.9ms, a value between 0.5ms and 1.0ms, and so on.
在一些實施例中,M個延遲值(d 1 ,d 2 ,...,d M )之一有限集合跨越自0至d max 之範圍選擇,且適合全通回應(α 1(t),α 2 ,...(t),α M (t))可根據一最佳化程序來預運算。在此情況中,全通產生器函數C(d,t)可藉由利用預存之M個全通回應藉由一查找表或一內插函數來實施。 In some embodiments, a finite set of M delay values ( d1 , d2 , ... , dM ) is selected spanning the range from 0 to dmax , and suitable full - pass responses ( α1 ( t ) , α2 , ... ( t ) , αM ( t )) can be pre-calculated according to an optimization procedure. In this case, the full-pass generator function C ( d,t ) can be implemented by using a lookup table or an interpolation function using the M pre-stored full-pass responses.
在一些進一步實施例中,全通回應(例如,第m個全通濾波器α m (t))之各者可經界定為具有T個共軛極點對及其對應共軛零對之一無限脈衝回應(IIR)濾波器。T個濾波器極點之一基本集合p 1,m ,p 2,m ,...,d T,m 可經選擇,且接著可濾波器α m (t)可經界定為具有極點(p 1,m ,,p 2,m ,,...,p T,m , )及零點( ,,,,...,, )之一全通濾波器。 In some further embodiments, each of the all-pass filters (e.g., the m-th all-pass filter α m ( t )) can be defined as an infinite impulse response (IIR) filter having T pairs of conjugated poles and their corresponding pairs of conjugated zeros. A basic set of T filter poles p 1 ,m , p 2 ,m , ... , d T,m can be selected, and then the filter α m ( t ) can be defined as having poles ( p 1 ,m , ,p 2 ,m , , ... ,p T,m , ) and zero point ( , , , , ... , , ) is an all-pass filter.
因此,當所有M個全通回應(α 1(t),α 2 ,...(t),α M (t))之基本集合經界定為2T階之IIR濾波器,M個全通回應之集合根據[T×M]個複基極點被完全界定:
給定表示M個全通濾波器集合之基極之一矩陣P(各全通濾波器具有T個複基極點),且給定對應之延遲值集合(d 1 ,d 2 ,...,d M ),則可形成一多項式近似,使得基極可經界定為d之一多項式函數。此多項式近似程序可根據已知方法來實施,包含(但不限於)MATLAB POLYFIT函數。 Given a matrix P representing the basis of a set of M all-pass filters (each all-pass filter has T complex basis poles) and a corresponding set of delay values ( d 1 , d 2 , ... , d M ), a polynomial approximation can be formed such that the basis can be defined as a polynomial function of d . This polynomial approximation procedure can be implemented according to known methods, including (but not limited to) the MATLAB POLYFIT function.
在一些實施例中,複數基極之數目係T=3,且可使用4次多項式來運算作為延遲d之一函數之基極值。根據此實施例,用於運算一全通濾波器α(t)之程序藉由以下操作序列來實行: In some embodiments, the number of complex bases is T = 3, and a 4th-degree polynomial can be used to calculate the base value as a function of the delay d . According to this embodiment, the process for calculating an all-pass filter α ( t ) is implemented by the following sequence of operations:
1.給定延遲d,根據以下來運算基極(p 1 ,p 2 ,p 3):p t =c 1,t +c 2,t d+c 3,t d 2+c 4,t d 3+c 5,t d 4 1. Given a delay d , calculate the basis ( p1 , p2 , p3 ) as follows : pt = c1 , t + c2 , td + c3 , td2 + c4 , td3 + c5 , td4
2.形成具有基極p 1 ,,p 2 ,,p 3 , 之一IIR濾波器,且對應零點: ,,,,, 2. Form a base p1 , ,p 2 , ,p 3 , One of the IIR filters, and the corresponding zero point: , , , , ,
3.運算IIR濾波器之脈衝回應以形成全通回應α(t) 3. Calculate the pulse response of the IIR filter to form the all-pass response α ( t )
上文所展示之三個步驟展示使用多項式函數作為運算一濾波器之極點之一便捷方式。當然,一多項式可僅給定「最佳」極點之一近似,且眾所周知,極點位置之小誤差可導致所得濾波器回應之大變化。一些替代方法涉及將一非線性函數(諸如方程式18)應用於多項式值(Jl及Jl+1)。此非線性函數可經界定為使得多項式值(J1及Jl+1)中之小誤差將不再導致極點位置中之大誤差。此外,一些此實例涉及運算「s域」中之極點位置,且接著將其映射至「z域」。根據一些實例,此映射可使用 MATLAB函數「雙線性」來實施。可使用非線性映射函數之其他選擇,且此等非線性映射函數可在z域、s域或其他域中產生極點。 The three steps presented above demonstrate the use of polynomial functions as a convenient way to compute the poles of a filter. Of course, a polynomial can only give an approximation to the "best" poles, and it is well known that small errors in the pole locations can lead to large changes in the resulting filter response. Some alternative methods involve applying a nonlinear function (such as Equation 18) to the polynomial values (J 1 and J 1+1 ). This nonlinear function can be defined so that small errors in the polynomial values (J 1 and J 1+1 ) will no longer lead to large errors in the pole locations. In addition, some of these examples involve computing the pole locations in the "s-domain" and then mapping them to the "z-domain." According to some examples, this mapping can be implemented using the MATLAB function "bilinear." Other choices of nonlinear mapping functions can be used, and these nonlinear mapping functions can produce poles in the z-domain, s-domain, or other domains.
圖12展示根據一些實施例之可由圖11之全通產生器252實施之額外變換程序。在圖12中所展示之實例中,全通產生器252接收由延遲處理區塊272處理以產生一組中間值180之一選定延遲d 140。在此實例中,接著藉由額外非線性處理來映射中間值180以形成一組濾波器極點182。接著,由全通運算區塊275處理濾波器極點182以形成圖11及圖12之全通濾波器512,α(t)512。 FIG12 illustrates an additional transformation process that may be implemented by the all-pass generator 252 of FIG11 , according to some embodiments. In the example shown in FIG12 , the all-pass generator 252 receives a selected delay d 140 that is processed by the delay processing block 272 to generate a set of intermediate values 180. In this example, the intermediate values 180 are then mapped by additional nonlinear processing to form a set of filter poles 182. The filter poles 182 are then processed by the all-pass operation block 275 to form the all-pass filter 512 α ( t ) 512 of FIG11 and FIG12 .
根據對應於圖12中所展示之區塊之一些實例,延遲處理區塊272應用一上述多項式函數來輸出一組中間值180,在一些實例中,中間值180可為一組數:J1。在一些此等實例中,映射區塊273(例如)藉由實施方程式18來將一非線性映射程序應用於中間值集合180以產生輸出,在一個實例中,輸出係s平面濾波器極點。在一些實例中,雙線性變換區塊174轉換s域極點位置以產生輸出,在一個實例中,輸出包含z域極點位置(濾波器極點182)。在圖12中所展示之實例中,全通運算區塊275運算輸出,在此實例中,輸出係全通濾波器512,α(t)(一脈衝回應)。在替代實例中,輸出可為一相位回應、一頻率回應或吾人可用於界定全通濾波器回應之任何其他方法之輸出。 According to some examples corresponding to the blocks shown in FIG12 , delay processing block 272 applies a polynomial function as described above to output a set of intermediate values 180. In some examples, intermediate values 180 may be a set of numbers: J 1 . In some such examples, mapping block 273 applies a nonlinear mapping procedure to the set of intermediate values 180, for example by implementing Equation 18, to produce an output. In one example, the output is an s-plane filter pole. In some examples, bilinear transform block 174 converts the s-domain pole positions to produce an output. In one example, the output includes the z-domain pole positions (filter poles 182). In the example shown in Figure 12, the all-pass operation block 275 operates on the output, which in this example is the output of the all-pass filter 512, α(t) (an impulse response). In alternative examples, the output can be a phase response, a frequency response, or any other method we can use to define the output of an all-pass filter response.
當諸如一多項式之一簡單函數用於產生濾波器極點時,如熟習技術者應瞭解,當極點非常接近單位圓時,多項式輸出中之小誤差可轉化為最終全通回應中之大誤差。如圖12中所應用之將中間值180變換為濾波器極點182之非線性處理可使延遲處理區塊272之處理能夠更高效地實施。 When a simple function such as a polynomial is used to generate filter poles, as those skilled in the art will appreciate, small errors in the polynomial output can translate into large errors in the final all-pass response when the poles are very close to the unit circle. Nonlinear processing, such as that employed in FIG12 to transform the median value 180 into the filter pole 182, enables more efficient implementation of the processing in delay processing block 272.
在一些實施例中,延遲處理區塊272之處理實施為產生L中間值180之一組L多項式函數。例如,J l =Poly l (d)(l 1..L)。接著,中間值180可用於由在此實例中之映射區塊273產生s平面濾波器極點181。例如,兩個中間值J l 及J l+1可用於界定一複數s平面極點P n :
替代地,根據P n =-2πJ l ,一單一中間值(例如J l )可由映射區塊273用於界定一單一真實s平面極點P n 。 Alternatively, a single intermediate value (e.g., J l ) may be used by mapping block 273 to define a single true s-plane pole P n , according to P n =-2π J l .
s平面濾波器極點181隨後可經變換(在此實例中,藉由變換區塊274)為z平面極點(濾波器極點182)。例如,MATLAB BILINEAR可由變換區塊274用於應用此變換:P(N)=bilinear(P(N),1,1,48000),或此變換:P(N)=bilinear(P(N),1,1,48000,FP),其中Fp表示上限頻率(如方程式15中所使用),且48000係根據本實施例之取樣率。應瞭解,可使用替代取樣率,包含(但不限於)16000、32000、44100或96000。 The s-plane filter pole 181 can then be transformed (in this example, by transform block 274) to the z-plane pole (filter pole 182). For example, MATLAB BILINEAR can be used by transform block 274 to apply this transformation: P(N) = bilinear(P(N), 1, 1, 48000), or this transformation: P(N) = bilinear(P(N), 1, 1, 48000, FP), where Fp represents the upper frequency limit (as used in Equation 15) and 48000 is the sampling rate according to the present embodiment. It should be understood that alternative sampling rates can be used, including (but not limited to) 16000, 32000, 44100, or 96000.
熟習技術者應瞭解,可採用其他非線性處理方法來促進將一選定延遲d 140映射至一組全通極點(濾波器極點182)。在一替代實施例中,由延遲處理區塊272應用之多項式函數可用於界定極點之頻率及Q,且由映射區塊273應用之非線性映射程序可將頻率及Q值轉換為一各自極點位置。在另一實施例中,由映射區塊273應用之非線性映射程序可判定z域極點位置,以消除對變換區塊274之雙線性變換之需要。 Those skilled in the art will appreciate that other nonlinear processing methods can be employed to facilitate mapping a selected delay d 140 to a set of all-pass poles (filter poles 182). In an alternative embodiment, the polynomial function applied by the delay processing block 272 can be used to define the frequency and Q of the poles, and the nonlinear mapping procedure applied by the mapping block 273 can convert the frequency and Q values to a respective pole location. In another embodiment, the nonlinear mapping procedure applied by the mapping block 273 can determine the z-domain pole locations, eliminating the need for the bilinear transformation of the transformation block 274.
亦應瞭解,藉由形成額外共軛極點(針對濾波器極點182之 集合中之各複極點),且藉由將各濾波器零點形成為各對應極點之倒數,可藉由在此實例中之全通運算區塊275來推導一全通濾波器回應,且此全通濾波器將係因果的。 It will also be appreciated that by forming additional conjugate poles (for each complex pole in the set of filter poles 182) and by forming each filter zero as the inverse of each corresponding pole, an all-pass filter response can be derived by the all-pass operation block 275 in this example, and this all-pass filter will be causal.
圖13繪示自一組HRTF產生一組基本濾波器之一程序。在一些實例中,圖13之區塊可至少部分由圖1A之控制系統110實施。圖13展示一配置500,其中一原始HRTF庫520經處理(在此實例中,由HRTF變換區塊521)以產生一修改之HRTF庫541。在此實例中,原始HRTF集合之HRTF濾波器中固有之耳間延遲分量係由滿足方程式15之全通濾波器替換,且修改之HRTF庫在大於F p 之頻率下減少耳間相位。根據諸如方程式10之擬合程序之一擬合程序,處理修改之HRTF庫541(在此實例中,由基本濾波器產生區塊522)以產生一組基本濾波器523。 FIG13 illustrates a process for generating a set of basic filters from a set of HRTFs. In some examples, the blocks of FIG13 may be implemented, at least in part, by the control system 110 of FIG1A . FIG13 shows a configuration 500 in which an original HRTF library 520 is processed (in this example, by an HRTF transformation block 521) to generate a modified HRTF library 541. In this example, the interaural delay components inherent in the HRTF filters of the original HRTF set are replaced by all-pass filters that satisfy Equation 15, and the modified HRTF library reduces the interaural phase at frequencies greater than Fp . The modified HRTF library 541 (in this example, from the basic filter generation block 522) is processed according to a fitting procedure such as that of Equation 10 to generate a set of basic filters 523.
基本濾波器523之集合具有比修改之HRTF集合更少之構件。在此內文中,基本濾波器523之集合之一「構件」係基本濾波器523之集合之基本濾波器之一者,且修改之HRTF集合之一「構件」係修改之HRTF集合中之HRTF之一者。根據一些實例,基本濾波器523之集合可具有比修改之HRTF集合少至少一個數量級之構件。例如,在一些例項中,修改之HRTF集合可具有數百或數千個構件,而基本濾波器523之集合可包含少於100個構件、少於50個構件或甚至少於20個構件。因此,基本濾波器523之集合形成原始HRTF庫520之一緊湊表示。 The set of base filters 523 has fewer components than the modified set of HRTFs. In this context, a "component" of the set of base filters 523 is one of the base filters of the set of base filters 523, and a "component" of the modified set of HRTFs is one of the HRTFs in the modified set of HRTFs. According to some examples, the set of base filters 523 may have at least an order of magnitude fewer components than the modified set of HRTFs. For example, in some examples, the modified set of HRTFs may have hundreds or thousands of components, while the set of base filters 523 may include fewer than 100 components, fewer than 50 components, or even fewer than 20 components. Thus, the set of base filters 523 forms a compact representation of the original HRTF library 520.
圖14繪示自一組HRTF產生一組基本濾波器以及使用基本濾波器集合形成左HRTF及右HRTF濾波器之程序。在一些實例中,圖14之區塊可至少部分由圖1A之控制系統110實施。圖14展示一配置501,其中一原始HRTF庫520由HRTF變換區塊521處理以產生一修改之HRTF庫 541,其接著由基本濾波器產生區塊522處理以產生一組基本濾波器523。一到達方向524(其可根據球面座標(θ, )、單位向量(x,y,z)或技術中已知之其他形式來界定)由加權係數產生區塊525處理以形成加權係數526。在一些實施例中,加權係數可根據球諧平移方程式來界定,且基本濾波器同樣可經調適以與球諧平移方程式相容,例如,方程式7中之g k (x,y,z)。 FIG14 illustrates a process for generating a set of basic filters from a set of HRTFs and forming left and right HRTF filters using the set of basic filters. In some embodiments, the blocks of FIG14 may be implemented at least in part by the control system 110 of FIG1A. FIG14 shows a configuration 501 in which an original HRTF library 520 is processed by an HRTF transformation block 521 to generate a modified HRTF library 541, which is then processed by a basic filter generation block 522 to generate a set of basic filters 523. An arrival direction 524 (which may be based on spherical coordinates (θ , ), unit vector ( x, y, z ), or other forms known in the art) are processed by weighting coefficient generation block 525 to form weighting coefficients 526. In some embodiments, the weighting coefficients can be defined according to the spherical harmonic translation equation, and the basic filter can also be adapted to be compatible with the spherical harmonic translation equation, for example, g k ( x, y, z ) in Equation 7.
在此實例中,加權係數及基本濾波器組合區塊527將加權係數526與基本濾波器523組合以形成表示指定到達方向之修正HRTF之左耳及右耳HRTF濾波器(分別為528、529)。當基本濾波器表示一對稱HRTF集合時,加權係數及基本濾波器組合區塊527可(例如)根據方程式7來實施。當基本濾波器表示包含不對稱之一HRTF集合時,加權係數及基本濾波器組合區塊527可(例如)根據方程式6來實施。 In this example, weighting coefficient and base filter combination block 527 combines weighting coefficient 526 with base filter 523 to form left-ear and right-ear HRTF filters (528 and 529, respectively) that represent the modified HRTF for a specified direction of arrival. When the base filters represent a symmetric HRTF set, weighting coefficient and base filter combination block 527 can be implemented, for example, according to Equation 7. When the base filters represent an HRTF set that includes asymmetry, weighting coefficient and base filter combination block 527 can be implemented, for example, according to Equation 6.
圖15繪示自一組HRTF產生一組基本濾波器以及使用基本濾波器集合形成左及右音訊信號之程序。在一些實例中,圖15之區塊可至少部分由圖1A之控制系統110實施。圖15展示一配置502,其中一原始HRTF庫520由HRTF變換區塊521處理以產生一修改之HRTF庫541,其接著由基本濾波器產生區塊522處理以產生一組基本濾波器523。根據此實例,一音訊產生區塊530產生呈與一基於場景之音訊格式相關聯之一形式之音訊信號531,諸如雙音或高階並音。音訊產生區塊530可為或可包含經調適以自一傳送或儲存之編碼位元流產生一多聲道音訊位元流之一音訊解碼器。替代地,音訊產生區塊530可為或可包含經調適以產生表示一空間音訊場景之基於場景之音訊信號531之一音訊捕捉及/或處理裝置。 FIG15 illustrates a process for generating a set of basic filters from a set of HRTFs and forming left and right audio signals using the set of basic filters. In some examples, the blocks of FIG15 may be implemented, at least in part, by the control system 110 of FIG1A . FIG15 shows a configuration 502 in which an original HRTF library 520 is processed by an HRTF transformation block 521 to generate a modified HRTF library 541, which is then processed by a basic filter generation block 522 to generate a set of basic filters 523. According to this example, an audio generation block 530 generates an audio signal 531 in a form associated with a scene-based audio format, such as dubbing or high-order dubbing. The audio generation block 530 may be or may include an audio decoder adapted to generate a multi-channel audio bitstream from a transmitted or stored encoded bitstream. Alternatively, the audio generation block 530 may be or may include an audio capture and/or processing device adapted to generate a scene-based audio signal 531 representing a spatial audio scene.
根據此實例,音訊輸入及基本濾波器組合區塊532經調適以將音訊信號531與基本濾波器523組合以產生左及右耳音訊信號(分別為 533、534)。在一些實例中,音訊輸入及基本濾波器組合區塊532可經組態以實施一卷積程序,其可根據技術中已知之時域或頻域方法來實施。 According to this example, audio input and basic filter combination block 532 is adapted to combine audio signal 531 with basic filter 523 to generate left and right ear audio signals ( 533 , 534 , respectively). In some examples, audio input and basic filter combination block 532 can be configured to implement a convolution process, which can be implemented based on time-domain or frequency-domain methods known in the art.
圖16展示根據一些實施方案之圖13至圖15之HRTF變換區塊之額外細節。在一些實例中,圖16之區塊可至少部分由圖1A之控制系統110實施。圖16展示圖13至圖15之上半部分中之程序之一更詳細視圖503(自一「原始」HRTF庫520轉換為一「修改」之HRTF庫541)。根據此實例,各左/右HRTF對由一對應HRTF變換子區塊250處理。 FIG16 shows additional details of the HRTF transformation block of FIG13-15 according to some embodiments. In some examples, the block of FIG16 may be implemented at least in part by the control system 110 of FIG1A . FIG16 shows a more detailed view 503 of the process in the upper half of FIG13-15 (converting from a "raw" HRTF library 520 to a "modified" HRTF library 541 ). According to this example, each left/right HRTF pair is processed by a corresponding HRTF transformation sub-block 250 .
圖17展示根據一些實施方案之圖16之HRTF變換子區塊之額外細節。在一些實例中,圖17之區塊可至少部分由圖1A之控制系統110實施。圖17展示HRTF變換子區塊250之一實例,其中L及R HRTF(211L及211R)分別由左HRTF處理區塊251L及右HRTF處理區塊251R處理以提取未延遲之脈衝回應311L/R及延遲140L/R。接著,兩個延遲140L/R由延遲處理區塊138處理以產生新的簡化延遲141L/R。簡化延遲各由全通濾波器產生器252L及252R處理以形成全通濾波器512L/R。根據此實例,修改之左與右HRTF產生區塊253L及253R經組態以將非延遲脈衝回應311L/R與全通濾波器512L/R組合以形成修改之HRTF對711L/R。 FIG17 shows additional details of the HRTF transformation sub-block of FIG16 according to some embodiments. In some examples, the block of FIG17 can be implemented at least in part by the control system 110 of FIG1A. FIG17 shows an example of the HRTF transformation sub-block 250, in which the L and R HRTFs (211L and 211R) are processed by the left HRTF processing block 251L and the right HRTF processing block 251R, respectively, to extract the undelayed pulse response 311L/R and the delay 140L/R. The two delays 140L/R are then processed by the delay processing block 138 to generate new simplified delays 141L/R. The simplified delay is processed by all-pass filter generators 252L and 252R to form all-pass filters 512L/R. According to this example, the modified left and right HRTF generation blocks 253L and 253R are configured to combine the non-delayed pulse response 311L/R with the all-pass filter 512L/R to form a modified HRTF pair 711L/R.
圖17之延遲處理區塊138經組態以回應於延遲140L/R之間的差異以產生新的延遲141L/R。圖18、圖19及圖20繪示可由圖17之延遲處理區塊138實施之函數之實例。針對圖18、圖19及圖20,修改延遲d’L 771A、771B、771C及d’R 772A、772B、772C之對應函數係:根據圖18:
在方程式21中,d max 表示|d L -d R |之最大預期值。 In Equation 21, d max represents the maximum expected value of | d L - d R |.
由延遲處理區塊138實施之函數之一重要性質在於其產生修改延遲d’L及d’R,其等滿足d’ L -d’ R =d L -d R :,使得左(L)及右(R)HRTF之間的耳間延遲差保留。 An important property of the function implemented by the delay processing block 138 is that it produces modified delays d' L and d' R that satisfy d' L - d' R = dL - dR , such that the interaural delay difference between the left (L) and right (R) HRTFs is preserved.
圖20之函數(其相同於實例性MATLAB碼中所使用之函數,下文所展示)經界定為具有以下性質:(a)兩個耳朵之延遲係一平滑函數,且(b)具有較低延遲之耳朵(通常亦為具有較大振幅之耳朵)將具有較小延遲變動(因為當延遲較低時,曲線之斜率較低)。 The function of Figure 20 (which is the same function used in the example MATLAB code, shown below) is defined to have the following properties: (a) the delay for both ears is a smooth function, and (b) the ear with lower delay (which is typically also the ear with larger amplitude) will have smaller delay variations (because the slope of the curve is lower when the delay is lower).
在一些進一步實施例中,左HRTF處理區塊251L及右HRTF處理區塊251R可分別經調適以產生未延遲之脈衝回應311L及311R,其等係最小相位濾波器回應。 In some further embodiments, the left HRTF processing block 251L and the right HRTF processing block 251R may be adapted to generate undelayed pulse responses 311L and 311R, respectively, which are minimum phase filter responses.
根據一些實施例,左HRTF處理區塊251L及右HRTF處理區塊251R可根據以下步驟實施: According to some embodiments, the left HRTF processing block 251L and the right HRTF processing block 251R may be implemented according to the following steps:
1.判定一原始HRTF濾波器之頻率回應,例如,如方程式11中界定之R orig,n (f)。 1. Determine the frequency response of an original HRTF filter, for example, R orig,n ( f ) as defined in Equation 11.
2.判定原始HRTF濾波器之幅值回應:M(f)=|R orig,n (f)| 2. Determine the amplitude response of the original HRTF filter: M ( f ) = | R orig,n ( f )|
3.根據技術中已知之一方法來判定一新的(未延遲)最小相位濾波器之頻率
回應,採用希爾伯特變換:
4.判定原始濾波器及未延遲濾波器之相位回應: Ψ orig (f)=unwrap(angle(R orig,n (f))) 4. Determine the phase response of the original filter and the undelayed filter: Ψ orig ( f ) = unwrap ( angle ( R orig,n ( f )))
Ψ minp (f)=unwrap(angle(R'(f)))其中angle( )函數提取一複頻率回應之相位分量,且unwrap( )運算在技術中係已知的,此係用於藉由在各頻率添加2π之一倍數來移除提取之相位回應中之不連續性(例如,MATLAB UNWRAP()函數)之一方法。 Ψ minp ( f ) = unwrap ( angle ( R' ( f ))) where the angle ( ) function extracts the phase component of a complex frequency response, and the unwrap ( ) operation is known in the art as a method for removing discontinuities in the extracted phase response by adding a multiple of 2π at each frequency (e.g., MATLAB UNWRAP() function).
5.判定與原始HRTF濾波器相關聯之延遲為:
在一些實例中,延遲量測頻率f d 經選擇為300Hz至1600Hz範圍內之一值。在一詳細實例中,f d =1200Hz。在一些其他實例中,f d 可為在300Hz至1200Hz範圍內之一值、在600Hz至1800Hz範圍內之一值、在1000Hz至1400Hz範圍內之一值或某一其他頻率範圍內之一值。 In some examples, the delay measurement frequency f d is selected to be a value in the range of 300 Hz to 1600 Hz. In a specific example, f d = 1200 Hz . In some other examples, f d may be a value in the range of 300 Hz to 1200 Hz, a value in the range of 600 Hz to 1800 Hz, a value in the range of 1000 Hz to 1400 Hz, or a value in some other frequency range.
在一些實施例中,如根據上述方法判定,延遲d及最小相位回應R'(f)用於根據以下步驟來判定修改之HRTF回應R mod,n (f): In some embodiments, as determined according to the above method, the delay d and the minimum phase response R' ( f ) are used to determine the modified HRTF response Rmod ,n ( f ) according to the following steps:
1.針對各原始左耳及右耳HRTF濾波器(針對一給定到達方向),使用上述程序來判定延遲d及最小相位回應R'(f),且將其標記為:左耳延遲=d L 左耳未延遲濾波器=R' L (f) 1. For each original left and right ear HRTF filter (for a given direction of arrival ), use the above procedure to determine the delay d and minimum phase response R' ( f ), and denote them as: Left ear delay = dL Left ear undelayed filter = R'L (f )
右耳延遲=d R 右耳未延遲濾波器=R' R (f) Right ear delay = d R Right ear undelayed filter = R' R ( f )
2.判定最大耳間延遲差:
3.判定新的左耳及右耳延遲值d' L 及d' R ,使得:d' L -d' R =d L -d R 其中左耳及右耳延遲值d' L 及d' R 可根據方程式21來判定。 3. Determine new left-ear and right-ear delay values d' L and d' R such that: d' L - d' R = dL - dR , where the left-ear and right-ear delay values d' L and d' R can be determined according to Equation 21.
4.判定全通濾波器頻率回應:
5.判定新的修改之HRTF濾波器:左耳: 5. Determine the newly modified HRTF filter: Left ear:
右耳: Right ear:
以下MATLAB實施方案根據所揭示之方法提供額外細節。自一既有HRTF庫判定一HRTF基本濾波器集合之一MATLAB函數之一實例展示如下:FUNCTION IR_DATA = GENERATE_HOA_HRIRS_MOD_LENS(ORDER, SOFA_PATH, ... SOFA_FILE_NAME, IR_LEN) % HRIR CONVERTOR - TAKES SPHERE SAMPLED HRIRS AND CONVERTS THEM TO % HOA HRIRS. % % ORDER - HOA ORDER TO BE CONVERTED TO. % SOFA_PATH - PATH TO THE DIRECTORY THAT CONTAINS THE SOFA FILES TO BE % CONVERTED. % SOFA_FILE - FILE NAME OF THE HRTFS TO BE CONVERTED % IR_LEN - LENGTH OF THE IRS TO BE USED. % % LOAD IN THE SUPPORT COEFS LOAD('HRTF_SUPPORT_COEFS.MAT', 'HRTF_SUPPORT_COEFS'); RMSSPHERE = HRTF_SUPPORT_COEFS(ORDER).RMSSPHERE; LR_ODD = HRTF_SUPPORT_COEFS(ORDER).LR_ODD; XYZ_TO_PAN = HRTE_SUPPORT_COEFS(ORDER).XYZ_TO_PAN; ALLPASS = HRTF_SUPPORT_COEFS(ORDER).AP; % CHOOSE A HI-RES SET OF POSITIONS TO SAMPLE THE INPUT HRTFS VS_HI_RES = LOAD("SPHERE_PACKING_2562.MAT"); VS_HI_RES = VS_HI_RES.VS_HI_RES; N = 512; % FETCH THE HRTFS, AND FIGURE OUT THE ITD FOR EVERY DIRECTION H = HRTF_LIBRARY_LOADER(); H.READSOFA(CHAR(FULLFILE(SOFA_PATH, SOFA_FILE_NAME))); IRS_HI_RES = H.XYZ_TO_IR(VS_HI_RES); FRS_HI_RES = M_DFT(IRS_HI_RES, N); % FREQ X EARS X VS FRS_HI_RES_MINP = MAG2MIN_PHASE(FRS_HI_RES); EXCESS_PHASE = SQUEEZE(UNWRAP(DIFF(ANGLE(FRS_HI_RES), 1, 2) - ... DIFF(ANGLE(FRS_HI_RES_MINP), 1, 2))); BIN1200 = CEIL(1200/24000*N); ITD_HI_RES = EXCESS_PHASE(BIN1200,:)' / ((BIN1200-0.5)/N*24000*2*PI); MAXDEL = MAX(ITD_HI_RES, [], 'ALL'); % CREATE 2 EARS EAR_DELS_HI_RES = (REPMAT(ITD_HI_RES, 1, 2) .* [0.5 -0.5]) + ... 0.5*MAXDEL .* (1 - 2/PI*COS(ITD_HI_RES * PI/2 / MAXDEL)); MRS_HI_RES = ABS(FRS_HI_RES_MINP); % GENERATE PERMUTATION [~, PERM] = ISMEMBERTOL(... VS_HI_RES', VS_HI_RES'.*[1,-1,1], ... 1E-4, "BYROWS",TRUE); MRS_HI_RES(:,2,:) = MRS_HI_RES(:,2,PERM); NEW_FREQRESP_L = MAG2MIN_PHASE(SQUEEZE(MEAN(MRS_HI_RES, 2))) .* ... M_DFT(GET_ALLPASS_IRS(ALLPASS, EAR_DELS_HI_RES(:, 1) * 48000), N, 1); % CREATE SOLVING WEIGHTS WEIGHTS = ABS(NEW_FREQRESP_L); WEIGHTS(WEIGHTS < 0.1) = 0.1; WEIGHTS = 1 ./ (SQRT(SQRT(WEIGHTS))); % SOLVE TO COMPUTE THE HOA FREQUENCY RESPONSES. [M, ~] = SIZE(XYZ_TO_PAN); FREQRESP_HOA = ZEROS(M, N); FOR K=1:N AW = NEW_FREQRESP_L(K,:) .* WEIGHTS(K,:); BW = XYZ_TO_PAN .* WEIGHTS(K,:); FREQRESP_HOA(:,K) = AW * PINV(BW, 0); END FREQRESP_HOA_ABS2 = REAL(FREQRESP_HOA.*CONJ(FREQRESP_HOA)); FREQRESP_HOA = FREQRESP_HOA .* ... MAG2MIN_PHASE(((FREQRESP_HOA_ABS2' * RMSSPHERE.^2) .^ (-0.5)), 1).'; % CONVERT BACK TO IRS IR_HOA = M_IDFT(FREQRESP_HOA.', [], 1); IR_HOA = CAT(3, IR_HOA, (IR_HOA .* (1-2*LR_ODD)')); % PUT MATRIX DIMENSIONS IN THE RIGHT ORDER IR_HOA = PERMUTE(IR_HOA, [3, 1, 2]); % GET THE IRS TO THE RIGHT LENGTH IR_HOA = IR_HOA(:,1:IR_LEN,:) .* ... SIN(INTERP1([0,150/192*IR_LEN,IR_LEN+1],[1,1,0]*PI/2, 1:IR_LEN)); IR = PERMUTE(IR_HOA, [2, 1, 3]); IR_DATA = IR; The GET_ALLPASS_IRS() function may be defined according to the MATLAB code below. FUNCTION IR = GET_ALLPASS_IRS(ALLPASS, DELS) XSET = POLY2XSET(ALLPASS.PROTO_POLY, (DELS(:)'-16)*ALLPASS.UPPER_FREQ/ALLPASS.PROTO_BW); IR = MAKEIRS(XSET, ALLPASS.UPPER_FREQ); END FUNCTION XSET = POLY2XSET(P, DELS_SAMPLES) XSET = ZEROS(SIZE(P,1), NUMEL(DELS_SAMPLES)); FOR K = 1:SIZE(P,1) XSET(K,:) = POLYVAL(P(K,:),DELS_SAMPLES(:)'/32); END END FUNCTION IRS = MAKEIRS(X,BW) IRS = MAP_POLES2IRS(MAP2POLES(X,BW)); END FUNCTION P = MAP2POLES(X,BW) P = MAP_2_S_POLES(X) * BW; FOR K = 1:SIZE(P(:,:),2) P(:,K) = BILINEAR(P(:,K),P(:,K),1,48000,BW); END END FUNCTION IRS = MAP_POLES2IRS(P) IRS = ZEROS(512,SIZE(P(:,:),2)); FOR K = 1:SIZE(IRS,2) [~,A] = ZP2TF(P(:,K),P(:,K),1); IRS(:,K) = FILTER(FLIPLR(A),A, [1;ZEROS(511,1)]); END END FUNCTION P = MAP_2_S_POLES(X) ORDER = SIZE(X,1); IF ORDER==0 P=[]; ELSEIF ORDER==1 P=-2*PI*X; ELSE ANG = ATAN(X(2,:))/2+PI/4; P = [-2*PI*X(1,:).*EXP(1I*[1;-1]*ANG) ; MAP_2_S_POLES(X(3:END,:))]; END END The following MATLAB implementation provides additional details according to the disclosed method. An example of a MATLAB function that determines a set of HRTF base filters from an existing HRTF library is shown below: FUNCTION IR_DATA = GENERATE_HOA_HRIRS_MOD_LENS(ORDER, SOFA_PATH, ... SOFA_FILE_NAME, IR_LEN) % HRIR CONVERTOR - TAKES SPHERE SAMPLED HRIRS AND CONVERTS THEM TO % HOA HRIRS. % % ORDER - HOA ORDER TO BE CONVERTED TO. % SOFA_PATH - PATH TO THE DIRECTORY THAT CONTAINS THE SOFA FILES TO BE % CONVERTED. % SOFA_FILE - FILE NAME OF THE HRTFS TO BE CONVERTED % IR_LEN - LENGTH OF THE IRS TO BE USED. % % LOAD IN THE SUPPORT COEFS LOAD('HRTF_SUPPORT_COEFS.MAT', 'HRTF_SUPPORT_COEFS'); RMSSPHERE = HRTF_SUPPORT_COEFS(ORDER).RMSSPHERE; LR_ODD = HRTF_SUPPORT_COEFS(ORDER).LR_ODD; XYZ_TO_PAN = HRTE_SUPPORT_COEFS(ORDER). VS_HI_RES.VS_HI_RES; N = 512; % FETCH THE HRTFS, AND FIGURE OUT THE ITD FOR EVERY DIRECTION H = HRTF_LIBRARY_LOADER(); H.READSOFA(CHAR(FULLFILE(SOFA_PATH, SOFA_FILE_NAME))); IRS_HI_RES = H.XYZ_TO_IR(VS_HI_RES); FRS_HI_RES = M_DFT(IRS_HI_RES, N); % FREQ FRS_HI_RES_MINP = MAG2MIN_PHASE(FRS_HI_RES); EXCESS_PHASE = SQUEEZE(UNWRAP(DIFF(ANGLE(FRS_HI_RES), 1, 2) - ... DIFF(ANGLE(FRS_HI_RES_MINP), 1, 2))); BIN1200 = CEIL(1200/24000*N); ITD_HI_RES = EXCESS_PHASE(BIN1200,:)' / ((BIN1200-0.5)/N*24000*2*PI); MAXDEL = MAX(ITD_HI_RES, [], 'ALL'); % CREATE 2 EARS EAR_DELS_HI_RES = (REPMAT(ITD_HI_RES, 1, 2) .* [0.5 -0.5]) + ... 0.5*MAXDEL .* (1 - 2/PI*COS(ITD_HI_RES * PI/2 / MAXDEL)); MRS_HI_RES = ABS(FRS_HI_RES_MINP); % GENERATE PERMUTATION [~, PERM] = ISMEMBERTOL(... VS_HI_RES', VS_HI_RES'.*[1,-1,1], ... 1E-4, "BYROWS",TRUE); MRS_HI_RES(:,2,:) = MRS_HI_RES(:,2,PERM); NEW_FREQRESP_L = MAG2MIN_PHASE(SQUEEZE(MEAN(MRS_HI_RES, 2))) .* ... M_DFT(GET_ALLPASS_IRS(ALLPASS, EAR_DELS_HI_RES(:, 1) * 48000), N, 1); % CREATE SOLVING WEIGHTS WEIGHTS = ABS(NEW_FREQRESP_L); WEIGHTS(WEIGHTS < 0.1) = 0.1; WEIGHTS = 1 ./ (SQRT(SQRT(WEIGHTS))); % SOLVE TO COMPUTE THE HOA FREQUENCY RESPONSES. [M, ~] = SIZE(XYZ_TO_PAN); FREQRESP_HOA = ZEROS(M, N); FOR K=1:N AW = NEW_FREQRESP_L(K,:) .* WEIGHTS(K,:); BW = FREQRESP_HOA = FREQRESP_HOA .* ... MAG2MIN_PHASE(((FREQRESP_HOA_ABS2' * RMSSPHERE.^2) .^ (-0.5)), 1).'; % CONVERT BACK TO IRS IR_HOA = M_IDFT(FREQRESP_HOA.', [], 1); IR_HOA = CAT(3, IR_HOA, (IR_HOA .* (1-2*LR_ODD)')); % PUT MATRIX DIMENSIONS IN THE RIGHT ORDER IR_HOA = PERMUTE(IR_HOA, [3, 1, 2]); % GET THE IRS TO THE RIGHT LENGTH IR_HOA = IR_HOA(:,1:IR_LEN,:) .* ... SIN(INTERP1([0,150/192*IR_LEN,IR_LEN+1],[1,1,0]*PI/2, 1:IR_LEN)); IR = PERMUTE(IR_HOA, [2, 1, 3]); IR_DATA = IR; = GET_ALLPASS_IRS(ALLPASS, DELS) XSET = POLY2XSET(ALLPASS.PROTO_POLY, (DELS(:)'-16)*ALLPASS.UPPER_FREQ/ALLPASS.PROTO_BW); IR = MAKEIRS(XSET, ALLPASS.UPPER_FREQ); END FUNCTION XSET = POLY2XSET(P, DELS_SAMPLES) XSET = ZEROS(SIZE(P,1), NUMEL(DELS_SAMPLES)); FOR K = 1:SIZE(P,1) MAP2POLES(X,BW) P = MAP_2_S_POLES(X) * BW; FOR K = 1:SIZE(P(:,:),2) P(:,K) = BILINEAR(P(:,K),P(:,K),1,48000,BW); END END FUNCTION IRS = MAP_POLES2IRS(P) IRS = ZEROS(512,SIZE(P(:,:),2)); FOR K = 1:SIZE(IRS,2) [~,A] = ZP2TF(P(:,K),P(:,K),1); IRS(:,K) = FILTER(FLIPLR(A),A, [1;ZEROS(511,1)]); END END FUNCTION P = MAP_2_S_POLES(X) ORDER = SIZE(X,1); IF ORDER==0 P=[]; ELSEIF ORDER==1 P=-2*PI*X; ELSE ANG = ATAN(X(2,:))/2+PI/4; P = [-2*PI*X(1,:).*EXP(1I*[1;-1]*ANG) ; MAP_2_S_POLES(X(3:END,:))]; END END
圖21係概述可由諸如本文中所揭示之設備或系統之一設備或系統執行之一方法之一個實例的一流程圖。如同本文中所描述之其他方法,方法2100之區塊未必以所指示之順序執行。在一些實施方案中,方法2100之區塊之一或多者可同時執行。再者,方法2100之一些實施方案可包含比所展示及/或所描述更多或更少之區塊。方法2100之區塊可由一 或多個裝置執行,裝置可為(或可包含)一控制系統,諸如圖1A中所展示及上文所描述之控制系統110。 FIG21 is a flow chart outlining an example of a method that may be performed by an apparatus or system such as those disclosed herein. As with other methods described herein, the blocks of method 2100 are not necessarily performed in the order indicated. In some embodiments, one or more of the blocks of method 2100 may be performed concurrently. Furthermore, some embodiments of method 2100 may include more or fewer blocks than shown and/or described. The blocks of method 2100 may be performed by one or more devices, which may be (or may include) a control system, such as control system 110 shown in FIG1A and described above.
在此實例中,方法2100係一音訊處理方法。根據此實例,區塊2105涉及由一控制系統獲得一第一組HRTF。第一組HRTF可為(例如)圖13至圖16之原始HRTF庫520。 In this example, method 2100 is an audio processing method. According to this example, block 2105 involves obtaining a first set of HRTFs from a control system. The first set of HRTFs can be, for example, the original HRTF library 520 of Figures 13 to 16.
在此實例中,區塊2110涉及由控制系統將第一組HRTF變換為一第二組HRTF。例如,第二組HRTF可為圖13至圖16之修改之HRTF庫541。根據此實例,區塊2110之變換程序涉及用第二組HRTF中之全通濾波器替換第一組HRTF之延遲分量。在此實例中,區塊2110之變換程序亦涉及調整第二組HRTF中之全通濾波器之各者之一相位回應,使得針對低於對應全通濾波器之一相關聯臨限頻率之頻率,各耳間相位回應實質上係線性的,且針對高於對應全通濾波器之相關聯臨限頻率之頻率,各相位回應減小耳間相位差。臨限頻率可(例如)為替代相位曲線412偏離圖5之線性相位回應411之頻率。臨限頻率可(例如)為在1300Hz至1500Hz範圍內之一頻率、在1000Hz至1600Hz範圍內之一頻率、在1200Hz至1600Hz範圍內之一頻率等。在一些實例中,臨限頻率可為1400Hz。 In this example, block 2110 involves, by the control system, transforming the first set of HRTFs into a second set of HRTFs. For example, the second set of HRTFs may be the modified HRTF library 541 of Figures 13-16. According to this example, the transformation process of block 2110 involves replacing the delay components of the first set of HRTFs with the all-pass filters in the second set of HRTFs. In this example, the transformation process of block 2110 also involves adjusting a phase response of each of the all-pass filters in the second set of HRTFs such that each interaural phase response is substantially linear for frequencies below a threshold frequency associated with the corresponding all-pass filter, and each phase response reduces interaural phase differences for frequencies above the threshold frequency associated with the corresponding all-pass filter. The critical frequency may be, for example, the frequency at which the alternative phase curve 412 deviates from the linear phase response 411 of FIG. 5 . The critical frequency may be, for example, a frequency in the range of 1300 Hz to 1500 Hz, a frequency in the range of 1000 Hz to 1600 Hz, a frequency in the range of 1200 Hz to 1600 Hz, etc. In some examples, the critical frequency may be 1400 Hz.
在此實例中,區塊2115涉及輸出調整第二組HRTF中之全通濾波器之各者之相位回應之一結果。輸出結果可(例如)涉及儲存結果、傳送結果、提供結果以供進一步處理或其等之組合。 In this example, block 2115 involves outputting a result of adjusting the phase response of each of the all-pass filters in the second set of HRTFs. Outputting the result may, for example, involve storing the result, transmitting the result, providing the result for further processing, or a combination thereof.
根據一些實例,方法2100可涉及諸如本文中參考圖13所描述之程序之額外程序。在一些此等實例中,方法2100亦可涉及由控制系統基於第二組HRTF來界定一組基本濾波器。基本濾波器組可具有比第二組HRTF更少之構件。在一些實例中,基本濾波器組可具有比第二組 HRTF少至少一個數量級之構件。 According to some examples, method 2100 may involve additional steps, such as those described herein with reference to FIG. 13 . In some such examples, method 2100 may also involve the control system defining a set of basic filters based on the second set of HRTFs. The set of basic filters may have fewer components than the second set of HRTFs. In some examples, the set of basic filters may have fewer components than the second set of HRTFs by at least an order of magnitude.
在一些實例中,方法2100亦可涉及諸如本文中參考圖14或圖15所描述之程序的程序。在一些此等實例中,方法2100亦可涉及由控制系統獲得呈一輸入音訊格式之輸入音訊資料之一位元流,及由控制系統組合輸入音訊資料與基本濾波器組之一或多個基本濾波器以產生左音訊資料及右音訊資料。根據一些此等實例,方法2100亦可設計由控制系統輸出左音訊資料及右音訊資料。輸出左音訊資料及右音訊資料可涉及儲存左音訊資料及右音訊資料,傳送左音訊資料及右音訊資料,由控制系統將左音訊資料及右音訊資料提供至一組揚聲器以供回放,提供左音訊資料及右音訊資料以供進一步處理(例如,提供至由控制系統實施之其他模組至另一控制系統)或其等之組合。 In some examples, method 2100 may also involve a process such as the process described herein with reference to FIG. 14 or FIG. 15 . In some such examples, method 2100 may also involve a control system receiving a bit stream of input audio data in an input audio format, and combining the input audio data with one or more basic filters of a basic filter set to generate left audio data and right audio data. According to some such examples, method 2100 may also be configured to output the left audio data and right audio data from the control system. Outputting the left audio data and the right audio data may involve storing the left audio data and the right audio data, transmitting the left audio data and the right audio data, providing the left audio data and the right audio data to a set of speakers for playback by the control system, providing the left audio data and the right audio data for further processing (e.g., providing to other modules implemented by the control system to another control system), or a combination thereof.
根據一些實例,區塊2110之變換程序可涉及諸如本文中參考圖16及圖17所描述之程序的程序。根據一些此等實例,區塊2110可涉及自第一組HRTF獲得左耳HRTF及右耳HRTF,自左耳HRTF之各者識別一左耳非延遲脈衝回應及一左耳延遲及自右耳HRTF識別一右耳非延遲脈衝回應及一右耳延遲。在一些此等實例中,區塊2110可涉及產生左耳全通濾波器,左耳全通濾波器之各者至少部分基於左耳延遲之一例項,及產生右耳全通濾波器,右耳全通濾波器之各者至少部分基於右耳延遲之一例項。在一些此等實例中,區塊2110可涉及組合左耳及右耳非延遲脈衝回應之例項與左耳及右耳全通濾波器之對應例項以產生第二組HRTF之HRTF對。 According to some examples, the transformation process of block 2110 may involve processes such as those described herein with reference to Figures 16 and 17. According to some such examples, block 2110 may involve deriving a left-ear HRTF and a right-ear HRTF from the first set of HRTFs, identifying a left-ear non-delayed pulse response and a left-ear delay from each of the left-ear HRTFs, and identifying a right-ear non-delayed pulse response and a right-ear delay from the right-ear HRTFs. In some such examples, block 2110 may involve generating left-ear all-pass filters, each of which is based at least in part on an instance of the left-ear delay, and generating right-ear all-pass filters, each of which is based at least in part on an instance of the right-ear delay. In some such examples, block 2110 may involve combining instances of the left-ear and right-ear undelayed pulse responses with corresponding instances of the left-ear and right-ear all-pass filters to generate HRTF pairs for the second set of HRTFs.
在一些此等實例中,區塊2110可涉及基於提取之左耳延遲及右耳延遲之一或多者來產生修改之左耳延遲值及右耳延遲值。左耳全通 濾波器及右耳全通濾波器可分別基於修改之左耳延遲值及右耳延遲值。在一些實例中,產生修改之左耳延遲值及右耳延遲值之例項可涉及判定一提取之左耳延遲與一提取之右耳延遲之間的一差。根據一些實例,產生修改之左耳延遲值及右耳延遲值之例項可涉及判定一提取之左耳延遲與一提取之右耳延遲之間的最大預期差。 In some such examples, block 2110 may involve generating modified left-ear delay and right-ear delay values based on one or more of the extracted left-ear delay and right-ear delay. The left-ear all-pass filter and the right-ear all-pass filter may be based on the modified left-ear delay and right-ear delay values, respectively. In some examples, generating the modified left-ear delay and right-ear delay values may involve determining a difference between an extracted left-ear delay and an extracted right-ear delay. According to some examples, generating the modified left-ear delay and right-ear delay values may involve determining a maximum expected difference between an extracted left-ear delay and an extracted right-ear delay.
根據一些實例,一提取之左耳延遲與一提取之右耳延遲之間的一差可等於一對應修改之左耳延遲值與一修改之右耳延遲值之間的一差。在一些實例中,修改之左耳延遲值及修改之右耳延遲值可對應於平滑函數,諸如圖20中所展示之平滑函數。根據一些實例,各對修改之左耳延遲值及修改之右耳延遲值可包含一較低延遲值及一較高延遲值。在一些此等實例中,較低延遲值可具有比較高延遲值小之延遲變動。 According to some examples, the difference between an extracted left-ear delay and an extracted right-ear delay may be equal to the difference between a corresponding modified left-ear delay value and a modified right-ear delay value. In some examples, the modified left-ear delay value and the modified right-ear delay value may correspond to a smoothing function, such as the smoothing function shown in FIG. 20 . According to some examples, each pair of modified left-ear delay value and modified right-ear delay value may include a lower delay value and a higher delay value. In some such examples, the lower delay value may have a smaller delay variation than the higher delay value.
在一些實例中,非延遲脈衝回應可為最小相位濾波器回應。 In some examples, the undelayed pulse response can be a minimum phase filter response.
根據一些實例,自左耳及右耳HRTF之各者提取各左耳非延遲脈衝回應、各右耳非延遲脈衝回應、各左耳延遲及各右耳延遲可涉及:判定第一組HRTF之一原始HRTF之一頻率回應;判定原始HRTF之一幅值回應;判定一新的非延遲最小相位濾波器之一最小相位頻率回應;判定原始HRTF之一相位回應及新的非延遲最小相位濾波器之一相位回應;及至少部分基於原始HRTF濾波器之相位回應及新的非延遲最小相位濾波器之相位回應來判定與原始HRTF濾波器相關聯之一延遲。在一些此等實例中,判定最小相位頻率回應可涉及實施涉及原始HRTF濾波器之幅值回應之一希爾伯特變換。根據一些實例,判定與原始HRTF濾波器相關聯之延遲亦可至少部分基於在300Hz至1600Hz之一範圍內之一延遲量測頻 率。 According to some examples, extracting each left ear non-delayed pulse response, each right ear non-delayed pulse response, each left ear delay, and each right ear delay from each of the left ear and right ear HRTFs may involve: determining a frequency response of an original HRTF of a first set of HRTFs; determining an amplitude response of the original HRTF; determining a minimum phase frequency response of a new non-delayed minimum phase filter; determining a phase response of the original HRTF and a phase response of the new non-delayed minimum phase filter; and determining a delay associated with the original HRTF filter based at least in part on the phase response of the original HRTF filter and the phase response of the new non-delayed minimum phase filter. In some such examples, determining the minimum phase frequency response may involve performing a Hilbert transform on the magnitude response of the original HRTF filter. According to some examples, determining the delay associated with the original HRTF filter may also be based at least in part on a delay measurement frequency in a range of 300 Hz to 1600 Hz.
在一些實例中,針對高於臨限頻率之頻率,一全通相位回應可偏離一線性斜坡相位回應且可平滑接近零相位。圖5之替代相位曲線412提供一個此實例。 In some examples, for frequencies above a threshold frequency, an all-pass phase response may deviate from a linearly ramped phase response and may smooth out to near zero phase. The alternative phase curve 412 of FIG. 5 provides an example of this.
根據一些實例,經組態以實施方法2100之一控制系統亦經組態以實施沉浸式語音及音訊服務(IVAS)之一編解碼器之至少部分。圖1D展示一個此實例。 According to some examples, a control system configured to implement method 2100 is also configured to implement at least a portion of a codec for an immersive voice and audio service (IVAS). FIG1D shows one such example.
上述描述繪示本發明之各種實施例以及可如何實施本發明之態樣之實例。上述實例及實施例不應被視為唯一實例,而是經呈現以繪示如由以下申請專利範圍界定之本發明之靈活性及優點。基於上述揭示內容及以下申請專利範圍,熟習技術者將明白其他配置、實施例、實施方案及等效物且可在不背離如由申請專利範圍界定之本發明之精神及範疇之情況下採用。 The above description illustrates various embodiments of the present invention and examples of how the present invention may be implemented. The above examples and embodiments should not be construed as exclusive examples, but rather are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, a person skilled in the art will recognize that other configurations, embodiments, implementations, and equivalents may be employed without departing from the spirit and scope of the present invention as defined by the following claims.
2100:方法 2100: Methods
2105:區塊 2105: Block
2110:區塊 2110: Block
2115:區塊 2115: Block
Claims (26)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363455539P | 2023-03-29 | 2023-03-29 | |
| US63/455,539 | 2023-03-29 | ||
| US202363595752P | 2023-11-02 | 2023-11-02 | |
| US63/595,752 | 2023-11-02 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202447608A TW202447608A (en) | 2024-12-01 |
| TWI899919B true TWI899919B (en) | 2025-10-01 |
Family
ID=94735507
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW113111623A TWI899919B (en) | 2023-03-29 | 2024-03-28 | Method for creation of linearly interpolated head related transfer functions |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI899919B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104205878A (en) * | 2012-03-23 | 2014-12-10 | 杜比实验室特许公司 | Method and system for head-related transfer function generation by linear mixing of head-related transfer functions |
| CN110035376B (en) * | 2017-12-21 | 2021-04-20 | 高迪音频实验室公司 | Audio signal processing method and apparatus for binaural rendering using phase response characteristics |
| US20210168554A1 (en) * | 2019-04-30 | 2021-06-03 | Shenzhen Voxtech Co., Ltd. | Acoustic output apparatus |
| TW202309881A (en) * | 2021-07-08 | 2023-03-01 | 美商博姆雲360公司 | Colorless generation of elevation perceptual cues using all-pass filter networks |
-
2024
- 2024-03-28 TW TW113111623A patent/TWI899919B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104205878A (en) * | 2012-03-23 | 2014-12-10 | 杜比实验室特许公司 | Method and system for head-related transfer function generation by linear mixing of head-related transfer functions |
| CN110035376B (en) * | 2017-12-21 | 2021-04-20 | 高迪音频实验室公司 | Audio signal processing method and apparatus for binaural rendering using phase response characteristics |
| US20210168554A1 (en) * | 2019-04-30 | 2021-06-03 | Shenzhen Voxtech Co., Ltd. | Acoustic output apparatus |
| TW202309881A (en) * | 2021-07-08 | 2023-03-01 | 美商博姆雲360公司 | Colorless generation of elevation perceptual cues using all-pass filter networks |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202447608A (en) | 2024-12-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105340298B (en) | The stereo presentation of spherical harmonics coefficient | |
| CN111149155B (en) | Devices and methods for generating enhanced sound field descriptions using multi-point sound field descriptions | |
| RU2759160C2 (en) | Apparatus, method, and computer program for encoding, decoding, processing a scene, and other procedures related to dirac-based spatial audio encoding | |
| JP4944902B2 (en) | Binaural audio signal decoding control | |
| US9794721B2 (en) | System and method for capturing, encoding, distributing, and decoding immersive audio | |
| JP7612987B2 (en) | Audio encoding and decoding method and apparatus | |
| CN102348158B (en) | Apparatus for determining a spatial output multi-channel audio signal | |
| CN112492501B (en) | Audio encoding and decoding using rendering transformation parameters | |
| WO2019193248A1 (en) | Spatial audio parameters and associated spatial audio playback | |
| BR112020000779A2 (en) | apparatus for generating an improved sound field description, apparatus for generating a modified sound field description from a sound field description and metadata with respect to the spatial information of the sound field description, method for generating an improved sound field description, method for generating a modified sound field description from a sound field description and metadata with respect to the spatial information of the sound field description, computer program and enhanced sound field description. | |
| TWI745795B (en) | APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING, DECODING, SCENE PROCESSING AND OTHER PROCEDURES RELATED TO DirAC BASED SPATIAL AUDIO CODING USING LOW-ORDER, MID-ORDER AND HIGH-ORDER COMPONENTS GENERATORS | |
| JP2009522895A (en) | Decoding binaural audio signals | |
| CN112511965B (en) | Method and apparatus for generating binaural signals from stereo signals using upmix binaural rendering | |
| CN110326310A (en) | The dynamic equalization that crosstalk is eliminated | |
| JP2023551016A (en) | Audio encoding and decoding method and device | |
| WO2017132082A1 (en) | Acoustic environment simulation | |
| EP4042723A1 (en) | Spatial audio representation and rendering | |
| WO2022133128A1 (en) | Binaural signal post-processing | |
| EP3808106A1 (en) | Spatial audio capture, transmission and reproduction | |
| TWI899919B (en) | Method for creation of linearly interpolated head related transfer functions | |
| AU2024249241A1 (en) | Method for creation of linearly interpolated head related transfer functions | |
| US20250247660A1 (en) | Ambisonic Decoder Filters | |
| WO2025036543A1 (en) | Devices and methods for binaural audio rendering | |
| EA042232B1 (en) | ENCODING AND DECODING AUDIO USING REPRESENTATION TRANSFORMATION PARAMETERS | |
| EA047653B1 (en) | AUDIO ENCODING AND DECODING USING REPRESENTATION TRANSFORMATION PARAMETERS |