JP2014041240A

JP2014041240A - Time scaling method, pitch shift method, audio data processing device and program

Info

Publication number: JP2014041240A
Application number: JP2012183083A
Authority: JP
Inventors: Yoshihisa Furukawa; 善久古川
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2014-03-06

Abstract

PROBLEM TO BE SOLVED: To convert into a high quality sound in a phase continuous processing using a FFT system for achieving time scaling and a pitch shift.SOLUTION: An audio data processing device includes: a FFT section 21 for converting digital audio data into an amplitude and a phase for each frequency component; and a phase continuous processing section 33 which estimates a phase after time expansion using a differential value between a phase obtained by FFT converting the digital audio data again at execution timing different by a time expansion length from an execution timing of the FFT section 21 and the phase obtained by the FFT section 21 as a phase change amount, and performs phase continuous processing.

Description

本発明は、デジタルオーディオデータのタイムスケーリングまたはピッチシフトを行うタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムに関するものである。 The present invention relates to a time scaling method, a pitch shift method, an audio data processing apparatus, and a program for performing time scaling or pitch shift of digital audio data.

デジタルオーディオデータの音高を変えずに時間軸上の長さを伸張および圧縮する「タイムスケーリング」、およびデジタルオーディオデータの音高だけを変更して時間軸上の長さを変更しない「ピッチシフト（キーコントロール）」を実現する方法として、ＦＦＴ（Fast Fourier Transform，高速フーリエ変換）方式が知られている。例えば、特許文献１には、ＦＦＴ方式を用いて入力サンプリング数と出力サンプリング数とを変更することで、タイムスケーリングを行う方法が記載されている。また、特許文献２には、ＦＦＴ方式を用いてタイムスケーリングを行う際に、入力オーバーラップサンプリング数と出力オーバーラップサンプリング数とを変更したときのトランジェント（打撃音）の開始ずれを補正する方法が記載されている。 "Time scaling" that extends and compresses the length on the time axis without changing the pitch of the digital audio data, and "Pitch shift" that changes only the pitch of the digital audio data and does not change the length on the time axis As a method for realizing (key control), an FFT (Fast Fourier Transform) method is known. For example, Patent Document 1 describes a method for performing time scaling by changing the number of input samplings and the number of output samplings using an FFT method. Patent Document 2 discloses a method for correcting a start deviation of a transient (battering sound) when the input overlap sampling number and the output overlap sampling number are changed when performing time scaling using the FFT method. Have been described.

ところが、ＦＦＴ方式を用いると、アタックが急峻な打楽器音（リズム音）の場合、アタック部が時間軸方向に間延びし、アタック感が失われるという音質劣化が生じる。これは、例えば上記のようにＦＦＴ方式を用いて入出力のサンプリング数を変更することでタイムスケーリングを実現する場合、原音の位相のままでは、次のＦＦＴ演算との間で位相が不連続になってしまうため、位相が不連続にならないように、位相を連続化する処理（以下、「位相連続処理」と称する）が必要となると共に、ピッチシフトを行う場合は周波数領域で周波数シフトを行なうため、ＦＦＴにより計算した位相を周波数シフト後の位相とすることができず、周波数シフト後の周波数成分ごとに位相連続処理が必要となり、その結果、原音とはまったく違う位相になるためである。つまり、従来のＦＦＴ方式では、ＦＦＴにより計算された位相を周波数シフト後には別の値に変換して用いなければならないため、周波数成分間でアタックを失わないために保つべき位相関係が失われ、アタック感の消失を防止できなかった。この問題を解決するため、特許文献３では、振幅および／または位相の時間変化率の演算結果からアタック音が検出された場合、位相連続処理ではなく、ＦＦＴ変換した位相そのものを用いて位相リセット処理を行うことにより、アタック感を再現している。 However, when the FFT method is used, in the case of a percussion instrument sound (rhythm sound) with a sharp attack, the attack portion extends in the direction of the time axis, resulting in a deterioration in sound quality in which the sense of attack is lost. For example, when the time scaling is realized by changing the number of input / output samples using the FFT method as described above, the phase is discontinuous with the next FFT operation as it is with the original sound phase. Therefore, in order to prevent the phase from becoming discontinuous, a process for making the phase continuous (hereinafter referred to as “phase continuous process”) is required, and when the pitch shift is performed, the frequency shift is performed in the frequency domain. Therefore, the phase calculated by FFT cannot be made the phase after the frequency shift, and the phase continuation processing is required for each frequency component after the frequency shift, and as a result, the phase is completely different from the original sound. That is, in the conventional FFT method, the phase calculated by the FFT must be converted into a different value after the frequency shift, so that the phase relationship to be maintained in order not to lose the attack between the frequency components is lost. Loss of attack could not be prevented. In order to solve this problem, in Patent Document 3, when an attack sound is detected from the calculation result of the time change rate of the amplitude and / or phase, the phase reset process is performed using the FFT-converted phase itself instead of the phase continuous process. By doing, the feeling of attack is reproduced.

特表２００７−５１９９６７号公報Special table 2007-519967 米国特許７５６５２８９号公報US Pat. No. 7,565,289 特開２０１２−２８５８号公報JP 2012-2858 A

ところが、従来のＦＦＴ方式は、アタック部以外においても音の劣化が生じるといった問題があった。図１２は、従来例に係る位相連続処理の概念図である。同図上段は、２０４８サンプルのＦＦＴデータが、入力オーバーラップ数Ｎ（Ｎ＝（時間伸縮率）×５１２）間隔で入力されることを示している。また、同図下段は、２０４８サンプルのＩＦＦＴデータが、出力オーバーラップ数５１２（固定値）間隔で出力されることを示している。また、同図において、「ｉ」はＦＦＴ演算回数、「ｊ」はＦＦＴ周波数ｂｉｎ番号を示している。また、従来例に係る連続位相計算式は、同図中段の（式Ｂ）に示すとおりである。（式Ｂ）において、「Ｆ_s」はサンプリング周波数を示している。当該連続位相計算式に示すように、従来例に係る位相連続処理では、位相の変化量から真の周波数を推定し、時間伸縮長から時間伸縮後の位相を演算している。このため、（１）推定誤差が蓄積してしまう、（２）周波数成分が独立でない場合に誤差が生じてしまう、などの問題があり、音質劣化を招いていた。 However, the conventional FFT method has a problem that sound is deteriorated even in portions other than the attack portion. FIG. 12 is a conceptual diagram of phase continuation processing according to a conventional example. The upper part of the figure shows that 2048 samples of FFT data are input at an input overlap number N (N = (time expansion / contraction rate) × 512). The lower part of the figure shows that 2048 samples of IFFT data are output at intervals of the output overlap number 512 (fixed value). In the figure, “i” indicates the number of FFT operations, and “j” indicates the FFT frequency bin number. The continuous phase calculation formula according to the conventional example is as shown in (Formula B) in the middle of the figure. In (Formula B), “F _s ” indicates the sampling frequency. As shown in the continuous phase calculation formula, in the phase continuous processing according to the conventional example, the true frequency is estimated from the phase change amount, and the phase after time expansion / contraction is calculated from the time expansion / contraction length. For this reason, there are problems such as (1) accumulation of estimation errors, and (2) errors when frequency components are not independent, leading to sound quality degradation.

本発明は、上記の問題点に鑑み、ＦＦＴ方式を用いて位相連続処理を行う場合に、高品質な音に変換可能なタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムを提供することを目的とする。 In view of the above problems, the present invention provides a time scaling method, a pitch shift method, an audio data processing device, and a program that can be converted into high-quality sound when phase continuous processing is performed using the FFT method. With the goal.

本発明のタイムスケーリング方法は、デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する第１の周波数変換ステップと、デジタルオーディオデータを、第１の周波数変換ステップの実行タイミングから時間伸縮長だけ異なる実行タイミングで、周波数成分ごとの振幅と位相に変換する第２の周波数変換ステップと、第１の周波数変換ステップで得られた位相と、第２の周波数変換ステップで得られた位相の差分値を、位相変化量とし、時間伸縮後の位相を推定する位相推定ステップと、を実行することを特徴とする。 The time scaling method of the present invention includes a first frequency conversion step for converting digital audio data into amplitude and phase for each frequency component, and digital audio data from the execution timing of the first frequency conversion step by a time expansion / contraction length. Second frequency conversion step for converting the amplitude and phase for each frequency component at different execution timings, the phase obtained in the first frequency conversion step, and the phase difference value obtained in the second frequency conversion step And a phase estimation step of estimating a phase after time expansion and contraction.

本発明のオーディオデータ処理装置は、デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する第１の周波数変換手段と、デジタルオーディオデータを、第１の周波数変換手段の実行タイミングから時間伸縮長だけ異なる実行タイミングで、周波数成分ごとの振幅と位相に変換する第２の周波数変換手段と、第１の周波数変換手段で得られた位相と、第２の周波数変換手段で得られた位相の差分値を、位相変化量とし、時間伸縮後の位相を推定する位相推定手段と、を備えたことを特徴とする。 The audio data processing apparatus according to the present invention includes a first frequency conversion unit that converts digital audio data into amplitude and phase for each frequency component, and the digital audio data is subjected to time expansion and contraction length from the execution timing of the first frequency conversion unit. The difference between the phase obtained by the second frequency conversion means, the phase obtained by the first frequency conversion means, and the second frequency conversion means for converting the amplitude and phase for each frequency component at different execution timings Phase estimation means for estimating the phase after time expansion and contraction using the value as a phase change amount.

これらの構成によれば、第１の周波数変換ステップ（第１の周波数変換手段）で得られた位相と、第２の周波数変換ステップ（第２の周波数変換手段）で得られた位相の差分値から位相変化量を求め、当該位相変化量から時間伸縮後の位相を推定するため、位相の変化量から真の周波数を推定し、時間伸縮長から時間伸縮後の位相を演算する従来の位相演算処理と比較して、誤差要因が少ない。このため、ＦＦＴ方式を用いた位相連続処理による音質劣化を防止できる。 According to these configurations, the difference value between the phase obtained in the first frequency conversion step (first frequency conversion means) and the phase obtained in the second frequency conversion step (second frequency conversion means). Conventional phase calculation that calculates the phase change after time expansion by estimating the true frequency from the amount of change in phase and calculating the phase after time expansion / contraction from the phase change amount Compared with processing, there are fewer error factors. For this reason, it is possible to prevent deterioration in sound quality due to phase continuous processing using the FFT method.

上記のタイムスケーリング方法において、時間伸縮長は、時間伸縮率と、出力オーバーラップ数の乗算値に基づいて演算される長さであることを特徴とする。 In the time scaling method, the time expansion / contraction length is a length calculated based on a product of a time expansion / contraction ratio and an output overlap number.

この構成によれば、出力オーバーラップ数を固定とした場合、時間伸縮率と出力オーバーラップ数の乗算値を、入力オーバーラップ数として算出できる。つまり、入力オーバーラップ数を可変することで、時間伸縮長（タイムスケーリング量）を可変できる。 According to this configuration, when the output overlap number is fixed, a multiplication value of the time expansion ratio and the output overlap number can be calculated as the input overlap number. That is, by varying the number of input overlaps, the time expansion / contraction length (time scaling amount) can be varied.

上記のタイムスケーリング方法において、振幅および／または位相の時間変化率の演算結果を用いて異なる位相切替判別を行う複数の位相切替判別処理の処理結果に応じ、周波数成分ごとの位相が、第１の周波数変換ステップの演算結果そのものとして位相のリセット処理を行う位相リセット処理と、周波数成分ごとの位相が、第１の周波数変換ステップの前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う位相連続処理と、のいずれの位相演算処理を行うかを判別する位相演算処理判別ステップと、位相演算処理判別ステップの判別結果に応じて、位相リセット処理または位相連続処理を行う位相演算処理ステップと、を実行し、第１の周波数変換ステップ、第２の周波数変換ステップおよび位相推定ステップは、位相連続処理を行う場合に実行されることを特徴とする。 In the time scaling method described above, the phase for each frequency component is determined according to the processing results of the plurality of phase switching determination processes for performing different phase switching determination using the calculation result of the time change rate of the amplitude and / or phase. Phase reset processing that performs phase reset processing as the calculation result of the frequency conversion step itself, and the phase for each frequency component as a continuous change taking into account time expansion and contraction from the previous calculation result of the first frequency conversion step The phase continuation process is performed, the phase calculation process determination step for determining which phase calculation process is performed, and the phase reset process or the phase continuation process is performed according to the determination result of the phase calculation process determination step. Performing a phase calculation processing step, a first frequency conversion step, a second frequency conversion step, and a phase estimation Step is characterized in that it is executed when performing the phase continuous process.

この構成によれば、振幅および／または位相の時間変化率の演算結果を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行うことにより、急峻な音の立ち上がりなどを検出できる。また、当該複数の位相切替判別処理の処理結果に応じて、適切な位相演算処理（位相リセット処理および位相連続処理のいずれか）を行うため、アタック感の消失を防止できる。つまり、振幅および／または位相の時間変化率の演算結果から、急峻な音の立ち上がりなどが検出された場合は、位相連続処理ではなく、ＦＦＴ変換した位相そのものを用いて位相リセット処理を行うため、アタック感を再現することが可能となる。これにより、アタックが緩いロングトーンの音（メロディ音）だけでなく、アタックが急峻な打楽器音（リズム音）についても、ＦＦＴ方式を用いた高品質なタイムスケーリングが可能となる。 According to this configuration, it is possible to detect a steep rise in sound or the like by performing a plurality of phase switching determination processes for performing different phase switching determination using the calculation result of the time change rate of the amplitude and / or phase. Moreover, since appropriate phase calculation processing (either phase reset processing or phase continuation processing) is performed according to the processing results of the plurality of phase switching determination processing, it is possible to prevent a sense of attack from being lost. In other words, when a steep rise in sound is detected from the calculation result of the time change rate of the amplitude and / or phase, the phase reset process is performed using the FFT-converted phase itself, not the phase continuous process. It becomes possible to reproduce the feeling of attack. As a result, not only a long tone sound (melody sound) with a weak attack but also a percussion instrument sound (rhythm sound) with a sharp attack can be subjected to high-quality time scaling using the FFT method.

上記のタイムスケーリング方法において、複数の位相切替判別処理は、異なる周波数帯域ごとにアタック部の有無を判別するものであり、位相演算処理ステップでは、複数の位相切替判別処理の判別によりアタック部「有」と判別された場合、位相リセット処理を行い、アタック部「無」と判別された場合、位相連続処理を行うことを特徴とする。 In the time scaling method described above, the plurality of phase switching determination processes determine the presence or absence of an attack section for each different frequency band. In the phase calculation processing step, the attack section “present” is determined by the determination of the plurality of phase switching determination processes. ”Is performed, a phase reset process is performed. If it is determined that the attack part is“ none ”, a phase continuation process is performed.

この構成によれば、異なる周波数帯域ごとにアタック部の有無を判別するため、正確にアタック部を検出することができる。 According to this configuration, since the presence / absence of the attack portion is determined for each different frequency band, the attack portion can be accurately detected.

上記のタイムスケーリング方法において、位相演算処理ステップによる位相演算処理後の各周波数成分を、デジタルオーディオデータに変換する周波数逆変換ステップと、周波数逆変換ステップによる周波数逆変換処理時に、時間伸縮率に比例して周波数逆変換後のデジタルオーディオデータのデータ数を増減させる時間伸縮演算ステップと、をさらに実行することを特徴とする。 In the above time scaling method, each frequency component after the phase calculation process in the phase calculation process step is proportional to the time expansion ratio at the time of the frequency reverse conversion step for converting to digital audio data and the frequency reverse conversion process by the frequency reverse conversion step. Then, a time expansion / contraction calculation step of increasing / decreasing the number of digital audio data after frequency inverse transformation is further performed.

この構成によれば、位相演算処理ステップの後、周波数逆変換ステップおよび時間伸縮演算ステップを実行することにより、デジタルオーディオデータの音高を変えずに時間軸上の長さを伸張および圧縮するタイムスケーリングを実現できる。 According to this configuration, the time for extending and compressing the length on the time axis without changing the pitch of the digital audio data by executing the frequency reverse conversion step and the time expansion / contraction calculation step after the phase calculation processing step. Scaling can be realized.

本発明のピッチシフト方法は、上記のタイムスケーリング方法における各ステップと、デジタルオーディオデータのサンプリング周波数を変更することで、時間伸縮および音高変更を行うサンプリングレート変換演算ステップと、を実行し、タイムスケーリング方法の各ステップによる時間伸縮長と、サンプリングレート変換演算ステップによる時間伸縮長とが相殺され、音高のみが変更されることを特徴とする。 The pitch shift method of the present invention executes each step in the time scaling method described above, and a sampling rate conversion calculation step for changing the sampling frequency of the digital audio data to perform time expansion / contraction and pitch change. The time expansion / contraction length in each step of the scaling method and the time expansion / contraction length in the sampling rate conversion calculation step are offset, and only the pitch is changed.

従来のＦＦＴを用いて周波数領域で周波数シフトを行なう方式のピッチシフトでは、ＦＦＴによって計算された位相は周波数シフト後には別の値に変換して用いなければならないため、周波数成分間でアタックを失わないために保つべき位相関係が失われるので位相リセット処理を正しく行なえずアタック感の消失を防止できない。これに対し、サンプリングレート変換法を用いる構成では、周波数領域で周波数シフトを行わないためＦＦＴにより計算した位相をアタック部分においてはピッチシフト変換音の位相とすることができるので、位相リセット処理により、アタック感の消失を防止できる。また、周波数シフト処理の誤差要因が少ないため、サンプリングレート変換法を使用しない従来のＦＦＴ方式と比較すると、アタック部以外の音質低下も防止でき、高品質なピットシフトが可能となる。 In the pitch shift of the frequency shift method using the conventional FFT in the frequency domain, the phase calculated by the FFT must be converted to a different value after the frequency shift, so that the attack between the frequency components is lost. Therefore, the phase relationship to be maintained is lost, so that the phase reset process cannot be performed correctly and the loss of the sense of attack cannot be prevented. On the other hand, in the configuration using the sampling rate conversion method, since the frequency shift is not performed in the frequency domain, the phase calculated by the FFT can be the phase of the pitch shift converted sound in the attack portion. Loss of attack feeling can be prevented. In addition, since there are few error factors in the frequency shift process, compared with the conventional FFT method that does not use the sampling rate conversion method, it is possible to prevent deterioration in sound quality other than the attack portion, and high-quality pit shift is possible.

本発明のプログラムは、コンピューターに、上記のタイムスケーリング方法における各ステップを実行させることを特徴とする。 A program according to the present invention causes a computer to execute each step in the time scaling method described above.

本発明のプログラムは、コンピューターに、上記のピッチシフト方法における各ステップを実行させることを特徴とする。 A program according to the present invention causes a computer to execute each step in the above-described pitch shift method.

これらのプログラムを実行することにより、ＦＦＴ方式を用いて位相連続処理を行う場合に、高品質な音に変換可能なタイムスケーリング方法またはピッチシフト方法を実現できる。 By executing these programs, it is possible to realize a time scaling method or a pitch shift method that can be converted into a high-quality sound when phase continuation processing is performed using the FFT method.

第１実施形態に係る再生装置と、その一部であるオーディオデータ処理部の簡易ブロック図である。1 is a simplified block diagram of a playback apparatus according to a first embodiment and an audio data processing unit that is a part of the playback apparatus. FIG. 第１実施形態に係るオーディオデータ処理部のブロック図である。It is a block diagram of an audio data processing unit according to the first embodiment. オーディオデータ処理部によるピッチシフト処理を示すフローチャートである。It is a flowchart which shows the pitch shift process by an audio data processing part. 第１実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 1st Embodiment. 第１実施形態に係る位相連続処理の概念図である。It is a conceptual diagram of the phase continuous process which concerns on 1st Embodiment. 第２実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 2nd Embodiment. 第３実施形態に係るオーディオデータ処理部のブロック図である。It is a block diagram of the audio data processing part which concerns on 3rd Embodiment. 基準値の根拠を示す補足説明図である。It is a supplementary explanatory view showing the basis of the reference value. 基準値の根拠を示す補足説明図である。It is a supplementary explanatory view showing the basis of the reference value. ピーク位相維持処理の概念図である。It is a conceptual diagram of a peak phase maintenance process. 第３実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 3rd Embodiment. 従来例且つ変形例に係る位相連続処理の概念図である。It is a conceptual diagram of the phase continuation process which concerns on a prior art example and a modification.

以下、本発明の一実施形態に係るタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムについて、添付図面を参照しながら詳細に説明する。本実施形態では、本発明のオーディオデータ処理装置を、ＣＤプレーヤーなどの再生装置に適用した場合について例示する。 Hereinafter, a time scaling method, a pitch shift method, an audio data processing device, and a program according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the present embodiment, the case where the audio data processing device of the present invention is applied to a playback device such as a CD player will be exemplified.

［第１実施形態］
図１（ａ）は、再生装置１の簡易ブロック図である。同図に示すように、再生装置１は、再生部２と、オーディオデータ処理部３（オーディオデータ処理装置）と、バッファメモリ４と、オーディオデータ出力部５と、を備えている。再生部２は、ＣＤなどのデバイスから楽曲・楽音を読み出して再生する。オーディオデータ処理部３は、ＣＰＵ（Central Processing Unit）またはＤＳＰ（Digital Signal Processor）によって主要部が構成され、再生部２によって再生されたデジタルオーディオデータ（以下、単に「オーディオデータ」と称する）をバッファメモリ４に格納すると共に、バッファメモリ４から読み出したオーディオデータに対し、デジタル信号処理を施す。なお、バッファメモリ４は、入力用のバッファメモリ４（以下、「入力バッファ４ａ」と称する）と、出力用のバッファメモリ４（以下、「出力バッファ４ｂ」と称する）と、から成る。オーディオデータ出力部５は、オーディオデータ処理部３による処理後のオーディオデータ（出力バッファ４ｂから読み出したオーディオデータ）を、外部（アンプおよびスピーカーを有する出力装置など）に出力する。 [First Embodiment]
FIG. 1A is a simplified block diagram of the playback device 1. As shown in FIG. 1, the playback device 1 includes a playback unit 2, an audio data processing unit 3 (audio data processing device), a buffer memory 4, and an audio data output unit 5. The playback unit 2 reads and plays music / musical sound from a device such as a CD. The audio data processing unit 3 is constituted by a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), and buffers digital audio data (hereinafter simply referred to as “audio data”) reproduced by the reproduction unit 2. Digital signal processing is performed on the audio data stored in the memory 4 and read out from the buffer memory 4. The buffer memory 4 includes an input buffer memory 4 (hereinafter referred to as “input buffer 4a”) and an output buffer memory 4 (hereinafter referred to as “output buffer 4b”). The audio data output unit 5 outputs the audio data processed by the audio data processing unit 3 (audio data read from the output buffer 4b) to the outside (such as an output device having an amplifier and a speaker).

図１（ｂ）は、オーディオデータ処理部３の一例を示すブロック図である。図１（ｂ）のオーディオデータ処理部３は、主な機能構成として、タイムスケーリング部１１を備えている。タイムスケーリング部１１は、上記のバッファメモリ４（入力バッファ４ａ）から、処理対象となるオーディオデータを取得してタイムスケーリング（時間伸縮変換処理）を行う。なお、本実施形態では、ＦＦＴ方式を用いてタイムスケーリングを行う。 FIG. 1B is a block diagram illustrating an example of the audio data processing unit 3. The audio data processing unit 3 in FIG. 1B includes a time scaling unit 11 as a main functional configuration. The time scaling unit 11 acquires audio data to be processed from the buffer memory 4 (input buffer 4a) and performs time scaling (time expansion / contraction conversion process). In the present embodiment, time scaling is performed using the FFT method.

一方、図１（ｃ）は、オーディオデータ処理部３の他の例を示すブロック図である。図１（ｃ）のオーディオデータ処理部３は、主な機能構成として、ＳＲＣ部１２と、タイムスケーリング部１１と、を備えている。つまり、図１（ｂ）のオーディオデータ処理部３に、ＳＲＣ部１２を追加した構成となっている。 On the other hand, FIG. 1C is a block diagram showing another example of the audio data processing unit 3. The audio data processing unit 3 in FIG. 1C includes an SRC unit 12 and a time scaling unit 11 as main functional configurations. That is, the SRC unit 12 is added to the audio data processing unit 3 in FIG.

ＳＲＣ部１２は、タイムスケーリング部１１によるタイムスケーリングの前あるいは後に、オーディオデータのサンプリング周波数を変更するＳＲＣ処理を行う（サンプリングレート変換演算ステップ）。ＳＲＣ処理は本来デジタルオーディオデータのサンプリング周期を変更するために使われる技術であるが、ＳＲＣ処理を施して新たに求めたサンプリングデータを、サンプリング周波数を元のままとすることで時間伸縮および音高変更が行われる。つまり、図１（ｃ）のオーディオデータ処理部３は、ＳＲＣ部１２とタイムスケーリング部１１によるオーディオデータの時間伸縮長を相殺することで、時間軸上の長さを変更することなく音高のみを変更させるピッチシフトを実現できるようになっている。以下、図１（ｃ）に示したオーディオデータ処理部３により、主にピッチシフトを行う方法について記載する。 The SRC unit 12 performs an SRC process for changing the sampling frequency of the audio data before or after the time scaling by the time scaling unit 11 (sampling rate conversion calculation step). SRC processing is a technique that is originally used to change the sampling period of digital audio data. However, sampling data newly obtained by performing SRC processing is time-expanded and pitched by keeping the sampling frequency as the original. Changes are made. In other words, the audio data processing unit 3 in FIG. 1C cancels the time expansion / contraction length of the audio data by the SRC unit 12 and the time scaling unit 11, so that only the pitch is changed without changing the length on the time axis. The pitch shift which changes can be realized. Hereinafter, a method of mainly performing a pitch shift by the audio data processing unit 3 shown in FIG. 1C will be described.

図２は、オーディオデータ処理部３の詳細な機能構成を示すブロック図である。上記の通り、オーディオデータ処理部３は、ＳＲＣ部１２およびタイムスケーリング部１１から成る。本実施形態では、最初にＳＲＣ処理を行い、その後タイムスケーリングを行うものとする。ＳＲＣ部１２は、原音となるオーディオデータに対してＳＲＣ処理を行う。 FIG. 2 is a block diagram showing a detailed functional configuration of the audio data processing unit 3. As described above, the audio data processing unit 3 includes the SRC unit 12 and the time scaling unit 11. In the present embodiment, it is assumed that the SRC process is first performed and then time scaling is performed. The SRC unit 12 performs SRC processing on the audio data that is the original sound.

一方、タイムスケーリング部１１は、ＦＦＴ部２１、位相演算部２２、逆ＦＦＴ部２３および時間伸縮演算部２４から成る。ＦＦＴ部２１は、オーディオデータを、周波数成分ごとの振幅と位相に変換する（第１の周波数変換ステップ）。つまり、時間領域の音を、周波数領域に変換し、振幅と位相を求める。 On the other hand, the time scaling unit 11 includes an FFT unit 21, a phase calculation unit 22, an inverse FFT unit 23, and a time expansion / contraction calculation unit 24. The FFT unit 21 converts audio data into amplitude and phase for each frequency component (first frequency conversion step). That is, the sound in the time domain is converted into the frequency domain, and the amplitude and phase are obtained.

位相演算部２２は、振幅の時間変化率の演算結果に応じて、位相演算処理を行う。具体的には、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行い、その判別結果に応じた位相演算処理を行う。当該複数の位相切替判別処理は、アタック部を検出するための処理である。同図に示すように、位相演算部２２は、アタック検出部３１、位相リセット処理部３２および位相連続処理部３３を備えている。さらに、アタック検出部３１は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃから成る。 The phase calculation unit 22 performs phase calculation processing according to the calculation result of the time change rate of the amplitude. Specifically, using the normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, a plurality of phase switching determination processes for performing different phase switching determination are performed, and a phase calculation process according to the determination result is performed. . The plurality of phase switching determination processes are processes for detecting an attack unit. As shown in the figure, the phase calculation unit 22 includes an attack detection unit 31, a phase reset processing unit 32, and a phase continuation processing unit 33. Further, the attack detection unit 31 includes an all frequency band detection unit 31a, a frequency band detection unit (A) 31b, and a frequency band detection unit (B) 31c.

全周波数帯域検出部３１ａは、上記の複数の位相切替判別処理の一つである第１の位相切替判別処理により、正規化振幅差分値の合計値が所定の閾値Ｌ１（当該閾値を、以下「高閾値」と称する）以上であるか否かを判別し、前回の演算において高閾値未満であり、且つ今回の演算で高閾値以上である場合、全周波数帯域に対してリセット処理が必要なアタック部を検出したと判定する。具体的例を挙げると、バスドラ等の低音の打楽器の打撃音を検出した場合などである。低音打楽器の場合のアタック部には楽器の音の高さを特徴づける基本の低い周波数成分から上のかなり高音域までの周波数成分が含まれているため、ほぼ全周波数帯にわたるような位相のリセット処理が必要となる。 The total frequency band detection unit 31a performs a first phase switching determination process that is one of the plurality of phase switching determination processes described above, so that the total value of the normalized amplitude difference values is a predetermined threshold value L1 (hereinafter referred to as “the threshold value”). An attack that requires reset processing for all frequency bands if it is less than the high threshold value in the previous computation and greater than or equal to the high threshold value in the current computation. Part is detected. A specific example is a case where a percussion sound of a bass percussion instrument such as a bass drum is detected. In the case of bass percussion instruments, the attack part contains frequency components from the basic low frequency component that characterizes the pitch of the musical instrument to the considerably higher frequency range above, so the phase can be reset over almost the entire frequency band. Processing is required.

また、周波数帯域別検出部（Ａ）３１ｂは、第１の位相切替判別処理により、正規化振幅差分値の合計値が、上記の高閾値未満且つ所定の閾値Ｌ２（当該閾値を、以下「低閾値」と称する）以上である場合（但し、Ｌ１＞Ｌ２）、第２の位相切替判別処理を行う。当該第２の位相切替判別処理は、周波数成分ごとに正規化振幅差分値を低閾値で２値化し且つ高域限定で、周波数成分ごとにアタック部を検出する処理である。ここでは、中域から高域の打撃音（アタック）を検出可能である。 Further, the frequency band detection unit (A) 31b performs the first phase switching determination process so that the total value of the normalized amplitude difference values is less than the above high threshold and the predetermined threshold L2 (hereinafter referred to as “low threshold”). If it is equal to or greater than (referred to as “threshold”) (where L1> L2), a second phase switching determination process is performed. The second phase switching determination process is a process of binarizing the normalized amplitude difference value for each frequency component with a low threshold and detecting an attack part for each frequency component with a high frequency limitation. Here, it is possible to detect a striking sound (attack) from the mid range to the high range.

さらに、周波数帯域別検出部（Ｂ）３１ｃは、第１の位相切替判別処理により、正規化振幅差分値の合計値が所定の閾値Ｌ２未満であると判別した場合、第３の位相切替判別処理を行う。当該第３の位相切替判別処理は、周波数成分ごとに正規化振幅差分値を高い閾値で２値化して、周波数成分ごとにアタック部の有無を検出する処理である。ここでは、ボーカルや弦楽器などによる打撃音を検出可能である。 Further, the frequency band detection unit (B) 31c performs the third phase switching determination process when it is determined by the first phase switching determination process that the total value of the normalized amplitude difference values is less than the predetermined threshold L2. I do. The third phase switching determination process is a process of binarizing the normalized amplitude difference value for each frequency component with a high threshold and detecting the presence or absence of an attack portion for each frequency component. Here, it is possible to detect a hitting sound by a vocal or a stringed instrument.

位相リセット処理部３２は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにおいてアタック部が検出された場合、周波数成分ごとの位相が、ＦＦＴ部２１の演算結果そのものとして位相のリセット処理（以下、「位相リセット処理」と称する）を行う。 The phase reset processing unit 32, when an attack unit is detected in the all frequency band detection unit 31a, the frequency band detection unit (A) 31b, and the frequency band detection unit (B) 31c, A phase reset process (hereinafter referred to as “phase reset process”) is performed as the calculation result of the unit 21 itself.

一方、位相連続処理部３３は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにおいてアタック部が検出されなかった場合、周波数成分（周波数ｂｉｎ）ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理（以下、「位相連続処理」と称する）を行う。このように、本実施形態の位相演算部２２は、正規化振幅差分値の合計値、および個別の周波数成分ごとの値に応じて、位相リセット処理および位相連続処理のいずれかの処理を選択的に行う。 On the other hand, when the attack part is not detected in the all frequency band detection unit 31a, the frequency band detection unit (A) 31b, and the frequency band detection unit (B) 31c, the phase continuation processing unit 33 determines the frequency component (frequency bin). The phase continuation processing (hereinafter referred to as “phase continuation processing”) is performed on the assumption that each phase has changed continuously in consideration of time expansion and contraction from the previous calculation result of the FFT unit 21. As described above, the phase calculation unit 22 of the present embodiment selectively selects either the phase reset process or the phase continuation process according to the total value of the normalized amplitude difference values and the value for each individual frequency component. To do.

ところで、本実施形態の位相連続処理部３３は、ＦＦＴ部２１によるＦＦＴの実行タイミングから時間伸縮長だけ異なる実行タイミングで、２回目のＦＦＴを行う（第２の周波数変換ステップ）。そして、ＦＦＴ部２１で得られた位相と、２回目のＦＦＴで得られた位相の差分値を位相変化量とし、時間伸縮後の位相を推定する（位相推定ステップ）。つまり、変換前オーディオデータから直接的に時間伸縮後の位相変化量を取得する。詳細については、図５を参照して後述する。 By the way, the phase continuation processing unit 33 of the present embodiment performs the second FFT at the execution timing different from the FFT execution timing by the FFT unit 21 by the time expansion / contraction length (second frequency conversion step). Then, the difference value between the phase obtained by the FFT unit 21 and the phase obtained by the second FFT is used as a phase change amount, and the phase after time expansion / contraction is estimated (phase estimation step). That is, the phase change amount after time expansion / contraction is acquired directly from the pre-conversion audio data. Details will be described later with reference to FIG.

逆ＦＦＴ部２３は、位相演算部２２による位相演算処理後の各周波数成分を、オーディオデータに変換する（周波数逆変換ステップ）。つまり、周波数領域の振幅と位相を、時間領域の音に変換する。 The inverse FFT unit 23 converts each frequency component after the phase calculation processing by the phase calculation unit 22 into audio data (frequency reverse conversion step). That is, the amplitude and phase in the frequency domain are converted into sound in the time domain.

時間伸縮演算部２４は、逆ＦＦＴ部２３による周波数逆変換処理時に、時間伸縮率に比例してデータ数を増減させる（時間伸縮演算ステップ）。具体的には、ＳＲＣ部１２によるオーディオデータの時間伸縮長を相殺するように時間伸縮する。なお、時間伸縮の方法については、逆ＦＦＴ部２３により演算された時間領域のオーディオデータをＦＦＴ時にずらしたサンプル数より時間伸縮率に比例して変化させたサンプル数だけずらすことにより実現する。時間伸縮演算部２４による演算処理後のオーディオデータは、変換音として出力される。 The time expansion / contraction calculation unit 24 increases / decreases the number of data in proportion to the time expansion / contraction rate at the time of frequency inverse conversion processing by the inverse FFT unit 23 (time expansion / contraction calculation step). Specifically, the time expansion / contraction is performed so as to cancel out the time expansion / contraction length of the audio data by the SRC unit 12. The time expansion / contraction method is realized by shifting the time-domain audio data calculated by the inverse FFT unit 23 by the number of samples changed in proportion to the time expansion / contraction rate from the number of samples shifted during the FFT. The audio data after the arithmetic processing by the time expansion / contraction arithmetic unit 24 is output as converted sound.

なお、ステレオ再生の場合、本実施形態では、各部（ＳＲＣ部１２、ＦＦＴ部２１、位相演算部２２、逆ＦＦＴ部２３および時間伸縮演算部２４）において、左右の音を独立して処理する。 In the case of stereo reproduction, in this embodiment, the left and right sounds are processed independently in each unit (SRC unit 12, FFT unit 21, phase calculation unit 22, inverse FFT unit 23, and time expansion / contraction calculation unit 24).

次に、図３および図４のフローチャートを参照し、第１実施形態に係るピッチシフト処理の流れについて説明する。まず、オーディオデータ処理部３は、初期化処理を行い（ＦＦＴ演算回数ｉ＝１とする，Ｓ０１）、入力バッファ４ａからオーディオデータを取得する（Ｓ０２）。続いて、ＳＲＣ部１２によりＳＲＣ処理を行い（Ｓ０３）、その後Ｓ０４以降のタイムスケーリングを開始する。 Next, the flow of the pitch shift process according to the first embodiment will be described with reference to the flowcharts of FIGS. 3 and 4. First, the audio data processing unit 3 performs an initialization process (FFT operation count i = 1, S01), and acquires audio data from the input buffer 4a (S02). Subsequently, SRC processing is performed by the SRC unit 12 (S03), and then time scaling after S04 is started.

タイムスケーリングでは、まず、入力窓関数（ハニング窓関数）を乗じ（Ｓ０４）、ｉ回目のＦＦＴを行う（Ｓ０５）。また、周波数成分、すなわちＦＦＴ周波数ｂｉｎ番号ｊをｊ＝０とし（Ｓ０６）、位相振幅計算を行う（Ｓ０７）。以上、Ｓ０３〜Ｓ０７は、ＦＦＴ部２１による処理工程である。 In time scaling, first, the input window function (Hanning window function) is multiplied (S04), and the i-th FFT is performed (S05). Further, the frequency component, that is, the FFT frequency bin number j is set to j = 0 (S06), and the phase amplitude calculation is performed (S07). As described above, S03 to S07 are processing steps by the FFT unit 21.

続いて、オーディオデータ処理部３は、位相演算部２２により位相演算処理を行う（Ｓ０８）。当該位相演算処理については、図４にて後述する。オーディオデータ処理部３は、位相演算処理を終えると、振幅と位相を複素数化し（Ｓ０９）、ＦＦＴ周波数ｂｉｎ番号ｊが、ＦＦＴサンプル数ｎ_FFTの半分に達したか否か、すなわち「ｊ＝＝ｎ_FFT／２」に達したか否かを判別する（Ｓ１０）。ここで、ＦＦＴサンプル数ｎ_FFTの半分に達していない場合は（Ｓ１０：Ｎｏ）、ＦＦＴ周波数ｂｉｎ番号ｊをカウントアップして（Ｓ１１）、Ｓ０７に戻る。また、ＦＦＴサンプル数ｎ_FFTの半分に達した場合は（Ｓ１０：Ｙｅｓ）、複素数化したデータの共役複素数を残り半分の負の周波数成分の複素数データとしてＩＦＦＴを行う（Ｓ１２）。以上、Ｓ０９〜Ｓ１２は、逆ＦＦＴ部２３による処理工程である。 Subsequently, the audio data processing unit 3 performs phase calculation processing by the phase calculation unit 22 (S08). The phase calculation process will be described later with reference to FIG. When the audio data processing unit 3 finishes the phase calculation process, the amplitude and phase are converted into complex numbers (S09), and whether or not the FFT frequency bin number j has reached half of the FFT sample number n _FFT , that is, “j == It is determined whether or not “n _FFT / 2” has been reached (S10). If the number of FFT samples n has not reached half of the _FFT (S10: No), the FFT frequency bin number j is counted up (S11), and the process returns to S07. If the number of FFT samples n has reached half of the _FFT (S10: Yes), IFFT is performed using the complex complex data of the complex data as complex data of the negative frequency component of the other half (S12). As described above, S09 to S12 are processing steps by the inverse FFT unit 23.

続いて、オーディオデータ処理部３は、出力窓関数（ハニング窓関数）を乗じ（Ｓ１３）、ＳＲＣ率をキャンセルすべく、入力オーバーラップ数に時間伸縮率（タイムストレッチ率）を乗算して、出力ポインタを移動する（Ｓ１４）。また、これを出力バッファ４ｂに書き込んで（出力バッファ４ｂに加算して，Ｓ１５）、変換音として出力する。以上、Ｓ１３〜Ｓ１５は、時間伸縮演算部２４による処理工程である。なお、この実施例では出力窓関数もＦＦＴ前と同じハニング窓としたが、必ずしも同じである必要はなく、別の窓関数を選んでもよい。 Subsequently, the audio data processing unit 3 multiplies the output window function (Hanning window function) (S13), multiplies the input overlap number by the time expansion ratio (time stretch ratio) to cancel the SRC ratio, and outputs the result. The pointer is moved (S14). Further, this is written into the output buffer 4b (added to the output buffer 4b, S15) and output as converted sound. As described above, S13 to S15 are processing steps by the time expansion / contraction calculation unit 24. In this embodiment, the output window function is also the same Hanning window as before the FFT, but it is not necessarily the same, and another window function may be selected.

その後、オーディオデータ処理部３は、入力オーバーラップ数の入力ポインタを移動し（Ｓ１６）、入力バッファ４ａにオーディオデータが残っているか否かを判別する（Ｓ１７）。ここで、オーディオデータが残っている場合は（Ｓ１７：データあり）、ＦＦＴ演算回数ｉをカウントアップして（Ｓ１８）、Ｓ０２に戻る。また、オーディオデータが残っていない場合は（Ｓ１７：データなし）、ピッチシフト処理を終了する。 Thereafter, the audio data processing unit 3 moves the input pointer for the number of input overlaps (S16), and determines whether or not audio data remains in the input buffer 4a (S17). If audio data remains (S17: data present), the FFT operation count i is counted up (S18), and the process returns to S02. If no audio data remains (S17: no data), the pitch shift process ends.

次に、図４を参照し、図３のＳ０８に相当する位相演算処理について説明する。オーディオデータ処理部３（位相演算部２２）は、まず、振幅の差分を演算し（Ｓ２１）、正規化振幅差分値を求める（Ｓ２２）。すなわち、振幅の時間変化率をさらに振幅で除算することにより正規化振幅差分値を求める。但し、振幅が０であるか、非常に微小である場合は、除算できないか、除算の結果が適切でなくなる可能性があるため、例外処理として正規化振幅差分値も０とする。ここで、i回目の正規化振幅差分値の合計値（図４では、「Σi」と表記）が、高閾値以上であるか、低閾値以上高閾値未満であるか、低閾値未満であるかを判別する（Ｓ２３，第１の位相切替判別処理）。 Next, the phase calculation process corresponding to S08 of FIG. 3 will be described with reference to FIG. The audio data processing unit 3 (phase calculation unit 22) first calculates an amplitude difference (S21) and obtains a normalized amplitude difference value (S22). That is, the normalized amplitude difference value is obtained by further dividing the amplitude time change rate by the amplitude. However, if the amplitude is 0 or very small, division may not be possible or the result of the division may not be appropriate, so the normalized amplitude difference value is also set to 0 as exception processing. Here, whether the total value of the i-th normalized amplitude difference value (indicated as “Σi” in FIG. 4) is a high threshold value or more, a low threshold value or more and less than a high threshold value, or a low threshold value or less. (S23, first phase switching determination process).

ここで、i回目の正規化振幅差分値の合計値Σiが高閾値以上の場合は（Ｓ２３：高閾値以上）、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であったか否かを判別し（Ｓ２４）、高閾値以上でなかった場合（Ｓ２４：Ｎｏ）、全周波数帯域に対して位相リセット処理を行う（Ｓ３０）。また、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であった場合は（Ｓ２４：Ｙｅｓ）、位相連続処理を行う（Ｓ３１）。つまり、全周波数帯域検出部３１ａにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット処理部３２により、周波数成分ごとの位相を、ＦＦＴ部２１の演算結果そのものとして位相リセット処理を行う。また、アタック部が検出されなかった場合は、位相連続処理部３３により、周波数成分ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相連続処理を行う。 Here, when the total value Σi of the i-th normalized amplitude difference value is equal to or greater than the high threshold (S23: equal to or greater than the high threshold), the total value Σi-1 of the i-1th normalized amplitude difference value is equal to or greater than the high threshold. (S24: No), phase reset processing is performed for all frequency bands (S30). If the sum value Σi-1 of the i-1th normalized amplitude difference value is equal to or higher than the high threshold (S24: Yes), the phase continuation process is performed (S31). That is, the total frequency band detection unit 31a determines that the attack unit is detected when the i-1th calculation binarization is 0 and the i-th calculation binarization is 1, and the phase reset processing unit 32 A phase reset process is performed using the phase for each frequency component as the calculation result itself of the FFT unit 21. If no attack portion is detected, the phase continuation processing unit 33 performs phase continuation processing on the assumption that the phase for each frequency component has changed continuously in consideration of time expansion and contraction from the previous calculation result of the FFT unit 21. Do.

また、正規化振幅差分値の合計値が低閾値以上高閾値未満の場合は（Ｓ２３：低閾値以上高閾値未満）、周波数成分ごとの正規化振幅差分値を低閾値で２値化し（Ｓ２５）、さらに高域限定で（Ｓ２６）、周波数別リセット（Ａ）の要否を判別する（Ｓ２７，第２の位相切替判別処理）。ここで、周波数別リセット（Ａ）が必要と判別した場合は（Ｓ２７：Ｙｅｓ）、周波数成分ごとに位相リセット処理を行い（Ｓ３０）、周波数別リセット（Ａ）が不要と判別した場合は（Ｓ２７：Ｎｏ）、位相連続処理を行う（Ｓ３１）。つまり、周波数帯域別検出部（Ａ）３１ｂにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット処理部３２による位相リセット処理を行う。また、アタック部が検出されなかった場合は、位相連続処理部３３による位相連続処理を行う。 When the total value of the normalized amplitude difference values is not less than the low threshold value and less than the high threshold value (S23: not less than the low threshold value and less than the high threshold value), the normalized amplitude difference value for each frequency component is binarized with the low threshold value (S25). Further, it is further limited to the high frequency range (S26), it is determined whether or not the frequency-specific reset (A) is necessary (S27, second phase switching determination process). If it is determined that frequency-specific reset (A) is necessary (S27: Yes), phase reset processing is performed for each frequency component (S30), and if frequency-specific reset (A) is determined to be unnecessary (S27). : No), phase continuous processing is performed (S31). That is, the detection unit (A) 31b for each frequency band determines that the attack unit has been detected when the i-1th operation binarization is 0 and the i-th operation binarization is 1, and the phase reset processing unit The phase reset process by 32 is performed. If no attack portion is detected, the phase continuation processing by the phase continuation processing portion 33 is performed.

さらに、正規化振幅差分値の合計値が低閾値未満の場合は（Ｓ２３：低閾値未満）、周波数成分ごとの正規化振幅差分値を高閾値で２値化し（Ｓ２８）、周波数別リセット（Ｂ）の要否を判別する（Ｓ２９，第３の位相切替判別処理）。ここで、周波数別リセット（Ｂ）が必要と判別した場合は（Ｓ２９：Ｙｅｓ）、位相リセット処理を行い（Ｓ３０）、周波数別リセット（Ｂ）が不要と判別した場合は（Ｓ２９：Ｎｏ）、位相連続処理を行う（Ｓ３１）。つまり、周波数帯域別検出部（Ｂ）３１ｃにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット処理部３２による位相リセット処理を行い、アタック部が検出されなかった場合は、位相連続処理部３３による位相連続処理を行う。なお、請求項における「位相演算処理判別ステップ」は、Ｓ２３〜Ｓ２９に相当し、「位相演算処理ステップ」は、Ｓ３０，Ｓ３１に相当する。 Furthermore, when the total value of the normalized amplitude difference values is less than the low threshold value (S23: less than the low threshold value), the normalized amplitude difference value for each frequency component is binarized with the high threshold value (S28), and the frequency-specific reset (B ) Is determined (S29, third phase switching determination process). Here, when it is determined that frequency-specific reset (B) is necessary (S29: Yes), phase reset processing is performed (S30), and when frequency-specific reset (B) is determined to be unnecessary (S29: No), A continuous phase process is performed (S31). That is, the detection unit (B) 31c for each frequency band determines that the attack unit is detected when the i-1st calculation binarization is 0 and the i-th calculation binarization is 1, and the phase reset processing unit When the phase reset process by 32 is performed and an attack part is not detected, the phase continuous process by the phase continuous process part 33 is performed. The “phase calculation processing determination step” in the claims corresponds to S23 to S29, and the “phase calculation processing step” corresponds to S30 and S31.

次に、位相連続処理の詳細について説明する。図５は、第１実施形態に係る位相連続処理の概念図である。同図上段は、２０４８サンプルのＦＦＴデータが、入力オーバーラップ数Ｎ（Ｎ＝（時間伸縮率）×５１２）間隔で入力されることを示している。また、同図下段は、２０４８サンプルのＩＦＦＴデータが、出力オーバーラップ数５１２（固定値）間隔で出力されることを示している。また、同図において、「ｉ」はＦＦＴ演算回数、「ｊ」はＦＦＴ周波数ｂｉｎ番号を示している。また、第１実施形態に係る連続位相計算式は、同図中段の（式Ａ）に示すとおりである。（式Ａ）において、「_in2θ_i,j」は同図「ｉ番目ＦＦＴ２実行タイミング」で実行されたＦＦＴにより計算された入力位相を示している。 Next, details of the phase continuation processing will be described. FIG. 5 is a conceptual diagram of phase continuation processing according to the first embodiment. The upper part of the figure shows that 2048 samples of FFT data are input at an input overlap number N (N = (time expansion / contraction rate) × 512). The lower part of the figure shows that 2048 samples of IFFT data are output at intervals of the output overlap number 512 (fixed value). In the figure, “i” indicates the number of FFT operations, and “j” indicates the FFT frequency bin number. Further, the continuous phase calculation formula according to the first embodiment is as shown in (Formula A) in the middle of the figure. In (Expression A), “ _in2 θ _{i, j} ” indicates the input phase calculated by the FFT executed at “i-th FFT2 execution timing” in FIG.

また、同図上段における「ｉ番目ＦＦＴ実行タイミング（現在）」とは、ＦＦＴ部２１による１回目のＦＦＴ実行タイミングを示している。また、同じく同図上段における「ｉ番目ＦＦＴ２実行タイミング」とは、位相連続処理部３３による２回目のＦＦＴ実行タイミングを示している。なお、２回目のＦＦＴについては、アタック検出部３１によるアタック検出前（位相演算処理判別ステップ前）に行っても良い。 The “i-th FFT execution timing (current)” in the upper part of the figure indicates the first FFT execution timing by the FFT unit 21. Similarly, the “i-th FFT2 execution timing” in the upper part of the figure indicates the second FFT execution timing by the phase continuation processing unit 33. Note that the second FFT may be performed before the attack detection by the attack detection unit 31 (before the phase calculation process determination step).

一方、図１２は、従来例に係る位相連続処理の概念図である。また、従来例に係る連続位相計算式は、同図中段の（式Ｂ）に示すとおりである。（式Ｂ）において、「Ｆ_s」はサンプリング周波数を示している。 On the other hand, FIG. 12 is a conceptual diagram of phase continuation processing according to a conventional example. The continuous phase calculation formula according to the conventional example is as shown in (Formula B) in the middle of the figure. In (Formula B), “F _s ” indicates the sampling frequency.

図５と図１２を比較すると明らかであるが、第１実施形態に係る位相連続処理では、フレームごとに（ｉ番目の周波数変換処理として）、２回のＦＦＴを行う（従来例は、「ｉ番目ＦＦＴ実行タイミング（現在）」のみ）。また、第１実施形態に係る連続位相計算式（式Ａ）は、従来例（式Ｂ）と比較して、単純な計算式となっている。このように、従来例では、位相の変化量から真の周波数を推定し、時間伸縮長から時間伸縮後の位相を演算しているのに対し、第１実施形態では、演算を行うのではなく、２回のＦＦＴ演算結果の位相差分値をそのまま時間伸縮後の位相としている。 As is apparent from a comparison between FIG. 5 and FIG. 12, in the phase continuation processing according to the first embodiment, two FFTs are performed for each frame (as the i-th frequency conversion processing). Th FFT execution timing (current) "only). Further, the continuous phase calculation formula (Formula A) according to the first embodiment is a simple calculation formula as compared with the conventional example (Formula B). As described above, in the conventional example, the true frequency is estimated from the amount of change in the phase and the phase after time expansion / contraction is calculated from the time expansion / contraction length, whereas in the first embodiment, the calculation is not performed. The phase difference value of the two FFT calculation results is used as the phase after time expansion / contraction as it is.

図５を参照し、本実施形態の位相連続処理についてさらに説明する。同図上段に示すように、「ｉ番目ＦＦＴ実行タイミング（現在）」は、ＦＦＴ部２１による「（ｉ−１）番目ＦＦＴ実行タイミング」から、入力オーバーラップ数（Ｎ）分遅れて実行される。また、「ｉ番目ＦＦＴ２実行タイミング」は、ＦＦＴ部２１による「（ｉ−１）番目ＦＦＴ実行タイミング」から、出力オーバーラップ数（５１２）分遅れて実行される。つまり、「ｉ番目ＦＦＴ２実行タイミング」は、「ｉ番目ＦＦＴ実行タイミング（現在）」から、時間伸縮変化長（５１２−Ｎ）分遅れて実行される。 With reference to FIG. 5, the phase continuation process of this embodiment is further demonstrated. As shown in the upper part of the figure, the “i-th FFT execution timing (current)” is executed with a delay of the number of input overlaps (N) from the “(i−1) -th FFT execution timing” by the FFT unit 21. . The “i-th FFT2 execution timing” is executed with a delay of the number of output overlaps (512) from the “(i−1) -th FFT execution timing” by the FFT unit 21. That is, the “i-th FFT2 execution timing” is executed with a delay of the time expansion / contraction change length (512-N) from the “i-th FFT execution timing (current)”.

なお、タイムスケーリングを行う場合のユーザ操作としては、例えば「０．５倍速」から「２倍速」の間で再生速度を変更可能な場合（時間伸縮率を０．５≦Ｎ／５１２≦２の間で変更可能な場合）、入力オーバーラップ数を、２５６≦Ｎ≦１０２４の間で可変する操作を行うことにより、「０．５倍速」から「２倍速」のマスターテンポ（音程を変えることなく再生速度を変更すること）を実現する。したがって、時間伸縮率によって「ｉ番目ＦＦＴ２実行タイミング」と「ｉ番目ＦＦＴ実行タイミング（現在）」の間隔は５１２サンプルで変わることはなく、「ｉ番目ＦＦＴ実行タイミング（現在）」と前回のＦＦＴ実行タイミング「（ｉ−１）番目ＦＦＴ実行タイミング」が可変する。 In addition, as a user operation when performing time scaling, for example, when the playback speed can be changed between “0.5 times speed” and “2 times speed” (the time expansion ratio is 0.5 ≦ N / 512 ≦ 2). By changing the number of input overlaps between 256 ≤ N ≤ 1024, and a master tempo from "0.5 times speed" to "2 times speed" (without changing the pitch) To change the playback speed). Accordingly, the interval between the “i-th FFT2 execution timing” and the “i-th FFT execution timing (current)” does not change at 512 samples depending on the time expansion / contraction rate, and the “i-th FFT execution timing (current)” and the previous FFT execution are not changed. The timing “(i−1) -th FFT execution timing” varies.

以上説明したとおり、第１実施形態によれば、正規化振幅差分値に応じて、アタック部の有無を検出し、その検出結果に応じて、ＦＦＴの演算結果そのものを利用して位相のリセット処理を行う位相リセット処理と、前回のＦＦＴの演算結果から時間伸縮を考慮して位相の連続化処理を行う位相連続処理と、のいずれかを行うため、高品質な音の変換が可能となる。 As described above, according to the first embodiment, the presence / absence of an attack unit is detected according to the normalized amplitude difference value, and the phase reset process is performed using the FFT calculation result itself according to the detection result. Therefore, it is possible to perform high-quality sound conversion.

また、アタック部が検出されなかった場合に実行する位相連続処理において、２回のＦＦＴで得られた位相の差分値から位相変化量を求め、当該位相変化量から時間伸縮後の位相を推定するため、従来の位相演算処理（図１２参照）と比較して、誤差要因が少ない。このため、ＦＦＴ方式を用いた位相連続処理による音質劣化を防止できる。また、タイムスケーリング部１１の時間伸縮長は、時間伸縮率（Ｎ／５１２）と、出力オーバーラップ数（５１２）の乗算値（Ｎ＝入力オーバーラップ数）に基づいて演算されるため、入力オーバーラップ数（Ｎ）を可変することで、時間伸縮長（タイムスケーリング量）を可変できる。 Further, in the phase continuation process executed when no attack portion is detected, the phase change amount is obtained from the phase difference value obtained by two FFTs, and the phase after time expansion / contraction is estimated from the phase change amount. Therefore, there are few error factors compared with the conventional phase calculation processing (see FIG. 12). For this reason, it is possible to prevent deterioration in sound quality due to phase continuous processing using the FFT method. In addition, the time expansion / contraction length of the time scaling unit 11 is calculated based on the time expansion / contraction rate (N / 512) and the multiplication value (N = input overlap number) of the output overlap number (512). By varying the number of laps (N), the time expansion / contraction length (time scaling amount) can be varied.

また、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いてアタック部を検出するため、原音の音量が小さい場合でも正確且つ確実にアタック部を検出することができる。さらに、正規化振幅差分値を用いて、異なる周波数帯域ごとに３回の位相切替判別処理を行うため、より確実にアタック部を検出することができる。例えば、正規化振幅差分値の合計値が高閾値以上の場合は、位相リセットをするべき周波数成分が広範囲に広がっていることを意味するため、全周波数帯域に対して第１の位相演算処理を行うことで、確実にアタック部を検出できる。また、正規化振幅差分値の合計値が高閾値未満の場合は、周波数帯域ごとにアタック部の有無を検出するため、微細なアタックであっても確実に検出することができる。 Further, since the attack portion is detected using the normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, the attack portion can be detected accurately and reliably even when the volume of the original sound is small. Furthermore, since the phase switching determination process is performed three times for each different frequency band using the normalized amplitude difference value, the attack portion can be detected more reliably. For example, when the total value of the normalized amplitude difference values is equal to or higher than the high threshold value, it means that the frequency components to be phase-reset are spread over a wide range. Therefore, the first phase calculation process is performed for all frequency bands. By doing so, it is possible to reliably detect the attack portion. Further, when the total value of the normalized amplitude difference values is less than the high threshold value, the presence / absence of an attack portion is detected for each frequency band, so even a fine attack can be reliably detected.

また、ＳＲＣ部１２により、サンプリングレート変換法を用いることで、周波数領域で周波数シフトを行わないためＦＦＴにより計算した位相そのものをアタック部分においてはピッチシフト変換音の位相とすることができるので、位相リセット処理により、アタック感の消失を防止できる。また、周波数シフト処理の誤差要因が少ないため、サンプリングレート変換法を使用しない従来のＦＦＴ方式と比較すると、アタック部以外の音質低下も防止でき、高品質なピットシフトが可能となる。 In addition, since the SRC unit 12 does not perform frequency shift in the frequency domain by using the sampling rate conversion method, the phase itself calculated by FFT can be used as the phase of the pitch shift converted sound in the attack portion. The loss of attack can be prevented by the reset process. In addition, since there are few error factors in the frequency shift process, compared with the conventional FFT method that does not use the sampling rate conversion method, it is possible to prevent deterioration in sound quality other than the attack portion, and high-quality pit shift is possible.

［第２実施形態］
次に、図６を参照し、本発明の第２実施形態について説明する。上記の第１実施形態では、振幅の時間変化率から得られる正規化振幅差分値に基づいてアタック部を検出したが、本実施形態は、位相の時間変化率から得られる位相断層度に基づいてアタック部を検出する点で異なる。以下、第１実施形態と異なる点のみ説明する。なお、本実施形態において、第１実施形態と同様の構成部分については同様の符号を付し、詳細な説明を省略する。また、第１実施形態と同様の構成部分について適用される変形例は、本実施形態についても同様に適用される。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG. In the first embodiment, the attack unit is detected based on the normalized amplitude difference value obtained from the time change rate of the amplitude. However, in the present embodiment, the attack level is obtained based on the phase fault degree obtained from the time change rate of the phase. It is different in that the attack part is detected. Only differences from the first embodiment will be described below. In the present embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted. Moreover, the modification applied about the component similar to 1st Embodiment is applied similarly about this embodiment.

本実施形態のオーディオデータ処理部３は、図２に示した第１実施形態の機能構成から、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃを省略した構成となっている（図示省略）。当該構成により、本実施形態の位相演算部２２は、位相の時間変化率の演算結果に応じて、位相演算処理を行う（位相演算ステップ）。具体的には、アタック検出部３１により、位相の時間変化率である位相断層度を用いて、位相断層が生じているか否かを判別し、その判別結果に応じて位相演算処理を行う。つまり、位相断層が生じている場合は、アタック部「有」と判定し、位相リセット処理部３２による位相リセット処理を行う。また、位相断層が生じていない場合は、アタック部「無」と判定し、位相連続処理部３３による位相連続処理を行う。 The audio data processing unit 3 according to the present embodiment has a total frequency band detection unit 31a, a frequency band detection unit (A) 31b, and a frequency band detection unit (B) from the functional configuration of the first embodiment shown in FIG. 31c is omitted (not shown). With this configuration, the phase calculation unit 22 of the present embodiment performs phase calculation processing according to the calculation result of the time change rate of the phase (phase calculation step). Specifically, the attack detection unit 31 determines whether or not a phase fault has occurred using the phase tomographic degree that is the rate of time change of the phase, and performs a phase calculation process according to the determination result. That is, if a phase fault has occurred, it is determined that the attack portion is “present”, and the phase reset processing by the phase reset processing unit 32 is performed. If no phase fault has occurred, it is determined that the attack portion is “no”, and the phase continuation processing by the phase continuation processing unit 33 is performed.

図６は、第２実施形態に係る位相演算処理を示すフローチャートである。なお、図３に示したピッチシフト処理のメインフローについては、第１実施形態と同様であるため、図示を省略する。本実施形態のオーディオデータ処理部３（位相演算部２２）は、まず、位相の２階差分を演算し（Ｓ４１）、位相断層度を算出する（Ｓ４２）。ここで、位相断層度が所定の閾値以上であるか否かに応じて、位相断層の有無（アタック部の有無）を判別する（Ｓ４３）。つまり、位相断層度が所定の閾値以上である場合は（Ｓ４３：あり）、位相リセット処理部３２により、周波数成分ごとの位相が、ＦＦＴ部２１の演算結果そのものとして位相リセット処理を行う（Ｓ４４）。また、位相断層度が所定の閾値未満である場合は（Ｓ４３：なし）、位相連続処理部３３により、周波数成分ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相連続処理を行う（Ｓ４５）。 FIG. 6 is a flowchart showing phase calculation processing according to the second embodiment. Since the main flow of the pitch shift process shown in FIG. 3 is the same as that of the first embodiment, the illustration is omitted. The audio data processing unit 3 (phase calculation unit 22) of the present embodiment first calculates a second-order phase difference (S41) and calculates a phase tomography degree (S42). Here, the presence / absence of a phase fault (presence / absence of an attack portion) is determined according to whether or not the phase fault degree is equal to or greater than a predetermined threshold (S43). In other words, if the phase tomographic degree is equal to or greater than the predetermined threshold (S43: Yes), the phase reset processing unit 32 performs phase reset processing on the phase for each frequency component as the calculation result itself of the FFT unit 21 (S44). . In addition, when the phase slice degree is less than the predetermined threshold (S43: None), the phase for each frequency component is continuously considered by the phase continuation processing unit 33 in consideration of time expansion and contraction from the previous calculation result of the FFT unit 21. As a result of the change, phase continuation processing is performed (S45).

以上説明したとおり、本実施形態によれば、位相の時間変化率を用いることで、原音の音量に拠らず、アタック部を正確に検出することができる。また、位相断層の有無で２つの処理に分岐するだけで良いため、少ない演算処理量で、高品質なタイムスケーリングおよびピッチシフトを実現できる。 As described above, according to the present embodiment, the attack portion can be accurately detected regardless of the volume of the original sound by using the time change rate of the phase. Further, since it is only necessary to branch into two processes depending on the presence or absence of a phase slice, high-quality time scaling and pitch shift can be realized with a small amount of calculation processing.

なお、上記の実施形態では、オーディオデータ処理部３の構成について、図２に示した第１実施形態の機能構成から、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃを省略したものとしたが、第１実施形態の機能構成と同様の構成としても良い。この場合、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにより、位相断層度を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行う。また、当該複数の位相切替判別処理により位相断層が生じているか否かを判別し、その判別結果に応じて位相演算処理を行う。なお、本例における位相演算処理の流れは、図４に示した第１実施形態の位相演算処理において、Ｓ２１の「振幅差分」を「位相差分」に、またＳ２２の「正規化振幅差分値」を「位相断層度」に変更したものと同様であるため、図示を省略する。 In the above embodiment, the configuration of the audio data processing unit 3 is different from the functional configuration of the first embodiment shown in FIG. 2 in that the entire frequency band detection unit 31a, the frequency band detection unit (A) 31b, and the frequency band Although the separate detection unit (B) 31c is omitted, a configuration similar to the functional configuration of the first embodiment may be used. In this case, a plurality of phase switching discriminating processes for performing different phase switching discriminating using the phase tomography degree by the all frequency band detecting unit 31a, the frequency band detecting unit (A) 31b and the frequency band detecting unit (B) 31c. I do. Further, it is determined whether or not a phase fault has occurred by the plurality of phase switching determination processes, and a phase calculation process is performed according to the determination result. The flow of the phase calculation process in this example is the same as the phase calculation process of the first embodiment shown in FIG. 4 except that the “amplitude difference” in S21 is changed to “phase difference” and the “normalized amplitude difference value” in S22. Is not shown in FIG.

また、第１実施形態の正規化振幅差分値を用いたアタック部の検出方法と、第２実施形態の位相断層度を用いたアタック部の検出方法と、を組み合わせてアタック部を検出しても良い。この構成によれば、アタック部の検出精度をより向上させることができる。 Moreover, even if the attack part is detected by combining the attack part detection method using the normalized amplitude difference value of the first embodiment and the attack part detection method using the phase tomography degree of the second embodiment, good. According to this configuration, the detection accuracy of the attack part can be further improved.

［第３実施形態］
次に、図７〜図１１を参照し、本発明の第３実施形態について説明する。本実施形態では、位相演算処理として、位相リセット処理と位相連続処理以外に、ピーク位相維持処理を行う。当該「ピーク位相維持処理」とは、周波数スペクトルのスペクトルピークと、当該スペクトルピークに近接する近接周波数帯域の位相関係を維持する処理を指す。以下、「ピーク位相維持処理」の対象となるスペクトルピークの決定方法、および位相関係を維持するか否かの判定方法等について、主に説明する。なお、本実施形態においても、上記の各実施形態と同様の構成部分については同様の符号を付し、詳細な説明を省略する。また、上記の各実施形態と同様の構成部分について適用される変形例は、本実施形態についても同様に適用される。 [Third Embodiment]
Next, a third embodiment of the present invention will be described with reference to FIGS. In the present embodiment, a peak phase maintaining process is performed as the phase calculation process in addition to the phase reset process and the phase continuation process. The “peak phase maintaining process” refers to a process of maintaining the phase relationship between the spectrum peak of the frequency spectrum and the adjacent frequency band adjacent to the spectrum peak. Hereinafter, a method for determining a spectrum peak to be subjected to the “peak phase maintaining process”, a method for determining whether or not to maintain a phase relationship, and the like will be mainly described. Also in the present embodiment, the same components as those in the above-described embodiments are denoted by the same reference numerals, and detailed description thereof is omitted. Moreover, the modification applied about the component similar to said each embodiment is applied similarly about this embodiment.

図７は、第３実施形態に係るオーディオデータ処理部３のブロック図である。本実施形態のオーディオデータ処理部３は、図２に示した第１実施形態の機能構成に、スペクトルピーク検出部３４、位相差算出部３５、位相判定部３６およびピーク位相維持処理部３７を追加した構成となっている。 FIG. 7 is a block diagram of the audio data processing unit 3 according to the third embodiment. The audio data processing unit 3 of this embodiment adds a spectrum peak detection unit 34, a phase difference calculation unit 35, a phase determination unit 36, and a peak phase maintenance processing unit 37 to the functional configuration of the first embodiment shown in FIG. It has become the composition.

スペクトルピーク検出部３４は、ＦＦＴ部２１（第１の周波数変換ステップ）によるＦＦＴ変換後の周波数スペクトルから、振幅が極大となるスペクトルピークを検出する（スペクトルピーク検出ステップ）。但し本実施形態では、瞬間的に振幅が極大となったものではなく、時間的に継続しているスペクトルピーク（以下、「継続音ピーク」と称する）を検出する。具体的には、ＦＦＴ部２１の前回の演算結果による同じ周波数帯域または当該同じ周波数帯域に近接する近接周波数帯域の振幅が極大であり、且つＦＦＴ部２１の今回の演算結果による振幅が極大となるスペクトルピークを、「継続音ピーク」として検出する。なお、継続音ピーク判定に用いる「近接周波数帯域」とは、隣接する周波数帯域（両側の２つのｂｉｎ）であるものとする。 The spectrum peak detection unit 34 detects a spectrum peak having a maximum amplitude from the frequency spectrum after FFT conversion by the FFT unit 21 (first frequency conversion step) (spectrum peak detection step). However, in the present embodiment, the spectrum peak (hereinafter referred to as “continuous sound peak”) that is not temporally maximized in amplitude but continued in time is detected. Specifically, the amplitude of the same frequency band according to the previous calculation result of the FFT unit 21 or the adjacent frequency band close to the same frequency band is maximum, and the amplitude according to the current calculation result of the FFT unit 21 is maximum. The spectrum peak is detected as a “continuation tone peak”. Note that the “proximity frequency band” used for continuous sound peak determination is an adjacent frequency band (two bins on both sides).

位相差算出部３５は、スペクトルピーク検出部３４で検出した継続音ピークの位相と、当該継続音ピークに近接する近接周波数帯域の位相との位相差を算出する（位相差算出ステップ）。 The phase difference calculation unit 35 calculates the phase difference between the phase of the continuous sound peak detected by the spectrum peak detection unit 34 and the phase of the adjacent frequency band close to the continuous sound peak (phase difference calculation step).

位相判定部３６は、位相差算出部３５で算出された位相差に応じて、継続音ピークとその近接周波数帯域の位相関係を維持するか否かを判定する（位相判定ステップ）。具体的には、位相差が、ＦＦＴ部２１によるＦＦＴ前に適用した入力窓関数（図３のＳ０４参照）に応じて定まる基準値から所定の閾値以下である場合、スペクトルピークと近接周波数帯域の位相関係を維持すると判定する。逆に、位相差が、所定の閾値を超える場合、スペクトルピークと近接周波数帯域の位相関係を維持しないと判定する。「位相差が、所定の閾値を超える場合」とは、例えばスペクトルピークの近傍に、別の音が存在する場合（２つの音が近い場合）などが考えられる。この場合は、スペクトルピークと近接周波数帯域の関連性が低いため、位相関係を維持しないことで、音質低下を防止できる。なお、位相差を判定する際の「基準値」とは、入力窓関数がハニング窓関数の場合、逆位相（π）となる。この点について、図８および図９を参照して簡単に説明する。 The phase determination unit 36 determines whether or not to maintain the phase relationship between the continuous sound peak and the adjacent frequency band according to the phase difference calculated by the phase difference calculation unit 35 (phase determination step). Specifically, when the phase difference is equal to or less than a predetermined threshold value from a reference value determined according to the input window function (see S04 in FIG. 3) applied before FFT by the FFT unit 21, the spectral peak and the adjacent frequency band It is determined that the phase relationship is maintained. Conversely, when the phase difference exceeds a predetermined threshold, it is determined that the phase relationship between the spectrum peak and the adjacent frequency band is not maintained. “When the phase difference exceeds a predetermined threshold” can be considered, for example, when another sound is present in the vicinity of the spectrum peak (when two sounds are close). In this case, since the relationship between the spectrum peak and the adjacent frequency band is low, deterioration in sound quality can be prevented by not maintaining the phase relationship. Note that the “reference value” for determining the phase difference is an antiphase (π) when the input window function is a Hanning window function. This point will be briefly described with reference to FIGS.

図８および図９は、周波数ｂｉｎ「ｉ−１」の周波数サンプル、周波数ｂｉｎ「ｉ」の周波数サンプル、周波数ｂｉｎ「ｉ＋１」の周波数サンプルを、それぞれ上段、中段、下段に示したものである。図８の点線楕円部に示すように、ＦＦＴ計算される位相が同じであるとすると、センター位置（窓かけしたピーク位置）では、逆位相となる。そこで、図９に示すように、窓かけした状態（センター最大）を再現するため、センター位置で位相が合うように計算すると、ＦＦＴ計算される位相は、周波数ｂｉｎ「ｉ」の両側のｂｉｎが逆位相となる（点線楕円部参照）。つまり、図８に示したように、両端が「０」になる窓関数を用いた場合、スペクトルピーク（継続音ピーク）に対し、隣接周波数帯域の位相は逆位相となる。このため、位相判定部３６では、「基準値」を逆位相（π）として、位相関係を維持するか否かを判定している。なお、入力窓関数として、両端が「０」とならない窓関数を用いた場合、その窓関数に応じて基準値を設定することとなる。 FIGS. 8 and 9 show the frequency sample of the frequency bin “i−1”, the frequency sample of the frequency bin “i”, and the frequency sample of the frequency bin “i + 1” in the upper, middle, and lower stages, respectively. As shown by the dotted elliptical part in FIG. 8, assuming that the phases calculated by the FFT are the same, the phase is reversed at the center position (windowed peak position). Therefore, as shown in FIG. 9, in order to reproduce the windowed state (center maximum), if the phase is calculated so that the phase is matched at the center position, the FFT calculated phase has the bins on both sides of the frequency bin “i”. The phase is reversed (see dotted ellipse). That is, as shown in FIG. 8, when a window function having both ends “0” is used, the phase of the adjacent frequency band is opposite to the spectrum peak (continuous sound peak). For this reason, the phase determination unit 36 determines whether or not to maintain the phase relationship by setting the “reference value” as the opposite phase (π). If a window function whose both ends are not “0” is used as the input window function, a reference value is set according to the window function.

図７の説明に戻る。ピーク位相維持処理部３７は、位相判定部３６の判定の結果、「位相関係を維持する」と判定された場合、近接周波数帯域について、ピーク位相維持処理を行う。より具体的には、継続音ピークに対して、位相連続処理を行い、近接周波数帯域に対してのみ、ピーク位相維持処理を行う。位相連続処理の位相推定方法については、第１実施形態（図５参照）を適用可能である。一方、本実施形態の位相連続処理部３３は、位相判定部３６の判定の結果、「位相関係を維持しない」と判定された場合、同じく第１実施形態で示した位相連続処理を行う。 Returning to the description of FIG. As a result of the determination by the phase determination unit 36, the peak phase maintenance processing unit 37 performs a peak phase maintenance process for the adjacent frequency band when it is determined that “the phase relationship is maintained”. More specifically, the phase continuation process is performed on the continuous sound peak, and the peak phase maintaining process is performed only on the adjacent frequency band. The first embodiment (see FIG. 5) can be applied to the phase estimation method of the continuous phase processing. On the other hand, the phase continuation processing unit 33 of the present embodiment performs the phase continuation processing similarly shown in the first embodiment when it is determined that “the phase relationship is not maintained” as a result of the determination by the phase determination unit 36.

ここで、図１０を参照し、ピーク位相維持処理の位相推定方法について説明する。図１０は、ピーク位相維持処理の概念図である。本実施形態では、継続音ピーク（周波数ｂｉｎ「ｉ」）の両側それぞれ３ｂｉｎ（周波数ｂｉｎ「ｉ−３」、周波数ｂｉｎ「ｉ−２」、周波数ｂｉｎ「ｉ−１」、周波数ｂｉｎ「ｉ＋１」、周波数ｂｉｎ「ｉ＋２」、周波数ｂｉｎ「ｉ＋３」の合計６つのｂｉｎ）を、近接周波数帯域としてピーク位相維持処理を行う。なお、同図において、横方向は時間軸を示している（縦方向矢印で示される１列分のマス７個は、ＦＦＴ１回分のデータを示している）。 Here, the phase estimation method of the peak phase maintaining process will be described with reference to FIG. FIG. 10 is a conceptual diagram of the peak phase maintaining process. In the present embodiment, 3 bins (frequency bin “i-3”, frequency bin “i-2”, frequency bin “i−1”, frequency bin “i + 1”) on both sides of the continuous sound peak (frequency bin “i”), The peak phase maintaining process is performed with the frequency bin “i + 2” and the frequency bin “i + 3” in total 6 bins) as the adjacent frequency bands. In the figure, the horizontal direction indicates the time axis (seven squares for one column indicated by the vertical arrow indicate data for one FFT).

また、図１０では、上段に示す時間伸縮前の状態から、下段に示す時間伸縮前の状態となるように、３％時間を引き延ばすタイムスケーリングを行うことを示している。また、周波数ｂｉｎ「ｉ」を示す横方向矢印上に表記された円形の中抜き矢印は、継続音ピーク（周波数ｂｉｎ「ｉ」）に対してのみ位相連続処理を行った結果推定される位相変化を示している。また、縦方向矢印は、ピーク位相維持処理において、継続音ピークの位相変化を、近接周波数帯域（両側６つのｂｉｎ）に対して反映することを意味している。したがって、ｊ＝ｉのｂｉｎが継続音ピークの場合、ＦＦＴから求まる複素データより算出されるピーク位相をθ_i、位相連続処理により算出された周波数ｂｉｎ「ｉ」の時間伸縮後の位相をθ´_iとすると、隣接する周波数ｂｉｎ「ｉ＋１」の位相は、「θ´_i+1＝θ´_i＋（θ_i+1−θ_i）」で算出される。その他の５つのｂｉｎについても、同様に算出される。このように、ピーク位相維持処理では、継続音ピークの前回と今回の位相連続処理による位相差に合わせて、時間伸縮後の位相を推定する。 FIG. 10 shows that time scaling is performed by extending 3% time so that the state before the time expansion / contraction shown in the upper stage is changed to the state before the time expansion / contraction shown in the lower stage. In addition, a circular hollow arrow written on the horizontal arrow indicating the frequency bin “i” indicates a phase change estimated as a result of performing phase continuation processing only on the continuous sound peak (frequency bin “i”). Is shown. Further, the vertical arrow means that the phase change of the continuous sound peak is reflected on the adjacent frequency band (6 bins on both sides) in the peak phase maintaining process. Therefore, when the bin of j = i is a continuous sound peak, the peak phase calculated from complex data obtained from FFT is θ _i , and the phase after time expansion / contraction of the frequency bin “i” calculated by the phase continuation process is θ ′. When _i, the phase of adjacent frequency bin "i + 1" is calculated by _{_{"θ'i + 1 = θ'i +}} (θ i + 1 -θ i) ". The same calculation is performed for the other five bins. As described above, in the peak phase maintaining process, the phase after time expansion / contraction is estimated in accordance with the phase difference between the previous continuous sound peak and the current phase continuous process.

次に、図１１を参照し、第３実施形態に係る位相演算処理の流れについて説明する。なお、図３に示したピッチシフト処理のメインフローについては、第１実施形態と同様であるため、異なる部分のみ説明する。本実施形態のオーディオデータ処理部３（位相演算部２２）は、まず、ＦＦＴ周波数ｂｉｎ番号ｊをｊ＝１とし（Ｓ５１）、継続音ピークに該当するか否かを判定する（Ｓ５２）。つまり、前回の演算時に同じ周波数ｂｉｎまたはその隣接するｂｉｎで振幅が極大であり、且つ現在の演算時に振幅が極大となったか否かを判別する。継続音ピークに該当すると判定した場合は（Ｓ５２：Ｙｅｓ）、位相連続処理を行う（Ｓ５３）。 Next, a flow of phase calculation processing according to the third embodiment will be described with reference to FIG. Since the main flow of the pitch shift process shown in FIG. 3 is the same as that of the first embodiment, only different parts will be described. The audio data processing unit 3 (phase calculation unit 22) of the present embodiment first sets the FFT frequency bin number j to j = 1 (S51), and determines whether or not it corresponds to a continuous sound peak (S52). That is, it is determined whether or not the amplitude is the maximum at the same frequency bin or the adjacent bin at the previous calculation and the amplitude is the maximum at the current calculation. If it is determined that the peak corresponds to the continuous sound peak (S52: Yes), phase continuation processing is performed (S53).

また、継続音ピークに該当しないと判定した場合は（Ｓ５２：Ｎｏ）、位相リセットが必要か否かの判定を行う（Ｓ５４）。当該判定は、第１実施形態の位相判定方法（図４のＳ２３〜Ｓ２９参照）を適用可能である。位相リセットが必要と判定された場合は（Ｓ５４：Ｙｅｓ）、位相リセット処理を行う（Ｓ５５）。また、位相リセットが不要と判定された場合は（Ｓ５４：Ｎｏ）、継続音ピークの近傍に該当するか否かを判別する（Ｓ５６）。つまり、継続音ピークの両側それぞれ３ｂｉｎの近接周波数帯域に該当するか否かを判別する。 When it is determined that the peak does not correspond to the continuous sound peak (S52: No), it is determined whether or not a phase reset is necessary (S54). For this determination, the phase determination method of the first embodiment (see S23 to S29 in FIG. 4) can be applied. When it is determined that the phase reset is necessary (S54: Yes), the phase reset process is performed (S55). If it is determined that the phase reset is unnecessary (S54: No), it is determined whether or not the phase is in the vicinity of the continuous sound peak (S56). That is, it is determined whether or not each side of the continuous sound peak corresponds to a 3-bin adjacent frequency band.

継続音ピークの近傍に該当しないと判定した場合は（Ｓ５６：Ｎｏ）、位相連続処理を行う（Ｓ５３）。また、継続音ピークの近傍に該当すると判定した場合は（Ｓ５６：Ｙｅｓ）、継続音ピークが単音であるか否かを判別する（Ｓ５７）。つまり、継続音ピークと近接周波数帯域の位相差が、基準値（逆位相）から所定の閾値以下の場合、継続音ピークが単音であると判定する。継続音ピークが単音であると判定した場合は（Ｓ５７：単音）、ピーク位相維持処理を行う（Ｓ５８）。また、継続音ピークが複音であると判定した場合は（Ｓ５７：複音）、位相連続処理を行う（Ｓ５３）。 When it is determined that it does not correspond to the vicinity of the continuous sound peak (S56: No), phase continuation processing is performed (S53). If it is determined that the current position is in the vicinity of the continuous sound peak (S56: Yes), it is determined whether or not the continuous sound peak is a single sound (S57). That is, when the phase difference between the continuous sound peak and the adjacent frequency band is equal to or less than a predetermined threshold value from the reference value (reverse phase), it is determined that the continuous sound peak is a single sound. When it is determined that the continuous sound peak is a single sound (S57: single sound), a peak phase maintaining process is performed (S58). If it is determined that the continuous sound peak is a compound sound (S57: compound sound), phase continuation processing is performed (S53).

なお、位相演算処理（Ｓ５３、Ｓ５５、Ｓ５８のいずれか）の後は、図３に示した第１実施形態の処理と同様に、振幅と位相を複素数化する（Ｓ５９）。その後、ＦＦＴ周波数ｂｉｎ番号ｊが、ＦＦＴサンプル数ｎ_FFTの半分に達したか否かを判別し（Ｓ６０）、達していない場合は（Ｓ６０：Ｎｏ）、ＦＦＴ周波数ｂｉｎ番号ｊをカウントアップして（Ｓ６１）、Ｓ５２に戻る。また、達した場合は（Ｓ６０：Ｙｅｓ）、図３のＳ１２に移行する。 Note that after the phase calculation process (any one of S53, S55, and S58), the amplitude and phase are converted into complex numbers (S59), as in the process of the first embodiment shown in FIG. Thereafter, it is determined whether or not the FFT frequency bin number j has reached half of the FFT sample number n _FFT (S60). If not (S60: No), the FFT frequency bin number j is incremented. (S61), the process returns to S52. If it is reached (S60: Yes), the process proceeds to S12 in FIG.

以上説明したとおり、本実施形態によれば、位相リセット処理を行わない場合、スペクトルピークと近接周波数帯域の位相差に応じて、位相連続処理およびピーク位相維持処理のいずれかの処理を行うため、高品質なタイムスケーリングが可能となる。つまり、位相差が基準値から所定の閾値以下である場合は、相関関係が高いことを意味するため、位相関係を維持することで、高品質な音に変換できる。一方、位相差が基準値から所定の閾値を超える場合は、相関関係が低いことを意味するため、位相関係を維持しないことで、音質の悪化を防止できる。また、スペクトルピークが時間的に継続しているか否かを判別し、時間的に継続している場合のみ、その近接周波数帯域をピーク位相維持処理の対象とするため、時間的に継続しない音に対して位相差を維持させることによる、音質の悪化を防止できる。 As described above, according to the present embodiment, when the phase reset process is not performed, according to the phase difference between the spectrum peak and the adjacent frequency band, either the phase continuous process or the peak phase maintaining process is performed. High quality time scaling is possible. That is, when the phase difference is equal to or smaller than the predetermined threshold value from the reference value, it means that the correlation is high. Therefore, the sound can be converted into a high-quality sound by maintaining the phase relationship. On the other hand, when the phase difference exceeds a predetermined threshold value from the reference value, it means that the correlation is low. Therefore, deterioration of sound quality can be prevented by not maintaining the phase relationship. In addition, it is determined whether or not the spectrum peak is continued in time, and only when it is continued in time, the adjacent frequency band is the target of the peak phase maintenance process. On the other hand, deterioration of sound quality due to maintaining the phase difference can be prevented.

なお、第３実施形態では、位相連続処理として、位相連続処理部３３により２回目の周波数変換を行ったが（１回目の周波数変換は、ＦＦＴ部２１にて周波数変換を行ったが）、ＦＦＴ部２１による周波数変換に加え、位相演算部２２により、２回の周波数変換を行っても良い。 In the third embodiment, as the phase continuation processing, the second frequency conversion is performed by the phase continuation processing unit 33 (although the first frequency conversion is performed by the FFT unit 21), FFT is performed. In addition to the frequency conversion by the unit 21, the phase calculation unit 22 may perform frequency conversion twice.

また、第３実施形態では、位相連続処理について、第１実施形態と同様の位相推定方法（図５参照）を適用するものとしたが、従来例の位相推定方法（図１２参照）を適用しても良い。つまり、位相の変化量から真の周波数を推定し、時間伸縮長から時間伸縮後の位相を推定しても良い。 In the third embodiment, the same phase estimation method (see FIG. 5) as in the first embodiment is applied to the phase continuation processing, but the conventional phase estimation method (see FIG. 12) is applied. May be. That is, the true frequency may be estimated from the amount of phase change, and the phase after time expansion / contraction may be estimated from the time expansion / contraction length.

また、位相リセットが必要か否かの判定について（位相切替判別処理について）、第１実施形態の位相判定方法（図４のＳ２３〜Ｓ２９参照）を適用するものとしたが、この場合、ステレオの左右の音に対する正規化振幅差分値の合算結果を用いて、複数の位相切替判別処理を行っても良い。例えば、ステレオの左右の音に音量差があった場合、同一音源からの発生音は左右同時にリセットしないと位相が左右ばらばらになってしまう。このため、合算結果を用いて判別を行うことにより、左右の音の位相リセットのタイミングを同期させ、音像（定位）の乱れを防止することができる。 Further, the phase determination method of the first embodiment (see S23 to S29 in FIG. 4) is applied to the determination of whether or not the phase reset is necessary (for the phase switching determination process). A plurality of phase switching determination processes may be performed using the sum of normalized amplitude difference values for left and right sounds. For example, when there is a volume difference between the left and right sounds of a stereo, the phases of the sounds generated from the same sound source will be scattered left and right unless they are reset simultaneously. For this reason, it is possible to synchronize the phase reset timings of the left and right sounds and prevent the disturbance of the sound image (localization) by performing the determination using the summed result.

また、ステレオの左右の音に対する正規化振幅差分値の合算結果と、ステレオの左右の音それぞれの正規化振幅差分値と、の両方を用いて、複数の位相切替判別処理を行っても良い。この構成によれば、ステレオの左右の音それぞれの正規化振幅差分値を用いることで、左右の音量差なども考慮して、より確実に音像の乱れを防止することができる。 In addition, a plurality of phase switching determination processes may be performed using both the sum of normalized amplitude difference values for stereo left and right sounds and the normalized amplitude difference values for stereo left and right sounds. According to this configuration, by using the normalized amplitude difference values of the left and right sounds of the stereo, it is possible to more reliably prevent the disturbance of the sound image in consideration of the left and right volume differences.

また、変形例として、位相切替判別処理においてアタック部が検出された場合でも、スペクトルピークが時間的に継続している継続成分に対しては、位相連続処理を行っても良い。この構成によれば、スペクトルピークが時間的に継続している継続成分を位相リセット処理の対象外とすることで、アタック部の前後で継続して鳴っている音を途切れにくくすることができる。 Further, as a modification, even when an attack portion is detected in the phase switching determination process, the phase continuation process may be performed on the continuous component in which the spectrum peak continues in time. According to this configuration, the continuation component in which the spectrum peak continues in time is excluded from the phase reset process, so that it is possible to make it difficult to interrupt the sound that is continuously sounding before and after the attack portion.

また、正規化振幅差分値の合計値Σiが高閾値以上である場合であってアタック部が検出された場合（図４のＳ２４：Ｎｏの場合）、低周波成分のみ、所定時間だけタイミングを遅らせて位相リセット処理を行っても良い。これは、低音域の音は周期が長いため、前処理で検出した位相リセットのタイミングでは位相が安定せず、位相リセット処理の効果が小さいが、タイミングを遅らせることで、位相リセット処理の効果を高めることができるためである。これにより、低音打楽器における打撃音後に継続する低周波数の音、例えばバスドラムの胴鳴りなどの高音質化を図ることができる。 When the total value Σi of the normalized amplitude difference values is equal to or higher than the high threshold and an attack portion is detected (S24: No in FIG. 4), only the low frequency component is delayed by a predetermined time. Phase reset processing may be performed. This is because the low frequency sound has a long period, so the phase is not stable at the timing of phase reset detected in the preprocessing, and the effect of the phase reset process is small, but the effect of the phase reset process is reduced by delaying the timing. This is because it can be increased. As a result, it is possible to improve the sound quality of a low-frequency sound that continues after the percussion sound in a bass percussion instrument, for example, the drumming of a bass drum.

また、上記の各実施形態において、オーディオデータ処理部３は、再生部２による再生に伴ってバッファメモリ４に書き込まれるオーディオデータを解析しながらピッチシフト（タイムスケーリング）を行うものとしたが、事前に解析したデータを読み出してこれらを行っても良い。つまり、楽曲を再生しながらリアルタイムにピッチシフト（タイムスケーリング）を行う構成としても良いし、事前に解析したデータを利用して、楽曲全体または楽曲の一部をピッチシフト（タイムスケーリング）する構成としても良い。 In each of the above embodiments, the audio data processing unit 3 performs the pitch shift (time scaling) while analyzing the audio data written to the buffer memory 4 along with the reproduction by the reproduction unit 2. These data may be read by reading the analyzed data. That is, it is good also as a structure which performs a pitch shift (time scaling) in real time, reproducing a music, or a structure which pitch-shifts (time scaling) the whole music or a part of music using the data analyzed beforehand. Also good.

また、上記に示したオーディオデータ処理部３の各構成要素をプログラムとして提供することが可能である。また、そのプログラムを各種記録媒体（ＣＤ−ＲＯＭ、フラッシュメモリ等）に格納して提供することも可能である。すなわち、コンピューターをオーディオデータ処理部３の各構成要素として機能させるためのプログラム、およびそれを記録した記録媒体も、本発明の権利範囲に含まれる。 In addition, each component of the audio data processing unit 3 described above can be provided as a program. Further, the program can be provided by being stored in various recording media (CD-ROM, flash memory, etc.). That is, a program for causing a computer to function as each component of the audio data processing unit 3 and a recording medium recording the program are also included in the scope of the right of the present invention.

また、上記の各実施形態では、オーディオデータ処理部３を再生装置１に適用した場合を例示したが、ミキサー装置などのＤＪ機器、各種電子楽器およびコンピューター（ＰＣアプリケーション、タブレット端末用アプリケーション）などに適用しても良い。また、カラオケ装置、ボイスチェンジャーおよび音声合成装置など、音高を変更する機能を有する音声処理装置への適用も有用である。例えば、本発明を適用することで、異なる楽曲を連続して再生するＤＪ機器において、連続再生する楽曲のキーが不協和な関係にある場合に、ピッチシフトにより親和性の高いキーに変換するハーモニックスミックスを高音質化できる。また、カラオケ装置において、ユーザの声の高さに合わせてキー変更する機能があるが、音質を落とさずにキー変更可能なように、音源を打ち込み音であるＭＩＤＩとしている場合が多いが、本発明を適用することによって、生音を音源に用いても高品質なキー変換が可能となる。 Further, in each of the above embodiments, the case where the audio data processing unit 3 is applied to the playback device 1 is exemplified, but the present invention is applied to a DJ device such as a mixer device, various electronic musical instruments, and a computer (PC application, tablet terminal application). It may be applied. Moreover, application to a speech processing device having a function of changing the pitch, such as a karaoke device, a voice changer, and a speech synthesizer, is also useful. For example, by applying the present invention, in a DJ device that continuously plays different music pieces, when the keys of the music pieces to be played continuously are in a dissonant relationship, the harmonics are converted to keys having high affinity by pitch shift. The sound quality of the mix can be improved. In addition, in the karaoke apparatus, there is a function of changing the key according to the voice of the user, but in many cases, the sound source is set to MIDI, which is a driving sound, so that the key can be changed without degrading the sound quality. By applying the invention, it is possible to perform high-quality key conversion even when raw sound is used as a sound source.

さらに、キーを変えずに音声の時間軸長さだけを変更する場合など、タイムスケーリングのみの適用も可能である。例えば、異なる楽曲を連続して再生するＤＪ機器に本発明を適用することで、連続再生する楽曲のテンポのみを変更し、キー（音高）を変更しないタイムスケーリング（マスターテンポ）を高音質化できる。また、音声を録音・再生できる装置において、高速再生しても、キーを変えない早聴き機能を高音質化できる。その他、本発明の要旨を逸脱しない範囲で、適宜変更が可能である。 Furthermore, only time scaling can be applied, for example, when only the time axis length of the voice is changed without changing the key. For example, by applying the present invention to a DJ device that continuously plays different music, only the tempo of the music that is played continuously is changed, and the time scaling (master tempo) without changing the key (pitch) is improved. it can. In addition, in a device capable of recording and playing back sound, a fast listening function that does not change the key even when played back at high speed can be improved in sound quality. Other modifications can be made as appropriate without departing from the scope of the present invention.

１…再生装置２…再生部３…オーディオデータ処理部４…バッファメモリ４ａ…入力バッファ４ｂ…出力バッファ５…オーディオデータ出力部１１…タイムスケーリング部１２…ＳＲＣ部２１…ＦＦＴ部２２…位相演算部２３…逆ＦＦＴ部２４…時間伸縮演算部３１…アタック検出部３２…位相リセット処理部３３…位相連続処理部３４…スペクトルピーク検出部３５…位相差算出部３６…位相判定部３７…ピーク位相維持処理部 DESCRIPTION OF SYMBOLS 1 ... Playback apparatus 2 ... Playback part 3 ... Audio data processing part 4 ... Buffer memory 4a ... Input buffer 4b ... Output buffer 5 ... Audio data output part 11 ... Time scaling part 12 ... SRC part 21 ... FFT part 22 ... Phase calculation part DESCRIPTION OF SYMBOLS 23 ... Inverse FFT part 24 ... Time expansion-contraction operation part 31 ... Attack detection part 32 ... Phase reset process part 33 ... Phase continuous process part 34 ... Spectral peak detection part 35 ... Phase difference calculation part 36 ... Phase determination part 37 ... Peak phase maintenance Processing part

Claims

A first frequency conversion step for converting digital audio data into amplitude and phase for each frequency component;
A second frequency conversion step of converting the digital audio data into an amplitude and a phase for each frequency component at an execution timing different from the execution timing of the first frequency conversion step by a time expansion / contraction length;
A phase estimation step for estimating a phase after time expansion and contraction, using a phase difference value as a difference value between the phase obtained in the first frequency conversion step and the phase obtained in the second frequency conversion step. A time scaling method characterized in that it is executed.

The time scaling method according to claim 1, wherein the time expansion / contraction length is a length calculated based on a product of a time expansion / contraction ratio and an output overlap number.

The phase for each frequency component is calculated in the first frequency conversion step according to the processing results of a plurality of phase switching determination processes for performing different phase switching determination using the calculation result of the time change rate of the amplitude and / or phase. Phase reset processing for performing phase reset processing as the result itself, and phase continuation as the phase for each frequency component has changed continuously in consideration of time expansion and contraction from the previous calculation result of the first frequency conversion step. A phase calculation process determining step for determining which phase calculation process to perform, and which phase calculation process to perform,
Depending on the determination result of the phase calculation process determination step, the phase calculation process step of performing the phase reset process or the phase continuation process,
3. The time scaling method according to claim 1, wherein the first frequency conversion step, the second frequency conversion step, and the phase estimation step are executed when the phase continuation processing is performed.

The plurality of phase switching determination processing is for determining the presence or absence of an attack portion for each different frequency band,
In the phase calculation processing step, when it is determined that the attack part is “present” by the determination of the plurality of phase switching determination processes, the phase reset process is performed, and when it is determined that the attack part is “none”, the phase The time scaling method according to claim 3, wherein continuous processing is performed.

Frequency inverse conversion step for converting each frequency component after the phase calculation processing by the phase calculation processing step into digital audio data;
The time expansion / contraction calculation step of increasing / decreasing the number of digital audio data after frequency inverse conversion in proportion to the time expansion / contraction ratio is further performed at the time of frequency inverse conversion processing by the frequency inverse conversion step. 5. The time scaling method according to 3 or 4.

Each step in the time scaling method according to any one of claims 1 to 5,
Changing the sampling frequency of the digital audio data to perform a sampling rate conversion calculation step for performing time expansion and contraction and pitch change,
The pitch shift method characterized in that the time expansion / contraction length in each step of the time scaling method and the time expansion / contraction length in the sampling rate conversion calculation step are offset, and only the pitch is changed.

First frequency conversion means for converting digital audio data into amplitude and phase for each frequency component;
Second frequency conversion means for converting the digital audio data into amplitude and phase for each frequency component at an execution timing that differs from the execution timing of the first frequency conversion means by a time expansion / contraction length;
A phase estimation means for estimating a phase after time expansion and contraction, using a difference value between the phase obtained by the first frequency conversion means and the phase obtained by the second frequency conversion means as a phase change amount; An audio data processing apparatus comprising:

The program for making a computer perform each step in the time scaling method of any one of Claim 1 thru | or 5.

The program for making a computer perform each step in the pitch shift method of Claim 6.