[go: up one dir, main page]

WO2002003374A1 - Procede de generation d'un ton musical - Google Patents

Procede de generation d'un ton musical Download PDF

Info

Publication number
WO2002003374A1
WO2002003374A1 PCT/FI2001/000630 FI0100630W WO0203374A1 WO 2002003374 A1 WO2002003374 A1 WO 2002003374A1 FI 0100630 W FI0100630 W FI 0100630W WO 0203374 A1 WO0203374 A1 WO 0203374A1
Authority
WO
WIPO (PCT)
Prior art keywords
note
musical
musical tone
based code
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/FI2001/000630
Other languages
English (en)
Inventor
Tero Tolonen
Matti Airas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Elmorex Ltd Oy
Original Assignee
Elmorex Ltd Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elmorex Ltd Oy filed Critical Elmorex Ltd Oy
Priority to AU2001282156A priority Critical patent/AU2001282156A1/en
Publication of WO2002003374A1 publication Critical patent/WO2002003374A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M19/00Current supply arrangements for telephone systems
    • H04M19/02Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone
    • H04M19/04Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone the ringing-current being generated at the substations
    • H04M19/041Encoding the ringing signal, i.e. providing distinctive or selective ringing capability
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H3/00Instruments in which the tones are generated by electromechanical means
    • G10H3/12Instruments in which the tones are generated by electromechanical means using mechanical resonant generators, e.g. strings or percussive instruments, the tones of which are picked up by electromechanical transducers, the electrical signals being further manipulated or amplified and subsequently converted to sound by a loudspeaker or equivalent instrument
    • G10H3/125Extracting or recognising the pitch or fundamental frequency of the picked up signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2230/00General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
    • G10H2230/005Device type or category
    • G10H2230/021Mobile ringtone, i.e. generation, transmission, conversion or downloading of ringing tones or other sounds for mobile telephony; Special musical data formats or protocols therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/046File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
    • G10H2240/056MIDI or other note-oriented file format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/241Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
    • G10H2240/251Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analogue or digital, e.g. DECT, GSM, UMTS
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/281Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
    • G10H2240/321Bluetooth
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/135Autocorrelation

Definitions

  • the invention relates to a method for generating a musical tone, such as a ringing tone.
  • the invention suits particularly well for generating ringing or warning tones for mobile terminals or multimedia devices.
  • 'musical tone' refers to ringing tones, warning tones, alarm tones or any other similar type of tone.
  • users of mobile terminals have been able to download ringing tone melodies that have been created and provided for example by network operators.
  • the users may have used web-based tools for creating a melody of their own or a tool for creating a melody may have been incorporated in the user device, such as a mobile terminal.
  • the latter two methods employ common music notation in their user interfaces and thus the user of these methods has to possess knowledge on musical theory or at least on musical notation in order to create new melodies. Disclosure of the Invention
  • An object of the present invention is to provide a method for generating a musical tone, such as a ringing tone, without musical skills.
  • Another object of the invention is a device, such as a network server or a user terminal, which implements the method according to the invention.
  • the invention is based on using a musical seed for providing the musical tone.
  • the musical seed is a musical content provided by a user and it may be in audio format or in note-based code format. If the musical seed is an audio signal, the audio signal is converted into a note-based code by an audio- to-notes conversion. The musical tone is generated on the basis of the note- based code.
  • the audio signal may be produced for example by singing, humming, whistling, or by playing an instrument.
  • the method of the invention is preferably executed in a network server; alternatively, the method may be executed in a user terminal. If a network server is employed, a user connects to the server via a wireless or a fixed connection.
  • connection protocols include, but are not limited to, the Internet protocol (IP), a wireless voice protocol of the Global System for Mobile Communications (GSM) or the like, wireless data protocols (e.g. data over GSM), short message service (SMS), wireless application protocol (WAP), telephone voice connection, modem connection, ISDN, infrared connection, local radio connection (e.g. BlueTooth).
  • IP Internet protocol
  • GSM Global System for Mobile Communications
  • SMS short message service
  • WAP wireless application protocol
  • telephone voice connection modem connection
  • ISDN infrared connection
  • local radio connection e.g. BlueTooth
  • the user provides a melody or a musical seed for the tone generation method.
  • the forms of the user input can be categorized into audio formats and note-based code formats.
  • the audio formats include, but are not limited to, waveform audio (digitized audio), encoded audio (obtained by using for example speech coding methods, such as methods based on linear prediction, or general audio coding methods, such as the transform codecs in the MPEG family), streaming audio, and audio files in the aforementioned formats.
  • the note-based formats include, but are not limited to, MIDI, MIDI files, ringing tone formats, music representation languages, such as CSound, and MPEG-4 synthetic audio.
  • the server provides a musical tone on the basis of the user's input.
  • the musical tone is provided by generating a code sequence corresponding to new melody lines, i.e. a new combination of notes, by using said note-based code as an input for a composing method which produces a new melody and by converting said new melody into a musical tone.
  • the term 'melody line' refers generally to musical content formed by a combination of notes and pauses.
  • the note-based code may be considered as an old melody line.
  • the note-based code is converted directly into a musical tone.
  • the second embodiment is similar to the above-described first embodiment with the distinction that now the composing method is not employed, but the note-based code is used as such for generating the tone.
  • the note-based code is compared to melodies which have been previously stored in a memory, then the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
  • a code sequ- ence corresponding to new melody lines is generated by using said note- based code as an input for a composing method which produces a new melody.
  • the new melody is compared to melodies, which have been previously stored in a memory, and the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
  • the fourth embodiment is a combination of the above described first and third embodiments.
  • Converting the note-based code into a musical tone means converting the note-based code into a tone of a suitable form for delivery to the user or for storage.
  • the note-based code may be simply encoded into the form of a ringing tone in Nokia Smart Message form or similar.
  • the musical tone may be stored on the server and/or delivered to the user by using the aforementioned connections and formats.
  • the tone can be delivered to the user terminal for example by using vendor-specific means, such as Nokia Smart Messaging, or by making the tone available for download at a web site or by downloading the tone directly over IP or via WAP (Wireless Application Protocol) gateway or in any other suitable manner.
  • vendor-specific means such as Nokia Smart Messaging
  • the musical tone is delivered to the user in the form of common musical notation for editing with some suitable software tool, or in a non- editable form.
  • the tone is delivered for editing for example in the form of a common musical notation, such as written notes or as a MIDI code.
  • the server includes functionality for playback and/or for editing the musical tone.
  • the audio-to-notes conversion method preferably comprises estimating fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies and detecting note events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.
  • the audio signal containing musical information is processed in frames, and the note-based code representing musical information is constructed at the same time as the input signal is provided.
  • the signal level of a frame is first measured and compared to a predetermined signal level threshold. If the signal level threshold is exceeded, a voicing decision is executed for judging if the frame is voiced or unvoiced. If the frame is judged voiced, the fundamental frequency of the frame is estimated and quantized for obtaining a quantized present fundamental frequency. Then, it is decided on the basis of the quantized present fundamental frequency whether a note is found. If a note is found, the quantized present fundamental frequency is compared to the fundamental frequency of the previous frame.
  • a note-off event and then a note-on event, after the note-off event are applied. If the previous and present fundamental frequencies are the same, nothing is done. If the signal level threshold is not exceeded or if the frame is judged unvoiced or if a note is not found, it is detected whether a note-on event is currently valid and if yes, a note-off event is applied. The procedure is repeated frame by frame at the same time as the audio signal is received for obtaining the note-based code.
  • An advantage of the invention is that it can be used by people without knowledge on musical theory for producing a musical tone, such as a ringing tone, by providing a musical presentation for example by singing, humming, whistling or playing an instrument.
  • a musical tone such as a ringing tone
  • the invention provides a simple method for personalizing mobile terminals and other similar devices. Additionally, self-made musical content can be stored in the form of a musical tone.
  • Figure 1A is a flow diagram illustrating a method according to the invention
  • Figure 1 B is a block diagram illustrating an arrangement according to an embodiment of the invention
  • Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
  • FIG. 2 illustrates the audio-to-notes conversion according to an embodiment of the invention
  • Figure 3 is a flow diagram illustrating fundamental frequency estimation according to an embodiment of the invention
  • FIGS. 4A and 4B illustrate time-domain windowing
  • Figures 5A to 6B illustrate an example of the effect of the LPC whitening
  • Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
  • the principle of the invention is to provide a musical tone, i.e. a ringing tone or the like, on the basis of a musical seed given by the user in the form of an audio signal or in the form of a note-based code.
  • FIG. 1A is a flow diagram illustrating a method according to the invention for generating a musical tone.
  • the musical seed is provided in the form of an audio signal, and this audio signal is converted into a note- based code with an audio-to-notes conversion method in step 3.
  • the audio-to-notes conversion comprises fundamental frequency estimation and note detection.
  • the musical seed is provided in the form of a note-based code.
  • the note-based code obtained by the audio-to-notes conversion or from the user is used for generating a musical tone in one of steps 4a, 4b, 4c and 4d.
  • step 4a the note-based code is used as a seed sequence for a composition method.
  • An automated composition method which is preferably used for this, is disclosed in [2].
  • This composition method generates code sequences corresponding to new melody lines on the basis of a seed sequence (training sequence).
  • the new melody adapts to changes in the input signal, but it is not necessarily the same. In this way, for example deficiencies in the input signal, if any, are corrected or smoothened.
  • the new melody lines are then converted into the musical tone.
  • step 4b the note-based code is converted directly into a musical tone.
  • This method allows users to sing a melody, for example, and to receive the melody they sang in the form of a ringing tone.
  • step 4c the note-based code is compared to melodies stored in a memory to find the melody that is the closest match with the note-based code.
  • the melody that is the closest match is then converted into the musical tone.
  • Step 4d is a combination of steps 4a and 4c.
  • the note-based code is used for generating new melody lines with a composition method and then the new melody lines are compared to the melodies stored in a memory and the melody corresponding to the closest match is converted into a musical tone.
  • the composition method enables deficiencies in the input signal, if any, to be corrected or smoothened, and therefore, comparison to stored melodies may become easier.
  • the comparison may be based on a distance measure computed on the intervals of the seed sequence, duration of individual notes in the sequence, absolute pitches of the notes in the sequence, or other musical information contained in the sequence.
  • step 5 the musical tone is delivered to the user in the form of a common musical notation for editing with some suitable software tool or for playback.
  • step 6 the tone is delivered to the user.
  • Step 5 or 6 may also include storing the tone in a file.
  • the file may be for example a MIDI file in which sound event descriptions are stored, or it may be a sound file which stores synthesized sound.
  • FIG. 1B is a block diagram illustrating an arrangement according to an embodiment of the invention.
  • a user connects from a mobile user terminal 8 or from a fixed user terminal 9 to a server 10a through a suitable connection.
  • the mobile user terminal 8 is typically a mobile phone or some other wireless device and the fixed user terminal 9 is typically a workstation or a personal computer.
  • the server process may be incorporated in the user terminal, but typically the server is a separate network server.
  • the user provides a musical seed, and the musical seed is transmitted to the server 10a in any suitable form. Some possible data formats and transmission protocols were described in the above description.
  • the server 10a executes the tone generation method according to the invention and returns the generated tone to the user terminal 8 or 9.
  • FIG. 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
  • the arrangement includes a wireless communication network 13 and the Internet 15.
  • the wireless network may be for example a GSM or a UMTS (Universal Mobile Telecommunications System) network.
  • GSM Global System for Mobile communications
  • UMTS Universal Mobile Telecommunications System
  • a mobile user terminal 8 and a server 10b are connected to the wireless network.
  • the mobile user terminal 8 is used for providing a musical seed to the server 10b for example through a voice connection.
  • the server 10b generates a musical tone and returns the musical tone to the mobile user terminal 8 for example in ringing tone format via SMSC 17 (Short Message Service Center).
  • a fixed user terminal 9 and a server 10c are connected to the Internet.
  • the fixed user terminal 9 is used for providing a musical seed to the server 10c for example through a voice over IP connection.
  • the mobile user terminal 8 may be used for providing a musical seed to the server 10c.
  • the connection between the mobile user terminal 8 and the server 10c is established through a WAP gateway 14, which connects the wireless network and the Internet and provides Internet services to mobile networks, and the server 10c then generates a musical tone.
  • the musical tone is returned to the fixed user terminal 9 for example as audio over IP or by placing the musical tone into a file available for download on an Internet site. To the mobile user terminal 8 the musical tone is transmitted through the WAP gateway.
  • the audio-to-notes conversion according to an embodiment of the invention can be divided into two steps, as shown in Figure 2: fundamental frequency estimation 21 and note detection 22.
  • step 21 an audio input is segmented into frames in time and the fundamental frequency of each frame is estimated.
  • the processing treatment of the signal is executed in the digital domain, and therefore, the audio input is digitized with an A/D converter prior to the fundamental frequency estimation, if the audio input is not already in a digital form.
  • fundamental frequency estimation alone is not sufficient for producing the note-based code. Therefore, in step 22, consecutive fundamental frequencies are further processed for detecting the notes.
  • the autocorrelation function has been widely adopted for fundamental frequency estimation, and it is also preferred in the method according to the invention. However, it is not mandatory for the method of the invention to employ autocorrelation in fundamental frequency estimation, but also other fundamental frequency estimation methods can be applied. Other techniques for fundamental frequency estimation can be found for example in [3].
  • the present estimation algorithm is based on detection of a fundamental period in an audio signal segment (frame). The fundamental period is denoted as TO (in samples) and it is related to the fundamental frequency as t ° -A T, .
  • f s is the sampling frequency in Hz.
  • the fundamental frequency is obtained from the estimated fundamental period by using Equation 1.
  • Figure 3 is a flow diagram illustrating the operation of the fundamental frequency (or period) estimation.
  • the input signal is segmented into frames in time and the frames are treated separately.
  • the input signal Audio In is first filtered with a high-pass filter (HPF) in order to remove the DC component of the signal Audio In.
  • HPF high-pass filter
  • the next step 31 in the chain is optional linear predictive coding
  • LPC LPC whitening of the spectrum of the signal segment (frame).
  • the signal is then autocorrelated.
  • the fundamental period estimate is obtained from the autocorrelation function of the signal by using peak detection in step 33.
  • the fundamental period estimate is filtered with a median filter in order to remove spurious peaks.
  • the human voice production mechanism is typically considered as a source-filter system, i.e. an excitation signal is created and filtered by a linear system that models a vocal tract.
  • the excitation signal is periodic and it is produced at the glottis.
  • the period of the excitation signal determines the fundamental frequency of the tone.
  • the vocal tract may be considered as a linear resonator that affects the periodic excitation signal, for example, the shape of the vocal tract determining the vowel that is perceived.
  • the vocal tract can be modeled for example by using an all pole model, i.e. as an Nth order digital filter with a transfer function of
  • a k are filter coefficients.
  • the filter coefficients may be obtained by using linear prediction, that is by solving a linear system involving an autocorrelation matrix and the parameters a k .
  • the linear system is most conveniently solved using the Levinson-Durbin recursion, which is disclosed for example in [4].
  • the whitened signal x(n) is obtained by inverse filtering the non-whitened signal x'(n) by using the inverse of the transfer function in Equation 3.
  • Figures 4A and 4B illustrate time-domain windowing.
  • Figure 4A shows a signal windowed with a rectangular window and
  • Figure 4B shows a signal windowed with a Hamming window. Windowing is not shown in Figure 3, but it is assumed that the signal is windowed before the step 32.
  • FIG. 5A to 6B An example of the effect of LPC whitening is illustrated in Figures 5A to 6B.
  • Figures 5A, 5B and 5C depict a spectrum, an LPC spectrum and an inverse-filtered (whitened) spectrum of the Hamming-windowed signal of Figure 4B, respectively.
  • Figures 6A and 6B illustrate an example of the effect of LPC whitening in the autocorrelation function.
  • Figure 6A illustrates the autocorrelation function of the whitened signal of Figure 5C
  • Figure 6B that of the (non-whitened) signal of Figure 5A. It can be seen that local maxima stand out relatively more clearly in the autocorrelation function of the whitened spectrum of Figure 6A than in that of the non-whitened spectrum of Figure 6B. Therefore, this example suggests that it is advantageous to apply LPC whitening to the autocorrelation maximum detection problem.
  • LPC whitening decreases the accuracy of the estimator. This concerns particularly signals that contain high-pitched tones. Therefore, it is not always advantageous to employ LPC whitening, and, consequently, the present fundamental period estimation can be applied either with or without LPC whitening.
  • the autocorrelation of the signal is implemented by using short-time autocorrelation analysis, as disclosed in [5].
  • the short-time autocorrelation function operating on a short segment of signal x(n) is defined as
  • ⁇ k (m) — [x(n + )w(n)][x(n + + m)w(n + m)], 0 ⁇ m ⁇ C -1 W
  • c is the number of autocorrelation points to be analyzed
  • N is the number of samples
  • w( ) is the time-domain window function, such as a Hamming window.
  • the length of the time-domain window function w(n) determines the time resolution of the analysis.
  • a tapered window that is at least twice the period of the lowest fundamental frequency. This means that if for example 50 Hz is chosen as the lower limit for the fundamental frequency estimation, the minimum window length is 40 ms. At a sampling frequency of 22 050 Hz, this corresponds to 882 samples.
  • the window length it is attractive to choose the window length to be the smallest power of two that is larger than 40 ms. This is because the Fast Fourier Transform (FFT) is used to calculate the autocorrelation function and the FFT requires that the window length is a power of two.
  • FFT Fast Fourier Transform
  • the sequence has to be zero-padded before FFT calculation.
  • Zero padding simply refers to appending zeros to the signal segment in order to increase the signal length to the required value.
  • the short-time autocorrelation function is calculated as
  • x(n) is the windowed signal segment and IFFT denotes the inverse- FFT.
  • the estimated fundamental period To is obtained by peak detection, which searches for the local maximum value of ⁇ (m) (autocorrelation peak) for each k in a meaningful range of the autocorrelation lag m.
  • the peak detection is further improved by parabolic interpolation. In parabolic interpolation, a parabola is fitted to the three points consisting of a local maximum and two values adjacent to the local maximum.
  • the median filter preferably used in the method according to the invention is a three-tap median filter.
  • the above-described method for the estimation of the fundamental frequency is quite reliable in detecting the fundamental frequency of a sound signal with a single prominent harmonic source (for example voiced speech, singing, musical instruments that provide harmonic sound). Furthermore, the method derives a time trajectory of the estimated fundamental frequencies so that it follows the changes in the fundamental frequency of the sound signal.
  • the time trajectory of the fundamental frequencies needs to be further processed for obtaining a note-based code. Specifically, the time trajectory needs to be analyzed into a sequence of event pairs indicating the start, pitch and end of a note, which is referred to as note detection.
  • note detection refers to the forming of note events from the fundamental frequency trajectory.
  • a note event comprises for example a starting position (note-on event), pitch, and ending position (note-off event) of a note.
  • the time trajectory may be transformed into a sequence of single length units, such as quavers, according to a user- determined tempo.
  • Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
  • a frame of the audio signal is investigated at a time.
  • the signal level of a frame of the audio signal' is measured. Typically, an energy-based signal-level measurement is applied, although it is possible to use more sophisticated methods, e.g. auditorily motivated loudness measurements.
  • the signal level obtained from step 70 is compared to a predetermined threshold. If the signal level is below the threshold, it is decided that no tone is present in the current frame. Therefore, the analysis is aborted and step 76 is executed. If the signal level is above the threshold, a voicing decision
  • step 72 (voiced/unvoiced) is made in steps 72 and 73.
  • the voicing decision is made on the basis of the ratio of the signal level at a prominent lag in the autocorrelation function of the frame to the frame energy. This ratio is determined in steps 72 and 73, and the ratio is compared with a predetermined threshold. In other words, it is determined if there is voice or a pause in the original signal during that frame. If the frame is judged unvoiced in step 73, i.e. it is decided that no prominent harmonic tones are present in the current frame, the analysis is aborted and step 76 is executed. Otherwise, the execution proceeds to step 74. In step 74, the fundamental frequency of the frame is estimated.
  • the voicing decision is integrated in the fundamental frequency estimation, but logically they are independent blocks and therefore presented as separate steps.
  • the fundamental frequency of the frame is also quantized preferably into a semitone scale, such as a MIDI pitch scale.
  • median filtering is applied for removing spurious peaks and for deciding if a note was found or not. In other words, for example three consecutive fundamental frequencies are detected and if one of them differs very much from the others, that particular frequency is rejected because it is probably a noise peak. If no note is found in step 75, the execution proceeds to step 76. In step 76, it is detected if a note-on event is currently valid, and if so, a note-off event is applied. If a note-on event is not valid, nothing is done.
  • the fundamental frequency estimated in step 74 is compared to the fundamental frequency of the currently active note (of the previous frame). If the values are different, a note- off event is applied to stop the currently active note, and a note-on event is applied to start a new note event. If the fundamental frequency estimated in step 74 is the same as the fundamental frequency of the currently active note, nothing is done.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

L'invention concerne un procédé de génération d'un ton musical, tel qu'un ton de sonnerie, qui consiste à introduire (1, 2) une graine musicale, et à fournir (3, 4) le ton musical sur la base de la graine musicale. Si cette dernière est sous forme d'un code basé sur la note, le ton musical est généré (4a, 4b, 4c, 4d) sur la base dudit code basé sur la note. Si la graine musicale est sous forme d'un signal audio, une transformation audio en notes est appliquée (3) au signal audio afin de générer un code basé sur la note représentant la graine musicale, et le ton musical est généré (4a, 4b, 4c, 4d) sur la base dudit code basé sur la note.
PCT/FI2001/000630 2000-07-03 2001-07-02 Procede de generation d'un ton musical Ceased WO2002003374A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001282156A AU2001282156A1 (en) 2000-07-03 2001-07-02 A method for generating a musical tone

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20001591 2000-07-03
FI20001591A FI20001591A0 (fi) 2000-07-03 2000-07-03 Musikaalisen äänen generointi

Publications (1)

Publication Number Publication Date
WO2002003374A1 true WO2002003374A1 (fr) 2002-01-10

Family

ID=8558715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2001/000630 Ceased WO2002003374A1 (fr) 2000-07-03 2001-07-02 Procede de generation d'un ton musical

Country Status (3)

Country Link
AU (1) AU2001282156A1 (fr)
FI (1) FI20001591A0 (fr)
WO (1) WO2002003374A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004049300A1 (fr) * 2002-11-22 2004-06-10 Hutchison Whampoa Three G Ip(Bahamas) Limited Procede de generation d'un fichier audio sur un serveur a la demande d'un telephone mobile
WO2004072944A1 (fr) * 2003-02-14 2004-08-26 Koninklijke Philips Electronics N.V. Appareil de telecommunication mobile comprenant un generateur de melodie
FR2861527A1 (fr) * 2003-10-22 2005-04-29 Mobivillage Procede et systeme d'adaptation de sequences sonores codees a un appareil de restitution sonore
WO2005094053A1 (fr) * 2004-03-05 2005-10-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dispositif et procede pour fournir une melodie de signalisation
WO2006039993A1 (fr) * 2004-10-11 2006-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Procede et dispositif pour lisser un segment de ligne melodique
EP1691555A1 (fr) * 2005-02-14 2006-08-16 Sony NetServices GmbH Système pour fournir une chaîne musicale avec la capacité de téléchargement de sonneries vraies
WO2008086288A1 (fr) * 2007-01-07 2008-07-17 Apple Inc. Création et achat de sonneries de téléphone
TWI411304B (zh) * 2007-05-29 2013-10-01 Mediatek Inc 播放與編輯多媒體資料之電子裝置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202528A (en) * 1990-05-14 1993-04-13 Casio Computer Co., Ltd. Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5616876A (en) * 1995-04-19 1997-04-01 Microsoft Corporation System and methods for selecting music on the basis of subjective content
US5886274A (en) * 1997-07-11 1999-03-23 Seer Systems, Inc. System and method for generating, distributing, storing and performing musical work files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202528A (en) * 1990-05-14 1993-04-13 Casio Computer Co., Ltd. Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5616876A (en) * 1995-04-19 1997-04-01 Microsoft Corporation System and methods for selecting music on the basis of subjective content
US5886274A (en) * 1997-07-11 1999-03-23 Seer Systems, Inc. System and method for generating, distributing, storing and performing musical work files

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004049300A1 (fr) * 2002-11-22 2004-06-10 Hutchison Whampoa Three G Ip(Bahamas) Limited Procede de generation d'un fichier audio sur un serveur a la demande d'un telephone mobile
WO2004072944A1 (fr) * 2003-02-14 2004-08-26 Koninklijke Philips Electronics N.V. Appareil de telecommunication mobile comprenant un generateur de melodie
FR2861527A1 (fr) * 2003-10-22 2005-04-29 Mobivillage Procede et systeme d'adaptation de sequences sonores codees a un appareil de restitution sonore
WO2005094053A1 (fr) * 2004-03-05 2005-10-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dispositif et procede pour fournir une melodie de signalisation
WO2006039993A1 (fr) * 2004-10-11 2006-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Procede et dispositif pour lisser un segment de ligne melodique
EP1691555A1 (fr) * 2005-02-14 2006-08-16 Sony NetServices GmbH Système pour fournir une chaîne musicale avec la capacité de téléchargement de sonneries vraies
WO2006084594A1 (fr) * 2005-02-14 2006-08-17 Sony Netservices Gmbh Systeme de mise a disposition d'un canal musical avec capacite de telechargement de sonnerie musicale
WO2008086288A1 (fr) * 2007-01-07 2008-07-17 Apple Inc. Création et achat de sonneries de téléphone
TWI411304B (zh) * 2007-05-29 2013-10-01 Mediatek Inc 播放與編輯多媒體資料之電子裝置

Also Published As

Publication number Publication date
FI20001591A0 (fi) 2000-07-03
AU2001282156A1 (en) 2002-01-14

Similar Documents

Publication Publication Date Title
US6541691B2 (en) Generation of a note-based code
JP5295433B2 (ja) 複雑さがスケーラブルな知覚的テンポ推定
US7027983B2 (en) System and method for generating an identification signal for electronic devices
EP1252621B1 (fr) Systeme et procede de modification de signaux vocaux
KR101094687B1 (ko) 노래학습 기능을 갖는 노래방 시스템
CN101983402B (zh) 声音分析装置、方法、系统、合成装置、及校正规则信息生成装置、方法
JP6561499B2 (ja) 音声合成装置および音声合成方法
TWI281657B (en) Method and system for speech coding
CN110310621A (zh) 歌唱合成方法、装置、设备以及计算机可读存储介质
Edler et al. ASAC–analysis/synthesis audio codec for very low bit rates
WO2002003374A1 (fr) Procede de generation d'un ton musical
WO1997035301A1 (fr) Systeme vocodeur et procede d'estimation de hauteur a l'aide d'une fenetre adaptative d'echantillons de correlation
Rodet et al. Spectral envelopes and additive+ residual analysis/synthesis
US7389231B2 (en) Voice synthesizing apparatus capable of adding vibrato effect to synthesized voice
Helen et al. Perceptually motivated parametric representation for harmonic sounds for data compression purposes
KR100579797B1 (ko) 음성 코드북 구축 시스템 및 방법
JP2006171751A (ja) 音声符号化装置及び方法
CN115171729B (zh) 音频质量确定方法、装置、电子设备及存储介质
Alexandraki Real-time machine listening and segmental re-synthesis for networked music performance
CN114765029B (zh) 语音至歌声的实时转换技术
JP6515945B2 (ja) コード抽出装置、および方法
Edwards Advanced signal processing techniques for pitch synchronous sinusoidal speech coders
Modegi Evaluation method for quality losses generated by miscellaneous audio signal processings using MIDI encoder tool “Auto-F”
Papanikolaou Speech Codecs analysis, basic arithmetic operations profiling and efficient Hardware mapping
Takara et al. A study on the pitch pattern of a singing voice synthesis system based on the cepstral method.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP