WO2002003374A1 - Procede de generation d'un ton musical - Google Patents
Procede de generation d'un ton musical Download PDFInfo
- Publication number
- WO2002003374A1 WO2002003374A1 PCT/FI2001/000630 FI0100630W WO0203374A1 WO 2002003374 A1 WO2002003374 A1 WO 2002003374A1 FI 0100630 W FI0100630 W FI 0100630W WO 0203374 A1 WO0203374 A1 WO 0203374A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- note
- musical
- musical tone
- based code
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M19/00—Current supply arrangements for telephone systems
- H04M19/02—Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone
- H04M19/04—Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone the ringing-current being generated at the substations
- H04M19/041—Encoding the ringing signal, i.e. providing distinctive or selective ringing capability
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H3/00—Instruments in which the tones are generated by electromechanical means
- G10H3/12—Instruments in which the tones are generated by electromechanical means using mechanical resonant generators, e.g. strings or percussive instruments, the tones of which are picked up by electromechanical transducers, the electrical signals being further manipulated or amplified and subsequently converted to sound by a loudspeaker or equivalent instrument
- G10H3/125—Extracting or recognising the pitch or fundamental frequency of the picked up signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2230/00—General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
- G10H2230/005—Device type or category
- G10H2230/021—Mobile ringtone, i.e. generation, transmission, conversion or downloading of ringing tones or other sounds for mobile telephony; Special musical data formats or protocols therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/011—Files or data streams containing coded musical information, e.g. for transmission
- G10H2240/046—File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
- G10H2240/056—MIDI or other note-oriented file format
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/171—Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
- G10H2240/201—Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
- G10H2240/241—Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
- G10H2240/251—Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analogue or digital, e.g. DECT, GSM, UMTS
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/171—Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
- G10H2240/281—Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
- G10H2240/321—Bluetooth
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/135—Autocorrelation
Definitions
- the invention relates to a method for generating a musical tone, such as a ringing tone.
- the invention suits particularly well for generating ringing or warning tones for mobile terminals or multimedia devices.
- 'musical tone' refers to ringing tones, warning tones, alarm tones or any other similar type of tone.
- users of mobile terminals have been able to download ringing tone melodies that have been created and provided for example by network operators.
- the users may have used web-based tools for creating a melody of their own or a tool for creating a melody may have been incorporated in the user device, such as a mobile terminal.
- the latter two methods employ common music notation in their user interfaces and thus the user of these methods has to possess knowledge on musical theory or at least on musical notation in order to create new melodies. Disclosure of the Invention
- An object of the present invention is to provide a method for generating a musical tone, such as a ringing tone, without musical skills.
- Another object of the invention is a device, such as a network server or a user terminal, which implements the method according to the invention.
- the invention is based on using a musical seed for providing the musical tone.
- the musical seed is a musical content provided by a user and it may be in audio format or in note-based code format. If the musical seed is an audio signal, the audio signal is converted into a note-based code by an audio- to-notes conversion. The musical tone is generated on the basis of the note- based code.
- the audio signal may be produced for example by singing, humming, whistling, or by playing an instrument.
- the method of the invention is preferably executed in a network server; alternatively, the method may be executed in a user terminal. If a network server is employed, a user connects to the server via a wireless or a fixed connection.
- connection protocols include, but are not limited to, the Internet protocol (IP), a wireless voice protocol of the Global System for Mobile Communications (GSM) or the like, wireless data protocols (e.g. data over GSM), short message service (SMS), wireless application protocol (WAP), telephone voice connection, modem connection, ISDN, infrared connection, local radio connection (e.g. BlueTooth).
- IP Internet protocol
- GSM Global System for Mobile Communications
- SMS short message service
- WAP wireless application protocol
- telephone voice connection modem connection
- ISDN infrared connection
- local radio connection e.g. BlueTooth
- the user provides a melody or a musical seed for the tone generation method.
- the forms of the user input can be categorized into audio formats and note-based code formats.
- the audio formats include, but are not limited to, waveform audio (digitized audio), encoded audio (obtained by using for example speech coding methods, such as methods based on linear prediction, or general audio coding methods, such as the transform codecs in the MPEG family), streaming audio, and audio files in the aforementioned formats.
- the note-based formats include, but are not limited to, MIDI, MIDI files, ringing tone formats, music representation languages, such as CSound, and MPEG-4 synthetic audio.
- the server provides a musical tone on the basis of the user's input.
- the musical tone is provided by generating a code sequence corresponding to new melody lines, i.e. a new combination of notes, by using said note-based code as an input for a composing method which produces a new melody and by converting said new melody into a musical tone.
- the term 'melody line' refers generally to musical content formed by a combination of notes and pauses.
- the note-based code may be considered as an old melody line.
- the note-based code is converted directly into a musical tone.
- the second embodiment is similar to the above-described first embodiment with the distinction that now the composing method is not employed, but the note-based code is used as such for generating the tone.
- the note-based code is compared to melodies which have been previously stored in a memory, then the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
- a code sequ- ence corresponding to new melody lines is generated by using said note- based code as an input for a composing method which produces a new melody.
- the new melody is compared to melodies, which have been previously stored in a memory, and the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
- the fourth embodiment is a combination of the above described first and third embodiments.
- Converting the note-based code into a musical tone means converting the note-based code into a tone of a suitable form for delivery to the user or for storage.
- the note-based code may be simply encoded into the form of a ringing tone in Nokia Smart Message form or similar.
- the musical tone may be stored on the server and/or delivered to the user by using the aforementioned connections and formats.
- the tone can be delivered to the user terminal for example by using vendor-specific means, such as Nokia Smart Messaging, or by making the tone available for download at a web site or by downloading the tone directly over IP or via WAP (Wireless Application Protocol) gateway or in any other suitable manner.
- vendor-specific means such as Nokia Smart Messaging
- the musical tone is delivered to the user in the form of common musical notation for editing with some suitable software tool, or in a non- editable form.
- the tone is delivered for editing for example in the form of a common musical notation, such as written notes or as a MIDI code.
- the server includes functionality for playback and/or for editing the musical tone.
- the audio-to-notes conversion method preferably comprises estimating fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies and detecting note events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.
- the audio signal containing musical information is processed in frames, and the note-based code representing musical information is constructed at the same time as the input signal is provided.
- the signal level of a frame is first measured and compared to a predetermined signal level threshold. If the signal level threshold is exceeded, a voicing decision is executed for judging if the frame is voiced or unvoiced. If the frame is judged voiced, the fundamental frequency of the frame is estimated and quantized for obtaining a quantized present fundamental frequency. Then, it is decided on the basis of the quantized present fundamental frequency whether a note is found. If a note is found, the quantized present fundamental frequency is compared to the fundamental frequency of the previous frame.
- a note-off event and then a note-on event, after the note-off event are applied. If the previous and present fundamental frequencies are the same, nothing is done. If the signal level threshold is not exceeded or if the frame is judged unvoiced or if a note is not found, it is detected whether a note-on event is currently valid and if yes, a note-off event is applied. The procedure is repeated frame by frame at the same time as the audio signal is received for obtaining the note-based code.
- An advantage of the invention is that it can be used by people without knowledge on musical theory for producing a musical tone, such as a ringing tone, by providing a musical presentation for example by singing, humming, whistling or playing an instrument.
- a musical tone such as a ringing tone
- the invention provides a simple method for personalizing mobile terminals and other similar devices. Additionally, self-made musical content can be stored in the form of a musical tone.
- Figure 1A is a flow diagram illustrating a method according to the invention
- Figure 1 B is a block diagram illustrating an arrangement according to an embodiment of the invention
- Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
- FIG. 2 illustrates the audio-to-notes conversion according to an embodiment of the invention
- Figure 3 is a flow diagram illustrating fundamental frequency estimation according to an embodiment of the invention
- FIGS. 4A and 4B illustrate time-domain windowing
- Figures 5A to 6B illustrate an example of the effect of the LPC whitening
- Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
- the principle of the invention is to provide a musical tone, i.e. a ringing tone or the like, on the basis of a musical seed given by the user in the form of an audio signal or in the form of a note-based code.
- FIG. 1A is a flow diagram illustrating a method according to the invention for generating a musical tone.
- the musical seed is provided in the form of an audio signal, and this audio signal is converted into a note- based code with an audio-to-notes conversion method in step 3.
- the audio-to-notes conversion comprises fundamental frequency estimation and note detection.
- the musical seed is provided in the form of a note-based code.
- the note-based code obtained by the audio-to-notes conversion or from the user is used for generating a musical tone in one of steps 4a, 4b, 4c and 4d.
- step 4a the note-based code is used as a seed sequence for a composition method.
- An automated composition method which is preferably used for this, is disclosed in [2].
- This composition method generates code sequences corresponding to new melody lines on the basis of a seed sequence (training sequence).
- the new melody adapts to changes in the input signal, but it is not necessarily the same. In this way, for example deficiencies in the input signal, if any, are corrected or smoothened.
- the new melody lines are then converted into the musical tone.
- step 4b the note-based code is converted directly into a musical tone.
- This method allows users to sing a melody, for example, and to receive the melody they sang in the form of a ringing tone.
- step 4c the note-based code is compared to melodies stored in a memory to find the melody that is the closest match with the note-based code.
- the melody that is the closest match is then converted into the musical tone.
- Step 4d is a combination of steps 4a and 4c.
- the note-based code is used for generating new melody lines with a composition method and then the new melody lines are compared to the melodies stored in a memory and the melody corresponding to the closest match is converted into a musical tone.
- the composition method enables deficiencies in the input signal, if any, to be corrected or smoothened, and therefore, comparison to stored melodies may become easier.
- the comparison may be based on a distance measure computed on the intervals of the seed sequence, duration of individual notes in the sequence, absolute pitches of the notes in the sequence, or other musical information contained in the sequence.
- step 5 the musical tone is delivered to the user in the form of a common musical notation for editing with some suitable software tool or for playback.
- step 6 the tone is delivered to the user.
- Step 5 or 6 may also include storing the tone in a file.
- the file may be for example a MIDI file in which sound event descriptions are stored, or it may be a sound file which stores synthesized sound.
- FIG. 1B is a block diagram illustrating an arrangement according to an embodiment of the invention.
- a user connects from a mobile user terminal 8 or from a fixed user terminal 9 to a server 10a through a suitable connection.
- the mobile user terminal 8 is typically a mobile phone or some other wireless device and the fixed user terminal 9 is typically a workstation or a personal computer.
- the server process may be incorporated in the user terminal, but typically the server is a separate network server.
- the user provides a musical seed, and the musical seed is transmitted to the server 10a in any suitable form. Some possible data formats and transmission protocols were described in the above description.
- the server 10a executes the tone generation method according to the invention and returns the generated tone to the user terminal 8 or 9.
- FIG. 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
- the arrangement includes a wireless communication network 13 and the Internet 15.
- the wireless network may be for example a GSM or a UMTS (Universal Mobile Telecommunications System) network.
- GSM Global System for Mobile communications
- UMTS Universal Mobile Telecommunications System
- a mobile user terminal 8 and a server 10b are connected to the wireless network.
- the mobile user terminal 8 is used for providing a musical seed to the server 10b for example through a voice connection.
- the server 10b generates a musical tone and returns the musical tone to the mobile user terminal 8 for example in ringing tone format via SMSC 17 (Short Message Service Center).
- a fixed user terminal 9 and a server 10c are connected to the Internet.
- the fixed user terminal 9 is used for providing a musical seed to the server 10c for example through a voice over IP connection.
- the mobile user terminal 8 may be used for providing a musical seed to the server 10c.
- the connection between the mobile user terminal 8 and the server 10c is established through a WAP gateway 14, which connects the wireless network and the Internet and provides Internet services to mobile networks, and the server 10c then generates a musical tone.
- the musical tone is returned to the fixed user terminal 9 for example as audio over IP or by placing the musical tone into a file available for download on an Internet site. To the mobile user terminal 8 the musical tone is transmitted through the WAP gateway.
- the audio-to-notes conversion according to an embodiment of the invention can be divided into two steps, as shown in Figure 2: fundamental frequency estimation 21 and note detection 22.
- step 21 an audio input is segmented into frames in time and the fundamental frequency of each frame is estimated.
- the processing treatment of the signal is executed in the digital domain, and therefore, the audio input is digitized with an A/D converter prior to the fundamental frequency estimation, if the audio input is not already in a digital form.
- fundamental frequency estimation alone is not sufficient for producing the note-based code. Therefore, in step 22, consecutive fundamental frequencies are further processed for detecting the notes.
- the autocorrelation function has been widely adopted for fundamental frequency estimation, and it is also preferred in the method according to the invention. However, it is not mandatory for the method of the invention to employ autocorrelation in fundamental frequency estimation, but also other fundamental frequency estimation methods can be applied. Other techniques for fundamental frequency estimation can be found for example in [3].
- the present estimation algorithm is based on detection of a fundamental period in an audio signal segment (frame). The fundamental period is denoted as TO (in samples) and it is related to the fundamental frequency as t ° -A T, .
- f s is the sampling frequency in Hz.
- the fundamental frequency is obtained from the estimated fundamental period by using Equation 1.
- Figure 3 is a flow diagram illustrating the operation of the fundamental frequency (or period) estimation.
- the input signal is segmented into frames in time and the frames are treated separately.
- the input signal Audio In is first filtered with a high-pass filter (HPF) in order to remove the DC component of the signal Audio In.
- HPF high-pass filter
- the next step 31 in the chain is optional linear predictive coding
- LPC LPC whitening of the spectrum of the signal segment (frame).
- the signal is then autocorrelated.
- the fundamental period estimate is obtained from the autocorrelation function of the signal by using peak detection in step 33.
- the fundamental period estimate is filtered with a median filter in order to remove spurious peaks.
- the human voice production mechanism is typically considered as a source-filter system, i.e. an excitation signal is created and filtered by a linear system that models a vocal tract.
- the excitation signal is periodic and it is produced at the glottis.
- the period of the excitation signal determines the fundamental frequency of the tone.
- the vocal tract may be considered as a linear resonator that affects the periodic excitation signal, for example, the shape of the vocal tract determining the vowel that is perceived.
- the vocal tract can be modeled for example by using an all pole model, i.e. as an Nth order digital filter with a transfer function of
- a k are filter coefficients.
- the filter coefficients may be obtained by using linear prediction, that is by solving a linear system involving an autocorrelation matrix and the parameters a k .
- the linear system is most conveniently solved using the Levinson-Durbin recursion, which is disclosed for example in [4].
- the whitened signal x(n) is obtained by inverse filtering the non-whitened signal x'(n) by using the inverse of the transfer function in Equation 3.
- Figures 4A and 4B illustrate time-domain windowing.
- Figure 4A shows a signal windowed with a rectangular window and
- Figure 4B shows a signal windowed with a Hamming window. Windowing is not shown in Figure 3, but it is assumed that the signal is windowed before the step 32.
- FIG. 5A to 6B An example of the effect of LPC whitening is illustrated in Figures 5A to 6B.
- Figures 5A, 5B and 5C depict a spectrum, an LPC spectrum and an inverse-filtered (whitened) spectrum of the Hamming-windowed signal of Figure 4B, respectively.
- Figures 6A and 6B illustrate an example of the effect of LPC whitening in the autocorrelation function.
- Figure 6A illustrates the autocorrelation function of the whitened signal of Figure 5C
- Figure 6B that of the (non-whitened) signal of Figure 5A. It can be seen that local maxima stand out relatively more clearly in the autocorrelation function of the whitened spectrum of Figure 6A than in that of the non-whitened spectrum of Figure 6B. Therefore, this example suggests that it is advantageous to apply LPC whitening to the autocorrelation maximum detection problem.
- LPC whitening decreases the accuracy of the estimator. This concerns particularly signals that contain high-pitched tones. Therefore, it is not always advantageous to employ LPC whitening, and, consequently, the present fundamental period estimation can be applied either with or without LPC whitening.
- the autocorrelation of the signal is implemented by using short-time autocorrelation analysis, as disclosed in [5].
- the short-time autocorrelation function operating on a short segment of signal x(n) is defined as
- ⁇ k (m) — [x(n + )w(n)][x(n + + m)w(n + m)], 0 ⁇ m ⁇ C -1 W
- c is the number of autocorrelation points to be analyzed
- N is the number of samples
- w( ) is the time-domain window function, such as a Hamming window.
- the length of the time-domain window function w(n) determines the time resolution of the analysis.
- a tapered window that is at least twice the period of the lowest fundamental frequency. This means that if for example 50 Hz is chosen as the lower limit for the fundamental frequency estimation, the minimum window length is 40 ms. At a sampling frequency of 22 050 Hz, this corresponds to 882 samples.
- the window length it is attractive to choose the window length to be the smallest power of two that is larger than 40 ms. This is because the Fast Fourier Transform (FFT) is used to calculate the autocorrelation function and the FFT requires that the window length is a power of two.
- FFT Fast Fourier Transform
- the sequence has to be zero-padded before FFT calculation.
- Zero padding simply refers to appending zeros to the signal segment in order to increase the signal length to the required value.
- the short-time autocorrelation function is calculated as
- x(n) is the windowed signal segment and IFFT denotes the inverse- FFT.
- the estimated fundamental period To is obtained by peak detection, which searches for the local maximum value of ⁇ (m) (autocorrelation peak) for each k in a meaningful range of the autocorrelation lag m.
- the peak detection is further improved by parabolic interpolation. In parabolic interpolation, a parabola is fitted to the three points consisting of a local maximum and two values adjacent to the local maximum.
- the median filter preferably used in the method according to the invention is a three-tap median filter.
- the above-described method for the estimation of the fundamental frequency is quite reliable in detecting the fundamental frequency of a sound signal with a single prominent harmonic source (for example voiced speech, singing, musical instruments that provide harmonic sound). Furthermore, the method derives a time trajectory of the estimated fundamental frequencies so that it follows the changes in the fundamental frequency of the sound signal.
- the time trajectory of the fundamental frequencies needs to be further processed for obtaining a note-based code. Specifically, the time trajectory needs to be analyzed into a sequence of event pairs indicating the start, pitch and end of a note, which is referred to as note detection.
- note detection refers to the forming of note events from the fundamental frequency trajectory.
- a note event comprises for example a starting position (note-on event), pitch, and ending position (note-off event) of a note.
- the time trajectory may be transformed into a sequence of single length units, such as quavers, according to a user- determined tempo.
- Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
- a frame of the audio signal is investigated at a time.
- the signal level of a frame of the audio signal' is measured. Typically, an energy-based signal-level measurement is applied, although it is possible to use more sophisticated methods, e.g. auditorily motivated loudness measurements.
- the signal level obtained from step 70 is compared to a predetermined threshold. If the signal level is below the threshold, it is decided that no tone is present in the current frame. Therefore, the analysis is aborted and step 76 is executed. If the signal level is above the threshold, a voicing decision
- step 72 (voiced/unvoiced) is made in steps 72 and 73.
- the voicing decision is made on the basis of the ratio of the signal level at a prominent lag in the autocorrelation function of the frame to the frame energy. This ratio is determined in steps 72 and 73, and the ratio is compared with a predetermined threshold. In other words, it is determined if there is voice or a pause in the original signal during that frame. If the frame is judged unvoiced in step 73, i.e. it is decided that no prominent harmonic tones are present in the current frame, the analysis is aborted and step 76 is executed. Otherwise, the execution proceeds to step 74. In step 74, the fundamental frequency of the frame is estimated.
- the voicing decision is integrated in the fundamental frequency estimation, but logically they are independent blocks and therefore presented as separate steps.
- the fundamental frequency of the frame is also quantized preferably into a semitone scale, such as a MIDI pitch scale.
- median filtering is applied for removing spurious peaks and for deciding if a note was found or not. In other words, for example three consecutive fundamental frequencies are detected and if one of them differs very much from the others, that particular frequency is rejected because it is probably a noise peak. If no note is found in step 75, the execution proceeds to step 76. In step 76, it is detected if a note-on event is currently valid, and if so, a note-off event is applied. If a note-on event is not valid, nothing is done.
- the fundamental frequency estimated in step 74 is compared to the fundamental frequency of the currently active note (of the previous frame). If the values are different, a note- off event is applied to stop the currently active note, and a note-on event is applied to start a new note event. If the fundamental frequency estimated in step 74 is the same as the fundamental frequency of the currently active note, nothing is done.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2001282156A AU2001282156A1 (en) | 2000-07-03 | 2001-07-02 | A method for generating a musical tone |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20001591 | 2000-07-03 | ||
| FI20001591A FI20001591A0 (fi) | 2000-07-03 | 2000-07-03 | Musikaalisen äänen generointi |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2002003374A1 true WO2002003374A1 (fr) | 2002-01-10 |
Family
ID=8558715
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/FI2001/000630 Ceased WO2002003374A1 (fr) | 2000-07-03 | 2001-07-02 | Procede de generation d'un ton musical |
Country Status (3)
| Country | Link |
|---|---|
| AU (1) | AU2001282156A1 (fr) |
| FI (1) | FI20001591A0 (fr) |
| WO (1) | WO2002003374A1 (fr) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004049300A1 (fr) * | 2002-11-22 | 2004-06-10 | Hutchison Whampoa Three G Ip(Bahamas) Limited | Procede de generation d'un fichier audio sur un serveur a la demande d'un telephone mobile |
| WO2004072944A1 (fr) * | 2003-02-14 | 2004-08-26 | Koninklijke Philips Electronics N.V. | Appareil de telecommunication mobile comprenant un generateur de melodie |
| FR2861527A1 (fr) * | 2003-10-22 | 2005-04-29 | Mobivillage | Procede et systeme d'adaptation de sequences sonores codees a un appareil de restitution sonore |
| WO2005094053A1 (fr) * | 2004-03-05 | 2005-10-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Dispositif et procede pour fournir une melodie de signalisation |
| WO2006039993A1 (fr) * | 2004-10-11 | 2006-04-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Procede et dispositif pour lisser un segment de ligne melodique |
| EP1691555A1 (fr) * | 2005-02-14 | 2006-08-16 | Sony NetServices GmbH | Système pour fournir une chaîne musicale avec la capacité de téléchargement de sonneries vraies |
| WO2008086288A1 (fr) * | 2007-01-07 | 2008-07-17 | Apple Inc. | Création et achat de sonneries de téléphone |
| TWI411304B (zh) * | 2007-05-29 | 2013-10-01 | Mediatek Inc | 播放與編輯多媒體資料之電子裝置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5202528A (en) * | 1990-05-14 | 1993-04-13 | Casio Computer Co., Ltd. | Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously |
| US5250745A (en) * | 1991-07-31 | 1993-10-05 | Ricos Co., Ltd. | Karaoke music selection device |
| US5616876A (en) * | 1995-04-19 | 1997-04-01 | Microsoft Corporation | System and methods for selecting music on the basis of subjective content |
| US5886274A (en) * | 1997-07-11 | 1999-03-23 | Seer Systems, Inc. | System and method for generating, distributing, storing and performing musical work files |
-
2000
- 2000-07-03 FI FI20001591A patent/FI20001591A0/fi unknown
-
2001
- 2001-07-02 AU AU2001282156A patent/AU2001282156A1/en not_active Abandoned
- 2001-07-02 WO PCT/FI2001/000630 patent/WO2002003374A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5202528A (en) * | 1990-05-14 | 1993-04-13 | Casio Computer Co., Ltd. | Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously |
| US5250745A (en) * | 1991-07-31 | 1993-10-05 | Ricos Co., Ltd. | Karaoke music selection device |
| US5616876A (en) * | 1995-04-19 | 1997-04-01 | Microsoft Corporation | System and methods for selecting music on the basis of subjective content |
| US5886274A (en) * | 1997-07-11 | 1999-03-23 | Seer Systems, Inc. | System and method for generating, distributing, storing and performing musical work files |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004049300A1 (fr) * | 2002-11-22 | 2004-06-10 | Hutchison Whampoa Three G Ip(Bahamas) Limited | Procede de generation d'un fichier audio sur un serveur a la demande d'un telephone mobile |
| WO2004072944A1 (fr) * | 2003-02-14 | 2004-08-26 | Koninklijke Philips Electronics N.V. | Appareil de telecommunication mobile comprenant un generateur de melodie |
| FR2861527A1 (fr) * | 2003-10-22 | 2005-04-29 | Mobivillage | Procede et systeme d'adaptation de sequences sonores codees a un appareil de restitution sonore |
| WO2005094053A1 (fr) * | 2004-03-05 | 2005-10-06 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Dispositif et procede pour fournir une melodie de signalisation |
| WO2006039993A1 (fr) * | 2004-10-11 | 2006-04-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Procede et dispositif pour lisser un segment de ligne melodique |
| EP1691555A1 (fr) * | 2005-02-14 | 2006-08-16 | Sony NetServices GmbH | Système pour fournir une chaîne musicale avec la capacité de téléchargement de sonneries vraies |
| WO2006084594A1 (fr) * | 2005-02-14 | 2006-08-17 | Sony Netservices Gmbh | Systeme de mise a disposition d'un canal musical avec capacite de telechargement de sonnerie musicale |
| WO2008086288A1 (fr) * | 2007-01-07 | 2008-07-17 | Apple Inc. | Création et achat de sonneries de téléphone |
| TWI411304B (zh) * | 2007-05-29 | 2013-10-01 | Mediatek Inc | 播放與編輯多媒體資料之電子裝置 |
Also Published As
| Publication number | Publication date |
|---|---|
| FI20001591A0 (fi) | 2000-07-03 |
| AU2001282156A1 (en) | 2002-01-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6541691B2 (en) | Generation of a note-based code | |
| JP5295433B2 (ja) | 複雑さがスケーラブルな知覚的テンポ推定 | |
| US7027983B2 (en) | System and method for generating an identification signal for electronic devices | |
| EP1252621B1 (fr) | Systeme et procede de modification de signaux vocaux | |
| KR101094687B1 (ko) | 노래학습 기능을 갖는 노래방 시스템 | |
| CN101983402B (zh) | 声音分析装置、方法、系统、合成装置、及校正规则信息生成装置、方法 | |
| JP6561499B2 (ja) | 音声合成装置および音声合成方法 | |
| TWI281657B (en) | Method and system for speech coding | |
| CN110310621A (zh) | 歌唱合成方法、装置、设备以及计算机可读存储介质 | |
| Edler et al. | ASAC–analysis/synthesis audio codec for very low bit rates | |
| WO2002003374A1 (fr) | Procede de generation d'un ton musical | |
| WO1997035301A1 (fr) | Systeme vocodeur et procede d'estimation de hauteur a l'aide d'une fenetre adaptative d'echantillons de correlation | |
| Rodet et al. | Spectral envelopes and additive+ residual analysis/synthesis | |
| US7389231B2 (en) | Voice synthesizing apparatus capable of adding vibrato effect to synthesized voice | |
| Helen et al. | Perceptually motivated parametric representation for harmonic sounds for data compression purposes | |
| KR100579797B1 (ko) | 음성 코드북 구축 시스템 및 방법 | |
| JP2006171751A (ja) | 音声符号化装置及び方法 | |
| CN115171729B (zh) | 音频质量确定方法、装置、电子设备及存储介质 | |
| Alexandraki | Real-time machine listening and segmental re-synthesis for networked music performance | |
| CN114765029B (zh) | 语音至歌声的实时转换技术 | |
| JP6515945B2 (ja) | コード抽出装置、および方法 | |
| Edwards | Advanced signal processing techniques for pitch synchronous sinusoidal speech coders | |
| Modegi | Evaluation method for quality losses generated by miscellaneous audio signal processings using MIDI encoder tool “Auto-F” | |
| Papanikolaou | Speech Codecs analysis, basic arithmetic operations profiling and efficient Hardware mapping | |
| Takara et al. | A study on the pitch pattern of a singing voice synthesis system based on the cepstral method. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |