GB2401714A

GB2401714A - Selecting audio information

Info

Publication number: GB2401714A
Application number: GB0311240A
Authority: GB
Inventors: Francis Domoney
Original assignee: GLENCROFT Ltd
Current assignee: GLENCROFT Ltd
Priority date: 2003-05-16
Filing date: 2003-05-16
Publication date: 2004-11-17
Also published as: GB0311240D0

Abstract

A first part of an audio information signal is received by a processor, which identifies a candidate audio object from a database of stored primitives which matches the received audio information. More specifically, the processor uses the musical instrument timbre tools of the MPEG-7 standard to identify the audio objects in the database, which are MPEG-4 musical instruments. A musical work can then be translated into the PMEG-4 format.

Description

APPARATUS FOR SELECTING AUDIO INFORMATION

AND METHOD THEREFOR

The present invention relates to an apparatus for selecting audio information, for example, of the type communicated between a first terminal and a second terminal in a bandwidth efficient manner, such as object-oriented audio information. The present invention also relates to a system employing the above apparatus, and a method of selecting audio information.

In the field of audiovisual communications, the Moving Pictures Expert Group (MPEG), a working group of the International Organisation for Standards (ISO), is known for developing compression standards and file formats relating to audiovisual information. One of the standards developed by the MPEG is known as MPEG-4.

MPEG-4 enables communication of audiovisual information in a bandwidth efficient manner through the use of an object-oriented paradigm to represent audiovisual scenes. In relation to audio information specifically, MPEG-4 adopted the concept of Structured Audio (SA), which is a means of communicating sound by describing how to produce the sound rather than compressing it. The SA part of the MPEG-4 standard comprises six so-called tools for the construction of an MPEG-4 "implementation". The tools include, inter alla, a Structured Audio Orchestra Language (SAOL) and a Structured Audio Score Language (SASL).

The SAOL is a special synthesis language that permits a synthetic orchestra to be defined by instruments that generate sounds like real musical instruments or prestored sounds. The instruments are each represented as a respective small network of signal processing primitives, such as a random noise generator, a transmission line, and/or a delay, to emulate some specific sounds such as those of a natural acoustic instrument.

The SASL is used to generate programs, or scores, to control the synthesized instruments specified as SAOL instruments by specifying, for example, the note to play, dynamics, tempo, duration, pitch, and how to control the instrument whilst it is being played.

It will be appreciated by those skilled in the art that the MPEG-4 standard enables efficient coding of audio-visual content. However, the amount of audio- visual content available in digital form is growing rapidly, making a search to locate content of interest in a database or in a network more burdensome than ever. In order to address this problem, the MPEG started a new "work item", the "Multimedia Content Description Interface" or MPEG- 7.

Whereas the MPEG-4 standard relates to generative descriptions, i.e. a more or less complete set of instructions from which a system can generate a piece of content, the MPEG-7 standard is directed to non- generative descriptions, i.e. for other purposes such as searching. MPEG- 7 descriptions are therefore capable of existing independently of the content communicated using MPEG-4, in order to provide information about the content.

In order to describe the content flexibly, the MPEG-7 standard consists of so- called descriptors and description schemes that are defined using a modified version of extensible Markup Language (XML) schema called the Description Definition Language (DDL).

In relation to the audio part of the MPEG-7 standard (MPEG-7 Audio), certain description tools have been predefined for describing audio content in order to provide the "building blocks" for, inter alla, searching and filtering content based upon one or more audio feature, such as spectrum, harmony, timbre or melody.

A given MPEG-7 Audio description tool can be classified as being either: a high- level audio description tool, or a low-level audio description tool. The low-level audio description tools include a group of low-level descriptors for audio features, known as the MPEG-7 "Audio Framework". These low-level descriptors allow the spectral, parametric and temporal features of an audio signal to be described, and include "timbral temporal" and "timbral spectral" descriptors. - 3

The high-level description tools are description schemes for the purposes of: describing musical instrument timbre, sound recognition, describing spoken content, and describing melodies.

In relation to the timbre description tools, timbre descriptors are used to describe perceptual features of musical instrument sound. The descriptors relate to concepts of "attack", "brightness" and "richness" of a sound, but are expressed at a low level in terms of multi- dimensional vectors.

Referring back to the MPEG-4 standard, and using a client-server analogy, the quality of reproduction of an audio work transmitted to a client terminal by a server will depend upon the repertoire of instruments provided by the server to the client terminal. Consequently, in order to reproduce a musical work to a high standard, the most suitable musical instruments needed to play the musical work need to be communicated to the client using the MPEG-4 standard. Unfortunately, no mechanism currently exists to identify a suitable and efficiently numbered collection of SAOL musical instruments to provide a given melody, sound or tune for transmission by the server to the client.

According to a first aspect of the present invention, there is provided an apparatus for selecting audio information from a database of audio objects, the apparatus comprising: a processing unit coupled to a first storage device; wherein the processing unit is arranged to receive, when in use, a first part of the audio information and identify at least one candidate audio object comprising a primitive from the database of audio information corresponding to the first part of the audio instrument.

For the avoidance of doubt, the term "musical work" should not be construed narrowly and should be construed as embracing not only an audio production comprising one or more instrument being, or to be, played, but also one or more vocal track, or noise, or any combination thereof.

Preferably, the audio information corresponds to a musical work.

Preferably, the database of audio objects comprises a plurality of objects, each object respectively corresponding to a musical instrument or sound. More preferably, at least one of the plurality of objects comprises an arrangement of signal processing primitives defining a musical instrument or sound. Very preferably, the at least one of the plurality of objects further comprises information relating to a playback mechanism or playing style of the musical instrument or sound.

Preferably, the processing unit is arranged to identify candidate audio objects for other parts of the audio information.

Preferably, the candidate audio objects are used to encode at least part of the musical work in a bandwidth efficient format. More preferably, the bandwidth efficient format is an MPEG-4 format.

Preferably, the at least one audio object is identified using an MPEG-7 tool.

More preferably, the MPEG-7 tool is a musical instrument timbre description tool.

Preferably, the processing unit is arranged to reduce the number of candidate instruments identified for the musical work. More preferably, the number of candidate musical instruments is reduced by comparison of audio objects to identify audio objects possessing substantially identical structures. A fuzzy logic technique may be employed to identify the audio objects possessing substantially identical structures.

Preferably, the number of musical instruments is reduced by comparison of audio objects to identify audio objects occupying substantially the same required space in timbre space.

According to a second aspect of the present invention, there is provided an apparatus for translating audio information from a first format to a second format, the apparatus comprising the apparatus for selecting audio information from a database of audio objects comprising one or more respective primitive - 5 as set forth above in relation to the first aspect of the present invention; and an encoder arranged to use the candidate audio objects selected to encode the audio information in the second format.

According to a third aspect of the present invention, there is provided a communications system comprising the apparatus as set forth above in relation to the first and/or second aspect of the present invention.

According to a fourth aspect of the present invention, there is provided a method of selecting audio information from a database of audio objects, the method comprising the steps of: receiving a first part of the audio information; and identifying at least one candidate audio object comprising a primitive from the database of audio information corresponding to the first part of the audio information.

Preferably, the audio information corresponds to a musical work.

Preferably, the database of audio objects comprises a plurality of objects, each object respectively corresponding to a musical instrument or sound. More preferably, at least one of the plurality of objects comprises an arrangement of signal processing primitives defining a musical instrument or sound.

Preferably, the method further comprises the step of: identifying candidate audio objects for other parts of the audio information.

Preferably, the method further comprises the step of: using the candidate audio objects to encode at least part of the musical work in a bandwidth efficient format. More preferably, the bandwidth efficient format is an UNPEGS format.

Preferably, the method further comprises the step of: using an MPEG-7 tool to identify the at least one audio object. More preferably, the MPEG- 7 tool is a musical instrument timbre description tool. The MPEG-7 tool may be able to identify more than one audio object. - 6

Preferably, the method further comprises the step of: reducing the number of candidate instruments identified for the musical work. More preferably, the method further comprises the step of: reducing the number of candidate musical instruments by comparison of audio objects to identify audio objects possessing substantially identical structures.

Preferably, the method further comprises the step of: reducing the number of musical instruments by comparison of audio objects to identify audio objects occupying substantially the same required space in timbre space.

According to a fifth aspect of the present invention, there is provided a method of translating audio information from a first format to a second format, the method comprising the method of selecting audio information from a database of audio objects as set forth above in relation to the fourth aspect of the present invention; and further comprising the step of: encoding the audio information into the second format using the candidate audio objects selected.

According to a sixth aspect of the present invention, there is provided a use of an audio object comprising a primitive identified from a database of audio objects using an MPEG-7 Audio tool to encode at least part of an audio work in a bandwidth efficient format.

It is thus possible to provide an apparatus for, and method of selecting audio information that can be communicated between terminals in a bandwidth efficient manner, for example over a Radio Frequency (RF) interface of a cellular telecommunications system, and reproduced upon reception to a sufficiently high standard.

At least one embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a system constituting an embodiment of the present invention; - 7 Figure 2 is a schematic diagram of an apparatus employed in the system of I Figure 1; Figure 3 is a flow diagram of a method executed by the apparatus of Figure 2; and Figure 4 is a schematic diagram of a surface in timbre space.

Throughout the following description, identical reference numerals will be used to identify like parts.

Referring to Figure 1, a cellular telecommunications network 100, for example a Global System for Mobile Communications (GSM) network, comprises a number of functional entities, including a cell 102 supported by a base station 104 coupled to an antenna 106. Since the other functional entities of the cellular network 100 are not of direct relevance to this example or the invention, for the purposes of conciseness they will not be described further. However, a person skilled in the art will readily appreciate the identity and interconnectivity of the other functional entities.

Within the cell 102, in this example, a mobile subscriber terminal 108 is capable of communicating with the base station 104 via a Radio Frequency (RF) interface 110. The mobile terminal 108 is any suitable mobile device, for example a Samsung SGH-S100 cellular telephone, having an MPEG-4 Audio Java agenVclient stored therein and an appropriate Graphical User Interface (GUI). The cellular communications network 100 is coupled to a computer network, such as the Internet 112, via a first communications link 114. A server 116 capable of serving audio content is coupled to the Internet 112, via an Internet Service Provider (ISP - not shown), and a second communications link 118.

Referring to Figure 2, the server 116 comprises, inter alla, a Central Processing Unit (CPU) 200 capable of communicating with a first storage device 202, such as a Hard Disk (HD) drive. The HD drive 202 is capable of storing files, such as MPEG-4 and other media files 204, and a database 206 of musical instrument and/or sound XML files. - 8

The database 206 of musical instruments constitutes a collection of synthetic MPEG-4 SAOL musical instruments and/or sounds and MPEG-7 descriptors and/or descriptor schemes respectively associated with each instrument sound.

The server 116 is also capable of receiving a musical work in a format other than that in which the musical work is intended to be transmitted. In this example, the musical work is to be provided in a format other than MPEG-4, for example an analogue reproduction from a Compact Disc (CD) . The musical I work is stored on an external or an internal second storage/playback device 208, for example a CD player. The server 116 additionally comprises an inpuVoutput port 210 to enable the server 116 to communicate with external equipment.

In operation (Figure 3), a user (not shown) of the mobile terminal 108 accesses the server 116, which, in this example, is hosting a music streaming site. The mobile terminal 108 communicates with the server 116 using a web browser functioning in accordance with any other suitable access protocol known in the art can be employed. Using the GUI of the mobile terminal 108, the user selects a musical work to which the user wishes to listen.

In response to the selection made by the user, the selection is communicated to the server 116 via the network 100 and the Internet 112, whereupon the server 116 retrieves (step 300) the musical work from the second storage device 208.

If the second storage device 208 is a playback device, the musical work can be stored temporarily by the first storage device 202 for playback at a rate acceptable to the CPU 200 of the server 116.

Once the musical work has been retrieved (step 300), a first note is "played" or isolated (step 302) by the CPU 200 and, using the musical instrument timbre description tool provided by the MPEG-7 Audio standard, a number of suitable candidate instruments are identified (step 304), from the database 206 of musical instruments. An instrument or sound is considered suitable if the instrument or sound defines a respective surface in timbre space and a point in - 9 - timbre space corresponding to a given note being played coincides with the I respective surface in timbre space of the instrument or sound.

As mentioned above, timbre space is a multi-dimensional space, but for ease of visualization a schematic three dimensional representation of a surface 400 defined in timbre space by a suitable candidate musical instrument and a point in timbre space occupied, for example, by the first note isolated is shown in Figure 4. It should, of course, be understood that, in this example, timbre space has more than three dimensions. I In many musical works, it is not uncommon for a number of musical instruments to be played simultaneously. In such cases, a number of candidate instrument combinations are selected by the CPU 200 as representative of the sound produced by the number of musical instruments, using the MPEG-7 Audio musical instrument timbre description tools, for each instrument being played.

Once one or more candidate instrument has been identified for the first note of the musical work, the CPU 200 determines (step 306) if all notes of the musical work have been analysed, i.e. if the musical work has finished. If further notes exist for analysis, a next note is retrieved (step 308) and candidate instruments identified. The above process (steps 304 to 308) is, of course, repeated until the whole of the musical work has been completed.

Once the whole of the musical work has been analysed and candidate instruments identified, the candidate instruments are analysed in order to try to rationalise the overall number of candidate instruments that will be used to reproduce the musical work at the mobile terminal 108. Consequently, a first part of the rationalization process is to identify and remove (step 310) duplicate musical instruments amongst all the candidate musical instruments identified for all notes. This first part can either be carried out by comparing the signal processing primitives that define two given musical instruments being compared, or by analysing the surface in timbre space that each candidate instrument being compared occupies. - 10

A second part of the rationalization process is to identify candidate musical I instruments that will be audiably indistinct or substantially audiably indistinct for the notes that they will be required to play to reproduce the musical work. This comparison can again be achieved by analysing the respective surfaces defined in timbre space by each instrument being compared, and selecting those instruments that define surfaces that are, at least in part, common surfaces in timbre space for the notes that these other instruments are required to play to contribute to the reproduction of the musical work.

Once the number of candidate instruments has been reduced to the minimum number of instruments possible, the CPU 200 begins encoding (step 312) the musical work in the MPEG-4 format (typically "off-line" unless sufficient processing power is available), using the rationalised number of candidate musical instruments identified, and subsequently, once encoded, transmits MPEG-4 tokens to the mobile terminal 108 via the inpuVoutput port 210. The tokens transmitted from the server 1 16 are communicated to the mobile terminal 108, in this example, via the internet 112 and the cellular network 100.

Alternatively, the MPEG-4 encoded file can be made available for download.

Upon receipt of a stream of tokens, the mobile terminal 108 using the MPEG-4 software reproduces the musical work, i.e. plays it, from the stream of tokens received. ! Of course, the issue of charging for the musical work being streamed can be addressed using any suitable mechanism known in the art. Also, whilst the above example has been described in the context of a cellular network employing the GSM standard, it should be appreciated that communications networks supporting other standards, for example, a Universal Mobile Telecommunications System (UMTS) standard can be used.

Although the above example has been described as an "on-demand bespoke" example, the generation of MPEG-4 encoded files can be achieved in accordance with the above described technique that employs the MPEG-7 musical instrument description tools and pre-stored for communication or general use at a later point in time.

Alternative embodiments of the invention can be implemented as a computer program product for use with a computer system, the computer program product being, for example, a series of computer instructions stored on a tangible data recording medium, such as a diskette, CD-ROM, ROM, or fixed disk, or embodied in a computer data signal, the signal being transmitted over a tangible medium or a wireless medium, for example microwave or infrared. The series of computer instructions can constitute all or part of the functionality described above, and can also be stored in any memory device, volatile or non- volatile, such as semiconductor, magnetic, optical or other memory device. - 12

Claims

Claims: I 1. An apparatus for selecting audio information from a database

of audio objects, the apparatus comprising: a processing unit coupled to a first storage device; wherein the processing unit is arranged to receive, when in use, a first part of the audio information and identify at least one candidate audio object comprising a primitive from the database of audio information corresponding to the first part of the audio information.
2. An apparatus as claimed in Claim 1, wherein the audio information corresponds to a musical work.
3. An apparatus as claimed in Claim 1 or Claim 2, wherein the database of audio objects comprises a plurality of objects, each object respectively corresponding to a musical instrument or sound.
4. An apparatus as claimed in Claim 3, wherein at least one of the plurality of objects comprises an arrangement of signal processing primitives defining a musical instrument or sound.
5. An apparatus as claimed in any one of the preceding claims, wherein the processing unit is arranged to identify candidate audio objects for other parts of the audio information.
6. An apparatus as claimed in any one of the preceding claims, wherein the candidate audio objects are used to encode at least part of the musical work in a bandwidth efficient format.
7. An apparatus as claimed in Claim 6, wherein the bandwidth efficient format is an MPEG-4 format.
8. An apparatus as claimed in any one of the preceding claims, wherein the at least one audio object is identified using an MPEG-7 tool. - 13
9. An apparatus as claimed in Claim 8, wherein the MPEG-7 tool is a

musical instrument timbre description tool.
10. An apparatus as claimed in any one of the preceding claims, wherein the processing unit is arranged to reduce the number of candidate instruments identified for the musical work.
11. An apparatus as claimed in Claim 10, wherein the number of candidate musical instruments is reduced by comparison of audio objects to identify audio objects possessing substantially identical structures.
12. An apparatus as claimed in Claim 10 or Claim 11, wherein the number of musical instruments is reduced by comparison of audio objects to identify audio objects occupying substantially the same required space in timbre space.
13. An apparatus for translating audio information from a first format to a second format, the apparatus comprising the apparatus for selecting audio information from a database of audio objects comprising one or more respective primitive as claimed in any one of the preceding claims; and an encoder arranged to use the candidate audio objects selected to encode the audio information in the second format.
14. A communications system comprising the apparatus as claimed in any one of the preceding claims.
15. A method of selecting audio information from a database of audio objects, the method comprising the steps of: receiving a first part of the audio information; and identifying at least one candidate audio object comprising a primitive from the database of audio information corresponding to the first part of the audio information. - 14
16. A method as claimed in Claim 15, wherein the audio information corresponds to a musical work.
17. A method as claimed in Claim 15 or Claim 16, wherein the database of audio objects comprises a plurality of objects, each object respectively corresponding to a musical instrument or sound.
18. A method as claimed in Claim 17, wherein at least one of the plurality of objects comprises an arrangement of signal processing primitives defining a musical instrument or sound.
19. A method as claimed in any one of Claims 15 to 18, further comprising the step of: identifying candidate audio objects for other parts of the audio information.
20. A method as claimed in any one of Claims 15 to 19, further comprising the step of: using the candidate audio objects to encode at least part of the musical work in a bandwidth efficient format.
21. A method as claimed in Claim 20, wherein the bandwidth efficient format is an MPEG4 format.
22. A method as claimed in any one of Claims 15 to 21, further comprising the step of: using an MPEG-7 tool to identify the at least one audio object.
23. A method as claimed in Claim 22, wherein the MPEG-7 tool is a musical

instrument timbre description tool.
24. A method as claimed in any one of Claims 15 to 23, further comprising the step of: - 15 reducing the number of candidate instruments identified for the musical
25. A method as claimed in Claim 24, further comprising the step of: reducing the number of candidate musical instruments by comparison of audio objects to identify audio objects possessing substantially identical structures.
26. A method as claimed in Claim 24 or Claim 25, further comprising the step of: reducing the number of musical instruments by comparison of audio objects to identify audio objects occupying substantially the same required space in timbre space.
27. A method of translating audio information from a first format to a second format, the method comprising the method of selecting audio information from a database of audio objects as claimed in any one of claims 15 to 26; and further comprising the step of: encoding the audio information into the second format using the candidate audio objects selected.
28. A use of an audio object comprising a primitive identified from a database of audio objects using an MPEG-7 Audio tool to encode at least part of an audio work in a bandwidth efficient format.