GB2531700A

GB2531700A - Methods for identifying and monitoring use of audio entities

Info

Publication number: GB2531700A
Application number: GB1417856.0A
Authority: GB
Inventors: Mcguire John; Scanlon Derek
Original assignee: BIGEARS DIGITAL SERVICES Ltd
Current assignee: BIGEARS DIGITAL SERVICES Ltd
Priority date: 2014-10-09
Filing date: 2014-10-09
Publication date: 2016-05-04
Also published as: GB201417856D0

Abstract

Methods of monitoring and identifying use of audio entities in broadcast, streamed or performed media. A set of media data for the broadcast, streamed or performed media is obtained, and processed to identify a portion of the data set containing audio data. This portion is compared with data for the given audio entity. This data may be an acoustic fingerprint for the given audio entity. An acoustic fingerprint may be generated for the portion of the set of media data, for comparison with the acoustic fingerprint for the given audio entity. Companion log data may also be obtained with the media data and may be used to identify the portion of the media data which is audio data. Processing may also involve identifying a plurality of subset media types, one of which being an audio type. Processing may also involve identifying a plurality of subset audio types. One of the identified audio subsets may be a speech subset which would be processed to obtain the text of speech. The step of comparing may comprise identifying a plurality of sub-portions of the audio data and comparing each sub-portion with data for the given audio entity.

Description

Intellectual Property Office Application No. GII1417856.0 RTM Date:22 February 2016 The following terms are registered trade marks and should be read as such wherever they occur in this document: PRS (Page 1) PPL (Page 1) Intellectual Property Office is an operating name of the Patent Office www.gov.uk /ipo Methods for identifying and monitoring use of audio entities

Field of the Invention

The invention relates to methods for identifying and monitoring use of audio entities or sounds, especially music.

Background to the Invention

There is a need for improved recognition of music or other sounds from a variety of sources. Audio or acoustic fingerprinting is well known and is used regularly to identify music or other audio signals. Currently, the use of copyright protected audio, such as music, in broadcasts, especially television broadcasts is reported by the broadcaster to the relevant fee collection society such as PRS or PPL in the UK. There are significant problems associated with self-reporting. The broadcast and streaming across television and radio networks of sound recordings, compositions and public performance annually generates a multi billion dollar income, accumulated by collecting societies. However, this income is often collected and distributed using systems and technology that rely on manual procedures and outdated data capture and processing facilities. These procedures are often inefficient and unable to cope with the varied and disparate channels that reproduce sound recordings and compositions in the 21st century.

For example, manual procedures can result in inaccurate reporting of usage, or missed instances of usage. Disputes over usage can result. For certain types of broadcast, societies such as PRS use a sampling approach, collecting data for usage only on certain days. This is particularly the case for public performances. Streaming media on the intemet adds further complexity for collecting societies.

Indeed the European Parliament has become concerned by the inability of these societies to collect and distribute income efficiently and accurately. In July 2012 it committed to legislating the collective management of copyright and related rights in Europe, due to: "poor financial management (of collecting societies and a) ...need for specific measures focusing on governance and transparency." In addition, although acoustic fingerprinting is known, it can have difficulty with complex input sources. For example, common methods of fingerprinting fail with inputs having various media, rather than simply audio, for example speech and periods of silence.

The present invention aims to address these problems and provide improvements upon the known devices and methods.

Summary of the Invention

Aspects and embodiments of the invention are set out in the accompanying claims.

In general terms, one embodiment of a first aspect of the invention can provide a method of monitoring use of a given audio entity in broadcast, streamed or 1)o-formed media, comprising the steps of: obtaining a set of media data for the broadcast, streamed or performed media; processing the set of media data Lo identify a portion of the data set containing audio data; and comparing the portion with data for the given audio entity.

This provides a simple but reliable means for monitoring use of an audio entity, such as a given piece of music, sound, sound effect, spoken word extract or audio sample. The audio entity and audio sample may be purely audio, or may be audio/visual. Rather than relying on the producer of a broadcast, streamed media or public performance to report use of a piece of audio, the method takes the data itself which was broadcast, streamed or performed (for example, recorded during the performance) and identifies directly from that data any use of the given audio entity (piece of music).

Although a media producer may be relatively accurate in the step of identifying a piece of music that may be or was used (usually because it will be identified during production), the reporting of whether that piece of music was actually used is potentially subject to errors.

In contrast, the present invention uses the data itself from the broadcast, stream or performance; this data necessarily reflects no more or less than what was actually broadcast, streamed or performed, and therefore the present invention's use of this data allows the possibility of complete accuracy as to whether a piece of music was used or not.

The invention also allows for the analysis or monitoring of a non-exclusively audio data input, such as a multimedia or audio/video input, in order to find uses of a particular audio entity. This is in contrast to most prior art methods of audio fingerprinting, which simply analyse an audio signal.

The set of media data for a broadcast may be a broadcast, transport or program stream or the like, such as a bitstream (e.g. MPEG, MP4, FLV), either single or multiplexed. For streamed media the data may similarly be a bitstream, such as MPEG, HLS or MPEG-DASH. For a public or live performance, the data may take the form of a stream compiled from a sound or AV desk, or an audio recording of the performance, for example. The media itself may be multimedia, i.e. a combination of different types of media, such as audio, video, graphics and the like.

The portion of the media data containing audio may therefore be an audio section of a bitstream, or audio and video section, or an audio feed from a live performance, or recording of the audio part of that performance.

Preferably, the data for the given audio entity is audio data. The portion of the media data identified as containing audio data may be converted to an audio format for comparison to the audio entity audio data.

Alternatively, the data for the given audio entity is an acoustic fingerprint for the given audio entity. This may provide greater accuracy and more efficient identification.

Preferably, the method further comprises generating an acoustic fingerprint for the portion of the set of media data for comparison with the acoustic fingerprint for the given audio entity.

in one embodiment, the media are broadcast media and the step of obtaining comprises obtaining a broadcast data stream, and the step of processing comprises processing the broadcast data stream to identify the portion of the stream containing audio.

Suitably, the method anther comprises obtaining companion log data for the set of media data.

This may allow for far greater accuracy in identifying usage of audio entities in the media data; in addition to using the data itself, a log of the data broadcast, streamed or performed is obtained to corroborate analysis of the data, and to aid processing of the data for monitoring or identification. For example, in the case of a broadcast stream, an As Run log may be obtained.

Preferably, the step of processing comprises using the companion log data to process the set of media data to identify the portion containing audio data.

The log may be obtained from the broadcaster of the audio, video or streaming media, or may be included in the data, for example in headers or information packets or the like in a bitstream.

Suitably, the step of processing comprises processing the set of media data to identify a plurality of subsets of media types.

This provides a means of breaking down complex media into constituents, to make later processing for audio possible, or less problematic. The set may be divided into the identified plurality of subsets of media types, for example into audio and video, or in the case of TV broadcasts into production, adverts, news, idcnts and the like.

Preferably, the step of processing comprises identifying an audio subset,and processing the audio subset to identify the audio data portion.

Suitably, the step of processing comprises processing the audio data portion to identify a plurality of subsets of audio types. The audio data portion may be divided into the plurality of subsets of audio types, such as music, silence and speech.

Preferably, the method comprises identifying a speech audio type subset, and processing data from the speech subset of the audio data portion to obtain text of the speech.

In an embodiment, the method further comprises, before the step of processing: determining a type of the companion log and/or a type of the set of media data; and obtaining a pre-defined schema for the type of companion log and or set of media data identified, wherein the schema comprises a set of instructions for processing the set of media data.

This allows for efficient processing of the input media data each type of media and or log may have an associated set of rules for processing that type of media (with the log), such as what media types are likely to be present (production, adverts, news), how the media data may be divided, or which types of audio might be expected (music, speech, silence) and for how long.

Suitably, the step of comparing comprises generating a measure of confidence in a match between the portion and data for the given audio entity.

In an embodiment, the step of comparing comprises: comparing the portion with data for a set of given audio entities; and identifying and/or recording a match between the portion and data for one of the set of given audio entities.

For example, the audio portion may be compared with a library of audio entities, to find a match.

Suitably, the step of comparing comprises: employing a plurality of algorithms to compare the portion with data for the given audio entity; and comparing the results of the plurality of algorithms.

in this way, the accuracy of the comparison can be greatly enhanced, confirming positive matches and removing false positives. For example, duplicates can be recognised, ambiguities in returns from the algorithms can be reconciled, and a confidence value can be increased.

In an embodiment, the step of comparing comprises: identifying a plurality of sub-portions of the portion of the set of media data containing audio data; comparing each sub-portion with data for the given audio entity; and comparing the results of each sub-portion comparison.

For example, the audio portion may be sub-divided into sections, or a set of cascading or overlapping sections of the audio portion may be generated, for comparison.

One embodiment of a second aspect of the invention can provide a method of identifying an audio entity in a set of media data, comprising: processing the set of media data to identify a plurality of subsets of media types; identifying an audio subset, and processing the audio subset to identify an audio data portion; and comparing the portion with data for the given audio entity.

Preferably, the data for the given audio entity is an acoustic fingerprint for the given audio entity.

More preferably, the method thither comprises generating an acoustic fingerprint for the portion of the set of media data for comparison with the acoustic fingerprint for the given audio entity.

In an embodiment, the step of comparing comprises: identifying a plurality of sub-portions of the audio data portion; comparing each sub-portion with data for the given audio entity; and comparing the results of each sub-portion comparison.

Suitably, the method further comprises sending a result of the comparison to a software application, wherein the software application is stored on one of: a local user device; a mobile user device; and an internet-accessible server.

One embodiment of a third aspect of the invention can provide a media device storing computer program code adapted, when loaded into or run on a computer, to cause the computer to cam/ out a method according to any of the above described embodiments.

One embodiment of a fourth aspect of the invention can provide a method for identifying an audio entity from an audio sample, comprising the steps of a) slicing the sample into slices; and b) using audio fingerprinting to identify an audio entity within the sample.

One embodiment of a fifth aspect of the invention can provide a method for identifying an audio entity from a television broadcast, comprising the steps of: a) identifying the type of programming within the broadcast, optionally by using the as run log; and slicing the sample into slices, optionally according to programme type; b) separating the sample into music and non-music; c) pre-processing one or more slices and using audio fingerprinting to identify an audio entity within the sample.

Optionally, the further step of d) integration of external data sources with the results of step c may be provided.

One embodiment of a sixth aspect of the invention can provide a method for dentifying an audio entity from an audio sample, comprising the step of: a) using audio fingerprinting to identify an audio entity within the sample, wherein the step of using audio fingerprinting includes using more than one audio fingerprinting technology or programme.

One embodiment of a seventh aspect of the invention can provide a method for identifying an audio entity from an audio sample, comprising the steps of a) using audio fingerprinting to identify an audio entity within the sample; and b) using a second type of analysis to confirm the identity of the audio entity.

The second type of analysis could be lyric matching; that is to say, any lyrics in the audio sample could be identified and matched to lyrics from known audio entities to identify the audio entity in the sample. Alternatively, it could be using a crowd portal, releasing the audio sample to third parties for them to identify. It is particularly useful to use a second type of analysis when the audio fingerprinting is not able to find a match for the audio entity or only finds a partial match.

If a second type of analysis is used, the method may also comprise the step of feeding the identity of the audio entity back into the audio fingerprinting system used, so as to educate the system as to the identity of that entity and improve the likelihood of its identification in the future.

An embodiment of an eighth aspect of the invention may provide a method for identifying an audio entity in an audio sample and confirming that the inclusion of that audio entity in said audio sample has been recorded for the purpose of the collection of any royalty due. The method may comprise the steps of a) identifying an audio entity within the sample; and b) applying a mark to the sample. identifying it as having been analysed and any audio entities within it identified.

This is particularly useful for audiovisual samples, such as videos, especially given the recent increase in the posting of videos on websites such as youtube. Many videos may be posted, containing audio or audiovisual entities for which royalties should be collected. It would be advantageous to provide an easy way for such videos to be analysed, to confirm such analysis has taken place and for royalties to be collected. The mark may be any appropriate mark, such as a watermark on the video. The analysis may use any of the techniques mentioned herein. For example, it may include audio fingerprinting, lyric matching or release to a crowd portal. it may also include any of the processing steps.

The invention will now be described in detail by way of example only, with reference to the figures, in which: Figure I is a diagram illustrating a method of identifying or monitoring an audio entity according to an embodiment of the invention; Figure 2 is a diagram illustrating a method of processing input media according to an embodiment of the invention; Figure 3 to 5 are diagrams illustrating methods of acoustic fingerprinting management according to embodiments of the invention; and Figure 6 is a diagram illustrating apparatus for use with embodiments of the invention.

Detailed description of the invention

Terminology and definitions Transmission Schedule This is the schedule of events planned for transmission.

As Run Log This is the textual log (often in XML format) of the events actually transmitted by the broadcaster as opposed to the Transmission Schedule which details those events planned for transmission. As embodiments of the invention monitor the transmitted content., it is the Transmission As-Run log (Tx As-Run) that may be required.

Conform Request A conform request is an XML request transferred to the Conforming system specifying how a specific asset is to be prepared. An addition to the start (SOM) and end (EOM) points, it often details the Video and Audio format, and the presence of additional data such as Audio Description and Subtitles.

Conformed Asset A conformed asset is a media item prepared from the source material in such a fashion as described in the Conform Request Programme Events These are events within the Transmission Schedule and Tx As-Run which describe main Programming Events. These are distinguished from programmes as a single programme may be split into multiple parts separated by other events such as commercials.

Commercials A commercial is an Advertisement event within the broadcast stream.

Promotional Events Promotional events are those events which promote the broadcasters output either within the channel on which the Promotional Event is being carried or on other channels broadcast by the broadcaster (cross-channel Promotion).

Presentation Events Presentation Events are those events which generally announce the upcoming main event but would also include Station Idents, Certificates, Content Notifications and other events not categorized elsewhere.

Bumper Bumpers are those events used to top and tail a Programme event and are generally used in support of programme sponsorship Station -dent The Station -Went is the Station Logo often transmitted at the start of a Commercial break in programming Acronyms A/V Audio/Video API Application Programming interface DHCP Dynamic Host Configuration Protocol DTV Digital TV EPG Electronic Programme Guide HTTP Hypertext Transfer Protocol JPEG Joint Photographic Experts Group Kbit/s kilobits per second Kbps kilobits per second Mbit/s Megabits Per Second Mbps Megabits Per Second MPEG Moving Picture Experts Group MPTS Multiple Program Transport Stream PID Packet Identifier QCIF Quarter Common Intermediate Format (176 x 144) SCMS Subscriber Card Management System SPTS Single Program Transport Stream TCP Transmission Control Protocol TDT Time of Day Table TS Transport Stream VBR Variable Bit Rate VoD Video on Demand The inventors have provided an improved system and method for identifying audio, especially music, used in broadcasts and public performances to improve the reporting of the use of such audio.

Embodiments of the invention arc designed, for example, to accept multiple broadcast streams from either pre-recorded files or live reception and using additional metadata, split the stream into discreet events and analyse for music content. In addition, the content may be compressed and archived for audit purposes.

In a simple form, the invention may comprise steps such as those illustrated in Figure 1. Data for media which has been broadcast, streamed or performed is obtained (102), for example a program stream. A portion of that media data containing audio data is identified (104); for example an ident containing music. Data for a given audio entity, such as an nth such audio entity from a library, as part of a search through the entire library, is retrieved (106); for example, this data might be an acoustic fingerprint for the audio entity. The retrieved data is then compared with the audio data portion identified (108); for example, the acoustic or audio fingerprints of the two data sources may be compared.

In one embodiment, the invention includes the following components: Media Capture and Meladata Ingest Workflow Management Media Filtering * by Programme Type * by Sound Type [Music, Silence, Speech] * Speech to Text Acoustic Fingerprinting Reporting Tools In more detail, one embodiment of the invention provides a method for dentifying an audio entity from a broadcast stream, comprising the four filtering steps of: a) Programme Filter -dividing the broadcast stream into 'slices' defined as discrete broadcast events'. These 'events are defined as News/Weather, Advertising, Sponsorship, Production, Film, Trailer, Station ldent; b) Music Filter -analysing the resulting 'slices' for the presence of Music, No Music, Voice or Silence; c) Acoustic Fingerprinting Engine -analysing the resulting parts containing a music sample to identify specific data associated with that music sample. This process is undertaken in three sub-processes a. Pre-Processing -which may include creating multiple versions of the original music sample to improve results from third party acoustic fingerprinting technologies b. Fingerprinting and Matching -creating a unique profile of the music sample and comparing that to a reference library to find a match using a range of third party and owned fingerprinting technologies c. Data cleaning -processing the data returned by the fingerprinting and matching process to remove ambiguity, duplication, to map results from the multiple sources, provide confidence and accuracy statements and to initiate new processes for music samples that have not been matched d) Data integration -adding to the results of step CC by introduction of additional data from external sources such as open source libraries, partner databases, and via owned 'crowd-sourced' tools e) Reporting the results of the analysis f) Feeding additional data mentioned in step d back into the audio fingerprinting library so as to improve its accuracy The audio entity may be a work of music in the form of a sound recording, sound effect, or spoken word or any such entity readily distinguishable. It is likely to be an entity that may be protected by copyright, the performance of which may require the payment of a licence fee. The entity may be the entire work, or a segment of the work. The sample or media data may be of a broadcast such as a television broadcast, a radio broadcast, an online stream or download or a public broadcast.

The method for identifying an audio entity from an audio sample may also comprise the step of comparing the sample with an expected sample, for example, comparing data describing the sample with data describing an expected sample. Alternatively, it may simply include the step of identifying the type of audio entity from data describing the nature of the entities within the sample. For example, accompanying data, such as a log, may identify only the types of audio to be found in a section of audio, and the embodiment may identify from the section of audio which of these types of audio are present, and where.

Each of the steps in the above noted more detailed embodiments is detailed in the following sections.

Metadata Capture/Ingest & Workflow Management The primary function of this part of embodiments of the invention is to manage the media and metadata workflow through the system. The workflow is controlled by the incoming metadata by a set of predetermined rules.

The workflow, in simple terms is defined as follows: 1. The platform receives the Transmission As-Run logs from the broadcaster and then processes these through a Rules Engine to produce Conform Request messages to be delivered to a Video Slicing Component. The Rules Engines contains a pre-defined set of rules for each provider or type of As Run log expected, and follows the As-Run log to generate the Conform Request, for example, where the input video is to be divided between events listed in the log.

2. Once the slicing has taken place, the video component is transcoded so as to reduce file size and stored on the media archive.

3. The audio component is sent to the analysis engine for further analysis.

4. Metadata required to support the audio analysis is then stored on the central data repository. The platform may also store the metadata in its own database to provide for historical audit and reprocessing if required.

Video Slicing The video slicing may be performed by a customized application developed for the specific purpose of extracting events from a broadcast stream indexed by the Lime of transmission recovered from the TDT transmitted within the stream.

A licensed library can be used to perform the video processing functions of the slicer. These are: * Push Demux * MPEG Demux * Sink Filter Transcode & Archiving The platform has a number of processes requiring some form of content transcode or other similar processing. These functions are to be performed by a transcoder farm. In order to achieve the required throughput and scalability, the platform implements a scalable transcoder farm such as Ateme or Rhozet for example. These components are under the control of the primary workflow. As the primary role for transcode is to archive the incoming media, the transcode profile is chosen so as not to compromise the quality of the audio component and to compress the video component to a format suitable for archive review. To this end a profile based on the QCIF standard may be used. Additional roles for the transcoder farm are: * The preparation of audio formats for fingerprinting, the format depending on the fingerprinting to be used * The preparation of audio / video browse copies for use in the reporting applications: Audio Fingerprinting Audio fingerprinting is a process where a fragment of digitised audio is analysed to generate a fingerprint. A fingerprint is then compared to a reference library to try and find a matching song, returning as a minimum for any matches an artist name and song title. Fingerprint generation is performed locally. Fingerprint matching may be performed locally or may be outsourccd.

Metadata Processing & Storage The platform is able to store the samples, either pre-or post-processing, as well as the results of the fingerprint analysis. It is provided with appropriate storage media or access to such storage. It is also able to report on the data obtained, and may include appropriate software to allow the data to be reported in an easy to read manner. in particular it may be able to prepare a report of the fingerprint analysis.

Elements of the invention are described in more detail below.

I. Programme Filter Slicer' in one embodiment, the sample is a television broadcast. When the sample is a television broadcast, the sample is the actual programmes broadcast by the broadcaster, as represented in the As Run Log (see definitions). The audio entities within the sample may be filtered according to programme type as detailed in the As Run Log. The sample may be filtered using the As Run Log, according to programme type, for example, advertisement, production, news, promotion etc. This allows identification of the type of programme. This is important in terms of the rights being exploited and the fees that flow from that exploitation.

A high level view of the Slicer is shown in Figure 2.

The sample may then be sliced according to type of programme. The video slicing may be performed by a customized application developed for the specific purpose of extracting events from a broadcast stream indexed by the time of transmission recovered from the TDT (Time and Data Table) transmitted within the stream.

The conform request (202) has been generated by the metadata ingest process on receipt of the As Run log: the As Run log gives precise details of the components and timing of the stream, and therefore the conform request can order division of the stream between those components according to those details. The steps taken are: a) TX As run log is transferred to the input directory specified by a set Provider configuration (a configuration is pre-set for each Provider of incoming run logs) b) Retrieve log file and validate against schema -the set of rules pre-established for each provider/type of run log received c) Log file events are processed against schema rules and stored in the database d) Send conform requests to the Slicer for each processed entry from the Tx As run log.

The media file input (204), such as a MPEG 2 Transport stream is first buffered (208) and then &multiplexed (210) to recover the Audio/Video and Timing Components, which are passed to the Time Reader (212). The AN components are then forwarded to a stream sectioner (214) which will cut and condition the stream in accordance with the conform request (202) and timing keys received from the Control (206) and Time Reader (212) components respectively. The sectioned part of the stream is then conditioned and re-multiplexed (216) and forwarded to the Transcode and Archive component (218) and the Analyser (220).

2. Music Filter This process performs pre-filtering to eliminate content that contains no music and therefore has no value to the purpose of the overall processes. This pre-filtering performance is influenced by a number of factors, including: b) Level of interfering noise, e.g. speech, background noise, etc. c) Quality of audio sound track d) Presence of impostor" sounds which might be confused with music e) Music genre The sample is filtered according to the audio entity type that can be heard, for example silence, music and voice. In particular, the sample may be separated into music and non-music sound. The sample containing music is isolated for further processing -see next step.

3. Acoustic Fingerprinting Engine LAFEJ One or more audio fingerprinting technologies may be used. For example, the audio fingerprinting may involve a method in which particularly audio landmarks are identified within the sample. Alternatively or additionally it may involve identifying particular patterns of notes or tones within the sample.

The sample or slices may also be subject to other types of analysis to identify audio entities found within them, such as review by a crowd-sourced portal. Using multiple identification techniques increases the likelihood of a correct identification.

Following identification of the presence of an audio entity, a report may be made. The presence of the entity may be recorded and reported to a third party, such as the author of the audio entity or the broadcaster. The method may particularly include the step of pre-processing the samples prior to audio fingerprinting, as shown below.

The resulting sample from process 2 may be processed using the AFE [Acoustic FingerPrinting Engine] . The purpose of die AFE is to process an audio file whose content is unknown and if it contains music, to identify the song title, performing artist, composer, publisher, amount of music used, and/or any additionally available metadata. The audio files presented to the AFE will be retained in the original file format and length provided by the 'slicer' application and process.

At the core of the AFE component are the fingerprinting SDKs (software development kits), in this example from third-party suppliers. The AFE supports multiple fingerprinting providers and will support future fingerprinting products developed, licensed and/or acquired by the Company. Embodiments of the present invention aim rather to control the SDKs and how / when the fingerprints are taken A. Pre-Processing -preparing the sample for analysis. This may include creating multiple versions of the original music sample to improve results from third party acoustic fingerprinting technologies. An example of this process is illustrated in Figure 3.

An audio file 302 can be measured in unit time. From the audio file, sections of the audio can be taken for fingerprinting. These sections could be consecutive, but need not be. In this example, fingerprint sections 1-10 are staggered. Fingerprinting sections I to 3 (304) use 2.8 units of audio each, but are started at intervals of 2 units, so that they overlap. Each of the three is sent to a different fingerprinting algorithm or provider.

In this example, 1-3 all identified the same track; this indicates a high likelihood of the match being correct. Track lengths are typically between 2.8 and 6.8 units long.

Fingerprints 4-6 (306) did not identify any tracks. Fingerprints 7 and 8 (308) identified a second track, but fingerprint 9 identified a third -it might therefore be assumed that fingerprint 9 is a false positive.

Other options for this are to send consecutive sections, or to send batches of sections all starting at the same point. The former may not allow enough comparison between the AFEs. The latter will provide a straight comparison between AFEs, but will be more computationally expensive -overlapped sections may therefore be a happy medium, providing sufficient comparison between AFEs while progressing swiftly along the audio file.

The inputs to the AFE are audio files. Depending on the architecture, the AFE may: f) receive an audio / video file requiring transcoding (e.g. into a watch folder or via a HI IP REST API) g) receive audio files transcoded ready for processing h) receive a notification that content is available for processing (e.g. via a HI IP REST API) (preferred option) i) query a queue for content availability (preferred option) Assuming one of the preferred options, the AFP may: * issue transcode requests * await notification of transcode completions fetch transcoded files start fingerprinting Figure 4 illustrates the system for processing the separate fingerprinting. The slicer (402) produces the media files which are sent to the AFE controller (404), which sends the sections to the separate AFEs (406). Results are collected by the reporting engine (408), which can be referred to an external data source if needed.

B. Fingerprinting and Matching -creating a unique profile of the music sample and comparing that to a reference library to find a match using a range of third party or proprietary technologies Fingerprinting Process includes: * generating a fingerprint from the audio file * posting the fingerprint to a local or an external server for matching * processing returned data (typically JSON or XML) Post fingerprinting activities include: de-duplication of data (fuzzy matching techniques) * comparing results with analysis of a different part of the same sample -multiple fingerprinting to avoid false results * sending fingerprinting results to the core data repository, likely via a HTTP REST API Where fingerprinting has not returned any results. further processing may be required, for example: * issuing a request to extract a shorter audio segment for processing (e.g. to try and clip non music audio from the start / end to try and get a match) Further details of this process are illustrated in Figure 5. The AFE controller (502) sends a segment for transcoding (504). The audio is read (506) and sent for fingerprinting (here by an SDK, 508, 510) with reference to the library 512. The intermediate result (514) is generated, and if more audio is required (516), more is read for fingerprinting. If not, the results are consolidated (518) and reported (520).

The AFP should be able to: * manage its own throughput (e.g. fingerprinter t has X slots available, fingerprinter 2 has Y slots available) * have in built resilience for failure / unavailability of other external and internal components * hook into any system management tools (e.g. SMTP) provide real time and historical performace data (e.g. size of queues, time to process files etc) C. Data Cleaning -processing the data returned by the fingerprinting and matching process to remove ambiguity, duplication, to map results from the multiple sources, provide confidence and accuracy statements and to initiate new processes for music samples that have not been matched The spellings of the artist name and song title matches returned from the fingerprinters often vary, for example "Guns n' Roses" could also be "Guns 'n' Roses", "Guns'n'Roses", "Guns and Roses", "Guns Roses" etc. The spellings can vary for the same artist on a single fingerprinter on different songs, different releases of the same song. When multiple fingerprinters are used, more spelling variations are likely.

Cleaning the data allows different fingerprinting matches to be identified as the same song, and allows all songs by a particular artist to be found. Where variations on the same match are found on a single file, being able to reduce these to a single match improves the confidence in the result being correct.

Additional processing helps calculate the confidence measure, as follows: The files being analysed may contain: I) no music, 2) a single piece of music, 3) multiple individual pieces of music or 4) multiple pieces of music in a continuous mix 5) a mixture of 3) and 4) As we need to continually search for multiple music uses, the files must be analysed frequently at different points, for example at 5 second intervals. Although a fingerprintcr may return a match, we may not know if this is a correct match or a false positive. Performing one or more validation fingerprints after a shorter interval (e.g. is, then 2s, 3s em) and comparing the results should give an indication of low confidence or false positive matches. Therefore in addition to the overlapping fingerprint sections shown in Figure 3, some embodiments may also compare additional short interval fingerprint sections to themselves (i.e. results from the same AFE) or simply sample more frequently as the inputs for a process as in Figure 3.

Referring to Figure 6, the above embodiments of the invention may be conveniently realized as a computer system suitably programmed with instructions for carrying out the steps of the methods according to the invention.

For example, a central processing unit 602 is able to receive media data for analysis via a port 604 which could be a reader for portable data storage media (e.g. CD-ROM. USB), or a connection to a network, for example receiving streaming or broadcast data Software applications loaded on memory 606 are executed to process the media data in random access memory 608.

The processor 602 in conjunction with the software can perform the steps such as obtaining a set of media data for the broadcast, streamed or performed media; processing the set of media data to identify a portion of the data set containing audio data; and comparing the portion with data for the given audio entity A Man -Machine interface 610 typically includes a keyboard/mouse/screen combination (which allows user input such as initiation of applications, or selection of results) and a screen on which the results are displayed. Alternatively, the M1\411 610 could be realised in a mobile device, with connection over a network to a device housing the processor 602.

It will be appreciated by those skilled in the art that the invention has been described by way of example only, and that a variety of alternative approaches may be adopted without departing from the scope of the invention, as defined by the appended claims.

Claims

CLAIMS1. A method of monitoring use of a given audio entity in broadcast, streamed or performed media, comprising the steps of: obtaining a set of media data for the broadcast, streamed or performed media; processing the set of media data to identify a portion of the data set containing audio data; and comparing the portion with data for the given audio entity.
A method according to Claim 1, wherein the data for the given audio entity is audio data.
3. A method according to Claim 1, wherein the data for the given audio entity is an acoustic fingerprint for the given audio entity.
4. A method according to Claim 3, further comprising generating an acoustic fingerprint for the portion of the set of mcdia data for comparison with the acoustic fingerprint for the given audio entity.s. A method according to any preceding claim, wherein the media are broadcast media, wherein the step of obtaining comprises obtaining a broadcast data stream, and wherein the step of processing comprises processing the broadcast data stream to identify the portion of the stream containing audio.6. A method according to any preceding claim, further comprising obtaining companion log data for the set of media data.7. A method according to Claim 6, wherein the step of processing comprises using the companion log data to process the set of media data to identify the portion containing audio data.R. A method according to any preceding claim, wherein the step of processing comprises processing the set of media data to identify a plurality of subsets of media types.9. A method according to Claim 8, wherein the step of processing comprises identifying an audio subset, and processing the audio subset to identify the audio data portion.10. A method according to Claim 8 or Claim 9, wherein the step of processing comprises processing the audio data portion to identify a plurality of subsets of audio types.11. A method according to Claim 10, comprising identifying a speech audio type subset, and processing data from the speech subset of the audio data portion to obtain text of the speech.12. A method according to any of the Claims 7 to 11 as dependent on Claim 6, further comprising, before the step of processing: determining a type of the companion log and/or a type of the set of media data; and obtaining a pre-defined schema for the type of companion log and or set of media data identified, wherein the schema comprises a set of instructions for processing the set of media data 13. A method according to any preceding claim, wherein the step of comparing comprises generating a measure of confidence in a match between the portion and data for the given audio entity.14. A method according to any preceding claim, wherein the step of comparing comprises: comparing the portion with data for a set of given audio entities; and identifying a match between the portion and data for one of the set of given audio entities.15. A method according to any preceding claim, wherein the step of comparing comprises: employing a plurality of algorithms to compare the portion with data for the given audio entity; and comparing the results of the plurality of algorithms.16. A method according to any preceding claim, wherein the step of comparing comprises: identifying a plurality of sub-portions of the portion of the set of media data containing audio data; comparing each sub-portion with data for the given audio entity; and comparing the results of each sub-portion comparison.17. A method of identifying an audio entity in a set of media data, comprising: processing the set of media data to identify a plurality of subsets of media types; identifying an audio subset, and processing the audio subset to identify an audio data portion; and comparing the portion with data for the given audio entity.18. A method according to Claim 17, wherein the data for the given audio entity is an acoustic fingerprint for the given audio entity.19. A method according to Claim 18, further comprising generating an acoustic fingerprint for the portion of the set of media data for comparison with the acoustic fingerprint for the given audio entity.20. A method according to any of the Claims 17 to 19, wherein the step of comparing comprises: employing a plurality of algorithms to compare the portion with data for the given audio entity; and comparing the results of the plurality of algorithms.21. A method according to any of the Claims 17 to 20, wherein the step of comparing comprises: identifying a plurality of sub-portions of the audio data portion; comparing each sub-portion with data for the given audio entity; and comparing the results of each sub-portion comparison.22. A method according to any preceding claim, further comprising sending a result of the comparison to a software application, wherein the software application is stored on one of: a local user device; a mobile user device; and an intemet-accessible server.23. A media device storing computer program code adapted, when loaded into or run on a computer, to cause the computer to carry out a method according to any preceding claim.