WO2010038187A1

WO2010038187A1 - Method for data clusters indexing, recognition and retrieval in presence of noise

Info

Publication number: WO2010038187A1
Application number: PCT/IB2009/054243
Authority: WO
Inventors: Fausto Artico
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-09-30
Filing date: 2009-09-28
Publication date: 2010-04-08
Anticipated expiration: 2011-03-30
Also published as: ITVE20080074A1

Abstract

A method for indexing, recognition and retrieval of data clusters is disclosed. Data cluster are extracted by a larger set of generic input data, particularly audio or media files, even in presence of noise. The method is suitable for many applications, such as: TV measuring services, for example for calculating the broadcast frequency and broadcast times of the adverts transmitted during an entire daily TV programs schedule; calculating royalty due to copyrights owner; recognizing music copyright infringements and plagiarism in musical tracks or advertisements. The method is intrinsically more robust and allows a more reliable and faster advertisements recognition than any other method known in the art. An embodiment of the invention related to the adverts recognition is also disclosed. More particularly, the embodiment describes the recognition of over 200 adverts, transmitted during an entire daily TV programs schedule, in less than 20 minutes with a probability of success of nearly 100%.

Description

Description Title of Invention: METHOD FOR DATA CLUSTERS INDEXING,

RECOGNITION AND RETRIEVAL IN PRESENCE OF NOISE

METHOD FOR DATA CLUSTERS INDEXING, RECOGNITION AND RETRIEVAL IN PRESENCE OF NOISE

[1]

Technical Field

This invention relates generally to data clusters retrieval. More particularly, it relates to data clusters indexing, recognition and retrieval and the application in the field of advertisements or media content recognition.

[2]

Background Art

[3] In many scientific papers, conference proceedings and patents are described a variety of methods for generating audio fingerprints that subsequently are indexed, recognized and retrieved. This methods represent the base for automated sound recognition and find application in many industries, particularly in the advertisements and music market. For example, TV measuring services are interested in calculating the broadcast frequency and broadcast times of the adverts transmitted during an entire daily TV programs schedule. Copyrights owner are interested in a fast and reliable calculation of the royalty due or in recognizing music copyright infringements and plagiarism in musical tracks or advertisements.

[4] For example, U.S. Pat. No 6,990,453 discloses a system and methods for recognizing sound and music signals in high noise and distortion. The method is based on a set of landmark time points and associated fingerprints (i.e. a unique identifier that represents a number of features of the media sample). To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve match fingerprints from the database. In U.S. Pat. No 7,549,052 ('Generating and match hashes of multimedia content') is disclosed a method for generating robust hashes (i.e. hashes are short summaries or signatures of data files which can be used to identify the file) for multimedia content, for example, audio clips. Other methods known in the art disclose methods for generating acoustic fingerprinting (e.g. U.S. Pat. No 93,185,901 - or U.S. Pat. No 6,963,975 'System and method for acoustic fingerprinting')

[5] Said methods are well-known for the skilled in the art and are very similar one to each other since they are substantially based on the same basic concepts: the way the human ear perceive sounds; the use of audio fingerprints to recognize an audio content and the application of Hamming distance, or equivalent metrics. Slight differences between the known methods concern the indexing, the recognition and the retrieval mechanisms of the audio fingerprints. Despite what is claimed in the patents, the implementation of these methods on large scale applications are very poor in terms of reliability and computing speed. For instance, the known methods cannot solve or efficiently solve the following problems that are extremely important for the broadcasting and advertisement markets:

[6] analyzing the media file containing an entire daily TV programs schedule in order to calculate the number and the transmission times of broadcasted adverts;

[7] distinguishing precisely different versions of the adverts transmitted and consequently calculate correctly the royalty due to the copyrights owner;

[8] recognizing music copyright infringements and plagiarism in musical tracks or advertisements.

[9] The most important Italian broadcasting and media company, do not apply the known methods for identifying adverts within a one-day transmission scheduling and instead continue to make use of manual or semi-manual methods that consider, at the moment, more reliable.

[10] A key problem of all of the above prior art audio recognition methods concern the poor reliability and precision of the audio fingerprint indexing and retrieval processes. These problems arise because the known method in the art are based on certain hypothesis that in a wide range of applications have not given proof of sufficiently robustness and reliability. More particularly some of the key hypothesis are the following:

[11] inside a given fingerprints set, at least one audio fingerprint is uncorrupted (this hypothesis is not true in case of recording obtained by means of mobile phone);

[12] the complexity of the known audio fingerprint generation methods, does not permit a complete generation of all fingerprints of maximum distance N from a given one in an efficient way;

[13] the use of the permutation of the only n bits that might have a significant error likelihood (this hypothesis has proven completely unreliable).

[14] As a practical consequence of these hypothesis, almost 5 seconds of query sample recording have to be submitted in order to have good probability of success in recognition of the entire recording.

[15] Another problem with prior art methods is the generation of complex multi-indexes that require large memory and computing power. Due to the drawbacks described, the known methods have therefore only limited applications in many markets.

[16] Accordingly, it is a primary object of the present invention to provide a method for indexing, recognition and retrieval of data clusters inside larger sets even in presence, or corrupted, by noise; [17] It is another primary object of the invention to provide a method that allows a faster generation of the fingerprint of maximum Hamming distance; [18] It is an additional primary object of the present invention to provide a retrieval method that requires only a single index for fast and reliable data cluster recognition and retrieving; [19] It is a further primary object of the present invention to provide an audio fingerprint retrieval method that allows reliable audio fingerprint recognition and retrieval even when every audio fingerprint is corrupted;

[20] It is an additional primary object of the present invention to provide an audio fingerprint retrieval method that allows a fast and reliable generation of all fingerprints of maximum distance N from a given one; [21] It is a further object of the invention to provide a recognition method than can recognize data clusters inside larger set of input data with a computing time at least 4 times faster than the known methods; [22] It is another primary object of the present invention to provide a method that allows large scale application of data indexing, search, recognition and retrieval in new markets, such as broadcasting or advertisement; [23] It is an additional object of the invention to provide a more reliable method that allows sound tracks recognition inside a larger input data set with a probability of success higher than 95%; [24] It is another additional object of the invention to provide a method that needs only a few milliseconds of a query sound track recording to recognize the entire sound track with a high probability of success; [25] It is a further object of the invention to provide a more efficient method that allows search and retrieval of an extremely larger number of sound tracks almost in real time; [26] It is another object of the invention to provide a method for computing the broadcast frequency and broadcast times of adverts transmitted during an entire daily TV programs schedule in a few minutes without human activity; [27] It is a further object of the invention to provide a method for recognizing music copyright infringements and plagiarism in musical tracks or advertisements; [28] It is another object of the invention to provide a method for indexing, recognition and retrieval that does not require large computing systems and allows and shortens time for data transfer from the central memory to the hard disk and vice versa. [29] These objects and advantages are attained by the method for indexing, recognition and retrieval of data clusters inside larger sets, even in presence of noise, described below.

Disclosure of Invention [30] The present invention provides a method for indexing, recognition and retrieval arbitrary subsets of input data (e.g. an advert or a media sample) within arbitrary sets of input data (e.g. a TV recording). The method for indexing, recognition and retrieval herein disclosed presents several advantages that finally solve the most relevant issues of the known methods: reliability, robust computing and speed. The invention herein described discloses a method for fingerprints indexing, recognition and retrieval and not a method for generating fingerprints. Said invention can use any of the method for generating fingerprints known to the skilled in the art.

[31] The method comprises many steps that are implemented by specific scripts or components. First of all, it is necessary to extract from arbitrary sets of input data the fingerprints that characterize them. This step involves the transformation of data clusters and of data sets in strings of numbers in binary base, using methods that are well known and of public domain. Technical literature and articles covering this topic is available also on the web and very precise (for instance, 'Robust Audio Hashing for Content Identification' by J. Haitsma, T. Kalker and J. Oostveen). The data clusters and the input sets are transformed in sets of strings of elements in one o more numerical bases. Said strings are then divided in a plurality of parts and the elements that compose said strings are linked together. The next step involves the generation of an alternative representation of said parts in one or more numerical bases, not necessarily coincident with those of first transformation. Said alternative transformations of data is used to index the representations of data sets where clusters searching will be performed. By means of a convenient metric (e.g. Hamming distance), the sets of representations of data to a certain maximum distance from a certain number of representations of data are computed. Said representations are used as samples, initially, executing a first retrieval to retrieve all the positions where such representations of data are present inside the sets of research transformed and then executing a test for a preliminary recognition (named 'first recognition') of the clusters representations inside the sets of research transformed as described. Then, for each single cluster, a selection is performed, among the recognitions of representations have passed the test of preliminary recognition, in order to keep only the most significant recognitions. By subsequent iteration, the positions of the representations of data clusters, that have passed the selection, inside the larger sets of representations, are eventually extracted and retrieved.

[32] The method herein disclosed is intrinsically robust and allows a reliable clusters search and retrieval even in presence of noise (i.e. background noise, fade-in and fade- out, transmission errors, interference, et cetera) since it is not based on the hypothesis that for every recognition exist at least one audio fingerprint that is not corrupted. The method is also intrinsically more reliable because it generates all the fingerprints of maximum Hamming distance and not only a subset. The method is also faster because it needs only a few milliseconds of a sound track recording to recognize the entire sound track (rather than several seconds of recording) with a high probability of success. The method is suitable for many applications, such as TV measuring services, for example, for calculating how many times and when a certain advert is broadcasted during an entire daily TV programs schedule. In this way, a correct calculation of the royalty due to the copyrights owner can be performed. Music copyright infringements and plagiarism in musical tracks or advertisements can therefore be recognized. By means of the method herein disclosed, advertiser can also assess the effectiveness of an advertising campaign.

[33] The method and its individual steps will now be described in detail with reference to a preferred embodiment related to media content recognition. More particularly, the embodiment concerns the application of the method for indexing, recognizing and retrieval advertisement inside generic TV registration files (e.g. files containing an entire daily TV programs schedule codified in the known standard). The following description represents a typical example of such an optimal method for the recognition of adverts inside generic TV registration files but it is given without limiting the generality of the invention herein disclosed. Best Mode for Carrying out the Invention

[34] In the following exemplary implementation, the scripts are listed in a order that allow their sequential recall for adverts recognition and retrieval. Although the following detailed description contains many specifics for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Particularly the order of the scripts described in the following preferred embodiment is not rigid and variations with regards to the method, the single scripts and the order of the scripts are obvious to the person skilled in the field. Although the term 'file' is used to describe one or more entities, the entities can be in any format for which the necessary values can be calculated. The method is typically implemented as software running on a computer system, with individual steps most efficiently implemented as independent software modules. The invention is not limited to any particular hardware system, or programming language. Accordingly, the following preferred embodiment of the invention is set forth without any loss of generality to, and without imposing limitations upon, the claimed invention. The recognition process consists of the following steps.

[35] Script: Variables Initialization and Arrangement

[36] In this script, the variables that characterize the elaboration are arranged. The variable bpf (bit for fingerprint) must be arranged so that it is equal to the number of bit that compose each fingerprint and it must be an exact power of 2. The variable ppf (parts for fingerprint) must be an exact power of 2, less or equal to the number of bit of fingerprint and will determine the numbers of parts that each fingerprint under evaluation will be subdivided in. The variable bpp (bit for part) is calculated instead automatically and represents the number of bit of each part of fingerprints under evaluation. The variable fpl (fingerprint for reading) indicates the number of fingerprints read during each reading of the generic file of the fingerprints where the adverts have to be searched. For example, if the generic file is composed of 14878366 fingerprint and the variable fpl is equal to 50000, then reading and indexing the whole generic file will require more readings (precisely 284) but each reading will consider only 50000 fingerprint each time (the last reading less). It is necessary then to arrange the variables nfpc, smip and smap. The variable nfpc represents the number of fingerprints to be used for a first comparison, i.e. the number of fingerprints that is necessary to consider in order to calculate the sets of fingerprints that are at the maximum Hamming distance mdh from those used for the first comparison for each advert. The variable smip and smap represent the minimum and maximum percentage threshold between which the indices of first comparison must always be extracted. For example, if an advert is composed of 5182 fingerprints, the variable smip is equal to 0.1 (10%) and the variable smap is equal to 0.9 (90%), then the nfpc fingerprints will be extracted among those characterized by indexes of the advert larger of 519 and lower of 4663. The thresholds have been introduced because often the adverts have a fade-in and a fade-out and consequently in these sectors the percentage of distortion, respect to the original fingerprints, is larger that in other cases. In this way we can avoid useless computing because fingerprints that can be retrieved with difficulty in generic files are not taken into account. Namely, said fingerprints present a very large Hamming distance, a situation that should be avoided in computing because the calculations involved increase exponentially. Since in the generic files noise and other disturb sources (e.g. due to cuts of the adverts and the fades in and fades out) will be present, the idea is to calculate for each fingerprint of first comparison all the fingerprints at maximum Hamming distance mdh and store them to search subsequently them inside the generic files by means of a perfect match. The variable mdh (maximum Hamming distance) is used to calculate for each fingerprint of first comparison under evaluation, the set of fingerprints that are distant from said fingerprints, at maximum, a Hamming distance equal to mdh. The variable sdv (threshold of validity) represents instead the threshold of validity for the positive recognition of an advert within the generic file considered (i.e. a one-day TV programme scheduling), as explained below. Ttf (time between fingerprints) represents the time between fin- gerprints so that it is then possible to reconstruct the transmission times of the adverts from 0 a.m. to 12 p.m. in the day of recording of said generic file.

[37] Script: Advert Structures Preparation.

[38] The files containing fingerprints of the adverts are arranged in a suitable way and other structures are prepared. Since each advert is short, the basic idea is that it is possible to create for each of them a matrix that is stored and saved as a new files for further use. Such matrix contains all the fingerprints of the advert arranged in rows. If, for example, an advert is composed of 5182 fingerprints each of 32 bit, after the elaboration of said advert, a matrix of 5182 rows and 32 columns will be stored and saved (one row for each fingerprint). These matrices will be useful in the script ' Recognition Adverts ' for validating or not the advert recognition inside the generic files considering the several docking indexes extracted during the elaboration as explained below.

[39] Script: Generic File Structures Preparation.

[40] This script generates several structures concerning the generic files, as well as the substructure of indexing of each generic file. For example, if a file is composed of 14878366 fingerprints and the variable fpl is equal to 50000, then 284 readings are required (each reading of 50000 fingerprints with the exception of the last one) to sub- index completely the generic file considered and the fingerprints, of 32 bit, are divided into 4 parts (8 bit for part), where the number of possible combinations of 8 bit is 256. During each reading 50000 fingerprints are taken from the generic file, broken into 4 parts and each part converted in decimal number. The 4 decimal numbers represent the indexes of indexing inside a structure composed of 4 parts, each part with 256 fields (the possible decimal values of the parts of 8 bit). The index of fingerprint considered inside the generic file is thus inserted in each of such addresses. For example, after the third reading, we have just read the fingerprints with indexes from 100001 to 150000 inside the generic file considered. Taking the fingerprint corresponding to the index 100001, breaking it into 4 parts and converting said parts in decimal numbers, we finally obtain the numbers 130 78 27 253. Therefore, we access to the indexing structure and insert the index 100001 in positions pic(l).num(130), pic(2).num(78), pic(l).num(27), pic(l).num(253), inserting said index as last index if there are already other numbers in these positions. Then we continue in the same way, until the index 150000, with the steps of: storing and saving the structure pic, continuing cyclically in the same way generating, storing and saving a new structure pic, for each reading until the last one. It is necessary to store each part of pic, that at the end is equal to 284, i.e. the number of reading performed for the generic file, because subsequently they will be used to build the comprehensive index of the generic file itself during the step of extraction of the docking indexes. [41 ] Script: Fingerprint Parts Generation .

[42] This script creates and stores a matrix with all the possible combinations of bpp bit

(bit for part). Said matrix is subsequently used to generate all the fingerprints of maximum Hamming distance mdh from the one given, in a more efficient way than known methods in the art, and avoiding useless repetitions and generating each fingerprint of maximum Hamming distance mdh in a reliable way, by means of the use of analogous substructures of indexing to those created after the readings of generic files.

[43] Script: Extraction of Fingerprint Indexes of First Comparison.

[44] The fingerprint indexes of first comparison are extracted, such indexes will be then used to generate the fingerprints sets of maximum Hamming distance mdh. If, for example, nfdc is equal to 3 and an advert is composed of 5182 fingerprints and the variables of minimum and maximum thresholds smip and smap are set equal to 0.1 (10%) and 0.9 (90%), firstly we calculate the thresholds (about 519 and 4463) and then the interval between 519 and 4463 is divided in three parts almost equal one to each other, and from each of this parts a fingerprint index of first comparison is extracted.

[45] Script: Fingerprint Of Maximum Distance Generation.

[46] Each advert recorded in a file is considered and the fingerprint of first comparison of each advert is retrieved by means of the indexes previously extracted. For each fingerprint retrieved the set of fingerprints of maximum Hamming distance mdh is calculated and this is performed in two phases. In the first, the script ' Generation Fingerprint Parts Of Maximum Distance ' is recalled so that for every fingerprint considered, an index-linked structure is generated with all the parts of maximum Hamming distance mdh from the parts of fingerprint. Such a structure is composed of a number of fields equal to the numbers of parts of the fingerprint considered and each of this fields is further composed of others mdh. These last fields, numbered from 0 to mdh contain the parts of fingerprint of maximum Hamming distance (the 0 field contains all the parts at distance 0, the 1 field contains all the parts at distance 1, ...., the mdh field contains all the parts at distance mdh) from said parts of the fingerprint under evaluation. In the second phase the script ' Sets Of Fingerprints Of Maximum Distance ' is recalled. Using the index-linked structure just created, said script combines the several parts of the fingerprints within maximum Hamming distance mdh from said parts of the fingerprint under evaluation, avoiding useless calculations, and generates at each interaction a fingerprint of maximum Hamming distance mdh from that given in a reliable way. Each set of fingerprints of maximum Hamming distance mdh is therefore stored and saved (it will be re-used subsequently in the script ' Extraction Docking Indexes ').

[47 ] Script: Docking Indexes Extraction

[48] For each advert, each set of fingerprints of maximum Hamming distance mdh of each fingerprint of first comparison is used to search for some perfect match with the fingerprints inside the generic files. Since the recordings are not heavily disturbed and having considered several fingerprints of first comparison (7 or 8), the basic idea is that it is enough generating sets of fingerprints of short maximum Hamming distance (1 or 2). In this way, it was possible to successfully retrieve the elements of such sets inside the generic files, store their positional indexes inside the file and use finally said positional indexes in the script ' Recognition Adverts '. To do so in an efficient way, all the indexes of the generic file considered are loaded, starting from its pieces previously indexed. Each set of fingerprints of maximum Hamming distance mdh of each advert from each fingerprint of first comparison is therefore considered. The several parts of each fingerprint of such sets are then considered, by extracting from the generic file the sets of fingerprint indexes with said parts in the positions indicated, and then performing the intersections among such sets to obtain the set with the fingerprint indexes characterized by a perfect match, in the generic file under evaluation. The procedure is repeated for each fingerprint of the set, then intersecting at last the sets obtained among them and storing the resulting set that represents the set of fingerprint of maximum Hamming distance mdh from the fingerprint of first comparison considered. In this way we have the fingerprints indexes of generic file to maximum Hamming distance equal at mdh from that of the fingerprint of first comparison related to the advert considered.

[49] Script: Adverts Recognition

[50] The adverts in the several generic files are searched, starting from the docking fingerprint indexes extracted. If, for example, 3 fingerprint indexes of first comparison have been extracted for an advert, then subsequently, 3 sets of fingerprints of maximum Hamming distance mdh are created and for each of such sets the docking fingerprint indexes with perfect match inside the generic files are extracted. For example, lets suppose the fingerprint of first comparison of an advert with 5182 fingerprints has index 1602 and the generation of the fingerprints set of maximum Hamming distance mdh has leaded to extract from one of the generic files of docking fingerprint indexes 23451 and 55456. Then with such indexes, knowing that they are associated to fingerprint of index 1602 of the advert, it is possible to superimpose the matrix of the advert of 5182 rows (one for fingerprint) and 32 columns (one for each bit of fingerprint) so that its row 1602 is centred and superimposed with the fingerprints of index 23451 and 55456 of the generic file considered while the remaining rows are naturally superimposed with the other fingerprints of interest in the generic file considered, from index 23451 -(1602-1) to index 23451+(5182-1602) in the first case and from index 55456-(1602-l) to index 55456+(5182- 1602) in the second case. With such superimpositions we can calculate the percentage of equal bit and, if higher, of test threshold sdv recognizing the advert considered inside the generic file under evaluation. Before the end of the procedure, since for each advert more fingerprints of first comparison have been extracted, a procedure of elimination is performed to retain only some recognitions and avoid useless superimpositions in the drawing of graphs and in the creation of recognition files containing the hours of transmission. For instance, if the advert has been recognized 5 times with docking fingerprint indexes 76543, 77000, 77345, 20000, 20100 and percentage of match among bit of 60%, 90%, 77%, 75% e 40%, the procedure of elimination takes into account the docking fingerprint index with percentage of match greater not still considered, that is 90%, for the whole advert; then, it calculates the interval defined by the starting point and the end point of the advert inside the generic file with docking fingerprint index 77000 and fingerprint index of first comparison given; further, it deletes all the results that correspond to docking fingerprint indexes that belong to said interval, that are 60% and 70%, storing the data of elaboration concern that of maximum percentage of match, that is 90%, continuing then with the successive docking fingerprint index with maximum percentage of match not still considered in the same way, that is 20000, until it has considered all the docking fingerprint indexes remained.

[51] Script: Advert Data Plotting

[52] The graphs of the adverts transmissions recognized are plotted and the text files of the adverts transmissions for every advert are created. Each text files indicate the starting time and ending time of each advert transmission.

[53] The example herewith described refers to fingerprints with length of 32 bit (the binary base it is therefore used), fingerprint subdivided in 4 parts converted to decimal base. But it is obvious for the skilled in the art, that the fingerprints can be of any length, with elements in one or more numerical bases, any bases, and that also the numerical bases of conversion of values of parts, given from concatenation of elements that compose them, can be any and not coincided necessarily with those previous. The method described can be used in all the fields of data retrieval where it is necessary to recognize and to retrieve data clusters aggregated among them (the single advert in this example) inside set containing data of the generic file (entire daily TV programs scheduling).

[54] Data that have to be equal (parts of the advert in its registration and parts of the registration of day when the advert has been broadcasted) may instead differ due to the presence of noise (or other interferences) to a great or lesser extent (Hamming distance among 'semi-equal' fingerprints greater than 0 in this case) not allowing therefore the use of a perfect match to retrieval the adverts. With the type of indexing performed and the method of calculating the distances used in this example (method that can be different from that used to calculate the Hamming distances), has been possible to implement an efficient retrieval of 'semi-equal' data, the identification of their positions and the possible subsequent recognition of those data clusters to be searched within larger sets through the use of test thresholds of perfect match and procedures of recognition of data clusters that avoid the presence of more recognitions of the same data cluster between a specific range from that recognition, still to be considered, characterized by an higher percentage of perfect match.

[55] In the embodiment described, only for convenience sake the Hamming distance was used for calculations of similarity and retrieval, since the binary base was exploited. But any criterion is usable in the calculation of the distances to recognize 'semi-equal' data.

[56] By means of the described method and writing the scripts in C language, it was possible to recognize a data park of 218 adverts distributed in a whole day of registration of Italian network Canale 5, with a percentage of success of 99,5%. In less than 20 minutes. The program starts after that the adverts and the entire daily TV programs scheduling have been converted in the corresponding fingerprint forms.

Claims

[Claim 1] A method for indexing, recognition and retrieval arbitrary subsets of input data within arbitrary sets of input data, comprising the steps of: considering arbitrary subsets of input data and arbitrary sets of input data; transforming, dividing and converting said arbitrary subsets of input data and said arbitrary sets of input data to obtain another representations of the data of said arbitrary subsets of input data and of the data of said arbitrary sets of input data; using said alternative representations to index said arbitrary sets of input data where the search of said arbitrary subsets of input data is performed; by means of generic criterions for distance calculation, calculating the representations to a maximum distance from said alternative representations of a given number of data of said arbitrary subsets of input data; retrieving all the positions inside said arbitrary sets of input data of said representations to a maximum distance from said alternative representations of a given number of data of said arbitrary subsets of input data within said alternative representations of data of said arbitrary sets of input data; storing only the positions and the more meaningful subsets of data within said arbitrary sets of input data with respect to said arbitrary subsets of input data by means one or more generic test of similarity; listing the positions of said more meaningful subsets of data within said arbitrary sets of input data with respect to said arbitrary subsets of input data.

[Claim 2] A method as recited in claim 1) comprising the steps of: considering arbitrary subsets of input data and arbitrary sets of input data; transforming said arbitrary subsets of input data and said arbitrary sets of input data in sets of entities composed by one or more elements represented in one or more numerical bases; dividing said entities composed by one or more elements in a plurality of parts and further linking together the elements composing said parts in a sequential way; further converting said parts in other numerical bases, wherein said numerical bases are not necessarily the same numerical bases considered in the preceding steps, to obtain another representations of the data of said arbitrary subsets of input data and of the data of said arbitrary sets of input data.

[Claim 3] A method of indexing of arbitrary sets of input data as recited in claim 1), wherein arbitrary sets of input data are transformed in sets of entities composed of one or more elements, wherein said elements are represented in one or more numerical bases by means of structures characterized by a number of parts equal to the number of parts that said entities are subdivided in, wherein each of said parts is characterized by a number of fields that at maximum is equal to the number of possible values obtainable from the transformation of a single said part to the chosen numerical base and each of said fields containing the set of indexes of entities positions in the said sets of entities with the corresponding value of field in the part considered.

[Claim 4] A method for transforming arbitrary sets of input data as recited in one or more of the preceding claims, characterized in that the structures in the form specified in claim 3), are applied for solving problems related to generic data indexing, recognition and retrieval or related to the generation of representations to a given maximum distance from the distance of the data under evaluation.

[Claim 5] A method as recited in one or more of the preceding claims, comprising a filter for extracting the representations of the positions of the data under evaluation, within the representations sets of arbitrary sets of input data, by means of structures in the form specified in claim 3).

[Claim 6] A filter as recited in claim 5) comprising the steps of: considering for each of said data under evaluation its sets of representations to a given maximum distance; considering each representation of maximum distance from the representation of a given data under evaluation, by extracting, from the structure as recited in claim 3), the sets of indexes of representations having the parts of representation of maximum distance considered in the positions indicated; performing the intersection of two or more of said sets to obtain the set of indexes of representations in the said arbitrary sets of input data with a perfect match with at least one representation of maximum distance from the representation of the given data under evaluation; iterating the preceding step for each representation of a given set of representations of maximum distance and finally intersecting two or more of the sets obtained in order to obtain a resulting set; storing the resulting set of the positions indexes in said arbitrary sets of input data.

[Claim 7] A filter as recited in one or more of the preceding claims, further comprising a method for selecting and storing only the positions and the more meaningful subsets of data within said arbitrary sets of input data with respect to said arbitrary subsets of input data by means one or more generic test of similarity.

[Claim 8] The method as recited in any of the preceding claimscharacterized in that the input data comprise audio or media files, including audio, sounds, voices, recorded music, video clips, images, text, radio broadcast programs, television broadcast programs, adverts and any multimedia combinations of individual media types.

[Claim 9] The method as recited in any of the preceding claimsfor sound or music content indexing, recognition and retrieval.

[Claim 10] A computer system programmed to perform the method steps as claimed in any of the preceding claims.