WO2014002064A1

WO2014002064A1 - System and method for media library navigation and recommendation

Info

Publication number: WO2014002064A1
Application number: PCT/IB2013/055315
Authority: WO
Inventors: Alexandre Alahi; Pierre Vandergheynst; Kirell BENZI
Original assignee: Ecole Polytechnique Federale de Lausanne EPFL
Current assignee: Ecole Polytechnique Federale de Lausanne EPFL
Priority date: 2012-06-29
Filing date: 2013-06-28
Publication date: 2014-01-03
Anticipated expiration: 2014-12-29

Description

SYSTEM AND METHOD FOR MEDIA LIBRARY NAVIGATION AND RECOMMENDATION

FIELD OF THE INVENTION

The invention relates to a system and method to optimize navigation and recommendation in a distributed network of media libraries. BACKGROUND

Nowadays, people have access to an overwhelming amount of digital media content such as music, photos, videos, multimedia presentations... In the field of music, for instance, thousands of music tracks are typically stored on distributed end user devices such as computers, smart phones and mp3 players, while further unlimited music download is available from internet-based platforms such as Pandora.com and Napster.com. Due to this deluge of content, there is a necessity for the generation of automatic content matching based on personal tastes.

The conventional approach to ease navigation within a digital media content library consists in classifying the items according to their metadata description, such as genre, author, title, keywords. This is however very limited as the genres are typically defined in a very broad manner, such as rock, pop or classical in the case of a music library, so there are still hundreds of thousands of titles within a single category. Moreover, music genre classification usually applies to an artist or album globally rather than at the song level, so the famous slow song "Still loving you" from the hard-rock group Scorpions will end up misclassified into the "metal" category.

In order to ease navigation within one's own library for more convenient media content usage, a popular approach, popularized by Apple iTunes for instance, consists in enabling the end user to generate his/her own content classification, such as user playlists in the case of music or photo albums in the case of photos. However, this is a cumbersome process for the end user as it requires manual browsing of the whole library in search for relevant items, ideally in a way that optimizes a smooth transition from one item to another; furthermore, it needs to be manually repeated for each playlist or photo album, possibly several times for a given item, for instance in the case of music when the end user wants to include it into a first playlist to listen while professionally working but also into a second playlist dedicated to fitness practice.

In order to ease discovery within an online library fore more personal media content recommendation, another approach has been developed in the past decade that is based on collaborative filtering, taking advantage of the diversity of end users interacting with the online library to analyze their preferences (from review surveys to purchase history) and to derive typical content item matching patterns. In the online Amazon library for instance, this approach is explicit as the recommendation for further purchase displays with the message "users who purchased this item also purchased...". In Apple iTunes, it is more implicit as the Genius recommendation engine analyses one's listening preferences in combination with other end users purchasing preferences in the background to filter and suggest relevant content recommendation. This approach still suffers from major limitations; first, it tends to overweight top chart content that is widely purchased and reviewed, to the detriment of less popular artists; second, it assumes that an end user has a monolithic taste without capturing the various needs and contexts into which one looks for specific media content, such as moods or emotional contexts (energizing activities, romance time...).

In order to more objectively address the content matching classification, a number of alternative, automatic matching analysis algorithms have also been developed to classify the content items not only according to their explicit content metadata, such as genre, artist, album and song title in the case of music, but also according to their inner content characteristics, by applying signal processing analysis. An example of such a solution is described in the PCT patent application WO2005/031517 by Holm and Hicken "Audio fingerprinting system and method" wherein frequency analysis of an audio item is applied to derive a fingerprint matrix of the audio item features. Different audio items can then be compared by means of the well-known Euclidian distance: the shortest the distance, the closest the two items, which can then be considered as matching each other. By applying the feature analysis and distance computation to the whole set of content items in the end user library, it becomes possible to automatically generate matching playlists. However the signal processing analysis algorithms do not perfectly match the perceptual preferences of end users, so the end resulting playlist is not always discerning from an end user perspective. In order to solve this particular issue, US patent application 2012/0023403 by Herberger and Tost proposes a dynamic playlist generation method where an audio signature is first generated for each audio item in the end user database to create a model of the audio items, describing their unique features and matching them in a conventional way; the user can then select a starting song for playlist generation, and modify the respective weights of the various song features (such as tempo, singing, male or female voice, instruments, melody...) according to his/her personal taste to adjust the matching algorithm computation and consequently re-generate a user-personalized playlist. For instance the end user may request that more songs with a similar tempo are included into the playlist by overweighting the tempo feature measurement. One main drawback of this method is that it requires some understanding of the underlying musical features by the end user, which is fine with music experts but less common in the general public. Another drawback is that it only addresses the problem of local library navigation; as the method requires significant computation, in particular in the initialization step, it is not possible to apply the dynamic, personalized playlist generation at a larger scale to provide personalized recommendation navigation through huge online libraries.

In US patent 6,545,209 by Flannery, Deeds and Schrock, a combination of human perceptual classification and digital signal processing classification of songs is used in combination with a learning process to derive a set of converging classification rules suitable to a first set of songs. The rules can then be generalized to apply to other sets of songs or new songs. One major issue in deploying such a solution in practice is that the learning process is not automatic. It explicitly requires to organize an initial classification by music expert, followed by two further stages of validation, of the initial classification and the proposed converging classification rule outcome respectively, by "groover" scrutiny from non experts. Moreover, the algorithm remain media centric in building clusters of matching media entities around a given media item, out of which a directed graph of matching entities can be built and adjusted. Thus, in order to classify a whole database of media items, the solution requires significant computation to match all media items in a roll up table through an exhaustive search space; even if a heuristic is proposed to prune the search around a reference song by limiting the search to its sub-genre, this approach ultimately depends on the sub- genre definitions by musical experts and does not benefit from extended crowd feedback analysis. Last but not least, all the media library navigation and recommendation solutions identified so far directly manage libraries of digital media, either at the end user client side or at the media provider server side, or both. This binds the end user to a specific media provider (such as iTunes for instance) and requires the navigation and recommendation solution provider to deal with complex digital rights management issues.

There is therefore a need for an improved method and system of navigation and recommendation into a media library, that can manage and connect a diversity of existing media items, that facilitates the discovery of new media without relying upon a specific service provider, that is adaptive to the end user behavior and preferences as transparently as possible, and that can be implemented efficiently in a distributed network of multiple heterogeneous media libraries.

SHORT DESCRIPTION OF THE INVENTION

The present invention proposes a method to assist the selection of media items comprising the steps of:

- extracting intrinsic features of the media items having a media reference and storing said media items intrinsic features in association with said media references as graph media item nodes into a database, - calculating for a pair of extracted media items, a similarity value representing a level of similarity between the extracted intrinsic features of the pair of media items,

- determining a proximity measurement based on at least said similarity value and at least one user preference parameter,

- storing the proximity measurement in association with the respective pair of media references of said media items as graph proximity edges into a database if the similarity value is above a threshold,

- establishing a selection path, from a given media item node, by connecting said media item to another media item through the edge having the highest degree of proximity measurement.

In the proposed media library navigation and recommendation system and method, a distributed network of media libraries is represented by a graph of media proximity edges connecting media item nodes. Each end user client initially computes a media proximity subgraph. Starting from a given media item node, the end user client can then derive a media player navigation list in the end user library by searching and navigating along the shortest proximity path between media items in the undirected graph. When the end user client is connected to the network server, it uploads its subgraph to the server which consolidates it into a global graph by merging it with the various subgraphs already uploaded by other end user clients. The server further provides media player recommendations to enrich the end user library around a given node item by searching for the optimal proximity path in the undirected consolidated graph that is not yet represented into the end user subgraph.

In order for the graph to represent the proximity between media items as closely as possible to the end user subjective perception, the media proximity edges are computed by the end user client as a function of a number of features such as:

The media library items similarity computation (automated proximity computation); The end user interactions with the local media library similarity navigation paths (e.g. like/dislike on playlists paths, classification of media items in different orthogonal categories, according to the user listening preferences and/or moods).

Moreover, the media proximity edges are further dynamically updated in a personalized way by analyzing:

The user taste and behavior, based on the end user interactions with the local media library similarity navigation paths and with the distributed media library recommendations;

The user connectivity, reputation and proximity with regards to other connected end users;

The navigation history.

Based on those various factors, the server can compute on the fly a personalized navigation list as well as further recommendations at the request of any end user, so that the latter benefits from a navigation and recommendation system adapted to his/her media library as closely as possible to his/her profile and media preferences.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows a client device.

FIG. 2 shows a distributed network of client devices and server.

FIG. 3 shows the high-level architecture of the client device mass memory according to the present invention.

FIG. 4 represents a media proximity graph in which the nodes represent the media items and the edges represent the proximity between them - the shorter the edge the closer the media items.

FIG. 5 represents an exemplary user interface for proximity-based navigation into a media library. FIG. 6 represents three different end user media proximity graphs with a subset of shared nodes and edges.

FIG. 7 represents a merged media proximity graph as consolidated by the server. FIG. 8 represents the high-level architecture of the media proximity navigation and recommendation server.

FIG. 9 represents in bold the nodes and edges that can be recommended out of a given end user subgraph in the consolidated server graph, based on proximity.

FIG. 10 shows the high-level architecture of a media proximity navigation and recommendation server enhanced with an end user taste analyzer functionality.

DETAILED DESCRIPTION FIG. 1 illustrates an exemplary client device 100 comprising computing means 110, memory means 120, communication means 130, and user interaction means 140. Examples of such devices are for instance conventional personal computers, handheld devices such as mobile phones or tablets, photo and video cameras. Communication means may be wired or wireless connections, enabling for instance access to the Internet and remote servers. Memory means may be internal or external mass storage, such as a computer hard disk or flash memory. User interaction means may be embedded such as a tactile screen in the case of a tablet, or external such as for instance a keyboard, a mouse, a remote control, and a display screen.

FIG. 2 represents a network environment where a server 200, is connected with a number of client devices 100a, 100b, 100 through a communication network 250. Communication network 250 may be LAN, WAN, intranet, the Internet, etc. Client devices and network servers may communicate by means of various communication protocols, such as for instance HTTP in the case of a web server. A client device 100a, 100b, 100c, may connect only from time to time to one server 200 through communication network 250 to access services such as media discovery, streaming or download. Client devices may also connect from time to time to one server 200 to report local media usage, user profile or user preferences. Client devices may also connect to each other from time to time in a peer-to-peer mode to exchange information and data in relation with their local media library. In the system and method according to the present invention, a digital media player software enabling user friendly navigation runs on client device 100 using its user interface UI 140 as represented by FIG. 3. The client device 100 comprises a computer readable medium 310 in which a number of digital media items are stored, such as music files, in its memory or storage unit 120. Those items may be organized in a single library such as iTunes or in separate libraries.

At installation time, the end user initializes the digital media player software by selecting the digital media items he/she wishes the digital media player 300 to handle. The digital media player 300 processes each media item file to extract a set of item features, comprising:

explicit related metadata from the digital item file format, such as author, title, album and duration information in the case of music. This information can be used as the basis for content identification, intrinsic content features, such as rhythm, chroma, tempo, instruments, that can be extracted from digital signal analysis of the content item and represented in a condensed form as a content item fingerprint.

In the case of audio, various methods may be used at this stage, such as fingerprinting methods based on Mel Frequency Cepstral Coefficients analysis, or the thumbnail fingerprint proposed by Bartsch in "Audio thumbnailing of popular music using chroma-based representations", IEEE Transactions on Multimedia, vol.7, no. l , Feb 2005. In order to facilitate further processing and storage of the implicit content features extracted from the content item analysis, they are preferably represented by a compact fingerprint value that uniquely characterizes the content item, such as the efficient binary audio fingerprint proposed by Jang et al in "Pairwise boosted audio fingerprint", IEEE Transactions of information forensics and security, Vol. 4, no,4, Dec 2009. In the system and method according to the present invention, various fingerprinting methods may be used; we refer to the fingerprinting method result as the item fingerprint in the remainder of this disclosure. The extracted content features, such as the media metadata, intrinsic features and fingerprint, are stored as the properties of media item nodes in a computer readable medium 320, in the memory or storage unit 120 of client 100.

For each possible pair of media items, their initial similarity can be measured as a function of the distance between the two media items extracted features. Intrinsic content features, such as rhythm, chroma, tempo, instruments, voice will produce an array of values that can be compared to qualify the similarity of two media items. Various distance computation methods may be used to this end, such as for instance the Hamming distance or the Euclidian distance in the case of feature vectors. A similarity value can be also calculated by applying similarity feature weights in order to adapt the influence of each individual similarity feature in the final measurement.

In the following, we assume that the highest the similarity measurement, the closest the two media items; however the reverse convention is also possible. The resulting media item similarity measurements are stored as content proximity edges in a computer readable medium 320, in the memory or storage unit 120 of client 100. In the following we will use indifferently the terminology similarity value, similarity measure or similarity measurement, however this shall not be considered as limitative; as known to those skilled in the art, the similarity measurement may also be represented in the form of a vector or matrix.

As represented by FIG. 4, the media items and their relative proximity are represented into an undirected graph where the nodes represent the media items and the edges connecting two nodes represent the proximity between them.

In order to optimize computational and memory optimization, the graph does not need to be complete; a threshold can be defined to only store the edges whose proximity is above a given threshold. For instance in the graph of FIG. 4 media item nodes A and M are close enough to media item node S for the edges S-A and S-M to be recorded in the media proximity database 320, but that is not the case for media item nodes N, C, T and E so they are not directly linked to media item node S.

Once the graph is built for the whole set of selected media items, it becomes possible to derive a navigation list, such as a playlist in the case of music or a photo album in the case of photo, by searching the shortest transition between media items from any starting media item node or between any two media item nodes, according to the initial digital signal processing analysis and proximity measurement. For instance with reference to FIG.4, the end user can select item S as the starting media item node and a navigation list will be automatically proposed as S-M-N-T corresponding to the shortest path from node S to node T. In the case of the graph represented by FIG.4, with reference to FIG.3, the client device 100 stores a list of content items S, A, M, N, C, T, E in its media item nodes database 310 and a list of content item proximity paths S-A, S-M, A-N, M-N, M-T, N-T, T-E, C-E, N-C, N-E, C-A in its media proximity edges database 320. In other embodiments, the media item nodes and the media proximity edges may be stored as different records in the same database.

It is also possible to constrain the navigation by user-defined parameters such as the choice of both a start and end media items, or even a list of media items that the end user wishes to have into the navigation list. In the case of music, it is also possible to hard-constrain the total navigation time for the navigation list, for instance as a maximum duration, or even as a series of desired transition times between selected media item nodes in a list of media items to be navigated through. Furthermore, multiple navigation lists, corresponding to different end points from a given starting node or different path variants along a series of forced navigation path nodes, may also be proposed by the navigation method and system.

In order to further optimize the navigation, it is also possible to apply some further heuristics derived from the current navigation history in the media item navigation and recommendation method. For instance it is desirable to avoid duplication of a media item from the navigation selection based on the shortest proximity path, or to avoid the navigation selection to repeat the same artist due to the inherent similarity of his/her different works (same voice strongly influencing the similarity comparison, for instance). To this end, the navigation and recommendation method may also include a further on-the-fly, dynamic processing step to underweight or overweight the edges paths based on the recent navigation history.

FIG.5 shows an exemplary smart player user interface operating according the proposed method and system. The user can select a song out of a library of music items 500 so the player automatically proposes a navigation playlist 510 as the shortest path in the graph that is represented by a chain of music items as nodes 520, 521, 522 and proximity edges 530, 531, starting from the initial song 520. The end user can experiment the playlist and vote on the relevance of the proximity connections 530, 531 by simply valuing them in a positive (+) or negative (-) way. Respective positive and/or negative valuation numbers are recorded accordingly for each proximity edge in the media proximity edge database 320. For instance, + 1 for a positive valuation, -1 for a negative valuation, 0 by default. Other valuation measurements are possible. For instance a slider may be used in the user interface to represent an extended valuation scale and the valuation numbers are then chosen within the extended valuation range.

Thus, the end user interface is kept as simple as possible; in particular no musical or technical expertise on the underlying media item features characterization and/or similarity measurement algorithm is required to adjust the underlying graph representation. The player can also suggest one or more alternative navigation playlists as the next shortest paths in the graph, in particular when the user undervalues a proximity connection - an alternative can then be proposed.

It may occur that some nodes are left unconnected in the undirected graph, when no other media item matching the proximity threshold requirement can be found in the whole library. This is more likely for smaller libraries and/or content items with uncommon features - for instance a Japanese traditional musical piece will unlikely fit most pop and rock content as the rhythm, instruments and singing significantly differ. It may however happen that another end user client 100b has processed the same musical piece as well as a collection of other similar pieces that are automatically connected in a relevant navigation list out of this specific content item. In order to enrich the graph of end user client 100a accordingly, the network connectivity can be used to provide a distributed media library navigation and recommendation system and method in which multiple client 100a, 100b, 100c connect, share and enrich their respective undirected graph data through a server 200 in charge with dynamically collecting and consolidating a global undirected graph encompassing the multiple graphs of connected clients 100a, 100b, 100c.

In practice, the graphs from various clients 100a, 100b, 100c only partially share some of their media item nodes and media proximity edges. FIG. 6 represents an example of three graphs from clients 100a, 100b, 100c respectively where content item nodes A, B, C, E are shared by at least two graphs as well as proximity edge C-E. By comparing and merging the graphs the server 200 can build and consolidate a global graph, as shown on FIG. 7 with reference to the examples illustrated in FIG.6. The graph is typically represented in server 200 by a database of media item nodes recording all local media item nodes as uploaded by the end users clients 100a, 100b, 100c, as well as a database of media proximity edges that may have been overvalued and/or undervalued by the end users. In alternate embodiments, the media item nodes and the media proximity edges databases may be merged, or each database may be further split; for instance in order to more efficiently manage a very large library of music, it is possible to identify "database bridging" edge candidates that connect two otherwise isolated subgraphs, and store each subgraph in a separate database together with its "database bridging" information to connect to the other database as relevant. The databases may be hosted in the memory or storage unit of server 200. The memory or storage unit may be internal or external, local or remote to the server 200. The databases may also be attached to the same server 200 or distributed over a set of servers. For instance different servers may be associated with a service offering additional music purchase or streaming from specific catalogues and genres, and it may be relevant to specialize the corresponding database upload to the most relevant part of the end users graphs.

FIG. 8 represents the components of the server 200 system. A proxy component 810 comprises communication means to manage the communications with the clients through communication network 250. Example of communication means are an Ethernet port with TCP/IP and HTTP/HTTPS protocols; other communication means are also possible. The proxy communication receives every client request and then dispatches them to the appropriate component of server 200. When the server component response is ready, the proxy component 810 forwards it to the client that initiated the request. The proxy component 810 is also in charge with securing the communications, by means of authentication and/or encryption methods, as well as identifying the client version and dispatching the request accordingly, as known by one skilled in the art.

In order to transmit its undirected graph data to the server 200, each client 100a, 100b, 100c uploads separately the information data characterizing each node and each edge from its graph to a merging component 820 in the server 200 through the proxy component 810. The merger component 820 merges the uploaded client graphs into a consolidated overall graph, represented by a media item nodes database 824 and a proximity edge database 822. In other embodiments, the media item nodes and the media proximity edges may be stored as different records in the same database by the merger component 820.

In the media node database 824, each media item is associated with a media node unique identifier and a set of features associated with the media item, such as for instance in the case of music, the metadata information (title, genre, artist name, album name, year, duration, ...), the music intrinsic features and its the corresponding fingerprint value.

In the proximity database 822, each edge is associated with an edge unique identifier, the media node unique identifiers of the two end nodes connected by the edge, their initial computed proximity value, the positive valuation number cumulated from end users valuation, the negative valuation number cumulated from end users valuation, and the adjusted proximity value dynamically computed by server 200 from the end users data (by default, equal to the initial computed proximity value).

Before sending the content item and its whole set of features, the client sends the fingerprint of the song to the proxy component 810 of server 200. In a possible embodiment, the proxy component looks for a match of the fingerprint in the media nodes database to determine if the content item node is already recorded. For the sake of efficiency, in another possible embodiment, rather than looking for a fingerprint match directly into the merging component database, the proxy component 810 is responsible for associating a unique content item node identifier to each content item fingerprint by means of a lookup table. For each uploaded media fingerprint corresponds exactly one and only one media node identifier. This lookup table can be implemented as a key-value store. Other embodiments are also possible, as will be recognized by one skilled in the art. For instance, in order to associate multiple media items that are directly correlated but have a different fingerprint value, for instance different resolutions of the same picture as different items or the long, short and remix versions of the same song as different items, a matching tree may be employed.

If the media item fingerprint is not found in the server 200 records, a unique media node identifier is created for it by the proxy module, a new entry is created in the fingerprint proxy lookup table, and/or a new record is created in the media node database 824 to store the uploaded media item information from the client. If it is not possible to add it immediately, the request is put in a node queue waiting to be processed. A background process reads this queue and processes the media item nodes one after the other to add them into the proxy lookup table and/or the media node database 824.

The media node information is then uploaded by the client and added to the media node database. In the case of music, it can then be linked to an existing artist and/or album record in the media node database 824, or used to create the artist and/or album records if not registered yet. The media item is also linked with the user identifier of the uploading user. If the media item is found in the media node database 824, only the uploading user identifier is further recorded in association with the existing content item record. The merging component 820 may also check that the uploaded metadata information corresponds to the one recorded (in the case of music, correct artist and album information, year, meta genre, etc ..) and provide a corrective proposition feedback to the client through the proximity component 810 as relevant.

A special case may occur when the client is disconnected from the server before the media node upload can be processed out of the node queue by the merging component 820. In that case the media node is added to the graph as an empty placeholder, without user information, to be updated later on when the client connects back.

The merging component 820 also handles every edge upload request. If the edge is not found in the media proximity edges database 822, it is added to the database. If it is not possible to add it immediately, it is put in a queue waiting to be processed. This queue is preferably separate from the node queue, but other embodiments are also possible. A background process reads this queue and processes the edge items one after the other to add them into the media proximity edges database 822. When the background process processes the edge, it first verifies that both ends exist as node records in the media nodes database. It then verifies that the edge does not already exist, and if not, creates it. As long as none of the end nodes already exist in the media nodes database, the edge is not popped out from the queue. The background process tries to insert it regularly until it is inserted successfully. This particular case occurs when an edge insertion request arrives before the two end nodes (media items) are created in the media item database. The proximity edge information as uploaded by the client is then added to the server media proximity edges database 822. In particular the edge positive or negative valuation information is recorded. The proximity edge is also linked with the user identifier of the uploading user. When the edge is found in the media proximity edges database 822, the edge positive and/or negative valuation information from the uploading user are used to update the database records. The value may simply be added to the existing recorded value. In an alternate embodiment, the value may be weighted by a trust ratio associated with the uploading user. Once the positive and/or negative valuation information is updated for a given edge, the merger component 820 updates the adjusted proximity value record accordingly, for instance by increasing the proximity distance as a function of the recorded negative valuation numbers and/or decreasing the proximity distance as a function of the recorded positive valuation numbers and/or adjusting the proximity distance as a function of the positive and the negative valuation numbers.

One significant advantage of the proposed method and system is that the computational load in building and adjusting the overall proximity graph is distributed over a number of clients, which makes it cost effective and scalable without requiring significant processing power on the server side. Moreover, thanks to the undirected graph representation, only one edge needs to be processed for each pair of content items, and thanks to the proximity threshold heuristics, it is not necessary to build complete graphs.

From the merged graph, it is possible for the server 200 to suggest further content recommendations to connected client 100a, 100b, 100c by means of a further recommendation manager component 830, as depicted on FIG. 8. For instance with reference to FIG. 6 and FIG. 7, content item G next to content item C in client 100c subgraph may be recommended to client 100a whose subgraph also includes content item C, but not content item G. FIG. 9 shows in bold the candidate nodes and edges to be recommended accordingly to client 100a.

Last but not least, the proposed method is adaptive: it is possible for the end user to provide his/her valuation feedback to adjust the graph beyond the initial automatic computation to capture more accurately the subjective perception of the content navigation relevance. The feedback of other users can also be taken into account, so that each user has an adjusted content item proximity measurement constantly modified as the number of feedback on the proximity edges by other end users increases.

The latter information is particularly useful to provide more relevant recommendations to end users. For instance with reference to FIG. 6 and FIG. 7, if the proximity edge C-G has been positively valued in the subgraph of client 100c, it is even more relevant for the server to recommend the connection of content item G and content item C to client 100a. Conversely, if the proximity edge C- G has been negatively valued in the subgraph of client 100c, it is less relevant for the server to recommend the connection of content item G and content item C to client 100a. Thus, in a possible further embodiment, the server recommendation component 830 recommends further connections to client 100a by looking at the proximity edges that are connected to content items of the client 100a subgraph by only one end (i.e. the other end node does not belong to the client 100a subgraph) and sorting them according to their adjusted proximity value, so that the closer media items are recommended first.

In another possible embodiment, the server recommendation component 830 operates under request by the client through the proxy component 810 to propose recommendations specifically associated with a media item. Proximity edges associated with this media item and not yet part of the client graph can then be proposed as recommendation paths and enrich the diversity of playlists that can be generated on the client side.

When experimenting with user valuation feedback on the navigation playlists, in the case of music, we also observed that the proximity relevance depends on the end user to a certain extent. Professional musicians for instance tend to be more demanding in the relevance of the proximity paths than amateurs. For people primarily listening to music as a background atmosphere, a common general flavor (sometimes just the matching genre or author) is often enough. Some users will therefore interact more than others with the graph to adapt it to their needs, and in a more or less specific way. By analyzing this interaction, some common patterns can actually be found between different end users from the way they interact on the common part of their respective graphs. It is therefore relevant to classify end users in clusters of "similar taste", which can be used to dynamically provide them with more relevant recommendations according to their taste, without requiring explicit classification on their behalf. Some users may also wish to explicitly invite their social network friends to use the same service and have their friends considered as privileged, more trusted users in updating their own graph; in that case, special weight may be allocated to the recommendations from those users' graphs.

In order to find users with similar tastes, the server 200 further comprises a taste analyzer component 1000, as depicted by FIG. 10, to analyze the user taste based on a number of user specific features. Some of the latter features can be pre-analyzed in the client side and transmitted by the client to the server analyzer tool 1000 through the server proxy component 810, such as the set of media nodes in the user client database, and the set of media proximity edges the end user has updated by positive or negative valuations. Features related to the interaction of the end user with the system may also be used, such as the number of recommendations requested, the number of recommendations further liked (positive valuation of the corresponding edge), the number of bug reports, the number of invited friends, etc. Those features are represented by a taste matrix of different features for each end user. The proximity between two end users is then computed as the distance between them; the shorter the distance is, the closer the users will be. Based on this matrix distance, the taste analyzer component 1000 dynamically identifies a cluster of users that are "music mates" for each user, for instance by selecting a set of the 5, 10 or 20 closest users.

For efficiency purpose, rather than applying an exhaustive search on the set of registered users, the analyzer tool looks at the media proximity edges and/or at the media nodes uploaded by a client to look for other users sharing the same media proximity edge and/or media node in their own graph. The latter information is particularly useful to provide more relevant recommendations to the end users by focusing the recommendation search to users from the same cluster. For instance with reference to FIG. 6 and FIG. 7, if client 100b is part of the client 100a closest users cluster but not client 100c, it is even more relevant for the server to recommend to client 100a the connection of content item B and content item C, from subgraph of client 100b, rather than the connection of content item G and content item C, from subgraph of client 100c. Thus, in a possible further embodiment, the server 200 recommends further connections to client 100a by looking in the proximity edge database 822 at the proximity edges that are connected to media nodes of the client 100a subgraph by only one end (i.e. the other end node does not belong to the client 100a subgraph) and sorting them according to their user identifier (i.e. the other end node belongs to the subgraph of a client identified in client 100a user proximity cluster) and their adjusted proximity value, so that the closer media items in connected graphs of users from client 100a user cluster are recommended first.

We also observed that for a given end user, the proximity relevance also depends to the time of the day, mood or activity to another extent. Therefore, one way to further adapt the media library to the end user needs is to categorize the media item nodes in a set of non-overlapping, orthogonal categories, for instance in terms of mood (energy, romance, relaxation, for instance) or activity (work, fitness, lounge, party, for instance). A media item must belong to only one category, exclusively, into a given set of categories. However, several different sets may be defined by the proposed system, for instance the mood set and the activity set, and the end user can then classify the content item in both. The resulting classification, in addition to the undervaluation or overvaluation of the proximity edges, provides further information on the user profile that can be analyzed by the taste analyzer component 1000 to match similar end user profiles. Accordingly, the recommendation component 830 can then provide more relevant recommendations, matching more accurately the user preferences.

In a possible further embodiment, the end user is associated with a reputation ratio. The reputation for an end user is computed based on a number of factors, such as the size of the user client database, the heterogeneity of the user client database, the number of uploaded content items further connected to other end users graphs, the computed connectivity with other users (number of connected graphs by at least one media node), the popularity measured as a the number of invited friends, the number of media classifications in the category sets, the number of positive proximity valuations, the number of negative proximity valuations, the number of contributions to the network improvement (bug reports, metadata tagging, etc), the number of requested recommendations, the number of liked recommendations... These factors can be weighted and summed up to measure a reputation score for each end user. The reputation score can then be used by the merging component 820 to underweight or overweight the end user valuations by a trust factor as a function of the reputation score. It can also be used, as well as by the recommendation component and/or the taste analyzer component in their respective search heuristics. It can also be used to reward the user in association with the underlying business model, for instance by using it as a virtual money currency to buy e-goodies, special features, new content, themes, etc.

As will be understood by one skilled in the art, the proposed system and method offers a number of advantages over the prior art solutions for media library navigation and recommendation. The preferences of the end users are taken into account both at a local (client) and global (server) level to dynamically adapt the media library navigation and recommendation to the preferences, behavior and profile of the end user. The proposed system and method does not operate on a specific media library, but rather on the connections between media library items, independently from the underlying media library source, represented into an undirected graph whose nodes correspond to the media items and edges to the connections between them. The proximity connections between two different media library items depend on the computed static similarity between their intrinsic media features, but also adapt to the end user taste as dynamically analyzed from his/her interaction with the navigation and recommendation lists in the proposed method and system. Moreover, the use of an undirected graph representation combined with a number of heuristics to optimize the navigation and recommendation search into the graph avoids the computational burden of exhaustive search in the media library at runtime, and the memory overhead of the prior art media item centric navigation and recommendation methods.

Claims

WHAT IS CLAIMED IS

1. A method to assist the selection of media items comprising the steps of:

- extracting intrinsic features of the media items having a media reference and storing said media items intrinsic features in association with said media references as graph media item nodes into a database,

- calculating for a pair of extracted media items, a similarity value representing a level of similarity between the extracted intrinsic features of the pair of media items,

2. The method of claim 1 or 2, further comprising the step of:

- evaluating multiple selection paths, from a given media item node, by connecting said media item to another media item through the edge having a high degree of proximity ;

- selecting at least one selection path out of the multiple selection paths, by dynamically underweighting and/or overweighting the path edges;

3. The method of claim 2, wherein underweighting and/or overweighting the path edges is based on the current selection history.

4. The method of any preceding claims, wherein it comprises the steps of:

- selecting a first media item to render to a user,

- selecting a second media item to render to the user having the highest degree of proximity measurement between the first media item and the second media item, - receiving a command from the user qualifying the selection of the next media item, and

- updating the proximity measurement according to the user command, said command increasing or decreasing the proximity measurement.

5. The method of any preceding claims, wherein the set of media items and the proximity measurements define an undirected graph, the nodes of the graph representing the media items and the edges connecting two nodes represent the proximity measurement between them.

6. A method to assist the selection of media items according to any preceding claims, further comprising the steps of:

- merging graph media item nodes and proximity edges from at least two end user clients into a database,

- determining a new proximity measurement for proximity edges shared by at least two end users based on their respective end user preferences parameters.

7. The method to claim 6, further comprising the step of:

- transmitting the updated proximity measurement to said end users.

8. The method to claims 6 or 7, further comprising the step of:

- storing the updated proximity measurement into a database.

9. A method to assist the selection of media items according to claim 6, 7 or 8, further comprising the steps of:

- receiving a selection path request to start from a given media item node from one end user client, - establishing a selection path, from said given media item node, by connecting said media item node to another media item node through the edge having the highest degree of updated proximity measurement,

- transmitting said selection path to said end user client, in the form of additional graph media item nodes and/or graph proximity edges to be merged into the end user client graph.

10. The method of claim 6, wherein a first graph is defined for a first user and a second graph for a second user, the first graph having a first media item connected with a second media item according to a first proximity measurement, the second graph having a third media item connected with a fourth media item according to a second proximity measurement, the first media item being the same as the third media item, the graph merging comprising the steps of:

- comparing the first proximity measurement with the second proximity measurement, and proposing the fourth media item to the first user instead of the second media item according to the result of the comparison.

11. Client device comprising computing means, memory means storing media items, said computing means comprising means for :

- storing the proximity measurement in association with the respective pair of media references of said media items as graph proximity edges into a database if the similarity value is above a threshold, - establishing a selection path, from a given media item node, by connecting said media item to another media item through the edge having the highest degree of proximity measurement.