FR2910661A1

FR2910661A1 - Electronic document e.g. image, organizing method for use in e.g. personal computer, involves modifying set of labels from documents of new sub-directory by subtracting label corresponding to sub-directory

Info

Publication number: FR2910661A1
Application number: FR0611347A
Authority: FR
Inventors: Jerome Besombes; Franck Meyer
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2006-12-22
Filing date: 2006-12-22
Publication date: 2008-06-27
Also published as: WO2008081129A2; WO2008081129A3; EP2102770A2

Abstract

The method involves labeling non-labeled documents from document assembly (5) by prediction based on labeled documents and descriptive data of the documents. A sub-directory is created by a distinct label among a set of labels associated with documents of current directory. Documents of the current directory are associated, where a set of labels of the documents includes a label corresponding to a created sub-directory. The set of labels from documents of new sub-directory is modified by subtracting the label corresponding to the sub-directory. Independent claims are also included for the following: (1) a computer program product downloadable from a communication network and/or recorded on a support comprising program code instructions for executing a method for organizing an electronic document (2) a multimedia electronic equipment comprising a multimedia document storage unit for implementing a method for organizing electronic documents.

Description

1 Procédé et dispositif d'organisation de documents électroniques, produit1 Method and device for organizing electronic documents, product

programme d'ordinateur et équipement électronique multimédia correspondants. 1. DOMAINE DE L'INVENTION Le domaine de l'invention est celui des systèmes informatisés (par exemple des ordinateurs personnels, connectés ou non à Internet). Plus précisément, l'invention concerne une technique d'organisation de documents électroniques dans un tel système informatisé, permettant d'aider un utilisateur à créer et/ou modifier une organisation arborescente de ces documents électroniques. corresponding computer program and multimedia electronic equipment. FIELD OF THE INVENTION The field of the invention is that of computerized systems (for example personal computers, connected or not to the Internet). More specifically, the invention relates to a technique for organizing electronic documents in such a computerized system, to help a user to create and / or modify a tree organization of these electronic documents.

La gestion des documents électroniques mis à disposition d'un utilisateur d'un système informatisé devient une problématique centrale. En effet, le nombre toujours croissant de ces documents, ainsi que de leurs sources (résultat d'une requête à un moteur de recherche, réception de mails, chargement d'appareils photos numériques, achats de fichiers de musique, ...), rend difficile leur utilisation sans la mise en place, par l'utilisateur, d'une organisation de ses données. 2. ART ANTÉRIEUR Comme tout fichier informatique, les documents électroniques peuvent être organisés manuellement sous forme d'une arborescence. Dans un système de gestion de fichiers tel l'Explorateur de fichiers de Windows (marqué déposée), l'utilisateur crée des répertoires et des sous-répertoires, puis y répartit l'ensemble de ses documents, par des déplacements successifs de chaque document, ou de groupes de documents sélectionnés. Cette méthode apporte l'avantage d'une organisation correspondant parfaitement aux désirs de l'utilisateur : chaque document est placé dans le répertoire choisi, répertoire lui-même crée et placé dans l'arborescence par l'utilisateur. Chaque répertoire contient alors un ensemble de documents et/ou de sous-répertoires, représentant un groupe homogène suivant un ou plusieurs critères propres à l'utilisateur. A titre d'exemples, des photos peuvent être classées par thèmes (événements, personnes,...), des courriels par dates ou par provenance, des fichiers musicaux par interprètes,... De manière complémentaire, de nombreux systèmes de gestion de documents électroniques donnent la possibilité à un utilisateur de labelliser les documents. Un document peut être associé à un ou plusieurs labels (typiquement un mot représentatif 2910661 2 des données contenues dans le document). A titre d'exemple, les sites Internet de partages de photos (Yahoo Flikr (marqué déposée), AOL Pictures (marqué déposée),...) ou les logiciels de gestion des photos personnelles (Google Picasa (marqué déposée), Microsoft Digital Image Suite 2006 (marqué déposée)) offrent cette possibilité qui 5 dépasse le cadre strict de la gestion de fichiers classiques. Un ensemble de fichiers d'images labellisés par un label commun constitue naturellement un ensemble cohérent et, par conséquent, un ensemble de labels constitue une organisation des documents labellisés. Les systèmes cités précédemment offrent tous la possibilité d'afficher l'ensemble des documents labellisés par un label donné, de la même manière que 10 peuvent être affichés les documents d'un même répertoire. En outre, cette organisation des données à partir de labels a la propriété de multi-appartenance : un même document peut être étiqueté par plusieurs labels différents. Ce document va donc apparaître à chaque fois que l'un de ses labels sera utilisé comme critère d'affichage des données. Contrairement à l'organisation manuelle des données (dans le cas précité d'un système 15 de gestion de fichier tel que l'Explorateur de Windows (marqué déposée) XP), cette organisation est purement logique : les documents demeurent dans leur unité de stockage initial et la labellisation ne provoque aucun déplacement de fichier. L'organisation n'est visible qu'à l'affichage et, en particulier, un document auquel plusieurs labels sont attribués n'est pas copié, même s'il peut apparaître dans plusieurs ensembles de 20 documents distincts. Deux techniques connues distinctes d'organisation de données ont été présentées ci-dessus : une technique manuelle via un système classique de gestion de fichiers hiérarchique et une technique manuelle utilisant des labels associés aux fichiers (avec une organisation logique). 25 La technique manuelle via un système classique de gestion de fichiers hiérarchique a l'avantage d'offrir à l'utilisateur une organisation de ses données qui correspond à ses souhaits puisqu'il a construit chaque répertoire et sous-répertoire et a déplacé chaque document dans le répertoire approprié. L'inconvénient majeur de cette technique apparaît quand le nombre de documents devient important. En effet, pour 30 construire une telle organisation, chaque document doit être identifié par l'utilisateur, de manière à déterminer dans quel répertoire il devra être déplacé (et éventuellement créer 2910661 3 ce répertoire). Au-delà de quelques dizaines de documents, il devient particulièrement fastidieux d'identifier chaque document, d'autant plus si ceux-ci sont répartis dans des systèmes de stockages différents. L'autre technique connue précitée, utilisant les labels, ne résout pas le problème 5 de nécessité de prise en compte de chaque document un à un, puisqu'au moins un label doit être associé à chaque document par l'utilisateur. De plus, les systèmes actuels n'offrent qu'une organisation à un niveau : ils permettent d'afficher l'ensemble de documents associés à un label particulier (ce qui peut être vu comme un répertoire), mais cette représentation n'est pas récursive. Il est impossible de sélectionner, à 10 l'intérieur d'un répertoire correspondant à un label x,, l'ensemble des documents labellisés par un label x2 également (ce qui constituerait un sous-répertoire du répertoire constitué par le label x,). Il n'existe pas actuellement de méthode permettant de gérer des documents aux labels multiples, de les classer automatiquement en gérant la multi-appartenance et en 15 permettant d'inférer les labels des documents non labellisés, et de maintenir la hiérarchie ainsi construite au travers d'une IHM (Interface Homme Machine) temps réel. 3. EXPOSÉ DE L'INVENTION Dans un mode de réalisation particulier de l'invention, il est proposé un procédé d'organisation de documents électroniques à partir d'un ensemble de documents 20 comprenant des documents labellisés, un document labellisé étant associé à un jeu de labels. Au moins un document labellisé est associé à un jeu de labels d'au moins deux labels. Le procédé comprend les étapes suivantes : a) labellisation des documents non labellisés de l'ensemble des documents, par prédiction en fonction des documents labellisés et de données descriptives des 25 documents ; à partir d'un répertoire courant associé à un ensemble de documents labellisés : b) création d'un sous-répertoire par label distinct parmi l'ensemble des labels associés aux documents du répertoire courant ; par sous-répertoire créé : 30 c) association des documents du répertoire courant dont le jeu de label comporte au moins un label correspondant audit sous-répertoire créé ; 2910661 4 d) modification des jeux de labels des documents du nouveau sous-répertoire par soustraction du label correspondant au sous-répertoire ; à partir de chaque sous-répertoire créé, devenant répertoire courant : e) itération des étapes b), c) et d) jusqu'à obtention de sous-répertoires associés 5 uniquement à des documents dont le jeu de labels est vide. Ainsi, dans ce mode de réalisation particulier, l'invention repose sur une approche tout à fait nouvelle et inventive combinant une labellisation automatique de documents avec une construction automatique d'une organisation arborescente à partir des labels des documents labellisés. 10 En effet, contrairement à la technique connue manuelle basée sur l'utilisation par un utilisateur d'un système classique de gestion de fichiers hiérarchique (voir discussion ci-dessus), ce n'est pas à l'utilisateur que revient la tâche fastidieuse de construire une organisation arborescente à partir d'un répertoire courant. La construction de l'arborescence est effectuée de manière automatique et récursive par niveaux, en tenant 15 compte des labels associés aux documents électroniques. La présente invention permet donc un gain de temps dans la construction de l'organisation arborescente et une optimisation des ressources du système informatisé dans lequel est mise en oeuvre la technique d'organisation de documents électroniques selon l'invention. En outre, grâce à la labellisation automatique des documents non labellisé, 20 l'utilisateur n'a pas à labelliser tous les documents, ce qui lui évite un travail fastidieux quand le nombre de documents devient important. Dans la présente description, on considère comme synonyme le fait qu'un document soit associé à un sous-répertoire et le fait que ce document soit contenu (ou encore placé) dans ce sous-répertoire. 25 Avantageusement, on entend par association d'un document donné à un sous-répertoire, le fait que ledit document donné est physiquement déplacé depuis une unité de stockage initiale vers ledit sous-répertoire donné. Dans ce cas, l'organisation arborescente est générée automatiquement mais utilisée ensuite comme une arborescence crée manuellement par l'utilisateur avec un 30 système classique de gestion de fichiers hiérarchique. 2910661 5 Selon une variante avantageuse, on entend par association d'un document donné à un sous-répertoire, le fait que ledit document donné demeure dans une unité de stockage initiale et qu'un identifiant unique dudit document donné est placé dans ledit sous-répertoire donné. 5 Dans cette variante, l'organisation arborescente est une organisation logique, dont les sous-répertoires contiennent uniquement des identifiants de documents électroniques. L'identifiant de chaque document est unique et définit le chemin complet vers le lieu de stockage de ce document. Par exemple, avec un système de stockage centralisé tel qu'un disque dur, l'identifiant définit le chemin depuis la racine du disque 10 dur jusqu'au sous-répertoire contenant le document (c'est-à-dire le chemin dans l'arborescence du système de stockage). Avec un système de stockage décentralisé, l'identifiant est par exemple une adresse URL indiquant le chemin d'accès au document dans Internet. Avantageusement, l'étape b) est précédée de l'étape suivante : 15 - si le nombre de labels différents associés aux documents du répertoire courant est supérieur à un seuil prédéterminé S, alors on trie les labels par ordre décroissant de fréquence et on sélectionne les S premiers labels, seuls les labels sélectionnés étant pris en compte pour effectuer les étapes suivantes. De cette façon, on réduit les ressources de traitement (processeur et mémoire 20 notamment) nécessaires à la construction de l'organisation arborescente. De façon avantageuse, s'il existe des premier et second labels tels que : la condition suivante est vérifiée : pour tous les documents électroniques labellisés dont le jeu de labels comprend ledit second label, ledit jeu de labels comprend aussi ledit premier label, et 25 - la condition suivante est vérifiée : pour tous les documents électroniques labellisés dont le jeu de labels comprend ledit premier label, ledit jeu de labels comprend aussi ledit second label, alors l'un desdits premier et second labels est supprimé uniquement pendant lesdites étapes b) à e), chaque document électronique labellisé retrouvant tous ses labels à l'issue 30 desdites étapes b) à e). 2910661 6 Cette prise en compte d'un premier type d'inclusion de labels permet de simplifier, par élagage, l'organisation arborescente obtenue et optimiser les ressources de traitement nécessaires à la construction de cette organisation arborescente. Avantageusement, s'il existe des premier et second labels tels que : 5 - la condition suivante est vérifiée : pour tous les documents électroniques labellisés dont le jeu de labels comprend ledit second label, ledit jeu de labels comprend aussi ledit premier label, et - la condition suivante n'est pas vérifiée : pour tous les documents électroniques labellisés dont le jeu de labels comprend ledit premier label, ledit jeu de labels 10 comprend aussi ledit second label, alors, au cours desdites étapes b) à e), ledit second label ne provoque la création d'un sous-répertoire que sous le sous-répertoire issu du premier label. Cette prise en compte d'un second type d'inclusion de labels permet elle aussi de simplifier, par élagage, l'organisation arborescente obtenue et optimiser les ressources 15 de traitement nécessaires à la construction de cette organisation arborescente. Avantageusement, ladite étape de labellisation comprend une étape de génération d'une matrice V de la façon suivante : - V;j=UijsiU;.ie 0 - V;j = P(i,j) si U;j = 0 20 avec i : un document électronique parmi l'ensemble des documents électroniques, j : un label parmi un ensemble de labels possibles, P : une fonction de prédiction en fonction de données descriptives des documents électroniques, et U : une matrice des documents électroniques et labels affectés par un utilisateur, telle que U;j ~ 0 signifie que l'utilisateur a décidé d'associer ou non le label j au document 25 électronique i, et U;j = 0 signifie que l'utilisateur n'a pas formulé aucun avis sur le fait que le label j doit ou non être associé au document électronique i. L'utilisation d'une telle matrice V permet de simplifier les calculs liés à la labellisation automatique. De façon avantageuse, U; = 1, si l'utilisateur a associé le label j au document 30 électronique i. U;j = -1, si l'utilisateur a décidé de ne pas associer le label j au document électronique i. P(i,j) renvoie une valeur entre -1 et +1, plus la valeur de P(i, j) étant 2910661 7 négative moins il est probable que le document électronique i soit associé au label j, et réciproquement plus la valeur de P(i, j) étant positive, plus il est probable que le document électronique i soit associé au label j. Avantageusement, ladite étape de labellisation comprend une étape de seuillage 5 de ladite matrice V dans une matrice W de la façon suivante : si V;j > alpha alors W;,j = 1 sinon W;j = 0, avec alpha un seuil déterminé compris entre 0 et 1. Seuls les éléments W;J non nuls sont utilisés dans lesdites étapes b) à e). Ainsi, le seuil alpha permet de déterminer quels sont les labels réellement utilisés dans la construction de l'arborescence. 10 Selon une caractéristique avantageuse, le procédé comprend une étape de limitation du nombre de labels automatiquement attribués à un document électronique. Dans un mode de réalisation particulier, cette limitation est mise en place grâce au seuillage alpha précité. Le seuil alpha est dans ce cas modifiable par l'utilisateur, qui peut ainsi, de façon dynamique, limiter le nombre de labels automatiquement attribués à 15 un document pendant l'étape de labellisation automatique. Avantageusement, lesdites étapes b) à e) sont suivies des étapes suivantes : - modification par un utilisateur, via une interface homme/machine, d'au moins un jeu de labels parmi les jeux de labels associés aux documents électroniques labellisés ; 20 - suppression d'une organisation arborescente résultant de l'exécution desdites étapes b) à e) ; nouvelle exécution desdites étapes b) à e), en tenant compte dudit au moins jeu de labels modifié par l'utilisateur. Ainsi, un premier mécanisme est offert à l'utilisateur pour modifier une 25 organisation arborescente déjà obtenue, afin de la corriger ou la faire évoluer. Ce premier mécanisme repose sur un nouveau calcul de l'arborescence. Selon une variante avantageuse, lesdites étapes b) à e) sont suivies des étapes suivantes : modification par un utilisateur, via une interface homme/machine, d'au moins un 30 jeu de labels parmi les jeux de labels associés aux documents électroniques labellisés ; 2910661 8 -pour chaque document électronique labellisé dont le jeu de labels associé a été modifié par l'utilisateur, dit document modifié : * ledit document modifié est supprimé de chaque répertoire ou sous-répertoire auquel il est associé, ; 5 * puis on reclasse ledit document modifié en partant d'un répertoire racine : on associe ledit document modifié à chaque sous-répertoire dont le label correspondant est compris dans le jeu de labels associé audit document modifié, puis pour chaque sous-répertoire auquel le document modifié a été associé, on répète l'association de manière récursive dans les sous- 10 répertoires de niveaux inférieurs, s'ils existent. Ainsi, un second mécanisme est offert à l'utilisateur pour modifier une organisation arborescente déjà obtenue. Ce second mécanisme repose sur une adaptation locale de l'arborescence, sans que celle-ci soit entièrement recalculée. De façon avantageuse, ladite étape de modification par un utilisateur d'au moins 15 un jeu de labels est réalisée par : - un déplacer/coller, effectué par l'utilisateur via une interface graphique, d'un représentant du document à modifier vers un sous-répertoire cible associé aux documents dont les jeux de labels comprennent un ou des labels désirés ; et une affectation automatique audit document à modifier du label correspondant 20 audit répertoire cible, en remplacement du ou des labels précédemment affectés audit document à modifier. De cette façon, l'utilisateur peut aisément modifier des labels. De façon avantageuse, ladite étape de modification par un utilisateur d'au moins un jeu de labels est réalisée par : 25 - une sélection, effectuée par l'utilisateur via l'interface homme/machine, d'au moins un sous-répertoire à supprimer ; - pour chaque sous-répertoire à supprimer, une suppression automatique, dans les jeux de labels des documents électroniques labellisés : du label correspondant au sous-répertoire à supprimer, ainsi que du label correspondant à chaque sous- 30 répertoire se trouvant entre le sous-répertoire à supprimer et un répertoire racine. A nouveau, l'utilisateur peut aisément modifier des labels. 2910661 9 Dans un autre mode de réalisation, l'invention concerne un produit programme d'ordinateur téléchargeable depuis un réseau de communication et/ou enregistré sur un support lisible par ordinateur et/ou exécutable par un processeur, ce produit programme d'ordinateur comprenant des instructions de code de programme pour l'exécution des 5 étapes du procédé précité (dans au moins un de ses différents modes de réalisation), lorsque ledit programme est exécuté sur un ordinateur. Dans un autre mode de réalisation, l'invention concerne un dispositif d'organisation de documents électroniques à partir d'un ensemble de documents comprenant des documents labellisés, un document labellisé étant associé à un jeu de 10 labels. Au moins un document labellisé est associé à un jeu de labels d'au moins deux labels. Le dispositif comprend: a) des moyens de labellisation des documents non labellisés de l'ensemble des documents, par prédiction en fonction des documents labellisés et de données descriptives des documents ; 15 b) des moyens de création, appliqués à un répertoire courant associé à un ensemble de documents labellisés, d'un sous-répertoire par label distinct parmi l'ensemble des labels associés aux documents du répertoire courant; c) des moyens d'association, appliqués à chaque sous-répertoire créé, des documents du répertoire courant dont le jeu de label comporte au moins un label 20 correspondant audit sous-répertoire créé ; d) des moyens de modification des jeux de labels des documents du nouveau sous- répertoire par soustraction du label correspondant au sous-répertoire ; e) les moyens b), c) et d) étant appliqués itérativement pour chaque sous-répertoire créé, devenant répertoire courant, jusqu'à obtention de sous-répertoires associés 25 uniquement à des documents dont le jeu de labels est vide. Plus généralement, le dispositif d'organisation comprend des moyens de mise en oeuvre du procédé d'organisation tel que décrit précédemment (dans l'un quelconque de ses différents modes de réalisation). Dans un autre mode de réalisation, l'invention concerne un équipement 30 électronique multimédia comportant des moyens de stockage de documents multimédias, et des moyens pour mettre en oeuvre le procédé d'organisation de 2910661 10 documents tel que décrit précédemment (dans l'un quelconque de ses différents modes de réalisation). 4. LISTE DES FIGURES D'autres caractéristiques et avantages de modes de réalisation de l'invention 5 apparaîtront à la lecture de la description suivante, donnée à titre d'exemple indicatif et non limitatif (tous les modes de réalisation de l'invention ne sont pas limités aux caractéristiques et avantages des modes de réalisation décrits ci-après), et des dessins annexés, dans lesquels : la figure 1 présente un schéma global d'un mode de réalisation particulier du 10 procédé d'organisation selon l'invention ; la figure 2 présente un schéma d'une phase de vectorisation des documents électroniques, comprise dans un mode de réalisation particulier du procédé d'organisation selon l'invention ; la figure 3 présente un schéma d'une phase de création de Q modèles prédictifs, 15 comprise dans la phase de labellisation automatique de la figure 5 ; - la figure 4 présente un schéma d'une phase de construction d'une matrice V, comprise dans la phase de labellisation automatique de la figure 5 ; la figure 5 présente un schéma d'une phase de labellisation automatique des documents électroniques, comprise dans un mode de réalisation particulier du 20 procédé d'organisation selon l'invention ; la figure 6 présente un schéma d'une phase de construction d'une arborescence, comprise dans un mode de réalisation particulier du procédé d'organisation selon l'invention ; les figures 7, 8 et 9 présentent les résultats de trois étapes successives de 25 construction d'un exemple d'arborescence, par mise en oeuvre de la phase de construction illustrée sur la figure 6 ; - la figure 10 illustre le résultat de la prise en compte d'inclusions de labels dans l'arborescence de la figure 9 ; la figure 11 illustre le résultat d'une prise en compte d'une limitation du nombre 30 de labels dans l'arborescence de la figure 10 ; 2910661 11 la figure 12 illustre un exemple de déplacement d'un document par l'utilisateur dans l'arborescence de la figure 11 ; la figure 13 illustre l'arborescence résultant du déplacement illustré sur la figure 12; 5 la figure 14 présente un exemple d'IHM d'un système mettant en oeuvre un mode de réalisation particulier du procédé d'organisation selon l'invention ; et la figure 15 présente la structure d'un dispositif d'organisation selon un mode de réalisation particulier de l'invention. 5. DESCRIPTION DÉTAILLÉE 10 Sur toutes les figures du présent document, les éléments identiques sont désignés par une même référence numérique. 5.1 Présentation globale Dans un mode de réalisation particulier, illustré par le schéma global de la figure 1, le procédé d'organisation selon l'invention comprend les étapes (aussi 15 appelées phases par la suite) suivantes : Phase 0 : Encodage préalable des documents (vectorisation). Cette première phase n'est pas illustrée sur la figure 1. Elle consiste à encoder chaque document pour le représenter sous une forme vectorielle. Ce type d'encodage est connu de l'homme du métier dès lors qu'on effectue des tâches de classification 20 automatique. Phase 1 : Définition de labels pour certains documents. Cette phase 1 est référencée P1 sur la figure 1. L'utilisateur définit des labels et les associe à certains documents qu'il choisit. Notons que l'utilisateur n'est pas du tout obligé de labelliser tous ses documents, une faible proportion suffira. Ainsi, à partir 25 d'un ensemble de documents stockés 5, on obtient un nouvel ensemble de documents 6 dont un premier sous-ensemble 6A contient des documents non labellisés et un second sous-ensemble 6B contient des documents labellisés. Phase 2 : Labellisation automatique de tous les documents par le système. Cette phase 2 est référencée P2 sur la figure 1. Le système mettant en oeuvre le 30 procédé labellise automatiquement les documents (il généralise tous les labels possibles 2910661 12 aux documents déjà labellisés et à ceux n'ayant pas de label) grâce à un module de classification multi-label. On obtient un ensemble de documents tous labellisés 7. Phase 3 : Génération automatique du plan de classement par le système Cette phase 3 est référencée P3 sur la figure 1. Le système mettant en oeuvre le 5 procédé génère automatiquement une arborescence 8 (aussi appelée organisation arborescente) permettant de classer les documents automatiquement. Les noeuds de cette arborescence, correspondant à des répertoires, sont générés et nommés automatiquement. Phase 4 : Modification du plan de classement par interaction entre l'utilisateur et 10 le système. Cette phase 4 est référencée P4 sur la figure 1. L'utilisateur peut ensuite modifier l'arborescence pour la corriger ou la faire évoluer. Il peut notamment utiliser des interactions classiques d'un système de gestion de fichier de type Windows Explorer (rnarque déposée). Il peut notamment faire les 15 actions suivantes : il peut déplacer des documents d'un répertoire ou sous-répertoire à un autre ; il peut changer les labels de certains documents ; il peut ajouter de nouveaux documents, labellisés ou non, qui seront automatiquement rangés dans l'arborescence. Le système s'adaptera et adaptera la classification automatiquement, soit par une adaptation locale, soit en repassant par la phase 2. Le système permet donc de limiter la charge de travail de l'utilisateur pour organiser ses documents : en ne labellisant qu'une partie des documents, l'utilisateur pourra ensuite générer une arborescence de classement de manière automatique. Cette arborescence de classement prendra en compte l'aspect multi-labels des documents ; en modifiant la classification obtenue de manière interactive, l'utilisateur provoquera une adaptation de l'arborescence de classement pour qu'elle corresponde à ses souhaits. Ici aussi, un minimum d'actions sera nécessaire car 20 25 2910661 13 le système adapte son arborescence de classement (aussi appelée plan de classement) automatiquement. Par rapport à un système totalement automatique, ce système a l'avantage de permettre de corriger et de manipuler un résultat de manière interactive, pour obtenir 5 réellement l'arborescence de classement souhaitée. 5.2 Vectorisation des documents Nous proposons un mode de réalisation particulier qui, sans être limitatif quant aux techniques employées, montre la faisabilité de l'invention. Ce mode de réalisation est tout d'abord basé sur une représentation vectorielle des documents. A partir de cette 10 représentation, chaque document est vu par le système comme un vecteur de RN. L'ensemble des documents sera donc représenté, après la phase de vectorisation, par une matrice D contenant en ligne la description vectorielle de chaque document, en colonne les propriétés mesurées sur ces documents, et à l'intersection d'une ligne i et d'une colonne j la valeur D1 correspondant à la propriété j pour le document i. 15 En d'autres termes, la phase de vectorisation consiste à transformer un document (déjà sous forme de fichier informatique) en un vecteur de valeurs numériques (vecteur dans RN). Par exemple, pour une image, cela peut consister en un ensemble de mesures, de fonctions appliquées à la représentation bitmap de l'image. Pour un document texte, cela peut être effectué en calculant, pour un ensemble de mots prédéfinis, les comptes 20 de mots de chaque document. Pour un enregistrement d'une base de données, cela peut consister en des recodages numériques de l'enregistrement, en passant par un codagedisjonctif complet des éventuelles valeurs non numériques. La vectorisation des objets est considérée comme un domaine connu de l'état de l'art, typique et particulier de chaque domaine d'application (traitement du signal, text-mining, image, data- 25 mining,...), et n'est donc pas ici précisée davantage. Un schéma présentant la phase de vectorisation est donné en figure 2. On considère une unité de stockage des documents 21. Tant qu'il existe un document vectorisé (étape 22), on sélectionne un document d non vectorisé (étape 23) puis on choisit une méthode de vectorisation en fonction du type du document (étape 24). On 2910661 14 obtient un vecteur Vd de RN qui peut être stocké dans une unité de stockage des documents vectorisés 26. 5.3 Labellisation de certains documents par l'utilisateur Le système mettant en oeuvre le procédé d'organisation permet à l'utilisateur de 5 labelliser certains documents via une IHM qui peut par exemple s'apparenter à celle décrite ci-dessous en relation avec la figure 14. Cela revient, au niveau du système, à associer à certains documents des labels. Par exemple, une image de vacances au bord de mer pourra être labellisée par l'utilisateur avec les labels "vacances", "plage" et "Atlantique", tandis qu'une image 10 d'anniversaire en famille pourra être labellisée avec les labels "Anniversaire", "Jean", etc. Le dispositif d'IHM et la gestion de documents associée, qui permettent la labellisation de documents par un utilisateur, sont considérés comme simple et facilement mise en oeuvre par l'homme du métier et ne sont donc pas davantage décrits. 15 Notons que dans le mode de réalisation décrit par la suite, tout document d a pour chaque label x une valeur associée -1, 0 ou +1 qui correspond respectivement à "l'utilisateur a explicitement dit que le document d n'avait pas ce label x", "on ne connaît pas l'affectation du document d au label x", et "l'utilisateur a donné au document d le label x". 20 5.4 Labellisation automatique des documents On suppose maintenant que l'utilisateur dispose d'un ensemble E={d,,...,dQ} de documents dont un sous-ensemble T={t,,... ,tk} est labellisé par différents labels. On note L={xc,...,xM} l'ensemble de tous les labels utilisés. Pour chaque document d de T et pour chaque label x de L, le document d est 25 dans l'un des trois cas suivants : • soit il a été labellisé par l'utilisateur avec le label x, et on notera que d,,=+1 ; • soit l'utilisateur a explicitement ou implicitement demandé au système de retirer le label x au document d, et on notera que dx= 1. Consulter le paragraphe 5.6 cidessous pour voir comment sont introduits les labels négatifs ; 2910661 15 • soit vis-à-vis du label x le document n'a actuellement aucune information de la part de l'utilisateur ; on notera dX=0. Une façon pratique de représenter les documents et les labels affectés par l'utilisateur est une matrice avec en ligne les documents et en colonne les différents 5 labels. Soit U la matrice des documents et labels affectés par l'utilisateur. On définit une deuxième matrice, qui va correspondre aux labels affectés aux documents automatiquement par le système. On note V la matrice des documents et labels générés automatiquement par le système. La matrice V est générée de la façon suivante : 10 V1 = U1 si U;j e 0 Vi,j = P(i,j) si U,d = 0 avec i : un document, j : un label, P : une fonction de prédiction. P(i,j) est une fonction de prédiction, implémenté par un module de prédiction automatique des labels en fonction de la description des documents (matrice D). P(i,j) 15 donne la prédiction que le document i ait ou n'ait pas le label j. P(i,j) renvoie une valeur entre -1 et +1. Pour les valeurs négatives, le document i ne contient pas le label j. Pour les valeurs positives, le document i contient le label j. Plus une valeur P(i, j) est négative moins il est probable que le document ait le label j, et réciproquement plus la valeur est positive, plus il est probable que le document ait le label j. 20 Plusieurs méthodes permettent de mettre en oeuvre une fonction de prédiction pour tout document et tout label. Une méthode simple est de construire un modèle de classification C~ de type réseau de neurone, ou arbre de décision, ou réseau bayesien pour chaque label j. A chaque label j est donc associé un modèle créé sur la base des exemples décrits par la matrice D et avec pour variable cible le label j. Pour chaque 25 valeur nulle de U;j on utilise le modèle de classification Ci qui prend en entrée la description du document i dans D et qui retourne la valeur prédite pour j pour affectation dans la matrice V. Pour mettre en oeuvre un modèle prédictif (aussi appelé modèle de classification) associé à chaque label j en utilisant les données descriptives de la matrice D, l'homme 30 du métier peut par exemple se référer au document suivant : Ian H. Witten, Eibe 2910661 16 Frank: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Morgan Kaufmann 1999 . La figure 3 illustre la création de Q modèles prédictifs (aussi appelés classifieurs), un pour chaque label, pour pouvoir affecter des labels aux documents en 5 fonction de leurs seules informations descriptives de départ (issues de leurs représentations préalables sous forme de vecteurs). On dispose des matrices D et U. Pour construire un modèle prédictif Px associé au label x, on utilise les données descriptives de la matrice D comme variables descriptives et les données de la x-ième colonne de la matrice U. Ce modèle prédictif Px est capable de prédire la probabilité du 10 label x pour tout document, en fonction des données descriptives de ce document. A titre d'exemple, le cadre référencé 311 symbolise la génération d'un modèle prédictif associé au label 1, à partir de la matrice D et la colonne 1 (référencée 301) de la matrice U. De même, le cadre référencé 31Q symbolise la génération d'un modèle prédictif associé au label Q, à partir de la matrice D et la colonne Q (référencée 30Q) de la 15 matrice U. Le cadre référencé 32 sur la figure 3 symbolise les Q modèles prédictifs obtenus, un spécialisé pour chaque label. La figure 4 illustre la construction de la matrice V intégrant les labels fournis par l'utilisateur et ceux inférés par le système. On utilise l'ensemble (référencé 32 comme sur la figure 3) des Q modèles prédictifs 32,,...32Q, pour générer la matrice V à 20 partir des matrices D et U. La figure 5 présente un schéma global de cette phase de labellisation automatique des documents. On suppose que la phase de vectorisation a déjà été effectuée (pré-codage des documents). On dispose d'un ensemble de documents (référencé 6 comme sur la figure 1) dont un premier sous-ensemble 6A contient des 25 documents non labellisés et un second sous-ensemble 6B contient des documents labellisés. Dans une étape 51, on constitue la matrice U des documents avec au moins un label. dans une étape 52, on crée des modèles prédictifs automatiques à partir de la matrice U et de la matrice D. Dans une étape 53, on génère la matrice V intégrant les labels utilisateurs et les labels prédits par le système. Dans une étape 54, on génère une 30 matrice W par seuillage de la matrice V avec le paramètre alpha (voir paragraphe 5.5 ci- 2910661 17 après). Dans une étape 55, on adapte éventuellement la matrice W par modification du seuil alpha, pour limiter le nombre de labels par document. 5.5 Construction de l'arborescence Pour construire l'arborescence, on utilise le résultat du seuillage de la matrice V 5 dans une matrice W. Un seuil alpha (modifiable par l'utilisateur) permet de déterminer quels seront les labels réellement considéré comme positif. Par défaut, alpha=0,5. La matrice V est seuillée selon le principe suivant : Début pour chaque cellule V1 : 10 si V;j>alpha alors Ww=1 sinon W,i=O Fin pour Fin seuillage. La construction de l'arborescence est effectuée de manière récursive par niveaux. Au premier niveau, tous les documents se situent dans un répertoire unique qui 15 constitue la racine R de l'arbre en construction. Chaque document est associé à un ou plusieurs labels selon la matrice W. 5.5.1 Arborescente complète On intègre ensuite une variante de l'algorithme de Borda (voir le document suivant : Bordat, J.P., calcul pratique du treillis de Galois d'une correspondance, Math. 20 Sci. Hum., 1986, no. 96, pp 31-47 ), appliquée au contenu de la matrice W, et décrite ci-après. L'ensemble des labels L={x,,...,x,n} va donner naissance à un ensemble de sous-répertoires {R1,...,R,n} directs du répertoire racine (c'est-à-dire à des noeuds fils), chaque nouveau sous répertoire R; correspondant à exactement un label x; de L. 25 A ce stade, un seuil gamma fixe le nombre maximal de répertoires ou de sous-répertoire qu'on peut avoir à un noeud donné de l'arborescence. Si le nombre de labels dépasse gamma, alors on trie les labels par ordre décroissant de fréquence. On sélectionne les gamma premiers labels. Les autres labels sont temporairement masqués. Un répertoire complémentaire nommé "Autre" est créé pour pouvoir affecter chaque 30 document ayant un label temporairement masqué. 2910661 18 Chaque répertoire va contenir les documents labellisés par le label correspondant. Un même document va alors pouvoir appartenir à plusieurs sous-répertoires, si plusieurs labels différents sont associés à celui-ci. Une nouvelle fonction de labellisation va alors être définie pour chaque nouveau répertoire de la manière 5 suivante. Pour tout document d dans le répertoire R;, les labels LR;(d)={L(d)-{x;}} sont associés. C'est-à-dire que, dans un répertoire R; quelconque de l'arborescence correspondant à un label x;, l'ensemble des labels (aussi appelé jeu de labels) de chaque document d dans R, est l'ensemble des labels de d dans le répertoire père de R; auquel le label x; a été retiré. Si un document possède un ensemble vide de labels dans un 10 répertoire de l'arborescence (ce qui est le cas de tous les documents à un certain niveau de l'arborescence, puisqu'en descendant d'un niveau au niveau inférieur, chaque document perd un des ses labels), celui-ci reste dans ce répertoire courant et n'est donc pas déplacé dans des sous-répertoires fils. Ce processus est ensuite itéré sur chaque sous répertoire R;, en prenant en compte les nouveaux labels. Un répertoire ne contenant que 15 des documents sans label ne donne naissance à aucun sous-répertoire. La figure 6 présente un schéma global de cette phase de construction d'une arborescence. On dispose d'un ensemble de documents labellisés (référencé 7 comme sur la figure 1). Dans une étape 61, on place une copie de l'ensemble des documents labellisé dans un répertoire racine R. Pour chaque label x non contenu dans un autre 20 label x' (étape de condition 62), on construit un nouveau sous-répertoire RX, dans la limite de gamma sous-répertoires (étape 63). Pour chaque document d labellisé par x selon la matrice W (étape de condition 64), on place le document d dans le sous-répertoire RX (étape 65) puis on supprime le label x du jeu de labels associé au document d contenu dans le sous-répertoire RX (étape 66). Enfin, on applique de 25 manière récursive la construction de l'arborescence au sous-répertoire RX (étape 67) Prenons l'exemple suivant d'un ensemble D de six documents, D={d,d,,...,d5}, labellisés avec cinq labels (x1, x2, x3, x4, x5). On a plus précisément les jeux de labels suivants : L(d)={xl, x2, x5}, L(dl)={x,, x3}, L(d2)={x2, x5}, L(d3)={x,, x4}, L(d4)={x,} et L(d5)={x3}. 2910661 19 La figure 7 montre le répertoire racine R contenant ces six documents. Le jeu de labels x; associé à chaque document est également représenté. Comme illustré sur la figure 8, la première étape de construction de l'arborescence donne cinq sous-répertoires R1, R2, R3, R4 et R5, correspondant 5 respectivement aux labels x1, x2, x3, x4 et x5. R, contient les documents d, d1, d3 et d4 avec les labels LR1(d)={x2, x5}, LR,(d,)={x3}, LR,(d3)={x4} et LR,(d4)={}. R2 contient les documents d et d2 avec les labels LR2(d)={xl, x5} et LR2(d2)={x5}. R3 contient les documents d, et d5 avec les labels LR3(d1)={x1} et LR3(d5)={ }. R4 contient le document d3, avec le label LR4(d3)={xl}. R5 contient les documents d et d2, avec les labels 10 LR5(d)={xl, x2} et LR5(d2)={x2}. Le processus de construction de l'arborescence continue sur le niveau suivant. Ainsi, le sous-répertoire R, donne naissance aux sous-répertoires R12, R13, R14 et R15. R12 contient le document d de label x5. R13 contient le document d, sans label. R14 contient le document d3 sans label. R15 contient le document d de label x2. Le sous-répertoire R2 15 donne naissance aux sous-répertoires R21 et R25. R21 contient les documents d de label x5. R25 contient les documents d de label x1 et d2 sans label. Le sous-répertoire R3 donne naissance au The management of electronic documents made available to a user of a computerized system becomes a central issue. Indeed, the ever increasing number of these documents, as well as their sources (result of a query to a search engine, reception of mails, loading of digital cameras, purchases of music files,. . . ), makes it difficult to use without the establishment, by the user, of an organization of his data. 2. PRIOR ART Like any computer file, the electronic documents can be organized manually in the form of a tree structure. In a file management system such as the Windows File Explorer (marked filed), the user creates directories and subdirectories, then distributes all his documents, by successive displacements of each document, or groups of selected documents. This method brings the advantage of an organization perfectly matching the desires of the user: each document is placed in the chosen directory, directory itself created and placed in the tree by the user. Each directory then contains a set of documents and / or sub-directories, representing a homogeneous group according to one or more criteria specific to the user. As examples, photos can be classified by themes (events, people ,. . . ), emails by dates or by source, music files by performers ,. . . In a complementary manner, many electronic document management systems allow a user to label the documents. A document may be associated with one or more labels (typically a word representative of the data contained in the document). For example, photo sharing websites (Yahoo Flikr (registered trademark), AOL Pictures (registered trademark) ,. . . ) or the personal photo management software (Google Picasa (marked filed), Microsoft Digital Image Suite 2006 (marked filed)) offer this possibility which goes beyond the strict framework of conventional file management. A set of image files labeled with a common label is naturally a coherent whole and, consequently, a set of labels constitutes an organization of the labeled documents. The systems mentioned above all offer the possibility of displaying all the documents labeled by a given label, in the same way as the documents of the same directory can be displayed. In addition, this organization of data from labels has the property of multi-membership: the same document can be tagged by several different labels. This document will appear each time one of its labels will be used as a criterion for displaying the data. Unlike the manual organization of the data (in the aforementioned case of a file management system such as the Windows Explorer (marked filed) XP), this organization is purely logical: the documents remain in their storage unit initialization and labeling does not cause any file movement. The organization is visible only on display and, in particular, a document to which multiple labels are assigned is not copied, although it may appear in several sets of 20 separate documents. Two distinct known data organization techniques have been presented above: a manual technique via a conventional hierarchical file management system and a manual technique using file-related labels (with logical organization). The manual technique via a conventional hierarchical file management system has the advantage of offering the user an organization of his data that corresponds to his wishes since he has built each directory and subdirectory and has moved each document in the appropriate directory. The major drawback of this technique appears when the number of documents becomes important. Indeed, to build such an organization, each document must be identified by the user, so as to determine which directory it should be moved (and possibly create this directory). Beyond a few dozen documents, it becomes particularly tedious to identify each document, especially if they are distributed in different storage systems. The other aforementioned known technique, using the labels, does not solve the problem of the need to take into account each document one by one, since at least one label must be associated with each document by the user. In addition, the current systems offer only one level of organization: they allow to display the set of documents associated with a particular label (which can be seen as a directory), but this representation is not recursive. It is impossible to select, within a directory corresponding to a label x, all the documents labeled by a label x2 also (which would constitute a sub-directory of the directory constituted by the label x,). . There is currently no method for managing multi-label documents, classifying them automatically by managing multi-membership and allowing labels to be inferred from unlabeled documents, and maintaining the hierarchy thus constructed through a real-time HMI (Human Machine Interface). 3. SUMMARY OF THE INVENTION In a particular embodiment of the invention, there is provided a method for organizing electronic documents from a set of documents comprising labeled documents, a labeled document being associated with a set of documents. labels. At least one labeled document is associated with a set of labels of at least two labels. The method comprises the following steps: a) labeling of the non-labeled documents of all the documents, by prediction according to the labeled documents and descriptive data of the documents; from a current directory associated with a set of labeled documents: b) creation of a sub-directory by separate label among all the labels associated with the documents in the current directory; by subdirectory created: c) associating the documents of the current directory whose label set comprises at least one label corresponding to said created subdirectory; 2910661 4 d) modification of the sets of labels of the documents of the new sub-directory by subtraction of the label corresponding to the sub-directory; from each sub-directory created, becoming current directory: e) iteration of steps b), c) and d) until subdirectories associated with only 5 documents whose set of labels is empty. Thus, in this particular embodiment, the invention is based on a completely new and inventive approach combining an automatic labeling of documents with an automatic construction of a tree organization from the labels of the labeled documents. Indeed, unlike the known manual technique based on the use by a user of a conventional hierarchical file management system (see discussion above), it is not the user who is the tedious task to build a tree organization from a current directory. The construction of the tree structure is performed automatically and recursively by levels, taking into account the labels associated with the electronic documents. The present invention thus allows a saving of time in the construction of the tree organization and an optimization of the resources of the computerized system in which is implemented the technique of organizing electronic documents according to the invention. In addition, thanks to the automatic labeling of unlabeled documents, the user does not have to label all the documents, which avoids tedious work when the number of documents becomes important. In the present description, it is considered synonymous that a document is associated with a subdirectory and the fact that this document is contained (or placed) in this subdirectory. Advantageously, by associating a given document with a subdirectory, the fact that said given document is physically moved from an initial storage unit to said given subdirectory. In this case, the tree organization is generated automatically but then used as a manually created tree by the user with a conventional hierarchical file management system. According to an advantageous variant, by associating a given document with a subdirectory, the fact that said given document remains in an initial storage unit and that a unique identifier of said given document is placed in said subdirectory. given directory. In this variant, the tree organization is a logical organization whose subdirectories contain only identifiers of electronic documents. The identifier of each document is unique and defines the complete path to the storage location of this document. For example, with a centralized storage system such as a hard disk, the identifier defines the path from the root of the hard disk to the subdirectory containing the document (i.e. storage system tree). With a decentralized storage system, the identifier is for example a URL indicating the path of the document on the Internet. Advantageously, step b) is preceded by the following step: if the number of different labels associated with the documents in the current directory is greater than a predetermined threshold S, then the labels are sorted in descending order of frequency and selected the first S labels, only the selected labels being taken into account to perform the following steps. In this way, it reduces the processing resources (processor and memory in particular) necessary for the construction of the tree organization. Advantageously, if there are first and second labels such that: the following condition is satisfied: for all the labeled electronic documents whose label set comprises said second label, said set of labels also includes said first label, and the following condition is satisfied: for all the labeled electronic documents whose label set includes said first label, said label set also includes said second label, then one of said first and second labels is deleted only during said steps b) to e), each electronic document labeled with all its labels at the end of said steps b) to e). 2910661 6 Taking into account a first type of inclusion of labels makes it possible to simplify, by pruning, the tree organization obtained and to optimize the processing resources necessary for the construction of this tree organization. Advantageously, if there are first and second labels such that: the following condition is satisfied: for all the labeled electronic documents whose label set comprises said second label, said set of labels also includes said first label, and the following condition is not satisfied: for all the labeled electronic documents whose label set comprises said first label, said set of labels 10 also includes said second label, then, during said steps b) to e), said second label label only creates a subdirectory under the subdirectory from the first label. This taking into account of a second type of inclusion of labels also makes it possible to simplify, by pruning, the tree organization obtained and to optimize the processing resources necessary for the construction of this tree organization. Advantageously, said labeling step comprises a step of generating a matrix V in the following way: - V; j = UijsiU ;. ie 0 = V; j = P (i, j) if U; j = 0 with i: an electronic document among the set of electronic documents, j: a label among a set of possible labels, P: a function of prediction according to descriptive data of the electronic documents, and U: a matrix of the electronic documents and labels assigned by a user, such that U; j ~ 0 means that the user has decided to associate or not the label j to the document 25 electronic i, and U; j = 0 means that the user has not given any opinion on whether or not the label j should be associated with the electronic document i. The use of such a matrix V simplifies the calculations related to automatic labeling. Advantageously, U; = 1, if the user has associated the label j with the electronic document i. U; j = -1, if the user has decided not to associate the label j with the electronic document i. P (i, j) returns a value between -1 and +1, plus the value of P (i, j) being negative minus it is likely that the electronic document i is associated with the label j, and vice versa the value if P (i, j) is positive, it is likely that the electronic document i will be associated with the label j. Advantageously, said labeling step comprises a thresholding step 5 of said matrix V in a matrix W as follows: if V; j> alpha then W;, j = 1 otherwise W; j = 0, with alpha a determined threshold between 0 and 1. Only non-zero elements W; J are used in said steps b) to e). Thus, the alpha threshold makes it possible to determine which are the labels actually used in the construction of the tree. According to an advantageous characteristic, the method comprises a step of limiting the number of labels automatically allocated to an electronic document. In a particular embodiment, this limitation is implemented thanks to the aforementioned alpha thresholding. The threshold alpha is in this case modifiable by the user, who can thus, dynamically, limit the number of labels automatically assigned to a document during the automatic labeling step. Advantageously, said steps b) to e) are followed by the following steps: modification by a user, via a man / machine interface, of at least one set of labels among the sets of labels associated with the labeled electronic documents; Deleting a tree organization resulting from the execution of said steps b) to e); new execution of said steps b) to e), taking into account said at least one set of labels modified by the user. Thus, a first mechanism is offered to the user to modify a tree organization already obtained, in order to correct or change it. This first mechanism is based on a new calculation of the tree. According to an advantageous variant, said steps b) to e) are followed by the following steps: modification by a user, via a man / machine interface, of at least one set of labels among the sets of labels associated with the labeled electronic documents; 2910661 8 -for each labeled electronic document whose associated set of labels has been modified by the user, said modified document: * said modified document is deleted from each directory or sub-directory with which it is associated,; 5 * then reclassifies said modified document starting from a root directory: it associates said modified document with each sub-directory whose corresponding label is included in the set of labels associated with said modified document, then for each subdirectory to which the modified document has been associated, recursively the association is repeated in sub-directories of lower levels, if they exist. Thus, a second mechanism is offered to the user to modify a tree organization already obtained. This second mechanism is based on a local adaptation of the tree, without it being completely recalculated. Advantageously, said step of modifying by a user of at least one set of labels is carried out by: a move / paste, carried out by the user via a graphical interface, of a representative of the document to be modified to a target subdirectory associated with documents whose label sets include one or more desired labels; and an automatic assignment to said document to be modified from the label corresponding to said target directory, replacing the label or labels previously assigned to said document to be modified. In this way, the user can easily modify labels. Advantageously, said step of modifying by a user of at least one set of labels is carried out by: a selection, made by the user via the man / machine interface, of at least one subdirectory to remove ; for each sub-directory to be deleted, an automatic deletion in the set of labels of the electronic documents labeled: the label corresponding to the sub-directory to be deleted, as well as the label corresponding to each subdirectory located between the sub-directory; directory to delete and a root directory. Again, the user can easily modify labels. In another embodiment, the invention relates to a computer program product downloadable from a communication network and / or recorded on a computer readable medium and / or executable by a processor, this computer program product comprising program code instructions for performing the steps of the aforesaid method (in at least one of its various embodiments), when said program is executed on a computer. In another embodiment, the invention relates to a device for organizing electronic documents from a set of documents comprising labeled documents, a labeled document being associated with a set of labels. At least one labeled document is associated with a set of labels of at least two labels. The device comprises: a) means for labeling the non-labeled documents of all the documents, by prediction according to the labeled documents and descriptive data of the documents; B) creation means, applied to a current directory associated with a set of labeled documents, of a sub-directory per distinct label among the set of labels associated with the documents of the current directory; c) association means, applied to each created sub-directory, documents of the current directory whose label set comprises at least one label corresponding to said created sub-directory; d) means for modifying the sets of labels of the documents of the new subdirectory by subtracting the label corresponding to the subdirectory; e) the means b), c) and d) being applied iteratively for each sub-directory created, becoming current directory, until subdirectories associated only with documents whose set of labels is empty. More generally, the organization device comprises means for implementing the organization method as described above (in any one of its various embodiments). In another embodiment, the invention relates to multimedia electronic equipment comprising means for storing multimedia documents, and means for implementing the method of organizing documents as described above (in one any of its different embodiments). 4. LIST OF FIGURES Other features and advantages of embodiments of the invention will become apparent on reading the following description, given by way of indicative and nonlimiting example (not all the embodiments of the invention are limited to the features and advantages of the embodiments described hereinafter), and the accompanying drawings, in which: FIG. 1 shows an overall diagram of a particular embodiment of the organizing method according to the invention; FIG. 2 presents a diagram of a vectorisation phase of the electronic documents, included in a particular embodiment of the organization method according to the invention; FIG. 3 shows a diagram of a phase of creation of Q predictive models, included in the automatic labeling phase of FIG. 5; FIG. 4 presents a diagram of a phase of construction of a matrix V, included in the automatic labeling phase of FIG. 5; FIG. 5 presents a diagram of an automatic labeling phase for electronic documents, included in a particular embodiment of the organizing method according to the invention; FIG. 6 is a diagram of a construction phase of a tree structure included in a particular embodiment of the organization method according to the invention; Figures 7, 8 and 9 show the results of three successive steps of construction of an exemplary tree, by implementing the construction phase illustrated in Figure 6; FIG. 10 illustrates the result of taking into account label inclusions in the tree of FIG. 9; Fig. 11 illustrates the result of taking into account a limitation of the number of labels in the tree of Fig. 10; Figure 12 illustrates an example of a user moving a document in the tree of Figure 11; Fig. 13 illustrates the tree resulting from the displacement illustrated in Fig. 12; FIG. 14 shows an exemplary HMI of a system implementing a particular embodiment of the organization method according to the invention; and Figure 15 shows the structure of an organization device according to a particular embodiment of the invention. 5. DETAILED DESCRIPTION In all the figures of this document, identical elements are designated by the same numerical reference. 5. In a particular embodiment, illustrated by the overall diagram of FIG. 1, the organization method according to the invention comprises the following steps (also called phases thereafter): Phase 0: Prior encoding of documents (vectorization). This first phase is not illustrated in FIG. It consists of encoding each document to represent it in a vector form. This type of encoding is known to those skilled in the art when performing automatic classification tasks. Phase 1: Definition of labels for certain documents. This phase 1 is referenced P1 in FIG. The user defines labels and associates them with certain documents he chooses. Note that the user is not obliged to label all his documents, a small proportion will suffice. Thus, from a set of stored documents 5, a new set of documents 6 is obtained, of which a first subset 6A contains non-labeled documents and a second subset 6B contains labeled documents. Phase 2: Automatic labeling of all documents by the system. This phase 2 is referenced P2 in FIG. The system implementing the method automatically labels the documents (it generalizes all the possible labels 2910661 12 to documents already labeled and those without a label) through a multi-label classification module. We obtain a set of documents all labeled 7. Phase 3: Automatic generation of the classification plan by the system This phase 3 is referenced P3 in FIG. The system implementing the method automatically generates a tree 8 (also called a tree organization) for classifying the documents automatically. The nodes of this tree, corresponding to directories, are generated and named automatically. Phase 4: Modification of the classification scheme by interaction between the user and the system. This phase 4 is referenced P4 in FIG. The user can then modify the tree to correct or change it. In particular, it can use conventional interactions of a file management system of the Windows Explorer type (registered register). In particular, it can do the following actions: it can move documents from one directory or subdirectory to another; he can change the labels of certain documents; it can add new documents, labeled or not, which will be automatically placed in the tree. The system will adapt and adapt the classification automatically, either by local adaptation or by going back to phase 2. The system therefore makes it possible to limit the workload of the user in order to organize his documents: by labeling only a part of the documents, the user can then generate a classification tree automatically. This classification tree will take into account the multi-label aspect of the documents; by modifying the classification obtained interactively, the user will cause an adaptation of the classification tree to match his wishes. Here too, a minimum of actions will be required as the system adjusts its ranking tree (also called the ranking scheme) automatically. Compared to a fully automatic system, this system has the advantage of being able to interactively correct and manipulate a result to actually obtain the desired ranking tree. 5. 2 Vectorization of documents We propose a particular embodiment which, without being limiting as to the techniques employed, shows the feasibility of the invention. This embodiment is firstly based on a vector representation of the documents. From this representation, each document is seen by the system as a vector of RN. The set of documents will therefore be represented, after the vectorisation phase, by a matrix D containing online the vector description of each document, in column the properties measured on these documents, and at the intersection of a line i and d a column j the value D1 corresponding to the property j for the document i. In other words, the vectorisation phase consists in transforming a document (already in the form of a computer file) into a vector of numerical values (vector in RN). For example, for an image, this can consist of a set of measurements, functions applied to the bitmap representation of the image. For a text document, this can be done by calculating, for a set of predefined words, the word counts of each document. For a database record, this may consist of digital recoding of the record, passing through a complete encode coding of any non-numeric values. The vectorization of the objects is considered as a known domain of the state of the art, typical and particular of each application domain (signal processing, text-mining, image, data mining ,. . . ), and is therefore not further specified here. A diagram presenting the phase of vectorization is given in FIG. A document storage unit 21 is considered. As long as there is a vectorized document (step 22), a non-vectorized document d is selected (step 23) and then a vectorization method is chosen according to the type of the document (step 24). An RN Vd vector is obtained that can be stored in a vectorized document storage unit 26. 5. 3 Labeling of certain documents by the user The system implementing the organization method allows the user to label certain documents via an HMI which can for example be similar to that described below in connection with FIG. 14. At the system level, this means associating certain documents with labels. For example, an image of holidays at the seaside may be labeled by the user with the labels "holidays", "beach" and "Atlantic", while a 10 anniversary family image may be labeled with the labels "Anniversary", "John", etc. The HMI device and the associated document management, which allow the labeling of documents by a user, are considered as simple and easily implemented by those skilled in the art and are therefore not further described. Note that in the embodiment described below, any document da for each label x a value associated with -1, 0 or +1 which respectively corresponds to "the user explicitly said that document d did not have label x "," we do not know the assignment of the document d to the label x ", and" the user has given the document d the label x ". 5. 4 Automatic Labeling of Documents It is now assumed that the user has a set E = {d ,,. . . , dQ} of documents including a subset T = {t ,,. . . , tk} is labeled by different labels. We denote L = {xc ,. . . , xM} all of the labels used. For each document d of T and for each label x of L, the document d is 25 in one of the following three cases: • either it has been labeled by the user with the label x, and it will be noted that d ,, = + 1; • either the user explicitly or implicitly asked the system to remove the label x from document d, and note that dx = 1. See paragraph 5. 6 below to see how negative labels are introduced; 2910661 15 • vis-à-vis the label x the document currently has no information from the user; we will denote dX = 0. A convenient way to represent the documents and labels assigned by the user is a matrix with online documents and in column the different 5 labels. Let U be the matrix of documents and labels assigned by the user. We define a second matrix, which will correspond to the labels assigned to the documents automatically by the system. We denote by V the matrix of documents and labels generated automatically by the system. The matrix V is generated in the following way: V1 = U1 if U, I 0 Vi, j = P (i, j) if U, d = 0 with i: a document, j: a label, P: a function prediction. P (i, j) is a prediction function, implemented by an automatic label prediction module according to the description of the documents (matrix D). P (i, j) gives the prediction that the document i has or does not have the label j. P (i, j) returns a value between -1 and +1. For negative values, the document i does not contain the label j. For positive values, the document i contains the label j. The less a value P (i, j) is the less likely it is that the document has the label j, and conversely the more positive the value, the more likely it is that the document has the label j. Several methods make it possible to implement a prediction function for any document and any label. A simple method is to build a classification model C ~ type neuron network, or decision tree, or Bayesian network for each label j. To each label j is therefore associated a model created on the basis of the examples described by the matrix D and with target variable label j. For each null value of U, j the classification model Ci is used which takes as input the description of the document i in D and which returns the predicted value for j for assignment in the matrix V. To implement a predictive model (also called classification model) associated with each label j using the descriptive data of the matrix D, the skilled person can for example refer to the following document: Ian H. Witten, Eibe 2910661 16 Frank: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Morgan Kaufmann 1999. Figure 3 illustrates the creation of Q predictive models (also called classifiers), one for each label, to be able to assign labels to documents based on their only descriptive starting information (derived from their prior representation as vectors). We have matrices D and U. To construct a predictive model Px associated with the label x, the descriptive data of the matrix D are used as descriptive variables and the data of the x-th column of the matrix U. This predictive model Px is able to predict the probability of the x label for any document, based on the descriptive data of this document. For example, the referenced frame 311 symbolizes the generation of a predictive model associated with the label 1, from the matrix D and the column 1 (referenced 301) of the matrix U. Similarly, the frame referenced 31Q symbolizes the generation of a predictive model associated with the label Q, from the matrix D and the column Q (referenced 30Q) of the matrix U. The frame referenced 32 in FIG. 3 symbolizes the Q predictive models obtained, one specialized for each label. Figure 4 illustrates the construction of the matrix V integrating the labels provided by the user and those inferred by the system. The set (referenced 32 as in FIG. 3) of the Q predictive models 32 ,, is used. . . 32Q, to generate the matrix V from matrices D and U. Figure 5 shows an overall diagram of this phase of automatic labeling of documents. It is assumed that the vectorization phase has already been performed (pre-coding of documents). There is a set of documents (referenced 6 as in FIG. 1) of which a first subset 6A contains non-labeled documents and a second subset 6B contains labeled documents. In a step 51, the matrix U of the documents with at least one label is constituted. in a step 52, automatic predictive models are created from the matrix U and the matrix D. In a step 53, the matrix V is generated integrating the user labels and the labels predicted by the system. In a step 54, a matrix W is generated by thresholding the matrix V with the parameter alpha (see section 5. 5 below). In a step 55, the matrix W is optionally adapted by modifying the alpha threshold, to limit the number of labels per document. 5. 5 Construction of the tree structure To build the tree structure, the result of the thresholding of the matrix V 5 is used in a matrix W. An alpha threshold (modifiable by the user) makes it possible to determine which labels will actually be considered positive. By default, alpha = 0.5. The matrix V is thresholded according to the following principle: Start for each cell V1: 10 if V; j> alpha then Ww = 1 else W, i = O End for End thresholding. The construction of the tree is done recursively by levels. At the first level, all the documents are in a single repertoire which constitutes the root R of the tree under construction. Each document is associated with one or more labels according to the matrix W. 5. 5. 1 Complete Arborescente We then integrate a variant of the Borda algorithm (see the following document: Bordat, J. P. , a practical calculation of the Galois lattice of a correspondence, Math. Sci. Hmm. , 1986, no. 96, pp. 31-47), applied to the contents of the matrix W, and described hereinafter. The set of labels L = {x ,,. . . , x, n} will give birth to a set of subdirectories {R1 ,. . . , R, n} direct from the root directory (that is to say to child nodes), each new sub-directory R; corresponding to exactly one x label; of the. At this point, a gamma threshold sets the maximum number of directories or subdirectories that can be held at a given node of the tree. If the number of labels exceeds gamma, then we sort the labels in descending order of frequency. We select the first gamma labels. The other labels are temporarily hidden. A complementary directory named "Other" is created to be able to assign each document having a temporarily hidden label. 2910661 18 Each directory will contain the documents labeled by the corresponding label. The same document will then be able to belong to several subdirectories, if several different labels are associated with this one. A new labeling function will then be defined for each new directory in the following manner. For any document d in the directory R ;, the labels LR; (d) = {L (d) - {x;}} are associated. That is, in a directory R; any of the tree corresponding to a label x ;, the set of labels (also called set of labels) of each document d in R, is the set of labels of d in the parent directory of R; to which the label x; has been removed. If a document has an empty set of labels in a directory of the tree (which is the case of all the documents at a certain level of the tree, since going down from one level to the lower level, each document loses one of its labels), it remains in this current directory and is not moved to child subdirectories. This process is then iterated on each sub-directory R, taking into account the new labels. A directory containing only unlisted documents does not give rise to any subdirectories. Figure 6 shows an overall diagram of this phase of construction of a tree. We have a set of labeled documents (referenced 7 as in Figure 1). In a step 61, a copy of all the labeled documents is placed in a root directory R. For each label x not contained in another label x '(condition step 62), a new RX subdirectory is constructed, within the gamma subdirectory limit (step 63). For each document labeled x according to the matrix W (condition step 64), the document d is placed in the sub-directory RX (step 65) and then the label x is deleted from the set of labels associated with the document d contained in the document. RX subdirectory (step 66). Finally, the construction of the tree structure is applied recursively to the RX subdirectory (step 67). Let us take the following example of a set D of six documents, D = {d, d ,,. . . , d5}, labeled with five labels (x1, x2, x3, x4, x5). We have more precisely the following sets of labels: L (d) = {xl, x2, x5}, L (d1) = {x ,, x3}, L (d2) = {x2, x5}, L (d3) = {x ,, x4}, L (d4) = {x,} and L (d5) = {x3}. Figure 7 shows the root directory R containing these six documents. The set of labels x; associated with each document is also represented. As illustrated in FIG. 8, the first tree construction step yields five subdirectories R1, R2, R3, R4 and R5 corresponding respectively to the labels x1, x2, x3, x4 and x5. R, contains the documents d, d1, d3 and d4 with the labels LR1 (d) = {x2, x5}, LR, (d,) = {x3}, LR, (d3) = {x4} and LR, ( d4) = {}. R2 contains documents d and d2 with the labels LR2 (d) = {xl, x5} and LR2 (d2) = {x5}. R3 contains documents d, and d5 with the labels LR3 (d1) = {x1} and LR3 (d5) = {}. R4 contains the document d3, with the label LR4 (d3) = {xl}. R5 contains documents d and d2, with the labels LR5 (d) = {x1, x2} and LR5 (d2) = {x2}. The build process of the tree continues on the next level. Thus, the R subdirectory gives rise to the subdirectories R12, R13, R14 and R15. R12 contains the document d of x5 label. R13 contains the document d, without label. R14 contains the document d3 without label. R15 contains the document d of label x2. The subdirectory R2 gives rise to the subdirectories R21 and R25. R21 contains the documents d of label x5. R25 contains the documents d of label x1 and d2 without label. The subdirectory R3 gives birth to the

sous-répertoire R31. R31 contient les documents d, sans label. Le sous-répertoire R4 donne naissance au sous-répertoire R41 contenant le document d3 sans label. Le sous-répertoire R5 donne naissance au sous-répertoire R51 contenant le 20 document d2 de label x2 et au sous-répertoire R52 contenant les documents d de label x1 et d2 sans label. Le processus continue avec le niveau suivant. Le sous-répertoire R12 donne naissance au sous-répertoire R125 contenant le document d. Le sous-répertoire R13 contenant le document d1 sans label, il ne donne naissance à aucun nouveau sous- 25 répertoire. Il en est de même pour les sous-répertoires R14, R31 et R41. Le sous-répertoire R15 donne naissance au sous-répertoire R152 contenant le document d. Le sous-répertoire R21 donne naissance au sous-répertoire R215 contenant le document d. Le sous-répertoire R25 donne naissance au sous-répertoire R251 contenant le document d. Le sous-répertoire R51 donne naissance au sous-répertoire R512 contenant le document d. Le sous-répertoire 30 R52 donne naissance au sous-répertoire R521 contenant le document d. Aucun des 2910661 20 documents contenus dans les sous-répertoires de ce niveau ne possède de label. Le processus s'arrête donc à ce niveau. La figure 9 illustre l'arborescence complète à l'issue du processus. 5.5.2 Elagage 5 5.5.2.1 Inclusion de labels Afin de simplifier la structure arborescente obtenue, nous allons maintenant montrer comment en supprimer certaines parties redondantes. Supposons que pour un label x et un label x' la propriété suivante est vérifiée : l'ensemble des documents labellisés par x' sont également labellisés par x. On dira dans ce cas que x contient x'. R31 subdirectory. R31 contains the documents d, without label. The subdirectory R4 gives birth to the subdirectory R41 containing the document d3 without label. The subdirectory R5 gives rise to the subdirectory R51 containing the document d2 of label x2 and to the subdirectory R52 containing the documents d of label x1 and d2 without label. The process continues with the next level. The subdirectory R12 gives birth to the subdirectory R125 containing the document d. The subdirectory R13 containing the document d1 without label, it gives rise to no new subdirectory. It is the same for the subdirectories R14, R31 and R41. The subdirectory R15 gives birth to the subdirectory R152 containing the document d. The subdirectory R21 gives birth to the subdirectory R215 containing the document d. The subdirectory R25 gives birth to the subdirectory R251 containing the document d. The R51 subdirectory gives birth to the R512 subdirectory containing the document d. The sub-directory 30 R52 gives birth to the subdirectory R521 containing the document d. None of the 2910661 20 documents contained in subdirectories of this level have any label. The process stops at this level. Figure 9 shows the complete tree at the end of the process. 5.5.2 Pruning 5 5.5.2.1 Inclusion of labels In order to simplify the obtained tree structure, we will now show how to remove some redundant parts. Suppose that for a label x and a label x 'the following property is verified: all the documents labeled by x' are also labeled by x. We will say in this case that x contains x '.

10 Alors si x' contient également x, les labels x et x' sont équivalents. Dans ce cas, le processus de construction de l'arborescence ne fera pas de distinction entre x et x', par exemple en supprimant l'un des deux labels. Cette suppression ne sera valable que pendant la construction, à l'issue de laquelle les documents retrouveront tous leurs labels.10 So if x 'also contains x, the labels x and x' are equivalent. In this case, the process of building the tree will not distinguish between x and x ', for example by deleting one of the two labels. This deletion will only be valid during construction, at the end of which the documents will find all their labels.

15 Dans notre exemple, le label x2 est équivalent au label x5, ce qui implique la suppression des répertoires R5, R51, R52, R512, R521, R125, R15, R152, R215, R25 et R251 (suppression du label x5 pendant la construction de l'arborescence). Par ailleurs, si x contient x' sans que x' ne contienne x, alors l'utilité du label x' réside dans le fait de distinguer entre eux certains éléments labellisés par x. Ainsi, dans 20 le processus de création de l'arbre, le label x' ne provoquera la construction d'un nouveau sous-répertoire que dans une partie de l'arborescence située sous les répertoires issus du label x. Dans notre exemple, les répertoires R4 et R41 doivent être supprimés, puisque le label x, contient le label x4. Par contre le répertoire Rio issu de x4 est conservé puisqu'il 25 se situe sous le répertoire R, issu de x,. Finalement, la structure de l'organisation arborescente prenant en compte l'inclusion de labels est donnée dans la figure 10. 5.5.2.2 Limitation du nombre de labels Dans l'optique de simplifier la structure, il peut apparaître utile de limiter le 30 nombre de labels automatiquement attribués à un document par le système pendant la 2910661 21 première phase d'exécution. Ainsi, on pourra par exemple limiter ce nombre au nombre maximal de labels attribués à un document déjà labellisé. Reprenons notre exemple en supposant que le nombre maximal de labels affectés à un document au départ est de 2. Or nous avons vu que 3 labels sont 5 automatiquement attribués au document d (à savoir x,, x2 et x5). Il convient donc dans cet exemple de supprimer l'un des labels attribués au document d. Nous pouvons mettre en place la limitation grâce au seuillage alpha. A partir de la matrice V, on diminue le seuil alpha (par petites unités, par exemple 0,001) tant que le nombre moyen de labels prédits et positifs par document dans la matrice résultante W 10 est supérieur à un nombre fixé par l'utilisateur. La figure 11 montre l'arborescence obtenue, après limitation du nombre de labels (c'est le label xlqui a été supprimé du jeu de labels associé au document d). 5.6 Modifications de l'arborescence Nous avons vu que l'organisation des données s'effectue en deux phases 15 successives : la première phase labellise les documents non labellisés et la seconde construit l'arborescence. Les positions occupées dans l'arborescence par les documents non initialement labellisés dépendent de la manière dont le processus a effectué sa labellisation. On peut ensuite supposer que l'utilisateur souhaite effectuer une correction sur un ou plusieurs labels de documents. Dans ce cas, le système va prendre en compte 20 ces corrections (suppression et/ou ajout de labels à des documents choisis par l'utilisateur) selon l'un des processus suivants : 1) Prise en compte avec modification immédiate de l'arborescence. L'arborescence courante est complètement supprimée. Les labels affectés automatiquement aux documents non labellisés par l'utilisateur sont supprimés. Le 25 processus est exécuté à nouveau (labellisation automatique des documents puis construction de l'arborescence) en incluant les nouveaux labels. 2) Prise en compte locale avec réajustement différé de l'arborescence. Chaque document dont le jeu de labels associé a été modifié est supprimé de tous les sous-répertoires dans lequel il apparaît. Puis il est reclassé tout seul en partant 30 du haut de la hiérarchie, en l'affectant à chaque sous-répertoire pour lequel il possède le 2910661 22 label correspondant. Pour chaque sous-répertoire où le document a été affecté, le procédé est répété de manière récursive dans les sous-répertoires s'ils existent. Au bout d'un certain nombre de modifications, on a effectué le processus souhaité. Le nombre de modifications dépend de la puissance de la machine qui exécute le procédé 5 d'organisation. 5.6.1 Déplacement d'un document Dans une implantation du système sous forme d'une interface graphique, le mécanisme suivant peut être intégré : afin de modifier les labels attribués à un document, l'utilisateur effectue un déplacer/coller d'un représentant (icône) de ce 10 document vers un sous-répertoire contenant des documents ayant les labels désirés (si un tel sous-répertoire existe). Dans notre exemple, nous avons vu que les labels x2 et x5 sont attribués au document d automatiquement. Si l'utilisateur sélectionne le document d dans le répertoire R2 et le déplace dans le répertoire R, (voir la figure 12) alors le label x, lui 15 sera automatiquement affecté (avec la valeur +1) en remplacement des labels x2 et x5 (qui prendront eux la valeur -1) (voir la figure 13). Il est à noter que le fait de déplacer un document non labellisé initialement fournit un nouveau document labellisé au départ du processus. Il est donc possible que la labellisation automatique d'autres documents soit modifiée (par la modification des 20 sphères de voisinage). Cette modification est souhaitable, dans la mesure où il s'agit d'une généralisation d'une correction apportée par l'utilisateur à d'autres documents que celui explicitement déplacé. 5.6.2 Suppression d'une partie de l'arborescence Un ou plusieurs sous-répertoires peuvent être considérés comme inutiles par 25 l'utilisateur. Dans ce cas, il est possible de les supprimer de la manière suivante : l'utilisateur sélectionne un sous-répertoire à supprimer. Ce sous-répertoire a été créé à partir de labels {x;1,...,x;q} (c'est-à-dire les labels ayant donné naissance au répertoire sélectionné, ainsi qu'aux répertoires situés sur le chemin menant de la racine au répertoire sélectionné). Alors le système va automatiquement supprimer l'ensemble de 30 l'arborescence, il va supprimer les labels {x;,,...,x;q} de tous les jeux de labels associés 2910661 23 aux documents et exécuter une nouvelle construction automatique de l'arborescence prenant en compte la nouvelle labellisation des documents. 5.6.3 Modification des labels A tout moment, l'utilisateur a la possibilité de modifier (ajout, suppression, 5 remplacement) un ou plusieurs label(s) correspondant à un ou plusieurs documents. Le système prend en compte cette modification de la façon suivante : • Cas de l'ajout d'un nouveau label y : le label est ajouté à la liste des labels L, et aux matrices U, V, W. • Cas de l'ajout d'un label y existant à un document d qui n'avait pas ce label : 10 Ud,y=+1, et le label est associé au document en question. • Cas de la suppression d'un label y du jeu de labels associé à un document d (qui avait donc auparavant ce label), soit par une affectation de l'utilisateur, soit par une prédiction du système : Ud,y=-1, et le label est associé de manière négative au document en question. 15 5.7 Exemple de réalisation d'une IHM du système La figure 14 présente un exemple d'IHM d'un système mettant en oeuvre un mode de réalisation particulier du procédé d'organisation selon l'invention. On peut distinguer les trois zones suivantes : • une zone 141 contenant l'arborescence ; 20 • une zone 142 contenant un ensemble le labels possibles ; et • une zone 143 contenant des représentants (icônes) des documents contenus dans le répertoire ou sous-répertoire sélectionné. Dans cet exemple, c'est le sous-répertoire CLOUDS ( nuages en français) qui est sélectionné (il apparaît pour cette raison en grisé dans l'arborescence de la zone 25 référencée 141). 5.8 Dispositif mettant en oeuvre le procédé d'organisation La figure 15 présente la structure simplifiée d'un dispositif d'organisation selon un mode de réalisation particulier de l'invention, qui comprend une mémoire RAM 153, une unité de traitement 151, équipée par exemple d'un microprocesseur, et pilotée par un 30 programme d'ordinateur 152 mettant en oeuvre le procédé d'organisation selon 2910661 24 l'invention (par exemple le mode de réalisation particulier décrit ci-dessus en relation avec les figures 1 à 14). A l'initialisation, les instructions de code du programme d'ordinateur 152 sont par exemple chargées dans la mémoire RAM avant d'être exécutées par le processeur de l'unité de traitement 151. L'unité de traitement 151 reçoit 5 en entrée 150 des documents électroniques et permet à l'utilisateur, via une IHM, d'en labelliser certains. L'unité de traitement 151 traite l'ensemble des documents (certains ayant été labellisés par l'utilisateur, et d'autres non), selon les instructions du programme 152, afin d'obtenir une organisation arborescente 154 (aussi appelée arborescence ou encore arbre de classification). L'unité de traitement 151 délivre en 10 sortie cette organisation arborescente 154 et permet à l'utilisateur, via une IHM, de la modifier. Le dispositif selon l'invention peut-être intégré dans un équipement électronique multimédia qui comporte un moyen de stockage de documents multimédias (images, fichiers musicaux, documents écrits...). Cet équipement est par 15 exemple un lecteur de fichier musicaux avec une interface graphique, un ordinateur de poche ou agenda électronique, un téléphone mobile avec ou sans dispositif photographique. Ainsi, le procédé d'organisation de documents électroniques selon l'invention, peut être mis en oeuvre sur ce type d'équipement pour classer les contenus 20 multimédias stockés.In our example, the label x2 is equivalent to the label x5, which implies the deletion of the directories R5, R51, R52, R512, R521, R125, R15, R152, R215, R25 and R251 (deletion of the label x5 during construction of the tree). On the other hand, if x contains x 'without x' containing x, then the utility of the x 'label lies in distinguishing between them certain elements labeled x. Thus, in the process of creating the tree, the label x 'will cause the construction of a new subdirectory only in a part of the tree under the directories from the label x. In our example, the R4 and R41 directories should be deleted, since the x label contains the x4 label. By cons the Rio directory from x4 is retained since it is located under the directory R, from x ,. Finally, the structure of the tree organization taking into account the inclusion of labels is given in figure 10. 5.5.2.2 Limitation of the number of labels In order to simplify the structure, it may seem useful to limit the number of labels labels automatically assigned to a document by the system during the first execution phase. Thus, it will be possible, for example, to limit this number to the maximum number of labels attributed to an already labeled document. Let's take our example assuming that the maximum number of labels assigned to a document at the start is 2. Now we have seen that 3 labels are automatically assigned to the document d (namely x ,, x2 and x5). It is therefore appropriate in this example to delete one of the labels assigned to document d. We can implement the limitation with alpha thresholding. From the matrix V, the alpha threshold is reduced (in small units, for example 0.001) as long as the average number of predicted and positive labels per document in the resulting matrix W 10 is greater than a number set by the user. Figure 11 shows the tree obtained, after limiting the number of labels (it is the label xl that was removed from the set of labels associated with the document d). 5.6 Modifications of the tree structure We have seen that the organization of the data takes place in two successive phases: the first phase labels the unlabeled documents and the second builds the tree. The positions occupied in the tree by non-initially labeled documents depend on how the process has done its labeling. It can then be assumed that the user wishes to perform a correction on one or more document labels. In this case, the system will take into account these corrections (deletion and / or addition of labels to documents chosen by the user) according to one of the following processes: 1) Taking into account with immediate modification of the tree structure . The current tree is completely deleted. Labels assigned automatically to documents that are not labeled by the user are deleted. The process is performed again (automatic labeling of documents and then construction of the tree) including the new labels. 2) Local consideration with delayed readjustment of the tree. Each document whose associated label set has been modified is removed from all subdirectories in which it appears. Then it is reclassified by itself from the top of the hierarchy, assigning it to each subdirectory for which it has the corresponding label. For each subdirectory where the document has been assigned, the process is recursively repeated in subdirectories if they exist. After a number of modifications, the desired process was performed. The number of changes depends on the power of the machine running the organization process. 5.6.1 Moving a document In a system layout in the form of a graphical interface, the following mechanism can be integrated: in order to modify the labels assigned to a document, the user makes a move / paste of a representative (icon) of this document to a subdirectory containing documents having the desired labels (if such a subdirectory exists). In our example, we saw that the labels x2 and x5 are assigned to the document d automatically. If the user selects the document d in the directory R2 and moves it to the directory R, (see Figure 12) then the label x, it will be automatically assigned (with the value +1) instead of the labels x2 and x5 (which will take the value -1) (see Figure 13). It should be noted that moving a non-labeled document initially provides a new document labeled at the beginning of the process. It is therefore possible that the automatic labeling of other documents is modified (by the modification of the 20 neighborhood spheres). This modification is desirable, since it is a generalization of a correction made by the user to other documents than the one explicitly moved. 5.6.2 Deleting a Part of the Tree One or more subdirectories can be considered unnecessary by the user. In this case, it is possible to delete them in the following way: the user selects a sub-directory to be deleted. This subdirectory was created from labels {x; 1, ..., x; q} (that is, the labels that gave birth to the selected directory, as well as the directories on the path leading from the root to the selected directory). Then the system will automatically delete the entire tree, it will remove the labels {x; ,, ..., x; q} from all the associated label sets 2910661 23 to the documents and execute a new automatic construction. of the tree structure taking into account the new labeling of documents. 5.6.3 Modification of labels At any time, the user has the possibility of modifying (adding, deleting, replacing) one or more labels corresponding to one or more documents. The system takes into account this modification in the following way: • Case of the addition of a new label y: the label is added to the list of the labels L, and matrices U, V, W. • Case of the adding an existing label to a document that did not have this label: 10 Ud, y = + 1, and the label is associated with the document in question. • Case of the deletion of a label y from the set of labels associated with a document d (which previously had this label), either by an assignment of the user, or by a prediction of the system: Ud, y = -1 , and the label is negatively associated with the document in question. 5.7 Example of an implementation of a system HMI Figure 14 shows an example of a HMI of a system implementing a particular embodiment of the organization method according to the invention. The following three areas can be distinguished: • an area 141 containing the tree; An area 142 containing a set of possible labels; and a zone 143 containing representatives (icons) of the documents contained in the selected directory or sub-directory. In this example, it is the sub-directory CLOUDS (clouds in French) which is selected (it appears for this reason in gray in the tree of the zone 25 referenced 141). 5.8 Device implementing the organizing method FIG. 15 presents the simplified structure of an organization device according to one particular embodiment of the invention, which comprises a RAM 153, a processing unit 151, equipped by example of a microprocessor, and driven by a computer program 152 implementing the organizing method according to the invention (for example the particular embodiment described above in connection with FIGS. 1 to 14 ). At initialization, the code instructions of the computer program 152 are for example loaded into the RAM before being executed by the processor of the processing unit 151. The processing unit 151 receives an input 150 electronic documents and allows the user, via an IHM, to label some of them. The processing unit 151 processes all the documents (some having been labeled by the user, and others not), according to the instructions of the program 152, in order to obtain a tree organization 154 (also called a tree structure). classification tree). The processing unit 151 outputs this tree organization 154 and allows the user, via an HMI, to modify it. The device according to the invention can be integrated in a multimedia electronic equipment which comprises means for storing multimedia documents (images, music files, written documents, etc.). This equipment is for example a music file player with a graphic interface, a pocket computer or electronic organizer, a mobile phone with or without a photographic device. Thus, the method of organizing electronic documents according to the invention can be implemented on this type of equipment for classifying the stored multimedia contents.

Claims

1. A method for organizing electronic documents from a set of documents comprising labeled documents, a labeled document being associated with a set of labels, characterized in that at least one labeled document is associated with a set of labels at least two labels and in that the method comprises the following steps: a) labeling (P2) of the non-labeled documents of all the documents, by prediction according to the labeled documents and descriptive data of the documents; from a current directory associated with a set of labeled documents: b) creation (P3; 63) of a sub-directory by distinct label among the set of labels associated with documents of the current directory; by subdirectory created: c) association (P3; 65) documents of the current directory whose label set includes at least one label corresponding to said created subdirectory; d) modifying (P3; 66) sets of document labels of the new subdirectory by subtracting the label corresponding to the subdirectory; from each sub-directory created, becoming current directory: e) iteration of steps b), c) and d) until subdirectories associated only with documents whose set of labels is empty.

2. Method according to claim 1, characterized in that by associating a given document with a subdirectory, the fact that said given document is physically moved from an initial storage unit to said given subdirectory.

3. Method according to claim 1, characterized in that by associating a given document with a sub-directory, the fact that said given document remains in an initial storage unit and that a unique identifier of said given document is placed in said given subdirectory.

4. Method according to any one of claims 1 to 3, characterized in that step b) is preceded by the following step: - if the number of different labels associated with documents in the current directory is greater than a predetermined threshold S, then the labels are sorted in descending order of frequency and the first S labels are selected, only the selected labels being taken into account to perform the following steps.

5. Method according to any one of claims 1 to 4, characterized in that, if there are first and second labels such that: 5 - the following condition is verified: for all the electronic documents labeled including the set of labels comprises said second label, said set of labels also includes said first label, and - the following condition is satisfied: for all the labeled electronic documents whose label set comprises said first label, said label set 10 also includes said second label, then one of said first and second labels is deleted only during said steps b) to e), each labeled electronic document found all its labels at the end of said steps b) to e).

6. Method according to any one of claims 1 to 5, characterized in that, if there are first and second labels such that: the following condition is verified: for all the electronic documents labeled including the set of labels includes said second label, said label set also includes said first label, and - the following condition is not satisfied: for all labeled electronic documents whose label set comprises said first label, said label set also includes said label second label, then, during said steps b) to e), said second label causes the creation of a subdirectory only under the subdirectory from the first label.

7. Method according to any one of claims 1 to 6, characterized in that said labeling step comprises a step of generating a matrix V as follows: Vii = Ui j if Uii e 0 - Vij = P (i, j) if Ui, = O with i: an electronic document among all the electronic documents, j: a label among a set of possible labels, P: a prediction function according to descriptive data of the documents electronic, and U: a matrix of electronic documents and labels assigned by a user, such that an e 0 means that the user has decided to associate or not the label j to the electronic document i, and U; j = 0 means that the user has not given any opinion as to whether or not the label must be associated with the electronic document i. 5

8. Method according to claim 7, characterized in that: - = 1, if the user has associated the label j to the electronic document i; - U = -1, if the user has decided not to associate the label j with the electronic document i; - P (i, j) returns a value between -1 and +1, plus the value of P (i, j) being negative 10 less it is likely that the electronic document i is associated with the label j, and vice versa the value if P (i, j) is positive, it is likely that the electronic document i will be associated with the label j.

9. A method according to claim 8, characterized in that said labeling step comprises a step of thresholding said matrix V in a matrix W as follows: if V; j> alpha then W; i = 1 otherwise W; j = 0, with alpha a determined threshold between 0 and 1, and in that only elements W; j non-zero are used in said steps b) to e).

10. Method according to any one of claims 1 to 9, characterized in that it comprises a step of limiting the number of labels automatically allocated to an electronic document.

11. Method according to any one of claims 1 to 10, characterized in that said steps b) to e) are followed by the following steps: -modification by a user, via a man / machine interface, of at least one set labels among the sets of labels associated with the electronic documents 25 labeled; deleting a tree organization resulting from the execution of said steps b) to e); - New execution of said steps b) to e), taking into account said at least one set of labels modified by the user. 30

12. Method according to any one of claims 1 to 10, characterized in that said steps b) to e) are followed by the following steps: 2910661 28 - modification by a user, via a man / machine interface, of at least a set of labels among the sets of labels associated with the labeled electronic documents; for each labeled electronic document whose associated set of labels has been modified by the user, said modified document: said modified document is deleted from each directory or sub-directory with which it is associated; and then reclassifying said modified document starting from a root directory: said modified document is associated with each sub-directory whose corresponding label is included in the set of labels associated with said modified document, then for each subdirectory to which the modified document has been associated, the recursive association is repeated in sub-directories of lower levels, if they exist.

13. Method according to any one of claims 11 and 12, characterized in that said step of modifying by a user of at least one set of labels is performed by: - a move / paste, performed by the user via a graphical interface, a representative of the document to be modified to a target subdirectory associated with the documents whose label sets comprise one or more desired labels; and 20 - an automatic assignment to said document to be modified of the label corresponding to said target directory, replacing the label or labels previously assigned to said document to be modified.

14. Method according to any one of claims 11 and 12, characterized in that said step of user urt modification of at least one set of labels is performed by: - a selection, made by the user via the human / machine interface, at least one sub-directory to be deleted; for each sub-directory to be deleted, an automatic deletion in the set of labels of the electronic documents labeled: the label corresponding to the sub-directory to be deleted, as well as the label corresponding to each subdirectory located between the sub-directory; directory to delete and a root directory. 2910661 29

15. Computer program product downloadable from a communication network and / or recorded on a computer readable medium and / or executable by a processor, characterized in that it comprises program code instructions for executing the steps method according to at least one of claims 1 to 14, when said program is run on a computer.

16. Device for organizing electronic documents from a set of documents comprising labeled documents, a labeled document being associated with a set of labels, characterized in that at least one labeled document is associated with a set of labels at least two labels and in that the device comprises: a) means for labeling the non-labeled documents of all the documents, by prediction according to the labeled documents and descriptive data of the documents; b) creation means, applied to a current directory associated with a set of labeled documents, of a sub-directory per distinct label among the set of labels associated with the documents of the current directory; c) association means, applied to each sub-directory created, documents of the current directory whose label set includes at least one label corresponding to said sub-directory created; d) means for modifying the sets of labels of the documents of the new subdirectory by subtracting the label corresponding to the subdirectory; e) the means b), c) and d) being applied iteratively for each sub-directory created, becoming current directory, until subdirectories associated only with documents whose set of labels is empty.

17. Multimedia electronic equipment comprising means for storing multimedia documents, characterized in that it comprises means for implementing the method of organizing documents according to any one of claims 1 to 14.