[go: up one dir, main page]

HK1176722B - Ranking based on facial image analysis - Google Patents

Ranking based on facial image analysis Download PDF

Info

Publication number
HK1176722B
HK1176722B HK13103280.8A HK13103280A HK1176722B HK 1176722 B HK1176722 B HK 1176722B HK 13103280 A HK13103280 A HK 13103280A HK 1176722 B HK1176722 B HK 1176722B
Authority
HK
Hong Kong
Prior art keywords
images
image
user
metadata
block
Prior art date
Application number
HK13103280.8A
Other languages
Chinese (zh)
Other versions
HK1176722A1 (en
Inventor
E.克鲁普卡
I.阿布拉莫夫斯基
I.克维阿特科维斯凯
Original Assignee
微软技术许可有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/784,498 external-priority patent/US9465993B2/en
Application filed by 微软技术许可有限责任公司 filed Critical 微软技术许可有限责任公司
Publication of HK1176722A1 publication Critical patent/HK1176722A1/en
Publication of HK1176722B publication Critical patent/HK1176722B/en

Links

Description

Ranking based on facial image analysis
Background
The collection of images reflects the important elements of the creator of the collection. For example, a user's personal image collection may be collected over an extended period of time and may reflect important elements in a person's life, such as important people. Many people may have image collections that include a wide variety of images, for example, from snapshots taken on a mobile phone to composite images taken with a digital camera during vacation.
Disclosure of Invention
A user's collection of images may be analyzed to identify faces of people within the images, and then clusters of similar faces may be created, where each cluster may represent a person. Clusters may be ranked in order of size to determine the relative importance of the associated person to the user. This ranking may be useful in many social applications to filter and present content that may be of interest to the user. In one usage scenario, clustering may be used to identify images from a set of images of a second user, where the identified images may be relevant or interesting to the first user. This ranking may also be based on user interaction with the images, as well as other inputs unrelated to the images. The rankings may be incrementally updated as new images are added to the set of users.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Brief Description of Drawings
In the drawings:
FIG. 1 is a diagram illustration of an embodiment showing a system with a social network and an image matching system.
FIG. 2 is a diagram illustration of an example embodiment showing an example image.
FIG. 3 is a flow diagram illustration of an embodiment showing a method for determining a ranking of people from an image.
FIG. 4 is a flowchart illustration of an embodiment showing a method for finding a matching image based on face analysis.
FIG. 5 is a flowchart illustration of an embodiment showing a method for pre-processing of face analysis.
FIG. 6 is a flowchart illustration of an embodiment showing a method for setting a threshold with a training set.
FIG. 7 is a flowchart illustration of an embodiment showing a method for event matching.
FIG. 8 is a flow diagram illustration of an embodiment showing a method for finding an image of a friend using event matching.
FIG. 9 is a flowchart illustration of an embodiment showing a method for using event matching to find an image of an event attended by a user.
FIG. 10 is a diagram illustration of an example embodiment showing a user interface with output of event matches.
FIG. 11 is a flowchart illustration of an embodiment showing a method for creating clusters.
FIG. 12 is a flowchart illustration of an embodiment showing a method for matching images using clustering.
Detailed Description
Facial image analysis and comparison of a user's image collection may be used to rank preferences or priorities related to the user's friends or family. The user's collection of images may reflect the user's interests, importance, or emotions to the person. The number of images of a particular person may be used as a representation of the importance of that person to the user.
Facial image analysis may be performed on the set of images to identify faces within the images and create face objects, which may be stored as metadata for the images. The face objects may be represented as grouped together into face clusters. The size of the cluster may be used as a measure of the importance of the friends associated with the facial object.
The ranking determined from the facial image analysis may be used to present relevant information to the user. For example, news sources or other information related to different users may be prioritized and presented to the user with more relevant information in more prominent locations in the user interface.
Throughout the specification and claims, reference to the term "image" may include still images, such as photographs or digital still images, as well as video images or motion picture images. The concepts discussed for processing images may be applicable to either still or moving images, and in some embodiments both still and moving images may be used.
Throughout this specification, like reference numerals refer to like elements throughout the description of the figures.
When elements are referred to as being "connected" or "coupled," the elements can be directly connected or coupled together or one or more intervening elements may also be present. In contrast, when elements are referred to as being "directly connected" or "directly coupled," there are no intervening elements present.
The present subject matter may be embodied as devices, systems, methods, and/or computer program products. Accordingly, some or all of the inventive subject matter may be embodied in hardware and/or in software (including firmware, resident software, micro-code, state machines, gate arrays, etc.). Furthermore, the present subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by an instruction execution system. Note that the computer-usable or computer-readable medium could be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other suitable medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" may be defined as a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
When the subject matter is embodied in the general context of computer-executable instructions, the embodiment may comprise program modules, executed by one or more systems, computers, or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
FIG. 1 is a diagram of an embodiment 100, showing client and server components for a social network. Embodiment 100 is a simplified example of a network environment that may include a client device and a social networking service accessed over a network.
The diagram of fig. 1 shows the functional components of the system. In some cases, a component may be a hardware component, a software component, or a combination of hardware and software. Some components may be application level software, while other components may be operating system level components. In some cases, the connection of one component to another component may be a tight connection, where two or more components operate on a single hardware platform. In other cases, the connection may be made over a network connection that spans a long distance. Each embodiment may use different hardware, software, and interconnection architectures to achieve the described functionality.
Embodiment 100 illustrates one example of a social network in which a user may have a collection of images. The social network may be a web application in which individual users may establish accounts in the social network and may manage a collection of images within the social network. A service operating within the social network infrastructure may analyze and compare the image sets.
The social network of embodiment 100 may be any type of social network in which there may be explicit or implicit relationships between users. In some social networks, relationships may be expressed by one user formally establishing a relationship with another user. Some social networks may establish a one-way relationship through this relationship declaration, while other social networks may establish a relationship when both users agree to the relationship.
Some social networks may have informal relationships between users. For example, an informal relationship may be established when two users exchange email messages, or when users communicate using another mechanism. For example, a social network may be established for users communicating in a chat room, instant messaging service, or other mechanism. In some cases, a contact list of a person in an email system or mobile phone may be used as an implicit relationship for purposes of establishing social network relationships.
In some social networks, a user may determine how images within their image collection may be shared. In some cases, the user may select images that may be shared to friends for whom a relationship exists. In other cases, the user may grant permission to any user with whom the image is shared.
The social network may be a formal social network in which each user may create an account to access the social network. In many such embodiments, the user may access the social network through a web browser, and the social network may be a web application. In many such embodiments, a user may upload images within a social network environment to create a collection of images.
In a less formal version of the social network, the user may store and manage the image collection on a personal computer or in a repository that is personally controlled or managed by the user. In such a social network, a user may identify various storage locations from which images may be shared with others. In some such social networks, social network relationships may be maintained using an infrastructure that may simply be an address exchange, forum, or other mechanism by which members may connect to one another.
The client device 102 may have a set of hardware components 104 and software components 106. Client device 102 may represent any type of device that may communicate with social network service 136.
The hardware component 104 may represent a typical architecture of a computing device, such as a desktop or server computer. In some embodiments, the client device 102 may be a personal computer, a gaming console, a network device, an interactive self-service terminal (kiosk), or other device. Client device 102 may also be a portable device such as a laptop computer, netbook computer, personal digital assistant, mobile phone, or other mobile device.
The hardware components 104 may include a processor 108, random access memory 110, and non-volatile storage 112. The hardware components 104 may also include one or more network interfaces 114 and user interface devices 116. In many cases, the client device 102 may include a camera 118 or scanner 120 that may capture images that may become part of the user's image collection.
The software components 106 may include an operating system 112 on which various applications, such as a web browser 124, may execute. In many social networking applications, web browser 124 may be used to communicate with social networking service 136 to access the social networking application. In other embodiments, specialized client applications may communicate with the social networking service to provide a user interface. In some such embodiments, such a client application may perform many of the functions that may be described in the social network service 136.
The client device 102 may have a local image library 126 that may include images collected from many different sources, such as the camera 118, the scanner 120, or other devices that may have image capture capabilities. The local image repository 126 may include images stored on other devices, such as on servers within a local area network or within a cloud storage service.
The client device 102 may have several applications that may allow a user to view and manage the local image library 126. Examples of such applications may be an image editor 130 and an image browser 132. In some cases, a client device may have several such applications.
The local image library 126 may include still images and video images. In some embodiments, the still images and video images may be stored in different libraries and may be accessed, edited, and manipulated with different applications.
In some embodiments, the client device 102 may have an image preprocessor 128. The image preprocessor may analyze the image content and various metadata associated with the image prior to associating the image with the social network. Preprocessing may perform facial image analysis, background analysis, color histograms, or other analysis on images available to the client. In other embodiments, some or all of the functions performed by the image preprocessor 128 may be performed by the social networking service 136. When the image preprocessor 128 is located on the client device 102, the server device may be offloaded from performing such operations.
Client device 102 may connect to social networking service 136 through network 134. In some embodiments, network 134 may be a wide area network such as the Internet. In some embodiments, network 134 may include a local area network that may be connected to a wide area network through a gateway or other device.
In some embodiments, the client device 102 may be connected to the network 134 through a hardwired connection, such as an Ethernet connection, for example. In other embodiments, the client device 102 may connect to the network 134 through a wireless connection, such as a cellular telephone connection or other wireless connection.
Various users of the social network may be connected using various client devices 138.
The social networking service 136 may operate on a hardware platform 140. The hardware platform 140 may be a single server device having a hardware platform similar to the hardware components 104 of the client device 102. In certain embodiments, the hardware platform 140 may be a virtualized or cloud-based hardware platform operating on two or more hardware devices. In some embodiments, the hardware platform may be a large data center in which thousands of computer hardware platforms may be used.
In some embodiments, social networking service 136 may operate within operating system 142. In embodiments with a cloud-based execution environment, the concept of a separate operating system 142 may not exist.
Social network 144 may include a plurality of user accounts 146. Each user account 146 may include metadata 148 related to the account, as well as relationships 150 that may be established between two or more users.
User account metadata 148 may include information about the user, such as the user's name, home address, location, and the user's likes and dislikes, education, and other relevant information. Some social networks may have emphasis on work related information, which may include items like work history, professional associations, or other work related information. Other social networks may emphasize friends and family relationships, where personal items may be emphasized. In some social networks, a very large amount of personal metadata 148 may be included, while other social networks may have a very small amount of personal metadata 148.
The relationship 150 may associate one user account to another. In some embodiments, the relationship may be a one-way relationship, where the first user may share information with the second user but the second user may not be able to reply and may not share information or share a limited amount of information with the first user. In other embodiments, the relationship may be a two-way relationship in which each user agrees to share information with each other.
In still other embodiments, a user may allow some or all of their information to be shared to anyone, including people who are not members of the social network. Some such embodiments may allow a user to identify a subset of information that may be shared to anyone, as well as a subset that may be shared with other members of a social network. Some embodiments may allow a user to define a subset that is shared with different groups of social network members.
Each user account 146 may include one or more image collections 152. The image collection 152 may include images 154. Each image 154 may include metadata 156, which may be general metadata such as a timestamp, location information, image size, title, and various tags. The tags may include identifiers for different social network members with which the image is to be related.
In some embodiments, the image metadata 156 may contain metadata derived from the image content. For example, face analysis may be performed to identify any faces within an image and create a face representation or face vector. The face representation may be used, for example, for comparison with other images. Other image content that may be used to derive metadata may include texture analysis of background regions or personal apparel, color histograms of entire images or portions of images, or other analysis.
The image metadata 156 may be used to create clusters 158. The clusters 158 may be images or groupings of elements from images. For example, the face representation may be analyzed to identify clusters that may contain similar face representations. Similarly, clusters may be created by grouping image analysis results from background regions of an image.
In certain embodiments, the clusters 158 may be created by grouping images based on metadata. For example, several images taken over a certain period of time may be grouped together as a cluster, or images tagged with the same tagging parameters may form a cluster. Examples of using clustering may be found in embodiments 1100 and 1200 set forth later in this specification.
In some embodiments, social network service 136 may include an image preprocessor 160 that may analyze images to derive image metadata. The image preprocessor 160 may be used in situations where the client device 102 may not have the image preprocessor 128 or when image preprocessing is not performed prior to analysis. An example of the preprocessing step can be shown in embodiment 500 presented later in this specification.
The comparison engine 162 may compare two or more images using image analysis techniques or metadata analysis to determine the clusters 158. Examples of the operation of the comparison engine 162 may be found in portions of the embodiment 400 set forth later in this specification.
The ranking engine 164 may compare the various clusters 158 to extract information, such as a ranking or importance to the image or information attached to the image. An example of the operation of the ranking engine 164 can be found in embodiment 300, set forth later in this specification.
The analysis engine 166 may analyze and compare the image sets to identify matches between the image sets. The analysis engine 166 may use metadata analysis and image content analysis to identify matches.
In many embodiments, the social network service 136 may operate with a web service 168, and the web service 168 may communicate with a browser or other application operating on a client device. Web services 168 may receive requests in the form of hypertext transfer protocol (HTTP) and respond with web pages or other HTTP compliant responses. In some embodiments, web service 168 may have an Application Programming Interface (API) through which applications on client devices may interact with the social networking service.
FIG. 2 is a diagram of an example embodiment 200 showing two images that may be analyzed by image analysis. The embodiment 200 shows two images 202, 204 showing a birthday party and a sailboat trip, respectively. These images may represent example images that may be found in the user's image collection.
The image 202 may represent a birthday party with two people. From the image 202, two faces 206 and 208 may be identified. The faces 206 and 208 may be identified using a number of different face recognition mechanisms or algorithms.
Once identified, the faces 206 and 208 may be processed to create a representation of the face. The representation may be a face vector or other representation that may allow different faces to be numerically compared to each other.
In certain embodiments, additional image analysis may be performed. For example, apparel regions 210 and 212 may be identified by determining geometric relationships from faces 206 and 208, respectively, and capturing portions of the image that may be relevant to apparel worn by the corresponding person.
Image analysis of apparel may be used to compare two images to determine whether the images were taken at the same event. This conclusion may be drawn when two images contain similar faces and these images additionally contain similar dress textures or color histograms. This analysis may assume that the images represent the same event because the people in the images are wearing the same clothing.
Additionally, the background region 214 may be analyzed for texture analysis, color histogram, or other analysis. These results may be compared to other images to determine similarities and matches between the images.
In image 204, faces 216 and 218 may be identified and captured. Because the sizes of the faces 216 and 218 may be relatively small, the clothing region of the person of the image 204 may not be performed, but the background region 220 may be identified and analyzed.
FIG. 3 is a flow diagram illustration of an embodiment 300 showing a method for determining a ranking of a person from a collection of images. Embodiment 300 is an example of a method that may be performed by a comparison engine and a ranking engine, such as comparison engine 162 and ranking engine 164 of embodiment 100.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 300 may be that the number of occurrences of a person's face in a user's image collection may be used as an approximation of the user's interest in the person or the importance of the person to the user.
Faces within an image may be analyzed, compared, and grouped together into clusters. Based on the size of the cluster, the individuals associated with the cluster may be ranked.
At block 302, a set of images may be received. The set of images may be preprocessed to identify faces and face representations. An example of this preprocessing method can be shown in embodiment 500 presented later in this specification.
In block 304, each image may be processed. For each image in block 304, if no face is present in block 306, the process may return to block 304 to process the next image. If one or more faces appear in the image in block 306, each face may be processed separately in block 308. For each face in block 308, the face object and associated image reference may be added to a list in block 310. The image reference may be a pointer or other indicator from which the image of the face is taken.
After all the images in block 304 have been processed, the resulting list may be sorted in block 312.
At block 314, the list may be analyzed to identify clusters based on a threshold in block 314. A cluster may define a set of face representations associated with a single person.
One mechanism to determine clustering may be to consider the face representation as a vector. The similarity between any two vectors can be considered as a distance in vector space. When multiple face representations reflect many different images of the same person, then the face representation vector may create a vector cluster.
In many embodiments, a threshold may be used as part of the mechanism to determine whether a given face representation is "close" to another face representation in order to be a match. The threshold may be determined in a number of different ways, and one such way may be shown in embodiment 600.
In block 316, each cluster may be analyzed. For each cluster in block 316, if any member of the cluster does not have a tag or other associated metadata in block 318, the process may return to block 316 to process another cluster.
If one or more members of the cluster in block 318 contain tags or other metadata, these tags may be applied to other cluster members in block 320. In some cases, the user may be presented with a user interface device at block 322, where the user may or may not approve the label. If the user approves the label in block 324, the label may be applied to all members of the cluster in block 326. If the user does not approve the tab in block 324, the tab is not applied to the members in block 328.
In many social networking applications, a user may tag an image with, for example, an identifier of a particular person. The process of blocks 316 through 328 may represent a method by which such tags may be automatically applied to other images. In some embodiments, the label applied to a member of a cluster may be a label associated with a person that the cluster may represent. A simple example may be a tag defining the name of the person.
The clusters may be analyzed in block 330 to rank the clusters according to size. The ranking may reflect the relative importance of the person to the user. Cluster ranking may be used in block 332 to prioritize people in various applications.
For example, news sources may include messages, status updates, or other information related to people in the user's social network. Those items related to important persons may be highlighted or presented in a manner that captures the user's attention. Other items about people who do not appear frequently in the user's image collection may be presented in a secondary or non-emphasized manner.
FIG. 4 is a flowchart illustration of an embodiment 400 showing a method for finding a matching image based on face analysis. Embodiment 400 is one example of a method that may be performed by a comparison engine, such as analysis engine 166 of embodiment 100.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 400 illustrates an example of a method that may compare images from a second set of images to a first set of images to identify images in the second set of images that contain the same person as the first set of images.
At block 402, a second set of images may be received. At block 404, the second set of images may be preprocessed. One example of a method for preprocessing may be shown in embodiment 500 presented later in this specification.
At block 406, each image in the second set of images may be processed. For each image in block 406, if no face is found in block 408, the process may return to block 406 to process the next image.
If a face is found at block 408, each face object may be processed at block 410. For each face object in block 410, a comparison may be made to the clusters of the first set of images to find the closest match in block 412. If the match does not satisfy the threshold at block 414, the process may return to block 410 to process the next face object. If the match is within the threshold at block 414, the image is associated with the cluster at block 416.
After all the images in block 406 are processed, the result may be a list of images from the second set of images that match the clusters in the first set of images. In block 418, the list may be ordered and presented to the user according to a ranking, which may be determined from the process of embodiment 300.
FIG. 5 is a flowchart illustration of an embodiment 500 showing a method for pre-processing of face analysis. Embodiment 500 is an example of a method that may be performed by an image pre-processor, such as image pre-processor 128 of client 102 or pre-processor 160 of social networking service 136 of embodiment 100.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
The pre-processing of embodiment 500 may identify faces for all images in the set of images and create a face vector or some other numerical representation of the face image.
An image file may be received at block 502 and may be scanned to identify all faces at block 504.
If a face is found at block 506, each face may be processed separately at block 508. For each face in block 508, the image may be cropped to the face in block 510, and a face object may be created from the cropped image in block 512. A face vector may be created at block 514, which may be a numerical representation of the face image. At block 516, the face vector and the face object may be stored as metadata for the image.
After all faces have been processed in block 508, the process may loop back to block 502 if another image is available in block 518, otherwise the process stops in block 520.
Fig. 6 is a flowchart illustration of an embodiment 600 showing a method for setting a threshold with a set of training images. Embodiment 600 is an example of a method that can collect example images from friends of a user and use these example images to set a threshold that can minimize false positive comparisons.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 600 may determine a threshold setting that minimizes false positive comparisons when comparing image sets. In many social networking applications, a relatively high confidence threshold may be useful to minimize the likelihood of incorrectly identifying a match. When selecting a photo or video image from the second user's image collection to match the first user's image collection, incorrect matches may give the user low confidence in the matching process. However, a missed match, i.e., a match exists but the threshold does not allow the match to be detected, may not significantly compromise the confidence of the user.
The process of embodiment 600 collects representative images from a collection of images of friends of a user to use as a training set for comparison. The facial comparison may differ based on the ethnicity, skin tone, and other physical characteristics of those persons associated with the user. The selected images may be from friends of the user's friends and may reflect possible physical characteristics of the people in the user's image collection.
The process of embodiment 600 may attempt to remove from the training set any person that may be in the user's image collection. This may be performed by checking any tags associated with the images of the friends to ensure that the tags do not match the friends of the user.
At block 602, friends of the user may be identified. The friends of the user may be determined from relationships within the social network, as well as any other source. In some cases, a user may belong to several social networks, each having a different set of relationships. In such cases, as many of those relationships are considered as possible.
At block 604, each friend of the user may be processed. For each friend in block 604, each image in the friend's image collection is processed at block 606. For each image in block 606, a label associated with the image may be identified at block 608. If the tag is associated with a friend of the user at block 610, the image is not considered at block 610. By excluding the user's friends at block 610, the training set may not include images that may be matches to the user, but may include images of people having similar characteristics to people that may be in the user's image collection.
If the label indicates that the image may not be relevant to the user at block 610, the image is selected for use in the training set at block 612. In many cases, the images selected for the training set may be a subset of all the images in the friend's image set. For example, a process may select one of every 100 or 1000 candidate images as part of a training set. In some embodiments, a random selection may be made for the training set.
After the images to be in the training set are selected in blocks 604 through 612, face pre-processing may be performed on the training set in block 614. The pretreatment may be similar to that of embodiment 500.
The match threshold may be set to a default value at block 616.
At block 618, each image of the user's image set may be processed to set a threshold such that none of the images in the user's image set match the training set. For each image in block 618, if the image does not contain a face at block 620, the process returns to block 618.
When the image contains faces in block 620, each face may be processed in block 622. For each face in block 622, the face object may be compared to face objects in the training set to find the most similar face object in block 624. If the similarity is less than the threshold in block 626, the process may return to block 622. If the similarity is greater than the threshold in block 626, the threshold is adjusted in block 628 such that the threshold is lower than the similarity in block 628.
After all images in the user's image collection are processed in block 618, the current threshold may be stored in block 630 and used for subsequent comparisons.
FIG. 7 is a flowchart illustration of an embodiment 700 showing a method for event matching. Embodiment 700 is a simplified example of a method that may be performed by an analysis engine, such as analysis engine 166 of embodiment 100.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 700 is an example of a method that may be used to detect events from metadata. The metadata may be metadata derived from the image, such as from face analysis or other image analysis. The metadata may also be metadata that is not derived from the image, such as a title, a timestamp, or location information.
Embodiment 700 may infer an event from the intersection of two user's image sets. This intersection may occur when both users attend the same event and both take an image of the event. For example, two users may attend a birthday party or a family party and take a picture of the family of the party. In another example, two users may attend a meeting, sporting event, or other public event, and may take an image of the meeting. In some cases, users may know each other's attendance at an event, while in other cases, users may not know that another person has attended.
At block 702, a set of images may be received from a first user. At block 704, a set of images may be received from a second user. In some embodiments, the received information may be only metadata related to the images in the collection, and not the actual images themselves.
The metadata from each image set may be compared to find a match at block 706. Matching may be based on image analysis, such as finding a matching face in images from two different sets. Matching may be based on metadata analysis, such as finding images with matching timestamps, tags, location information, or other metadata.
In many cases, a match may have a certain tolerance or level of deviation to determine that the match identified in block 706 may have a large amount of deviation or tolerance, and thus each match may be further evaluated in a later step. The match in block 706 may be a coarse or preliminary match that may be further refined to identify matches with greater certainty.
The result of block 706 may be a pair of images from each set. In some cases, the result may be a set of images from each collection that share similar metadata.
At block 708, each set of matched images may be compared. For each set of matched images in block 708, the metadata may be compared in block 710 to determine whether an event may be inferred.
Events may be inferred based on several factors. Some factors may be highly weighted, while other factors may have secondary characteristics. The determination of whether a match indicates an event may be determined using various heuristics or formulas, and such heuristics or formulas may depend on the implementation. For example, some embodiments may have a large amount of metadata available, while other embodiments may have fewer metadata parameters. Some embodiments may have complex image analysis, while other embodiment embodiments may have less complex or even no image analysis.
The highly weighted factor may be in the case where the second user identifies the first user in one of the images of the second user. Such metadata explicitly identifies the link between the two image sets and indicates that the two users may be in the same place at the same time.
In some embodiments, a user may tag images of people in their collection that have information from their social network. In such embodiments, a user may manually select an image and create a tag identifying a friend in the image. Some such embodiments may allow a user to point at a face and attach a tag to a location on the image. Such tags may be considered reliable indicators and given a higher weight than other metadata.
Other highly weighted factors may be very close in space and time. Very close timestamps and physical location information may indicate that two users were at the same time and place. In some embodiments, an image may include a point at which the image was taken and a direction in which the camera was facing when the image was taken. When such metadata is available, the overlap of the areas covered by the two images can be evidence of an event.
Certain images may be tagged with various descriptors that are manually added by the user. For example, an image may be tagged with "birthday party of Anna" or "technical meeting". When images from two image sets are similarly tagged, the tag may be a good indicator of an event.
The matching may be analyzed using image analysis to identify common events. For example, a facial image match between images in two sets may be a good indicator of an event that two users attended and captured. Face image matches may be further confirmed by similar background image regions and by apparel analysis of the person associated with the matched face.
When identifying common events, different combinations of factors may be used in different situations and in different embodiments. For example, in some cases, events may be determined by image analysis alone, even when metadata is not relevant. For example, one user may have purchased a camera device and may never correctly set the time and date in the camera, or may set the time to a different time zone than another user. In this case, the timestamp metadata may be incorrect, but the image analysis may identify common events.
In another example, the metadata may identify common events even though the image analysis may not be able to identify any common faces, backgrounds, or other similarities.
Different embodiments may have different thresholds for identifying events. In a typical social networking use of embodiment 700, analysis may be performed to automatically apply tags to images based on events. In such an embodiment, a higher degree of certainty may be desirable so that incorrect labels are not introduced as noise into the image set. In another use, matching may be used to identify possible events that a user may manually examine to determine if an event did in fact have occurred. In this use, the threshold for determining an event may have a much lower degree of certainty than in other use cases.
If an event is not determined in block 712, the process may return to block 708 to process another match.
If an event is identified in block 712, all images associated with the event may be identified in block 714. A metadata tag may be defined for the event in block 716 and the tag may be applied to the image in block 718.
The image associated with the event may be determined by identifying images that are related to the matching image or share common metadata or other features. For example, two images may be matched, each image from a set of images. Once the images are matched, any relevant images of the matched images in their respective sets may be identified at block 714.
The metadata tag in block 716 may be generated by scanning the related images to determine if an event tag is associated with any of the related images. For example, one of the images collected in block 714 may be tagged with an event tag such as "birthday of Anna". At block 718, the label may then be applied to all relevant images.
In some embodiments, the event tag of block 716 may be an automatically generated event tag that may identify how a match is determined. For example, a match determined by common metadata with time and location information may have a tag that includes "Yellows Cold, 22 months 2 2010". Each embodiment may have a different mechanism for determining the tag.
In some embodiments, the tab applied in block 718 may not be visible to the user. Such a tab may be used by a social network to link different sets of images together to provide enhanced search or browsing capabilities, and does not expose the tab to the user for viewing or modification.
FIG. 8 is a flowchart illustration of an embodiment 800 showing a method for event matching between a user's collection of images and a user's friend's collection of images. Embodiment 800 is one use scenario for the event matching method described in embodiment 700.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
The embodiment 800 compares a user's set of images to a user's friend's set of images. The comparison may identify an event shared by two users, and may identify an image of the friend's image collection that the first user may want to add to his or her image collection.
Embodiment 800 may be a powerful tool for linking two image collections together in a social network. In some applications, two users may know that they are attending the same event and may wish to share their images with each other. In other uses, the user may not remember to attend the same event or may not realize that both people are there. The method of embodiment 800 may enhance user interaction by identifying intersections in their lives and allowing them to share events through their images.
At block 802, a set of images of a user may be received. At block 804, friends of the user may be identified, and each friend may be processed at block 806. For each friend in block 806, an event match may be performed between the user and the user's friend to identify a common event at block 808. Event matching may be performed in a similar manner as described in embodiment 700.
At block 810, each new event found in block 808 may be analyzed. For each new event in block 810, an image matching the event may be selected from the friend's collection of images in block 812. Any metadata from the image selected from the image collection of friends may be identified at block 814 and applied to the image of the user related to the event at block 816.
Operations of blocks 814 and 816 may propagate tags and other metadata from the image collection of friends to the image collection of the user. In certain embodiments, the user may be given the option of approving or disapproving tagging. Tags and other metadata can enrich a user's image collection by automatically or semi-automatically applying useful tags.
At block 818, the images of the friends may be presented to the user and the images may be grouped by event. An example of a user interface may be shown in embodiment 1000, which is presented later in this specification.
After each event is processed in block 810, the user may browse the images of the friends and select one or more images of the friends in block 820. At block 822, the selected image may be added to the user's image collection.
FIG. 9 is a flowchart illustration of an embodiment 900 showing a method for event matching between a user's friend pair. Embodiment 900 is one use scenario of the event matching method described in embodiment 700.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 900 compares two of a set of images of a user's friends to identify events that may be inferred from the user's two friends. Images from the inferred event may be presented to the user and the user may add these images to the user's collection of images.
Embodiment 900 may be useful in a social networking scenario where a user may or may not attend an event and may wish to view images of the event and may add some of these images to the user's set of images. For example, a grandparent who is unable to attend a party to grandchildren may wish to see an image of the party. The party may be inferred by analyzing a collection of images from two or more persons attending the party. By inferring the event from an analysis of the collection of images, all relevant images of the event may be collected and presented to grandparents for their enjoyment.
Embodiment 900 operates in a similar manner as embodiment 800, but the set of images for event matching may be a set of friends from the user rather than comparing the user's set with his or her set of friends.
At block 902, friends of the user may be identified and placed in a list. Friends may be identified through a social network. At block 904, each friend can be processed. For each friend in block 904, each remaining friend on the friends list may be analyzed in block 906. The remaining friends are those for which the image collection has not been processed. For each remaining friend in block 906, an event matching process may be performed between the image sets of the two friends to identify a common event in block 908. The processes of blocks 904 and 906 may be arranged such that each pair of friends may be processed to identify a common event.
At block 910, each common event may be processed. For each common event in block 910, some embodiments may include a verification in block 912 to determine whether the user is likely present.
The verification of block 912 may be used to prevent events showing users that are not invited. For example, two friends of the user may come together to seek happy for one night, but the user may not be invited. To prevent the user from being perpetrated, some embodiments may include verification, such as block 912, to prevent the user from discovering that an event has occurred. In other embodiments, as with the grandparent example described above, the verification of block 912 may not be included or may be omitted.
In some social networks, a user may be able to select whether to share events with other users, and may be able to select which users may view their common events and which users may not.
At block 914, images from the image collection of friends may be selected from the common event and presented to the user in event groupings at block 916. After all common events are processed in block 910, the user may browse and select images in block 918 and may add the selected images to the user's collection in block 920.
FIG. 10 is a diagram illustration of an example embodiment 1000 showing a user interface with results from an event matching analysis. Embodiment 1000 is a simplified example of a user interface that may be used to present results of an event matching analysis, such as the event matching analysis of embodiments 800 or 900, to a user.
The user interface 1002 may display the results of the event matching process. In user interface 1002, results from three events are shown. Event 1004 may have the label "birthday party", event 1006 may have the label "beach holiday", and event 1008 may have the label "ski vacation". Various tags may be identified from the tags defining the image collection from friends. In some cases, the tag may be determined from an image of the user matching the detected event.
Each event may be presented with a source of the image. For example, an event 1004 may have an image source 1010 "from the collection of mom and Joe". Event 1006 may have an image source 1012 "from Joe's collection" and event 1008 may have an image source 1014 "from Lora's collection". The image source may be created using user indicia about friends of the user.
The user interface 1002 may also include various metadata about the event. For example, the event 1004 may be presented with metadata 1016 indicating which friends of the user were determined to be at the event. Similarly, events 1006 and 1008 may have metadata 1018 and 1020, respectively.
Each event may have a selection of images presented. Event 1004 is shown with images 1022, 1024, and 1026. Event 1006 is shown with images 1028 and 1030, and event 1008 is shown with image 1032. Next to each image may be a button or other mechanism by which the user can select one or more images to be added to the user's image collection.
The user interface of embodiment 1000 is but one example of certain components that may be presented to a user as a result of image matching analysis, such as event matching. The user interface may be a mechanism by which a user may browse the results of the matching analysis and perform operations on the results.
FIG. 11 is a flowchart illustration of an embodiment 1100 showing a method for creating clusters that can be used to match images. Event 1100 is a simplified example of one way that clusters can be created by analyzing a single set of images and grouping the images. Clustering may be used in image comparison analysis and metadata comparison analysis.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
Embodiment 1100 may illustrate a simplified method for creating image clusters. A cluster may be a group of images that may share common features and may be useful when grouping faces and grouping images as a whole.
Clusters may be created by identifying vectors representing images and by grouping the vectors together. The clusters may have centroids and radii, and a numerical comparison may be made between the images and clusters to determine the "distance" between the images and clusters to determine a match.
At block 1102, a set of images may be received, and at block 1104, each image in the set of images may be analyzed. In embodiments using face recognition, the image may be a face object that is cropped from a larger image that may contain only facial features of a person. In such embodiments, the analysis may create a vector representing the facial object. In other embodiments, the entire image may be analyzed to create an image vector.
At block 1106, the image may be analyzed to create an image vector. The image vector may contain a numerical representation of the individual elements of the image, including facial image analysis, apparel analysis, background image analysis, and texture analysis.
In certain embodiments, the analysis of block 1106 may create several image vectors. For example, an image with two faces may be represented with two image vectors representing faces, two image vectors representing apparel for two persons, and one or more vectors representing various textures in a background image or image.
After each image is analyzed in block 1104, the images may be grouped together in block 1108. The packets may use metadata packets and image analysis packets. One mechanism for grouping may be to group images together on separate or orthogonal grouping axes for each metadata category or image analysis type. For example, a grouping axis may be established for facial image analysis. On this axis, all face image representations or vectors can be grouped. Separately, each image may be grouped according to different metadata such as a timestamp or location.
Within each axis, clusters may be identified at block 1110. The definition of clusters can be controlled using thresholds that can limit the clustering to strict groupings of images. Clustering may be used to represent actual matches of images with a high degree of certainty, so that other operations such as image comparison and ranking may have a high degree of certainty.
Each axis on which the images are grouped may have a different threshold for identifying clusters. For example, facial image matches may have a relatively strict threshold, such that only matches with a very high degree of similarity can be considered clusters. Conversely, images matched by background image analysis may have a less restrictive threshold, such that a wider range of images may be grouped.
Each cluster may have a centroid and a radius calculated in block 1112. The centroid and radius may be used to determine a match when comparing other images to the set of images. At block 1114, the clusters as well as the centroids and radii may be stored.
FIG. 12 is a flowchart illustration of an embodiment 1200 showing a method for matching images using centroid and radius analysis of clusters. Embodiment 1200 may illustrate one way in which the images analyzed by embodiment 1100 may be used to identify matches between a user's image collection and a friend's image collection, and then select the most appropriate or best match to display to the user.
Other embodiments may use different ordering, additional or fewer steps, and different nomenclature or terminology to accomplish similar functions. In some embodiments, various operations or sets of operations may be performed in parallel with other operations, in a synchronous or asynchronous manner. The steps selected here are chosen to illustrate some of the principles of operation in a simplified form.
At block 1202, a set of images of a user may be received, and at block 1204, a set of images of friends may be received. At block 1205, the set of images of the user's friends may be pre-processed. An example of pre-processing an image may be embodiment 500. The preprocessing of embodiment 500 may be applied to facial image analysis and may be extended to background image analysis, texture analysis, color histogram analysis, apparel analysis, and other image analysis preprocessing.
The preprocessing of block 1205 may correspond to any analysis performed prior to clustering the set of images for the user.
At block 1206, each image in the friend's set of images may be analyzed. For each image in block 1206, each cluster associated with the user's image collection may be analyzed at block 1208.
As described in embodiment 1100, each image set may include a plurality of clusters in a plurality of orthogonal axes. Each cluster may represent an important aspect or element of the user's image collection, and these aspects may be used to compare with images from the friend's image collection.
For each cluster in block 1208, at block 1210, a distance from the analyzed image to the nearest cluster may be determined. At block 1212, if the distance is within the centroid matching threshold, then at block 1218, the image is associated with the cluster.
If the distance is not within the centroid matching threshold at block 1212, a distance to the nearest neighbor may be determined at block 1214. If the distance to the nearest neighbor is not within the neighbor threshold at block 1216, then no match is determined.
The nearest neighbors may be images within a cluster. The nearest neighbor evaluation may identify images that fall outside of a cluster but are very close to one of the images grouped with the cluster. In an exemplary embodiment, the neighbor threshold may be smaller when compared to the centroid threshold.
After all images in the friend's image collection are analyzed in block 1206, the friend's image may be selected for presentation to the user.
At block 1220, the clusters of users may be ranked by size. The ranking may serve as a representation of importance to the user. In block 1222, each cluster may be evaluated. For each cluster in block 1222, the matching images may be compared to the clusters in block 1224 to find the images closest to the neighbors, and the images closest to the cluster centroid are found in block 1226. The best match may be determined in block 1228 and added to the user interface display in block 1230.
The process of blocks 1220 through 1230 may identify those matches that may be most relevant to the user and most likely a good match. Relevance may be determined by a ranking of clusters derived from a user's image collection. The best matches may be those images that are closest or very close to the centroid of the cluster to another image, which may be represented by nearest neighbors.
Image matching may be prone to noise, and many image matching algorithms may lead to false positive results, where images are incorrectly matched. In social networking applications with image matching, user satisfaction with the matching mechanism may be higher when a quality match is presented to the user.
The process of blocks 1220 through 1230 may select the best match from the available matches to present to the user. This process may select a representative match for each cluster and present each match to the user so that the user can view a wide variety of matches.
After the images are selected, the images organized by clusters may be presented to the user at block 1232. At block 1234, the user may browse and select images, and at block 1236, the images may be added to the user's collection.
In some embodiments, the user may be able to drill down into the matches of a certain cluster to see additional matches. In this case, the process of blocks 1220 through 1230 may be used to organize and select the most appropriate image from the subset of images that match a particular cluster.
The foregoing description of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the subject matter to the precise form disclosed, and other modifications and variations may be possible in light of the above teachings. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the appended claims be construed to include other alternative embodiments except insofar as limited by the prior art.

Claims (15)

1. An image matching method performed on a computer processor, the method comprising:
receiving image metadata from a first image collection associated with a first user, the image metadata including processed facial objects from the first image collection;
analyzing the image metadata to identify similar facial objects having matching criteria relative to a first threshold;
grouping the similar facial objects into a plurality of clusters, wherein each cluster defines a set of facial objects in the image metadata that are related to a single person;
ranking the plurality of clusters according to at least a size of the plurality of clusters to determine a relative importance of a person defined by the plurality of clusters to the first user; and
using the rankings of the plurality of clusters to prioritize, in various applications, people defined by the plurality of clusters;
wherein the method further comprises:
receiving image metadata from a second set of images associated with a second user;
comparing image metadata from the first image collection with image metadata from the second image collection to find matching images in the first and second image collections that share similar image metadata;
for each set of matched images,
comparing metadata associated with the set of matching images to determine if an event can be inferred, an
Where an event is identified, all images associated with the event are identified from the first set of images and the second set of images, respectively, a metadata tag is defined for the event, and the metadata tag is applied to all identified images associated with the event.
2. The method of claim 1, wherein the image metadata is generated by a second method comprising:
for each of the images in the first set of images, analyzing the image to identify faces within the image, processing the faces to determine a face vector for each of the faces, and storing the face vectors into the image metadata.
3. The method of claim 1, further comprising:
an identity of a person associated with at least one of the images is determined for each of a plurality of clusters.
4. The method of claim 3, further comprising:
ranking the identities of the persons based on the sizes of the plurality of clusters.
5. The method of claim 1, further comprising:
an identity of a person of one of the images in a first cluster is determined and at least one of the images in the first cluster is labeled with the identity of the person.
6. The method of claim 5, wherein the identity of the person is determined from a social networking application.
7. The method of claim 1, wherein the first threshold is determined by:
comparing the image metadata to a set of comparison image metadata derived from a second set of images, the comparison image metadata including a face vector having a low probability of matching a face of the face object, the first threshold determined such that the first set of images does not match the second set of images.
8. The method of claim 7, wherein the second set of images is created by:
identifying a first user associated with the first set of images;
identifying a set of users associated with the first user; and
creating the second set of images by identifying images within a set of images associated with each of the set of users.
9. The method of claim 7, wherein the second set of images is a default set of images.
10. An image matching system, comprising:
a social network comprising a plurality of users, each of the plurality of users having user metadata and a set of images;
a ranking engine that:
receiving image metadata from a first set of images associated with a first user, the image metadata including a processed facial object from the images;
analyzing the image metadata to identify similar facial objects having matching criteria relative to a first threshold;
grouping the similar facial objects into a plurality of clusters, wherein each cluster defines a group of facial objects in the image metadata that are associated with a single person;
determining a ranking of the plurality of clusters based at least on the size of the plurality of clusters to determine a relative importance of the person defined by the plurality of clusters to the first user; and
using the rankings of the plurality of clusters to prioritize, in various applications, people defined by the plurality of clusters; and
an analysis engine that:
receiving image metadata from a second set of images associated with a second user;
comparing image metadata from the first image collection with image metadata from the second image collection to find matching images in the first and second image collections that share similar image metadata;
for each set of matched images,
comparing metadata associated with the set of matching images to determine if an event can be inferred, an
Where an event is identified, all images associated with the event are identified from the first set of images and the second set of images, respectively, a metadata tag is defined for the event, and the metadata tag is applied to all identified images associated with the event.
11. The system of claim 10, further comprising:
a comparison engine that compares the clusters to image metadata from a second set of images to identify a set of images from the second set of images that have similar facial objects to the plurality of clusters; and
presenting at least a portion of the second set of images according to the ranking.
12. The system of claim 10, wherein the image metadata is determined by a method comprising:
for each of the images in the first set of images, analyzing the image to identify faces within the image, processing the faces to determine a face vector for each of the faces, and storing the face vectors into the image metadata.
13. The system of claim 12, wherein the method is performed by a device prior to uploading the image to the social network.
14. The system of claim 13, wherein the device comprises an image capture device.
15. The system of claim 13, wherein the device comprises a browser operable to access the social network.
HK13103280.8A 2010-03-01 2011-02-25 Ranking based on facial image analysis HK1176722B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US30902910P 2010-03-01 2010-03-01
US61/309,029 2010-03-01
US12/784,498 US9465993B2 (en) 2010-03-01 2010-05-21 Ranking clusters based on facial image analysis
US12/784,498 2010-05-21
PCT/US2011/026356 WO2011109250A2 (en) 2010-03-01 2011-02-25 Ranking based on facial image analysis

Publications (2)

Publication Number Publication Date
HK1176722A1 HK1176722A1 (en) 2013-08-02
HK1176722B true HK1176722B (en) 2016-04-29

Family

ID=

Similar Documents

Publication Publication Date Title
US11182643B2 (en) Ranking clusters based on facial image analysis
US8983210B2 (en) Social network system and method for identifying cluster image matches
US20110211737A1 (en) Event Matching in Social Networks
US10776418B2 (en) Relationship mapping employing multi-dimensional context including facial recognition
US9721148B2 (en) Face detection and recognition
US8370358B2 (en) Tagging content with metadata pre-filtered by context
CA2897227C (en) Method, system, and computer program for identification and sharing of digital images with face signatures
US8416997B2 (en) Method of person identification using social connections
US8666123B2 (en) Creating social network groups
EP2618290A2 (en) Method and apparatus to incorporate automatic face recognition in digital image collections
US20150131872A1 (en) Face detection and recognition
JP2013003648A (en) Content extracting device, content extracting method, and program
US12517965B2 (en) Systems and methods for identifying interests based on social media activity
US20130343618A1 (en) Searching for Events by Attendants
HK1176722B (en) Ranking based on facial image analysis
KR20220015871A (en) Apparatus for providing photo room photo content sharing service by face recognition in photo and address book mapping
CN102200988B (en) There is the social networking system of recommendation
KR20220015880A (en) A record medium for recording the photo-sharing service program