WO2025018140A1 - Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations - Google Patents
Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations Download PDFInfo
- Publication number
- WO2025018140A1 WO2025018140A1 PCT/JP2024/023875 JP2024023875W WO2025018140A1 WO 2025018140 A1 WO2025018140 A1 WO 2025018140A1 JP 2024023875 W JP2024023875 W JP 2024023875W WO 2025018140 A1 WO2025018140 A1 WO 2025018140A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- appearance
- unit
- information processing
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A45—HAND OR TRAVELLING ARTICLES
- A45D—HAIRDRESSING OR SHAVING EQUIPMENT; EQUIPMENT FOR COSMETICS OR COSMETIC TREATMENTS, e.g. FOR MANICURING OR PEDICURING
- A45D44/00—Other cosmetic or toiletry articles, e.g. for hairdressers' rooms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
Definitions
- This disclosure relates to an information processing device, an information processing method, and an information processing program.
- the above-mentioned conventional technologies do not necessarily provide adequate support to the user in changing a specific state to a target state.
- the above-mentioned conventional technologies merely present changes in the state of the facial surface texture or face shape in an easy-to-understand manner, and do not take into consideration the ability to appropriately guide a user who is in a specific state to a target state that the user is aiming for.
- this disclosure proposes an information processing device, an information processing method, and an information processing program that can appropriately support a user in changing a specific state to a target state.
- an information processing device includes an acquisition unit that acquires an object image, which is an image of a first object, and a reference image relating to a second object different from the first object; a conversion unit that generates a converted image in which the first object is converted based on the object image and the reference image; an estimation unit that estimates a processing procedure for changing the appearance of the first object to one based on the second object based on the converted image and the object image; and a generation unit that generates an image of the first object whose appearance has been changed in accordance with the processing procedure as an output image based on the object image.
- FIG. 1 is a diagram showing an overall view of information processing according to an embodiment.
- FIG. 1 is a diagram illustrating an example of a system according to an embodiment.
- FIG. 2 is a diagram illustrating an example of the configuration of a server device according to the embodiment.
- FIG. 1 is a diagram illustrating an example of the configuration of a learning device according to an embodiment.
- FIG. 11 is an explanatory diagram illustrating a preprocessing technique in the learning phase.
- FIG. 11 is an explanatory diagram illustrating a learning method in a learning phase.
- FIG. 1 is a diagram showing an example of an estimation method realized by information processing according to an embodiment.
- FIG. 2 is a diagram showing an example of an estimation method realized by the information processing according to the embodiment.
- FIG. 13 is a diagram illustrating an example of a configuration of a server device according to a first modified example.
- 13 is a flowchart showing an estimation process procedure according to Modification 1.
- 13 is a flowchart showing a malfunction detection process procedure according to Modification 1.
- FIG. 2 is a block diagram showing an example of a hardware configuration of a computer corresponding to the information processing device according to the embodiment.
- One or more of the embodiments (including examples, variations, and application examples) described below can be implemented independently. However, at least a portion of the embodiments described below may be implemented in appropriate combination with at least a portion of another embodiment. These embodiments may include novel features that are different from one another. Thus, these embodiments may contribute to solving different purposes or problems and may provide different effects.
- the proposed technology of the present disclosure may be suitably applied to assist in tasks that require multiple processing procedures.
- the proposed technology of the present disclosure assists a user in a task so that the user approaches a target appearance.
- the proposed technology of the present disclosure may be suitably applied in a scene such as assisting in face makeup work to approach a target makeup state, assisting in hair and makeup work to approach a target hairstyle state, and assisting in painting work to approach a target painting state.
- information processing in the case where the proposed technology of the present disclosure is applied to assist in face makeup work will be described, but similar information processing is also applicable to other scenes.
- the shape of the face and features i.e., facial shape and base color of the face
- the final result will often be different if you only refer to the face photo of the makeup model.
- makeup application instructions that take into account one's own facial shape, base color of the face, and the lighting environment when applying makeup.
- this disclosure proposes a system that uses a 3D Morphable Model (3DMM) that can express facial shape, base facial color, and even various lighting environments by adjusting lighting using parameters, to help a user get closer to their desired makeup look even if there are differences in facial shape, base facial color, or lighting environment between the user and a makeup model that represents the makeup look the user is aiming for.
- 3DMM 3D Morphable Model
- a facial image of a makeup model in a target makeup state is subjected to a conversion process to estimate the appearance of the makeup model in a live photograph.
- the appearance of the makeup model is converted so that the facial shape, etc., resembles that of the user while maintaining the texture information of the makeup model.
- the makeup application procedure between the two parties is inferred using a machine learning model obtained through learning.
- the makeup workflow refers to the multiple steps leading up to the completion of the makeup, with the target makeup state defined as the completed stage. Then, for each step, instructions on the makeup content are automatically generated. Also, for each step, a short video is generated showing how to apply the makeup and tips. Also, for each step, an image is generated that reflects how the user will look at that step on a 3D (three-dimensional) face model.
- the technology proposed in this disclosure does not need to automatically generate instructions for makeup, generate a short video showing how to apply makeup and tips, and generate a 3D face model that reflects the user's appearance; it is sufficient to perform at least one of these.
- FIG. 1 is a diagram showing an overall image of information processing according to the embodiment.
- FIG. 1 conceptually shows the contents of (i) to (iii) above.
- FIG. 1 also shows a scene in which a person P1 (user) in a natural state without makeup, i.e., in a pre-makeup state, requests the presentation of a work procedure for approaching the target makeup state by using a face image of a person Px (makeup model) in a target makeup state as a reference image.
- a server device 100 which is an example of an information processing device, estimates and presents a work procedure by the information processing according to the embodiment.
- person P1 uses user device 10 to input face image IM1 (an example of an object image) of his/her own face (an example of a first object) before makeup (appearance before makeup) to server device 100 (step S1).
- face image IM1 shows the appearance of the face of person P1 before makeup.
- face image IM1 may be, for example, a still image captured by the imaging function of user device 10, or a moving image.
- the person P1 also inputs a face image IM2 of a person in a makeup state (an example of a target appearance) that is, a face of a makeup model Px (an example of a second object) as a reference image (reference image) to the server device 100 (step S2).
- the face image IM2 shows the facial appearance of the makeup model Px in the makeup state that the person P1 is aiming for.
- the face image IM2 may be, for example, an image found through a web search, or a face image of another person photographed by the person P1.
- the makeup model Px may be a celebrity or actress that the person P1 likes.
- the makeup model Px may be a close relative of the person P1 (for example, a family member or friend), or the person P1 himself when he has created makeup that he likes.
- the face image IM2 may be a single still image or a moving image.
- the server device 100 uses a makeup procedure estimation model M (machine learning model) to estimate an operation procedure (processing procedure) for changing the pre-makeup state to the target makeup state (step S3).
- a makeup procedure estimation model M machine learning model
- the server device 100 inputs a 3D face model, which is a three-dimensional image generated by applying the facial image IM1 to the 3DMM, and a 3D face model, which is a three-dimensional image generated by applying the facial image IM2 to the 3DMM, into the makeup procedure estimation model M, and estimates the operation procedure based on the output result.
- Each 3D face model is adjusted so that the texture of the face surface other than the makeup is unified between the person P1 and the makeup model Px.
- the server device 100 adjusts to match conditions other than the makeup so that only the difference in texture before and after makeup can be accurately compared between the person P1 and the makeup model Px.
- conditions other than the makeup include the lighting environment in the space (e.g., the shooting space) where the face image IM1 was obtained.
- the server device 100 also performs a process of converting one of the two 3D face models based on the other 3D face model.
- the server device 100 converts the face shape of the 3D face model generated based on face image IM2 to match the face shape of the 3D face model generated based on face image IM1.
- the server device 100 can obtain a 3D face model in an unmade-up state having the skin base color of person P1, and a 3D face model in a target make-up state having the face shape of person P1.
- the 3D face model obtained here has unified conditions other than makeup as a result of the above-mentioned adjustment process. Therefore, the server device 100 inputs the 3D model after the adjustment process and conversion process into the makeup procedure estimation model M.
- FIG. 1 shows an example in which the server device 100 has estimated 10 steps of work procedures (some omitted) based on information output from the makeup procedure estimation model M.
- the example shows the server device 100 estimating the first step "putting in colored contact lenses," the second step “applying makeup base,” the third step “foundation + eye shadow,” ... the eighth step “drawing eyebrows,” the ninth step “putting on a wig,” and the tenth step “lips.”
- the server device 100 estimates the work procedure as shown in Figure 1, it generates output information to be output (presented) to the person P1 based on the estimation result (step S4). Specifically, the server device 100 generates an instruction sentence SM to be presented to the person P1, and a 3D face model FM that reflects, as the work result, how the appearance of the person P1 will change if the work procedure is actually performed.
- the server device 100 generates an instruction sentence SM1 instructing the first step based on the first step, "Put in colored contact lenses.”
- the server device 100 also generates a 3D face model FM1 that reflects the change in appearance that occurs in the face of person P1 when performing the first step, for the 3D face model generated based on face image IM1.
- the server device 100 processes the 3D face model before makeup is applied, changing the appearance of person P1 to one in which colored contact lenses are worn.
- the server device 100 generates an instruction sentence SM2 instructing the second step based on the second step "apply makeup base.”
- the server device 100 also generates a 3D face model FM2 that further reflects the changes in appearance that will occur in the face of the person P1 when performing the work in the second step, from the 3D face model FM1 that reflects the changes in appearance up to the first step.
- the server device 100 further processes the 3D face model FM1 that reflects the colored contact lens state, changing the appearance of the person P1 to one in which makeup base is applied.
- the server device 100 generates an instruction sentence SM3 instructing the third step based on the third step, "foundation + eye shadow.”
- the server device 100 also generates a 3D face model FM3 that further reflects the changes in appearance that will occur in the face of the person P1 when the work in the third step is performed, from the 3D face model FM2 that reflects the changes in appearance up to the second step.
- the server device 100 further processes the 3D face model FM2 with a makeup base applied, changing the appearance of the person P1 to one with foundation and eye shadow applied.
- the server device 100 generates an instruction sentence SM8 instructing the eighth step based on the eighth step, "Draw eyebrows.”
- the server device 100 also generates a 3D face model FM8 that further reflects the changes in appearance that will occur in the face of person P1 when performing the eighth step, from a 3D face model FM7 (not shown) that reflects the changes in appearance up to the seventh step.
- the server device 100 further processes the 3D face model FM7 to change the appearance of person P1 to one with drawn eyebrows.
- the server device 100 generates an instruction sentence SM9 instructing the ninth step based on the ninth step, "Put on a wig.”
- the server device 100 also generates a 3D face model FM9 that further reflects the changes in appearance that will occur in the face of person P1 when performing the ninth step, from the 3D face model FM8 that reflects the changes in appearance up to the eighth step.
- the server device 100 further processes the 3D face model FM8 with drawn eyebrows, thereby changing the appearance of person P1 to one wearing a wig.
- the server device 100 generates an instruction sentence SM10 instructing the tenth step based on the tenth step "lips".
- the server device 100 also generates a 3D face model FM10 that further reflects the changes in appearance that will occur in the face of person P1 when working on the tenth step, from the 3D face model FM9 that reflects the changes in appearance up to the ninth step.
- the server device 100 further processes the 3D face model FM9 in a state where a wig is worn, thereby changing the appearance of person P1 to a state where he or she is wearing lips.
- the server device 100 controls the output so that the output information generated in step S4 is output to the user device 10 of the person P1 (step S5).
- the server device 100 may output the instruction sentence and the 3D face model FM in a corresponding state for each work procedure.
- the server device 100 outputs the instruction sentence SM1 and the 3D face model FM1 in a corresponding state.
- the server device 100 also outputs the instruction sentence SM2 and the 3D face model FM2 in a corresponding state.
- the server device 100 also outputs the instruction sentence SM3 and the 3D face model FM3 in a corresponding state.
- the person P1 can view the 3D face model FM from various viewpoints using the user device 10.
- the person P1 can freely rotate the 3D face model FM using the user device 10.
- FIG. 2 is a diagram showing an example of a system according to an embodiment.
- Fig. 1 shows a system 1 as an example of a system according to an embodiment. Information processing according to the embodiment is realized in the system 1.
- system 1 includes a user device 10, a learning device 30, and a server device 100. Furthermore, user device 10, learning device 30, and server device 100 are connected via network N so as to be able to communicate with each other via wired or wireless communication. There is no limit to the number of user devices 10, learning devices 30, and server devices 100 included in system 1.
- the user device 10 is an information processing terminal used by a person who wishes to receive guidance in changing the appearance of a first object to an appearance based on a second object different from the first object, i.e., a target appearance.
- the user device 10 is a smartphone, a wearable device, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), etc.
- An application that enables the transmission and reception of information between the user device 10 and the server device 100 may be installed in the user device 10.
- the learning device 30 learns an estimation model for estimating a processing procedure for changing the appearance of the first object to a target appearance.
- the estimation model may be learned using various known machine learning techniques as appropriate.
- the estimation model may be learned using a machine learning technique for supervised learning, such as SVM (Support Vector Machine).
- the estimation model may also be learned using a machine learning technique for unsupervised learning.
- the estimation model may also be learned using a deep learning technique.
- the estimation model may also be learned using various deep learning techniques, such as DNN (Deep Neural Network), RNN (Recurrent Neural Network), and CNN (Convolutional Neural Network).
- the server device 100 is a cloud computer that plays a central role in performing information processing according to the embodiment.
- the server device 100 inputs image information (e.g., an original image and a reference image) acquired via the user device 10 to a machine learning model generated by the learning device 30, and estimates a processing procedure for changing the state shown in the original image to the target state shown in the reference image based on the output information from the model.
- image information e.g., an original image and a reference image
- the server device 100 inputs image information (e.g., an original image and a reference image) acquired via the user device 10 to a machine learning model generated by the learning device 30, and estimates a processing procedure for changing the state shown in the original image to the target state shown in the reference image based on the output information from the model.
- Fig. 3 is a diagram showing an example of the configuration of the server device 100 according to the embodiment.
- the server device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.
- the communication unit 110 is realized by, for example, a network interface card (NIC) etc.
- NIC network interface card
- the communication unit 110 is connected to the network N by wire or wirelessly, and transmits and receives information between the user device 10 and the learning device 30.
- the storage unit 120 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk, an optical disk, etc.
- the storage unit 120 has a model data storage unit 121, an image data storage unit 122, and an estimation procedure data storage unit 123.
- the model data storage unit 121 stores data of an estimation model for estimating a processing procedure for changing a given state of a first object to a target state.
- the image data storage unit 122 stores various image data used in the information processing according to the embodiment.
- the estimation procedure data storage unit 123 stores a processing procedure for changing a given state of the first object to a target state, and data generated based on the processing procedure.
- Control unit 130 The control unit 130 is realized by a central processing unit (CPU), a micro processing unit (MPU), or the like executing various programs (e.g., the information processing program according to the embodiment) stored in a storage device inside the server device 100 using a RAM as a working area.
- the control unit 130 is also realized by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- control unit 130 has an image acquisition unit 131, an adjustment unit 132, a conversion unit 133, an estimation unit 134, a generation unit 135, and an output control unit 136, and realizes or executes the functions and actions of the information processing described below.
- the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be other configurations as long as they perform the information processing described below.
- the connection relationships between the processing units in the control unit 130 are not limited to the connection relationships shown in FIG. 3, and may be other connection relationships.
- the image acquisition unit 131 acquires an image input by a user as an image to be used for estimating a processing procedure. For example, the image acquisition unit 131 acquires an object image which is an image of a first object in a specific appearance state. The image acquisition unit 131 also acquires an image of a second object having an appearance targeted by the first object as a reference image. The image acquisition unit 131 also stores the object image and the reference image in the image data storage unit 122.
- the adjustment unit 132 adjusts the object image and the reference image to match other conditions other than the appearance of the first object shown in the object image and the appearance of the target shown in the reference image. For example, the adjustment unit 132 removes the information of the lighting environment estimated based on the object image from the object image and removes the information of the lighting environment estimated based on the reference image from the reference image, thereby removing the lighting environment conditions between the object image and the reference image. Furthermore, the adjustment unit 132 may unify the lighting environment conditions between the object image and the reference image by correcting the reference image using the information of the lighting environment estimated based on the object image in a state in which the information of the lighting environment estimated based on the reference image is removed from the reference image.
- the conversion unit 133 generates a converted image in which the other of the object image and the reference image is converted into an image of a first object whose appearance is reflected to a target, based on either the object image or the reference image. For example, the conversion unit 133 generates a converted image by converting the reference image so that feature information of a second object extracted from the reference image matches feature information of a first object extracted from the object image. Taking the support of face makeup work as an example, the conversion unit 133 converts the reference image so that the face shape of a makeup model extracted from the reference image matches the face shape of a user extracted from the object image.
- the estimation unit 134 estimates a processing procedure for changing the appearance of the first object to an appearance based on a second object, based on the transformed image generated by the transformation unit 133 and the object image. Specifically, the estimation unit 134 estimates a processing procedure for changing the appearance of the first object to an appearance of a target targeted by the first object, based on output information from an estimation model that receives as input a pair of the transformed image and the object image. The target appearance is possessed by a second object different from the first object.
- the generation unit 135 generates, as an output image, an image of the first object whose appearance has changed according to the processing procedure estimated by the estimation unit 134, based on the object image. For example, the generation unit 135 generates, as the output image, an image in which the appearance of the first object according to the processing procedure is reflected as a work result.
- the generation unit 135 also generates, as output information to be output together with the output image, an instruction sentence instructing the user to perform a task in accordance with the processing procedure.
- the generation unit 135 may also generate, as output information, a detailed sentence that explains the content of the instruction sentence in more detail, based on a predetermined language model and the instruction sentence.
- the output control unit 136 presents to the user the output information generated by the generation unit 135. Specifically, the output control unit 136 controls the output so that the output information generated by the generation unit 135 is output to the user device 10.
- the instruction text and detailed text may be output in text format or audio format.
- Fig. 4 is a diagram showing an example of the configuration of the learning device 30 according to the embodiment.
- the learning device 30 includes a communication unit 31, a storage unit 32, and a control unit 33.
- the communication unit 31 is realized by, for example, a network interface card (NIC), etc.
- NIC network interface card
- the communication unit 31 is connected to the network N by wire or wirelessly, and transmits and receives information to and from the server device 100.
- the storage unit 32 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.
- the storage unit 120 has a model data storage unit 32a.
- the model data storage unit 32a stores data of an estimation model for estimating a processing procedure for changing the appearance of a first object to a target appearance.
- Control unit 33 The control unit 22 is realized by a CPU, an MPU, or the like, executing various programs (e.g., an information processing program according to the embodiment) stored in a storage device inside the learning device 30 using a RAM as a working area.
- the control unit 33 is realized by an integrated circuit such as an ASIC or an FPGA.
- control unit 33 has an acquisition unit 33a, a generation unit 33b, and a learning unit 33c, and realizes or executes the functions and actions of the information processing described below.
- the internal configuration of the control unit 33 is not limited to the configuration shown in FIG. 4, and may be other configurations as long as they perform the information processing described below.
- the connection relationships between the processing units in the control unit 33 are not limited to the connection relationships shown in FIG. 4, and may be other connection relationships.
- the acquisition unit 33a acquires information constituting the learning data.
- the acquisition unit 33a acquires a video including images of an action of changing the appearance of a predetermined object related to the first object into the appearance of a completed state, and speech information within the video explaining the content of the action.
- the generation unit 33b generates learning data by combining a video consisting of images of an action that changes the appearance of a specified object related to the first object to the appearance of a completed state with speech information within the video that explains the content of the action.
- the learning unit 33c uses the learning data to train a model to learn the relationship between images before and after a change in appearance of a predetermined object and an action caused by the change in appearance. For example, when a pair of a transformed image and an object image is input, the learning unit 33c trains the model to learn the above relationship so as to output information on a processing procedure for changing the appearance of a first object to a target appearance, thereby generating an estimation model.
- the target appearance is defined by the transformed image.
- FIG. 5 is an explanatory diagram explaining the preprocessing method in the learning phase.
- FIG. 5 shows a scene in which learning data is generated using a video VD consisting of a group of images of makeup movements that change an arbitrary person Py from a pre-makeup state to a made-up state, and speech information within the video VD that explains the contents of the makeup movements.
- FIG. 5 shows an example in which learning data is generated based on a makeup video with audio.
- the video VD is acquired by the acquisition unit 33a and processed into learning data by the generation unit 33b.
- the generation unit 33b generates learning data using the video VD, which is composed of a group of images of makeup movements that change an arbitrary person Py from a pre-makeup state to a made-up state, and speech information in the video VD that explains the contents of the makeup movements.
- the generation unit 33b extracts, from among the actions performed in the video VD, actions necessary for facial makeup (actions to bring the makeup to a completed state) in association with the timestamps at which the actions necessary for facial makeup were performed.
- the generation unit 33b may extract pairs of actions necessary for facial makeup and timestamps at which the actions necessary for facial makeup were performed by analyzing speech information (audio data) included in the video VD.
- speech information audio data
- subtitle information in which the speech information (audio data) included in the video VD is transcribed and timestamps may be inserted in advance as metadata into the video VD. In such cases, the generation unit 33b can extract pairs of actions necessary for facial makeup and timestamps at which the actions necessary for facial makeup were performed based on the metadata.
- FIG. 5 shows an example in which the generation unit 33b extracts a pair of play time "3:50" and a necessary action "skin care” as a pair of a timestamp and an action required for facial makeup. It also shows an example in which the generation unit 33b extracts a pair of play time “4:55” and a necessary action “base” as a pair of a timestamp and an action required for facial makeup. It also shows an example in which the generation unit 33b extracts a pair of play time "7:20” and a necessary action "foundation” as a pair of a timestamp and an action required for facial makeup.
- the generation unit 33b searches within the video VD for a range that corresponds to the extracted necessary action for each necessary action.
- the generation unit 33b searches for a range in the video VD where the necessary action "skin care” was actually performed, based on the pair of playback time "3:50" and the necessary action "skin care".
- Figure 5 shows an example in which the generation unit 33b searches for video range RA1 as the range in the video VD where the necessary action "skin care" was actually performed.
- the generation unit 33b also searches for a range in the video VD where the necessary action "groundwork” was actually performed, based on the pair of the playback time "4:55" and the necessary action "groundwork.”
- Figure 5 shows an example in which the generation unit 33b searches for a video range RA2 as a range in the video VD where the necessary action "groundwork" was actually performed.
- the generation unit 33b also searches for a range in the video VD where the necessary action "applying foundation” was actually performed, based on the pair of the playback time "7:20" and the necessary action "applying foundation.”
- Figure 5 shows an example in which the generation unit 33b searches for a video range RA as a range in the video VD where the necessary action "applying foundation" was actually performed.
- the generation unit 33b also searches for a range in the video VD where the necessary action "concealer” was actually performed, based on the pair of the playback time "7:38" and the necessary action "concealer.”
- Figure 5 shows an example in which the generation unit 33b searches for a video range RA4 as a range in the video VD where the necessary action "concealer” was actually performed.
- FIG. 6 is an explanatory diagram for explaining the learning method in the learning phase.
- the example in FIG. 5 is still used.
- FIG. 6 shows a scene in which learning data is generated and learning processing is performed based on the pair of the playback time "17:30" and the required action "lip".
- the generation unit 33b obtains a representative frame image FL12 of a facial feature from a moving image range RA12 in which the required action "lips" was actually performed in the moving image VD.
- the generation unit 33b also obtains a representative frame image FL11 of a facial feature from a moving image range RA11 in which the required action "eyebrows", which is one action before the required action "lips", was actually performed. In this way, the generation unit 33b obtains representative frames of a facial feature before and after each action required for facial makeup.
- the generating unit 33b generates a combination of the frame image FL11, the frame image FL12, and the action resulting from the state change from the appearance of the person Py shown in the frame image FL11 (appearance before applying lipstick) to the appearance of the person Py shown in the frame image FL12 (appearance after applying lipstick), i.e., the required action "lips," as one piece of learning data.
- the learning unit 33c trains the model to learn the relationship between frame image FL11, frame image FL12, and the required action "lips.” For example, the learning unit 33c learns that the task of "applying lipstick” is required to change the appearance shown in frame image FL11 (appearance before applying lipstick) to the appearance shown in frame image FL12 (appearance after applying lipstick). Note that the learning unit 33c may also learn the difference from the completed makeup state by using frame images of the completed makeup state as learning data.
- FIG. 6 focuses on the required action "lips," the generation unit 33b will learn the change in appearance before and after each extracted required action. For example, the generation unit 33b acquires a representative frame image FL2 of facial features from a video range RA2 in which the required action "basework” was actually performed in the video VD. The generation unit 33b also acquires a representative frame image FL1 of facial features from a video range RA1 in which the required action "skin care,” which is one action before the required action "basework,” was actually performed.
- the generating unit 33b generates a combination of frame image FL1, frame image FL2, and an action resulting from the change in appearance of the person Py shown in frame image FL1 (appearance before the base is applied) to the appearance of the person Py shown in frame image FL2 (appearance after the base is applied), i.e., the required action "base", as one piece of learning data.
- the learning unit 33c also trains the model to understand the relationship between frame image FL1, frame image FL2, and the required action "base coat.” For example, the learning unit 33c learns that the task of "applying base coat” is required to change the appearance shown in frame image FL1 (appearance before the base coat is applied) to the appearance shown in frame image FL2 (appearance after the base coat is applied).
- the learning unit 33c learns about the change in appearance before and after the required action from many more videos VDs, rather than from one video VD. As a result, the learning unit 33c can generate an estimation model with higher accuracy.
- FIG. 7 A method of estimating a work procedure, which is realized by the information processing according to the embodiment, will be described with reference to Fig. 7 and Fig. 8.
- Fig. 7 continues to use the example content in Fig. 1, and describes an adjustment processing method for unifying conditions between face image IM1 (object image) and face image IM2 (reference image), and a conversion processing method for matching human features between face image IM1 and face image IM2.
- Fig. 8 describes a method of estimating a work procedure and a method of outputting information based on the estimation result.
- Fig. 7 is a diagram (1) showing an example of an estimation method realized by the information processing according to the embodiment.
- Fig. 7 shows an example in which a person P1 (user) inputs a face image IM1 of his/her own face in a pre-makeup state (appearance before makeup) to the server device 100 using the user device 10.
- Fig. 7 also shows an example in which the person P1 inputs a face image IM2 of a person in a makeup state (target appearance) that the person P1 is aiming for, i.e., a makeup model Px, to the server device 100 as a reference image.
- a makeup state target appearance
- the image acquisition unit 131 acquires a facial image IM1 in response to an image input by the person P1 (step S101).
- the generation unit 135 applies the facial image IM1 to a three-dimensional prediction model (e.g., 3DMM) to generate a 3D facial model FMx of the person P1 (step S102).
- the facial surface texture in the 3D facial model FMx includes the appearance features before makeup and the skin base color (skin color) of the person P1.
- the facial surface texture in the 3D facial model FMx is also affected by the light source (e.g., shadow information) used in the space where the facial image IM1 was captured.
- the adjustment unit 132 estimates information about the lighting environment in the space in which the facial image IM1 was captured based on the 3D face model FMx (step S103). For example, the adjustment unit 132 may estimate, as information about the lighting environment, the intensity of light from a light source used in the capture space of the facial image IM1, the angle at which light is irradiated from the light source to the person P1, etc.
- the adjustment unit 132 removes the information on the lighting environment estimated in step S103 from the 3D face model FMx (step S104). For example, the adjustment unit 132 removes the influence of the light source used in the shooting space of the face image IM1 on the appearance of the face image IM1 based on the information on the lighting environment. As a result, by removing the influence of the light source, the adjustment unit 132 can obtain a 3D face model FMxx as the face surface texture in which the appearance features before makeup and the base skin color of the person P1 are reflected in a state and color that are close to the real thing.
- the image acquisition unit 131 also acquires a facial image IM2 in response to an image input by the person P1 (step S201).
- the generation unit 135 then applies the facial image IM2 to a three-dimensional predictive model (e.g., 3D MMM) to generate a 3D facial model FMy of the makeup model Px (step S202).
- the facial surface texture in the 3D facial model FMy includes features of the post-makeup state.
- the facial surface texture in the 3D facial model FMy is in a state where it is influenced by the light source (e.g., shadow information) used in the space where the facial image IM2 is captured.
- the adjustment unit 132 estimates information about the lighting environment in the space in which the facial image IM2 was captured based on the 3D facial model FMy (step S203). For example, the adjustment unit 132 may estimate, as information about the lighting environment, the intensity of light from a light source used in the capture space of the facial image IM2, the angle at which light is irradiated from the light source to the makeup model Px, etc.
- the adjustment unit 132 removes the information on the lighting environment estimated in step S203 from the 3D face model FMy (step S204). For example, the adjustment unit 132 removes the influence of the light source used in the shooting space of the face image IM2 on the appearance of the face image IM2 based on the information on the lighting environment. As a result, by removing the influence of the light source, the adjustment unit 132 can obtain a 3D face model FMxx as the face surface texture in which the appearance features of the made-up state are reflected in a state and color that are close to the real thing.
- steps S104 and S204 which remove the lighting environment information from both 3D face models, are adjustment processes for unifying other conditions (lighting conditions) between the person P1 and the makeup model Px other than the pre-makeup and post-makeup states.
- the adjustment unit 132 may perform an adjustment process for matching the lighting conditions on the makeup model Px side to the lighting conditions on the person P1 side, rather than simply removing the lighting environment information.
- the adjustment unit 132 may correct the 3D face model FMy using the lighting environment information estimated in step S103 (lighting conditions on the person P1 side) in a state in which the lighting environment information estimated in step S203 (lighting conditions on the makeup model Px side) is removed from the 3D face model FMy. More specifically, the adjustment unit 132 applies the lighting environment information estimated in step S103 to the 3D face model FMy from which the lighting environment information estimated in step S203 has been removed, thereby correcting the facial surface texture of the 3D face model FMy so that the facial surface texture of the 3D face model FMy corresponds to the lighting conditions on the person P1 side.
- the conversion unit 133 extracts facial feature information (step S305). Specifically, the conversion unit 133 extracts facial feature information of the person P1 from the 3D face model FMxx, and extracts facial feature information of the makeup model Px from the 3D face model FMyy. For example, the conversion unit 133 may extract facial shape information of the person P1 from the 3D face model FMxx, and extract facial shape information of the makeup model Px from the 3D face model FMyy.
- the facial shape information may include not only information indicating the facial contour, but also information indicating the concaves and convexes of the face (for example, nose shape, nose height, lip shape, lip thickness, etc.).
- the conversion unit 133 converts the 3D face model FMyy so that the facial feature information of the makeup model Px matches the facial feature information of the person P1 based on the feature information extracted in step S305 (step S306).
- the conversion unit 133 converts the shape of the 3D face model FMyy so that the facial shape of the makeup model Px matches the facial shape of the person P1.
- the conversion unit 133 can obtain a 3D face model FMyyx in which the facial feature information of the makeup model Px matches the facial feature information of the person P1.
- the conversion unit 133 may also perform UV mapping (step S307). Specifically, the conversion unit 133 performs UV mapping on the 3D face model FMxx to obtain a 2D face image UVG1 as a two-dimensional UV map. The conversion unit 133 also performs UV mapping on the 3D face model FMyyx to obtain a 2D face image UVG2 as a two-dimensional UV map.
- the texture of the face surface other than the makeup is unified between the 2D face image UVG1 (person P1) and the 2D face image UVG2 (makeup model Px).
- the face shape and lighting conditions are unified between the 2D face image UVG1 and the 2D face image UVG2 obtained by the process up to this point, and simply, only the difference between the state before makeup and the state after makeup remains.
- the server device 100 can accurately extract only the difference in the face surface texture before and after makeup, and in addition to these comparisons, it becomes possible to accurately estimate the work procedure by using the skin base color as a hint.
- the method of estimation processing performed after step S307 will be described in FIG. 8.
- the conversion unit 133 may perform the reverse process rather than converting the 3D face model FMyy so as to match the facial feature information of the makeup model Px to the facial feature information of the person P1. Specifically, the conversion unit 133 may convert the 3D face model FMxx so as to match the facial feature information of the person P1 to the facial feature information of the makeup model Px.
- [7-2. Information processing method (2)] 8 is a diagram (2) showing an example of an estimation method realized by the information processing according to the embodiment.
- the estimation unit 134 inputs a UV map or a three-dimensional face model to the makeup procedure estimation model M (step S0401).
- the estimation unit 134 inputs the 2D face image UVG1 and the 2D face image UVG2 generated in step S307 of FIG. 7 to the makeup procedure estimation model M.
- the estimation unit 134 may further input a three-dimensional face model to the makeup procedure estimation model M.
- the estimation unit 134 may input a set of the 2D face image UVG1 and the 3D face model FMxx and a set of the 2D face image UVG2 and the 3D face model FMyyx to the makeup procedure estimation model M.
- the estimation unit 134 may adopt a method in which the three-dimensional face model is input while the UV map is not input.
- the estimation unit 134 may simply input the 3D face model FMxx and the 3D face model FMyyx to the makeup procedure estimation model M.
- the learning method of the makeup procedure estimation model M is as described with reference to FIGS. 5 and 6.
- the estimation unit 134 estimates the makeup procedure for changing the pre-makeup state shown in the facial image IM1 to the target makeup state shown in the facial image IM2 (step S402).
- FIG. 8 shows an example in which the estimation unit 134 estimates the first step "putting in colored contact lenses," the second step “applying makeup base,” the third step “foundation + eye shadow,” ... the eighth step “drawing eyebrows,” the ninth step “putting on a wig,” and the tenth step “lips.”
- the estimation unit 134 does not necessarily estimate a 10-step work procedure. For example, when a face image IM1 with makeup partially applied is input, rather than a face image IM1 without makeup applied, the estimation unit 134 may estimate a work procedure consisting of fewer steps. Furthermore, depending on the base skin color of person P1, the estimation unit 134 may estimate a work procedure consisting of more steps. The reason that the number of steps and the contents of the work procedure can be changed in this way according to the situation is that the information processing according to the embodiment is not simply a rule-based estimation, but aims to bring the atmosphere closer to the target state by using a machine learning model.
- the generation unit 135 also generates an instruction statement SM that instructs the user to perform the work procedure based on the work procedure estimated in step S402 (step S403).
- FIG. 8 shows an example in which the generation unit 135 generates an instruction statement SM1 that instructs the user to perform the first procedure based on the first procedure, "Put in colored contact lenses.”
- the generating unit 135 also generates a 3D face model FM for each work procedure estimated in step S402, in which the appearance of the person P1 after the change is reflected as the work result when the work procedure indicated in the work procedure is actually performed (step S404).
- the generation unit 135 generates a 3D face model FM1 that reflects the change in appearance that occurs in the appearance of the face of the person P1 when the work is performed in the first step for the 3D face model FMxx.
- the generation unit 135 processes the 3D face model FMxx in a state before makeup, thereby changing the appearance of the person P1 to a state in which the person P1 is wearing colored contact lenses.
- the generating unit 135 also generates a 3D face model FM2 that further reflects the changes in appearance that will occur in the face of the person P1 when working in the second step, from the 3D face model FM1 that reflects the changes in appearance up to the first step. Specifically, the generating unit 135 further processes the 3D face model FM1 that reflects the colored contact lens state, changing the appearance of the person P1 to one in which they are wearing a makeup base.
- the generating unit 135 also generates a 3D face model FM3 that further reflects the changes in appearance that will occur in the face of the person P1 when the work is performed in the third step, from the 3D face model FM2 that reflects the changes in appearance up to the second step. Specifically, the generating unit 135 further processes the 3D face model FM2 in a state where a makeup base has been applied, changing the appearance of the person P1 in a state where they have applied foundation and eye shadow.
- the generating unit 135 generates a 3D face model FM8 that further reflects the changes in appearance that will occur in the face of the person P1 when the work is performed in the eighth step from the 3D face model FM7 that reflects the changes in appearance up to the seventh step.
- the server device 100 further processes the 3D face model FM7 to change the appearance of the person P1 to a state in which the person has drawn eyebrows.
- the generating unit 135 also generates a 3D face model FM9 that further reflects the changes in appearance that will occur in the face of the person P1 when the ninth step is performed on the 3D face model FM8 that reflects the changes in appearance up to the eighth step. Specifically, the generating unit 135 further processes the 3D face model FM8 with drawn eyebrows to change the appearance of the person P1 to one in which he or she is wearing a wig.
- the generating unit 135 also generates a 3D face model FM10 that further reflects the changes in appearance that will occur in the face of the person P1 when the work is performed in the tenth step, from the 3D face model FM9 that reflects the changes in appearance up to the ninth step. Specifically, the generating unit 135 further processes the 3D face model FM9 in a state where a wig is worn, thereby changing the appearance of the person P1 to a state where he or she is wearing lipstick.
- the generation unit 135 may input the instruction SM generated in step S403 into the large-scale language model LLM (step S405) and, based on the output information, further generate a detailed instruction that explains the content of the instruction SM in more detail (step S406). For example, based on the instruction SM1 and the large-scale language model, the generation unit 135 can generate a detailed instruction that follows the sequence: "Place the lens on the tip of the index finger of your dominant hand” -> "Pull down the lower eyelid with the middle finger of your dominant hand” -> “Once the lens is properly placed on the pupil, slowly release the finger that was holding down the eyelid.”
- the output control unit 136 controls the output so that output information associating the instruction SM with the 3D face model FM for each work procedure is output to the user device 10 of person P1 (step S407).
- the method of outputting the detailed instruction is not limited.
- the output control unit 136 may cause the user device 10 to output a detailed instruction corresponding to the selected instruction SM.
- the output control unit 136 may cause the detailed instruction to be displayed together with the instruction SM.
- server device 100 may be implemented in various different aspects other than the above embodiment. Therefore, hereinafter, the server device 100 according to the first modified example of the present disclosure will be referred to as a "server device 100A.”
- the server device 100A detects an erroneous user action, it has a function of re-estimating the procedure to change the current makeup state to the target state, starting from the current makeup state. This function may be newly incorporated into the server device 100 according to the embodiment described in FIG. 3, and the detailed functional configuration will be described below.
- Fig. 9 is a diagram showing an example of the configuration of the server device 100A according to the modified example 1.
- the server device 100A has a communication unit 110, a storage unit 120, and a control unit 130A.
- the communication unit 110 and the storage unit 120 are the same as those in Fig. 3, and therefore description thereof will be omitted.
- Control unit 130A The control unit 130A is realized by a CPU, an MPU, or the like executing various programs (e.g., the information processing program according to Modification 1) stored in a storage device inside the server device 100A using a RAM as a working area.
- the control unit 130A is also realized by an integrated circuit such as an ASIC or an FPGA.
- the control unit 130A has an image acquisition unit 131, an adjustment unit 132, a conversion unit 133, an estimation unit 134, a generation unit 135, an output control unit 136, and a detection unit 137, and realizes or executes the functions and actions of the information processing described below. In this way, compared to the server device 100, the control unit 130A newly has a detection unit 137.
- the internal configuration of the control unit 130A is not limited to the configuration shown in FIG. 9, and may be other configurations as long as they perform the information processing described below.
- the connection relationships of each processing unit in the control unit 130A are not limited to the connection relationships shown in FIG. 9, and may be other connection relationships.
- Image Acquisition Unit 131 In the above embodiment, an example has been shown in which the image acquisition unit 131 acquires, as input information for the estimation process, one object image showing an original state before the appearance is changed toward a target state. For example, an example has been shown in which the image acquisition unit 131 acquires, as input information, a face image IM1 showing a state before makeup is applied as one still image captured by the imaging function of the user device 10.
- the image acquisition unit 131 may successively acquire object images in which actions that change the appearance of the first object are captured in real time.
- One such case is, for example, a scene in which a user uses the user device 10 to capture a video in real time of the user gradually applying makeup from an unapplied state.
- Another possible scene is one in which the user uses the user device 10 to capture still images or video of the user's makeup application process.
- the image acquisition unit 131 successively acquires object images captured in real time.
- the detection unit 137 detects an erroneous action of the user based on the object images sequentially acquired by the image acquisition unit 131. For example, the detection unit 137 may detect, as the erroneous action, a deviation between an action procedure actually performed to change the appearance of the first object and a processing procedure estimated so far.
- Estimatiation unit 134 When an erroneous operation is detected by the detection unit 137, the estimation unit 134 re-estimates the processing procedure for changing the current appearance of the first object into the target appearance by using the latest object image among the object images acquired successively.
- Fig. 10 is a flowchart showing the procedure of the estimation process according to Modification 1.
- Fig. 10 assumes a usage scene in which a face image IM1 in a makeup-applied state at the current time of makeup application is further input while confirming the work procedure estimated based on a face image IM1 corresponding to a state before makeup and a face image IM2 corresponding to a makeup state targeted by a person P1.
- the image acquisition unit 131 determines whether or not a new facial image IM1 has been acquired (step S1001). While the image acquisition unit 131 has not been able to acquire a new facial image IM1 (step S1001; No), it waits until a new facial image IM1 can be acquired. On the other hand, when a facial image IM1 captured in real time of a person P1 applying makeup is input to the server device 100 by the user device 10 (during this time, the person P1 is applying makeup while looking at the work procedure presented by the server device 100), the image acquisition unit 131 can determine that a new facial image IM1 has been acquired.
- the new facial image IM1 may be a single still image or a moving image.
- the estimation unit 134 estimates the work procedure for changing the current makeup state shown in the new face image IM1 to a target makeup state based on the acquired new face image IM1 and the previously input face image IM2 (step S1002).
- the estimation unit 134 actually estimates the work procedure by inputting the 2D face image UVG1 (or 3D face model FMxx) generated based on the new face image IM1 and the 2D face image UVG2 (or 3D face model FMyyx) generated based on the face image IM2 into the makeup procedure estimation model M. As described in FIG. 7, adjustment processing is performed by the adjustment unit 132 and conversion processing is performed by the conversion unit 133 before the UV map (2D face image UVG1, 2D face image UVG2) is obtained.
- the generation unit 135 generates output information to be presented to the person P1 based on the work steps estimated in step S1002 (step S1003). Specifically, the generation unit 135 generates, for each work step, an instruction statement SM and a 3D face model that reflects the results of the work step being performed.
- the server device 100 repeats the estimation process in response to input of face image IM1 in the middle of makeup application. Then, based on a comparison between the makeup actions actually performed by person P1 and the work procedures estimated up to that point, if the server device 100 detects an incorrect action, it presents the output information generated in step S1003 to person P1 as the re-estimation result. This point will be explained in more detail in FIG. 11.
- [8-3. Processing procedure (2)] 11 is a flowchart showing a malfunction detection process procedure according to Modification 1.
- the image acquisition unit 131 determines whether or not a new face image IM1 has been acquired (step S1101). If the image acquisition unit 131 has not been able to acquire a new face image IM1 (step S1101; No), the image acquisition unit 131 waits until the new face image IM1 can be acquired.
- step S1101 if a new face image IM1 is acquired (step S1101; Yes), the detection unit 137 performs image analysis on the newly acquired face image IM1 and identifies the makeup application that is currently being performed (step S1102).
- the detection unit 137 compares the work procedure estimated so far by the estimation unit 134 with the actual makeup operations identified in step S1102, and detects whether there is a discrepancy between the estimated work procedure and the actual makeup operations (step S1103). For example, the detection unit 137 may compare the earliest estimation result (i.e., the work procedure estimated based on the face image IM1 corresponding to the pre-makeup state and the face image IM2 corresponding to the makeup state targeted by the person P1) with the actual makeup operations.
- step S1103 If the detection unit 137 cannot detect any discrepancy between the estimated work procedure and the actual makeup movements (step S1103; No), it returns the process to step S1101.
- the output control unit 136 detects a discrepancy between the estimated work procedure and the actual makeup application operations (step S1103; Yes), it acquires output information corresponding to the currently acquired face image IM1 as information on the re-estimated work procedure (step S1104). Specifically, the output control unit 136 acquires output information generated in the procedure of FIG. 10 using the new face image IM1 acquired in step S1101.
- the output control unit 136 controls the output so that the acquired output information is output to the user device 10 of person P1 (step S1105).
- Modification 2 Next, a description will be given of Modification 2 of the present disclosure.
- the process according to Modification 1 of the present disclosure may also be performed by the server device 100A.
- the detection unit 137 when the detection unit 137 detects a deviation between the estimated work procedure and the actual makeup actions, it may dynamically determine a new target appearance based on the makeup actions that are currently being performed. For example, the detection unit 137 compares the estimated work procedure with the actual makeup actions and detects a deviation in which the action of "applying makeup base" was performed while the action of "putting in colored contact lenses” was skipped. In such a case, the detection unit 137 may determine, for example, a "natural makeup” state that does not look unnatural even without “colored contact lenses” as the new target appearance. The detection unit 137 may determine the new target appearance according to a rule base, or may use a machine learning model to estimate a makeup state that suits without "colored contact lenses.”
- the estimation unit 134 may estimate a procedure for changing the current makeup state shown in the new face image IM1 to the new target appearance based on the new face image IM1 and the face image IMx of the new target appearance.
- the output control unit 136 may then recommend the estimated procedure to the person P1.
- the output control unit 136 may present the procedure for changing the appearance to the new target appearance together with a comment such as "It looks like you are not wearing color contact lenses. In this state, why not try applying makeup using the following procedure?".
- the server device 100 estimates a processing procedure for changing the appearance of the first object to a target appearance targeted by the first object as a processing procedure for changing the appearance of the first object to an appearance based on the second object.
- the server device 100 does not necessarily need to estimate a processing procedure for changing the appearance of the first object to a target appearance, and may, for example, estimate a processing procedure for changing the appearance of the first object to an arbitrary appearance desired by the user, or may estimate a processing procedure for changing the appearance of the first object to an appearance preferred by the user.
- FIG. 12 is a block diagram showing a hardware configuration example of a computer corresponding to the information processing device according to the embodiment of the present disclosure. Note that Fig. 12 shows an example of the hardware configuration of a computer corresponding to the information processing device according to each embodiment, and does not need to be limited to the configuration shown in Fig. 12.
- computer 1000 has a CPU (Central Processing Unit) 1100, RAM (Random Access Memory) 1200, ROM (Read Only Memory) 1300, HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600.
- CPU Central Processing Unit
- RAM Random Access Memory
- ROM Read Only Memory
- HDD Hard Disk Drive
- the CPU 1100 operates based on the programs stored in the ROM 1300 or the HDD 1400, and controls each component. For example, the CPU 1100 loads the programs stored in the ROM 1300 or the HDD 1400 into the RAM 1200, and executes processes corresponding to the various programs.
- the ROM 1300 stores boot programs such as the Basic Input Output System (BIOS) that is executed by the CPU 1100 when the computer 1000 starts up, as well as programs that depend on the hardware of the computer 1000.
- BIOS Basic Input Output System
- HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by CPU 1100 and data used by such programs. Specifically, HDD 1400 records program data 1450.
- Program data 1450 is an example of an information processing program for realizing an information processing method according to an embodiment of the present disclosure, and data used by such information processing program.
- the communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (e.g., the Internet).
- the CPU 1100 receives data from other devices and transmits data generated by the CPU 1100 to other devices via the communication interface 1500.
- the input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000.
- the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600.
- the CPU 1100 also transmits data to an output device such as a display device, a speaker, or a printer via the input/output interface 1600.
- the input/output interface 1600 may also function as a media interface that reads programs and the like recorded on a specific recording medium.
- Examples of media include optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase change rewritable Disks), magneto-optical recording media such as MOs (Magneto-Optical Disks), tape media, magnetic recording media, and semiconductor memories.
- optical recording media such as DVDs (Digital Versatile Discs) and PDs (Phase change rewritable Disks)
- magneto-optical recording media such as MOs (Magneto-Optical Disks)
- tape media magnetic recording media
- magnetic recording media and semiconductor memories.
- the CPU 1100 of the computer 1000 executes a program loaded onto the RAM 1200, thereby implementing the various processing functions executed by the processes shown in FIG. 3 and the like.
- the CPU 1100 and the RAM 1200, etc. work together with software (an information processing program loaded onto the RAM 1200) to implement an information processing method by an information processing device according to an embodiment of the present disclosure.
- the present disclosure can also be configured as follows.
- an acquisition unit that acquires an object image that is an image of a first object and a reference image related to a second object different from the first object; a conversion unit that generates a converted image in which the first object is converted based on the object image and the reference image; an estimation unit that estimates a processing procedure for changing the first object into an appearance based on the second object based on the transformed image and the object image; a generating unit that generates, as an output image, an image of the first object whose appearance has been changed in accordance with the processing procedure based on the object image;
- An information processing device comprising: (2) The acquisition unit acquires an image of the second object having an appearance of a target targeted by the first object as the reference image, as an appearance based on the second object; the conversion unit generates the converted image in which the other of the object image and the reference image is converted into the image of the first object reflecting an appearance of the target, based on one of the object image and the reference image; The information processing device according to (1)
- an adjustment unit that adjusts the appearance of the first object and other conditions other than the appearance of the target between the object image and the reference image
- the adjustment unit removes information about a lighting environment estimated based on the object image from the object image and removes information about a lighting environment estimated based on the reference image from the reference image, thereby removing lighting environment conditions between the object image and the reference image.
- a learning unit that generates a model using a pair of the transformed image and the object image as input, The information processing device described in any one of (2) to (6), wherein the estimation unit estimates a processing procedure for changing the appearance of the first object to the appearance of the target object based on output information of the model.
- the learning unit uses a combination of a video consisting of images of actions that change the appearance of a specified object related to the first object to a completed state and speech information within the video that explains the content of the actions as learning data, and trains a model to learn the relationship between images before and after showing the change in appearance of the specified object and the actions caused by the change in appearance.
- the generation unit generates, as the output image, an image of the first object in which an appearance corresponding to the processing procedure is reflected as a work result in the appearance of the first object.
- the generation unit generates an instruction statement instructing the user to perform a task in accordance with the processing procedure as output information that is output together with the output image.
- the information processing device according to (11), wherein the generation unit further generates, as the output information, a detailed sentence that explains a content of the instruction sentence in more detail, based on a predetermined language model and the instruction sentence.
- the acquisition unit sequentially acquires, as the object image, object images in which an action that changes the appearance of the first object is captured in real time, A detection unit that detects an erroneous action based on the object images successively acquired, The information processing device described in any one of (2) to (12), wherein, when an erroneous operation is detected by the detection unit, the estimation unit re-estimates a processing procedure for changing the current appearance of the first object to the appearance of the target object using the latest object image among the object images acquired sequentially.
- the detection unit when the erroneous action is detected, identifying a new target appearance associated with the action sequence actually being performed to change the appearance of the first object;
- An information processing method executed by an information processing device acquiring an object image, the object image being an image of a first object, and a reference image relating to a second object different from the first object; a transformation step of generating a transformed image in which the first object is transformed based on the object image and the reference image; an estimation step of estimating a processing procedure for changing the first object into an appearance based on the second object based on the transformed image and the object image; generating an image of the first object, the appearance of which has been changed in accordance with the processing procedure, as an output image based on the object image;
- An information processing method comprising: (17) an acquisition step of acquiring an object image, the object image being an image of a first object, and a reference image relating to a second object different from the first object; a transformation step of generating a transformed image in which the first object is transformed based on the object image and the reference image; an estimation step of estimating a processing step for changing the first object into an appearance based on the second object based on the transformed
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Un dispositif de traitement d'informations selon la présente invention comprend une unité d'acquisition, une unité de conversion, une unité de déduction et une unité de génération. L'unité d'acquisition acquiert une image d'objet qui est une image d'un premier objet et acquiert également, en tant qu'image de référence, une image d'un second objet qui se trouve dans un état cible pour le premier objet. Sur la base de l'une des image d'objet et image de référence, l'unité de conversion convertit l'autre des image d'objet et image de référence en une image du premier objet qui représente l'état cible. Sur la base de l'image d'objet et de l'image convertie obtenue par la conversion, l'unité de déduction déduit une procédure de traitement pour convertir le premier objet d'un état prescrit à l'état cible. Sur la base de l'image d'objet, l'unité de génération génère, en tant qu'image de sortie à fournir en sortie à un utilisateur, une image du premier objet dans laquelle l'état du premier objet a été modifié conformément à la procédure de traitement.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023115843 | 2023-07-14 | ||
| JP2023-115843 | 2023-07-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025018140A1 true WO2025018140A1 (fr) | 2025-01-23 |
Family
ID=94281883
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/023875 Pending WO2025018140A1 (fr) | 2023-07-14 | 2024-07-02 | Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025018140A1 (fr) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008102440A1 (fr) * | 2007-02-21 | 2008-08-28 | Tadashi Goino | Dispositif et procédé de création d'image de visage maquillé |
| JP2016055202A (ja) * | 2016-01-26 | 2016-04-21 | パナソニックIpマネジメント株式会社 | メイクアップ支援装置およびメイクアップ支援方法 |
| JP2020526809A (ja) * | 2017-07-13 | 2020-08-31 | シセイドウ アメリカズ コーポレイション | 仮想顔化粧の除去、高速顔検出およびランドマーク追跡 |
-
2024
- 2024-07-02 WO PCT/JP2024/023875 patent/WO2025018140A1/fr active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008102440A1 (fr) * | 2007-02-21 | 2008-08-28 | Tadashi Goino | Dispositif et procédé de création d'image de visage maquillé |
| JP2016055202A (ja) * | 2016-01-26 | 2016-04-21 | パナソニックIpマネジメント株式会社 | メイクアップ支援装置およびメイクアップ支援方法 |
| JP2020526809A (ja) * | 2017-07-13 | 2020-08-31 | シセイドウ アメリカズ コーポレイション | 仮想顔化粧の除去、高速顔検出およびランドマーク追跡 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Habibie et al. | Learning speech-driven 3d conversational gestures from video | |
| JP7224323B2 (ja) | イメージ生成システム及びこれを利用したイメージ生成方法 | |
| CN115588224B (zh) | 一种基于人脸关键点预测的虚拟数字人生成方法及装置 | |
| US11582519B1 (en) | Person replacement utilizing deferred neural rendering | |
| JP2022503647A (ja) | クロスドメイン画像変換 | |
| US11581020B1 (en) | Facial synchronization utilizing deferred neural rendering | |
| CN112417414B (zh) | 一种基于属性脱敏的隐私保护方法、装置以及设备 | |
| KR20210040555A (ko) | 베이시스 모델에 기초하여 가상 캐릭터의 표정을 모사하는 장치, 방법 및 컴퓨터 프로그램 | |
| CN118536616A (zh) | 具有用于合成图像生成的图像编码器的机器学习扩散模型 | |
| CN114283052A (zh) | 妆容迁移及妆容迁移网络的训练方法和装置 | |
| CN118648032A (zh) | 用于面部属性操纵的系统和方法 | |
| CN119836650B9 (zh) | 基于使用部分面部图像的三维面部建模的用户认证 | |
| CN113781271B (zh) | 化妆教学方法及装置、电子设备、存储介质 | |
| CN113850169A (zh) | 一种基于图像分割和生成对抗网络的人脸属性迁移方法 | |
| KR102247481B1 (ko) | 나이 변환된 얼굴을 갖는 직업영상 생성 장치 및 방법 | |
| WO2025018140A1 (fr) | Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations | |
| Kawaler et al. | Database of speech and facial expressions recorded with optimized face motion capture settings | |
| CN115861122A (zh) | 脸部图像处理方法、装置、计算机设备及存储介质 | |
| Chuang | Analysis, synthesis, and retargeting of facial expressions | |
| RU2720361C1 (ru) | Обучение по нескольким кадрам реалистичных нейронных моделей голов говорящих персон | |
| Vandeventer | 4D (3D Dynamic) statistical models of conversational expressions and the synthesis of highly-realistic 4D facial expression sequences | |
| Tin | Facial extraction and lip tracking using facial points | |
| Cakir et al. | Audio to video: Generating a talking fake agent | |
| JP6856965B1 (ja) | 画像出力装置及び画像出力方法 | |
| Nakashima et al. | A Comparison of Cartoon Portrait Generators Based on Generative Adversarial Networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24842934 Country of ref document: EP Kind code of ref document: A1 |