CN119206006A

CN119206006A - Three-dimensional model data processing method, device, equipment, medium and product

Info

Publication number: CN119206006A
Application number: CN202411753118.1A
Authority: CN
Inventors: 李杨; 嵇盼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-12-02
Filing date: 2024-12-02
Publication date: 2024-12-27
Anticipated expiration: 2044-12-02
Also published as: CN119206006B

Abstract

The application discloses a three-dimensional model data processing method, a device, equipment, a medium and a product, and relates to the field of image processing. The method comprises the steps of obtaining a global view of a first object, wherein the first object comprises at least two object components, dividing the global view to obtain local views corresponding to the at least two object components respectively, converting the local views corresponding to the at least two object components respectively into a component three-dimensional model, and combining the component three-dimensional models corresponding to the at least two object components respectively to obtain an object three-dimensional model of the first object. According to the method, the generated object three-dimensional model has higher editing flexibility in the downstream application, and when the object three-dimensional model needs to be locally adjusted in the downstream application process, model data corresponding to the three-dimensional model of the adjusting part are supported to realize local editing, so that the workload of the object three-dimensional model in the downstream application is reduced.

Description

Three-dimensional model data processing method, device, equipment, medium and product

Technical Field

The present application relates to the field of image processing, and in particular, to a method, apparatus, device, medium, and product for processing three-dimensional model data.

Background

With the development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, the modeling of the three-dimensional model can be completed through the two-dimensional image by the AI technology, so that the creation process of the three-dimensional model is greatly simplified.

In the related art, a three-dimensional model corresponding to an object to be generated in a two-dimensional image is generated by an AI model by directly inputting the two-dimensional image into a pre-trained AI model.

However, the three-dimensional model generated by the above scheme has a problem of poor model flexibility.

Disclosure of Invention

The embodiment of the application provides a three-dimensional model data processing method, a device, equipment, a medium and a product. The technical scheme is as follows.

In one aspect, a three-dimensional model data processing method is provided, the method comprising:

acquiring a global view of a first object, wherein the global view comprises a two-dimensional image for displaying the first object from a first view angle direction, and the first object comprises at least two object components;

Dividing the global view into local views respectively corresponding to the at least two object components, wherein the local views comprise two-dimensional images for displaying the corresponding object components from the first view direction;

converting the partial views respectively corresponding to the at least two object components into a component three-dimensional model;

and combining the component three-dimensional models respectively corresponding to the at least two object components to obtain an object three-dimensional model of the first object.

In another aspect, there is provided a three-dimensional model data processing apparatus, the apparatus including:

An acquisition module for acquiring a global view of a first object, the global view comprising a two-dimensional image showing the first object from a first view direction, the first object comprising at least two object components;

The segmentation module is used for segmenting the local view corresponding to the at least two object components from the global view, and the local view comprises a two-dimensional image showing the corresponding at least two object components from the first view direction;

the generation module is used for converting the partial views respectively corresponding to the at least two object components into a component three-dimensional model;

And the combination module is used for combining the three-dimensional models of the parts corresponding to the at least two object components respectively to obtain the object three-dimensional model of the first object.

In some alternative embodiments, the segmentation module includes:

A first obtaining unit, configured to obtain location distribution information of the at least two object components in the global view, where the location distribution information is used to indicate a distribution situation of the at least two object components in the global view;

And the segmentation unit is used for segmenting the first object in the global view based on the position distribution information to obtain local views respectively corresponding to the at least two object components.

In some optional embodiments, the first obtaining unit is further configured to identify, based on image semantics of the global view, object components existing in the global view, and obtain location distribution information corresponding to the at least two object components respectively;

the first obtaining unit is further configured to receive a marking operation corresponding to each of the at least two object components in the global view, where the marking operation is used to mark at least one marking point corresponding to each of the at least two object components, and generate location distribution information corresponding to each of the at least two object components based on the marking point indicated by the marking operation.

In some optional embodiments, the segmentation unit is further configured to input location distribution information corresponding to the global view and the at least two object components to a region segmentation model, so as to obtain a local view corresponding to the at least two object components, where the region segmentation model is a machine learning model that is trained in advance and is used for segmenting an area where each object component is located from the global view.

In some optional embodiments, the generating module is further configured to input the partial views corresponding to the at least two object components into a three-dimensional modeling model, to obtain a component three-dimensional model corresponding to the at least two object components, where the three-dimensional modeling model is a machine learning model obtained by training in advance and used for outputting the component three-dimensional model according to the partial views.

In some alternative embodiments, the apparatus further comprises:

The completion module is used for inputting the local views corresponding to the at least two object components into an image generation model to obtain a third view after the shielding part in the local view is completed, and the image generation model is a machine learning model which is trained in advance and is used for filling the image content of the shielded area of the object component in the local view;

the generating module is further configured to input a third view corresponding to each of the at least two object components into the three-dimensional modeling model, to obtain a component three-dimensional model corresponding to each of the at least two object components.

In some alternative embodiments, the combining module includes:

a second acquisition unit configured to acquire first position information of the at least two object components in a three-dimensional space, the first position information being configured to indicate positions of the at least two object components in the three-dimensional space, respectively corresponding when the first object is composed;

and the combining unit is used for combining the component three-dimensional models respectively corresponding to the at least two object components according to the first position information to obtain an object three-dimensional model of the first object.

In some optional embodiments, the generating module is further configured to convert the global view into a registered three-dimensional model corresponding to the first object;

The second obtaining unit is further configured to determine the first position information of the at least two object components according to a mapping relationship between the component three-dimensional model and the registration three-dimensional model, where the component three-dimensional model and the registration three-dimensional model correspond to the at least two object components respectively.

In some alternative embodiments, the combination module further comprises:

a mapping unit, configured to establish a first mapping relationship between a component three-dimensional model corresponding to the at least two object components and a first registration area in the global view, where the first registration area is used to indicate areas where the at least two object components are located in the global view;

The mapping unit is further configured to establish a second mapping relationship between a first registration area corresponding to each of the at least two object components in the global view and a second registration area of the first object, where the second registration area is used to indicate an area where the first object is located in the global view;

the mapping unit is further configured to establish a third mapping relationship between the second registration area of the first object in the global view and the registration three-dimensional model;

The mapping unit is further configured to determine a fourth mapping relationship between the component three-dimensional model and the registration three-dimensional model, where the component three-dimensional model and the registration three-dimensional model correspond to the at least two object components respectively, according to a mapping chain formed by the first mapping relationship, the second mapping relationship and the third mapping relationship;

The second obtaining unit is further configured to determine the first location information of the at least two object components according to the fourth mapping relationship.

In some optional embodiments, the fourth mapping relationship includes at least two first point pair relationships respectively corresponding to the at least two object components, the first point pair relationships being used to indicate a mapping relationship of a mapping point pair formed by a first position point in the component three-dimensional model and a second position point in the registration three-dimensional model;

the second obtaining unit is further configured to screen the at least two first point pair relationships according to the mapping accuracy condition of the at least two first point pair relationships corresponding to the ith object component part, to obtain a second point pair relationship, where i is a positive integer;

The second obtaining unit is further configured to determine the first location information according to a second point-to-point relationship corresponding to the at least two object components respectively.

In some optional embodiments, the second obtaining unit is further configured to sample at least one third point pair relationship from the at least two first point pair relationships corresponding to the ith object component;

The second obtaining unit is further configured to determine mapping accuracies corresponding to the at least two first point pair relationships respectively by using the at least one third point pair relationship as a reference transformation relationship;

The second obtaining unit is further configured to screen the at least two first point pair relationships based on a threshold according to mapping accuracy corresponding to the at least two first point pair relationships, so as to obtain the second point pair relationship.

In some alternative embodiments, the combination module further comprises:

The registration unit is used for registering the at least two object components and the registration three-dimensional model in a first registration mode based on the first position information to obtain registration results respectively corresponding to the at least two object components;

the combination unit is further configured to combine the component three-dimensional models corresponding to the at least two object components respectively based on the registration result, to obtain an object three-dimensional model of the first object.

In some optional embodiments, the registration unit is further configured to register model data of the three-dimensional model of the component in a second registration mode according to a positional relationship between the at least two object components indicated by the first position information, to obtain registration model data corresponding to the at least two object components respectively, where the registration model data is used to indicate model data obtained after deformation of the three-dimensional model of the component is completed in the second registration mode;

The combination unit is further configured to combine the registration model data corresponding to the at least two object components according to the registration result, so as to obtain an object three-dimensional model of the first object.

In another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, where the at least one instruction, the at least one program, the set of codes, or the set of instructions are loaded and executed by the processor to implement a three-dimensional model data processing method according to any one of the embodiments of the present application.

In another aspect, a computer readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement a three-dimensional model data processing method according to any one of the embodiments of the present application.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the three-dimensional model data processing method according to any one of the above embodiments.

The technical scheme provided by the application at least comprises the following beneficial effects.

In the process of generating a three-dimensional model through a two-dimensional image, a global view of a first object is divided into local views corresponding to at least two object components respectively, a part three-dimensional model corresponding to each object component is generated through the local views, and therefore the object three-dimensional model of the first object is obtained through combining the part three-dimensional models of the at least two object components. The first object is decomposed into at least two object components, so that the generation granularity of the object three-dimensional model is thinned, independent model data are generated for each object component, and the problem that all components in the finally obtained object three-dimensional model of the first object belong to the same model data and the object components are adhered is solved, so that the object three-dimensional model has higher editing flexibility in downstream application. In the downstream application process, when the three-dimensional model of the object needs to be locally adjusted, model data corresponding to the three-dimensional model of the adjusting part is supported to realize local editing, so that the workload of the three-dimensional model of the object in the downstream application is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a computer system provided in accordance with an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a three-dimensional model data processing method provided by an exemplary embodiment of the present application;

FIG. 3 is a three-view of a first object provided by an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a three-dimensional model data processing method provided by another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of an implementation of a marking operation provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a model structure of a visual segmentation large model provided in accordance with an exemplary embodiment of the application;

FIG. 7 is a schematic diagram of the partitioning of object components provided by an exemplary embodiment of the present application;

FIG. 8 is a flow chart of a three-dimensional model data processing method provided by yet another exemplary embodiment of the present application;

FIG. 9 is a schematic diagram of a model structure provided by an exemplary embodiment of the present application;

FIG. 10 is a schematic diagram of an image completion process provided by an exemplary embodiment of the present application;

FIG. 11 is a flow chart of a three-dimensional model data processing method provided by yet another exemplary embodiment of the present application;

FIG. 12 is a schematic illustration of a process for assembling a three-dimensional model of a component provided by an exemplary embodiment of the present application;

FIG. 13 is a schematic diagram of the effect comparison of directly implementing three-dimensional model generation by CRAFTSMAN models and three-dimensional model generation implemented by the method of the embodiment of the application;

FIG. 14 is a block diagram of a three-dimensional model data processing apparatus provided in an exemplary embodiment of the present application;

FIG. 15 is a block diagram of a three-dimensional model data processing apparatus according to another exemplary embodiment of the present application;

fig. 16 is a schematic diagram of a computer device according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of promoting an understanding of the principles and advantages of the application, reference will now be made in detail to the embodiments of the application, some but not all of which are illustrated in the accompanying drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and no limitation on the amount or order of execution.

In order to improve the editing freedom of the three-dimensional model generated by the AI, the embodiment of the application provides a three-dimensional model data processing method. Referring to fig. 1, a block diagram of a computer system for providing a hardware environment for a three-dimensional model data processing method according to an exemplary embodiment of the present application is shown, where the computer system includes a terminal 110 and a server 120.

The terminal 110 is installed and operated with an application program supporting three-dimensional model generation. The device type of the terminal 110 includes at least one of a game console, a desktop computer, a smart phone, a tablet computer, and a laptop portable computer.

Server 120 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The server 120 is used to provide back-end services for applications that support three-dimensional model generation. Optionally, the server 120 takes over primary computing effort, the terminal 110 takes over secondary computing effort, or the server 120 takes over secondary computing effort, the terminal 110 takes over primary computing effort, or both the server 120 and the terminal 110 employ a distributed computing architecture to cooperatively compute.

It should be noted that the server 120 may be implemented as a physical server or may be implemented as a cloud server in the cloud. In some embodiments, the server 120 described above may also be implemented as a node in a blockchain system.

Terminal 110 is connected to server 120 via a wireless network or a wired network.

In some embodiments, taking the collaborative implementation of the three-dimensional model data processing method by the terminal 110 and the server 120 as an example, a user sends a global view 101 of a first object required to execute three-dimensional model generation to the server 120 through the terminal 110, the server 120 sends the global view 101 to a three-dimensional model generation service after receiving the global view 101 of the first object, the three-dimensional model generation service segments the global view 101 to obtain local views respectively corresponding to at least two object components, converts the local views respectively corresponding to at least two object components into a component three-dimensional model, and combines the component three-dimensional models respectively corresponding to at least two object components to obtain the object three-dimensional model 104 of the first object. The server 120 transmits the generated object three-dimensional model 104 to the terminal 110, and the object three-dimensional model 104 is displayed through the display screen by the terminal 110.

In other embodiments, the three-dimensional model data processing method may also be implemented independently by the terminal 110 or the server 120. For example, when the three-dimensional model data processing method is independently executed by the terminal 110, the user loads the global view 101 stored in the terminal 110 to an application program supporting three-dimensional model generation, the application program segments from the global view 101 to obtain local views corresponding to at least two object components respectively, converts the local views corresponding to at least two object components respectively into a component three-dimensional model by calling an AI processing chip and a graphics processor (Graphics Processing Unit, GPU), and finally obtains the object three-dimensional model 104 of the first object by combining the component three-dimensional models corresponding to at least two object components respectively. For example, when the three-dimensional model data processing method is independently executed by the server 120, in the case that the server 120 detects a three-dimensional model generation requirement of the first object, the server 120 reads the global view 101 from the database, sends the global view 101 to the three-dimensional model generation service, segments the global view 101 to obtain local views respectively corresponding to at least two object components through the three-dimensional model generation service, converts the local views respectively corresponding to at least two object components into a component three-dimensional model, and combines the component three-dimensional models respectively corresponding to at least two object components to obtain the object three-dimensional model 104 of the first object.

Those skilled in the art will appreciate that the number of devices described above may be greater or lesser. Such as the above-mentioned devices may be only one, or the above-mentioned devices may be several tens or hundreds, or more. The number of devices and the types of the devices are not limited in the embodiment of the application.

In some embodiments, the method provided by the embodiment of the present application may be applied to a cloud application scenario, so that the calculation of the data logic generated by the three-dimensional model is completed through the cloud server, and the terminal 110 is responsible for displaying the three-dimensional model generating process.

Referring to fig. 2, a flowchart of a three-dimensional model data processing method according to an exemplary embodiment of the present application is shown, where the method is implemented by the server 120 in fig. 1 as an example, and the method includes the following steps 210 to 240.

At step 210, a global view of a first object is acquired.

The first object is an object instance of the three-dimensional model to be generated. A three-dimensional model refers to a model structure constructed on a computer device that appears as a three-dimensional volume. The three-dimensional model may be implemented as a three-dimensional virtual model including a model that is represented as a virtual entity in a three-dimensional virtual scene, and is schematically implemented as a virtual character model, a virtual animal model, a virtual building model, or the like in the virtual scene.

The first object is implemented as at least one of a human object, a building object, an animal object, an ornament object, a prop object, a container object, and the like. Taking the first object as an example of the virtual character object in the game, a three-dimensional model corresponding to the virtual character object is generated according to the virtual character view of the virtual character object.

In an embodiment of the application, the first object comprises at least two object components, wherein the object components are used for indicating the components constituting the first object, and in one example, the first object is implemented as a virtual character, and the object components comprise a skull portion, a hair ornament portion, a hairstyle portion, a trunk portion, an extremity portion, a clothing portion, and a shoe portion.

Optionally, the at least two object components are implemented as a set of all components in the first object, or the at least two object components are implemented as components of a portion of the first object.

The global view is a two-dimensional image for showing the first object, and in an embodiment of the application, the global view comprises a two-dimensional image showing the first object from a first viewing angle direction.

Alternatively, the first viewing angle direction may be implemented as a single viewing angle direction, or may be implemented as at least two viewing angle directions.

Optionally, the global View includes at least one of a Front View (Front View), a Left View (Left View), a right View (RIGHT VIEW), a Top View (Top View), a Bottom View (Bottom View), and a back View (Rear View) for the first object. The front view is a view showing the first object from the front view direction, the left view is a view showing the first object from the left side view angle direction, the right view is a view showing the first object from the right side view angle direction, the top view is a view showing the first object from the top downward view direction, the bottom view is a view showing the first object from the bottom upward view direction, and the back view is a view showing the first object from the back view direction of the first object.

When the first view direction is implemented as at least two view directions, the global view comprises a view showing the first object from at least two view directions, i.e. the global view is a multi-view of the first object. Alternatively, the multiple views described above may be implemented as two views, three views, six views, etc., without limitation.

In one example, taking a global view implemented as a three-view of the first object as shown in fig. 3, there is shown a three-view 300 of the first object provided by one exemplary embodiment of the present application, the three-view 300 including a front view 301 showing the first object from a front view direction, a side view 302 showing the first object from a side view direction, and a rear view 303 showing the first object from a rear view direction.

Alternatively, the global view of the first object may be implemented as an image drawn by a computer device, or the global view of the first object may be implemented as an image obtained by capturing an entity of the first object by an image capturing device.

Step 220, segmenting the global view to obtain local views respectively corresponding to at least two object components.

In an embodiment of the application, the partial view comprises a two-dimensional image showing the corresponding object component from a first viewing angle direction. Illustratively, the first object includes n object components, and the ith partial view includes a two-dimensional image showing the ith object component from the first viewing angle direction, where i and n are positive integers, n is greater than or equal to 2, and i is greater than or equal to n.

In some embodiments, the global view is segmented by a position distribution of at least two object components in the global view to obtain local views to which the at least two object components respectively correspond. Schematically, position distribution information of at least two object components in a global view is obtained, wherein the position distribution information is used for indicating the distribution condition of the at least two object components in the global view, and a first object in the global view is segmented based on the position distribution information to obtain local views respectively corresponding to the at least two object components. The image area of each object component in the global view is segmented by combining the acquired position distribution information of each object component in the global view, so that a local view is obtained, and the segmentation accuracy when the image area corresponding to each object component is segmented in the global view can be improved by combining the prior information of the position distribution information.

Alternatively, the position distribution information of the at least two object components in the global view may be indicated by manual annotation, or the position distribution information of the at least two object components in the global view may be determined by automatic recognition by computer vision techniques.

Optionally, the position distribution information includes fuzzy position indication information or precise position indication information. Wherein the ambiguous location indication information may be indicated by a marker point, a marker box, a location description text, i.e. the ambiguous location indication information only indicates the general location of the object elements in the global view. The precise location indication information may be indicated by a mask indicating explicit boundary information or region information, i.e. the precise location indication information can indicate a specific location of the object component in the global view.

Optionally, when the position distribution information is implemented as fuzzy position indication information, a process of segmenting the global view according to the position distribution information is implemented through a pre-trained region segmentation model, wherein the region segmentation model is a machine learning model which is obtained in advance and is used for segmenting a region where each object component part is located from the global view. Schematically, the position distribution information and the global view are input into a region segmentation model, so as to obtain local views respectively corresponding to at least two object components.

Alternatively, when the position distribution information is implemented as the accurate position indication information, the global view is divided according to the region edge of the image region to which the object component provided by the position distribution information (the accurate position indication information) belongs, and the local view corresponding to the object component is obtained.

At step 230, the partial views corresponding to the at least two object components, respectively, are converted into a three-dimensional model of the component.

In the embodiment of the application, after obtaining the local view corresponding to each object component of the first object, the local view is converted into the component three-dimensional model, wherein the component three-dimensional model is stored in computer equipment in the form of model data, and the local view is converted into the model data corresponding to the component three-dimensional model by way of example.

Wherein the model data is data defining attribute information of a three-dimensional model of the component of the object component, enabling the object component to be accurately rendered and displayed in the computer graphics application. Wherein the attribute information includes at least one of geometry, appearance, and model behavior of the object component.

Optionally, the model data is implemented as at least one of Mesh (Mesh) data, voxel (Voxels) data, point Cloud (Point Cloud) data.

Wherein the mesh is a data structure used in three-dimensional computer graphics to represent the surface of a three-dimensional model, the mesh is composed of a set of vertices (Vertices), edges (Edges), and Faces (Faces), which together define the geometry of the object. Alternatively, the grid cells formed by the sides in the grid may be implemented as polygons such as triangles, quadrilaterals, pentagons, and the like. In a grid, vertices are basic points forming the grid, the basic points have coordinates (x, y, z) in a three-dimensional space, each vertex represents a position on the grid and is a minimum unit forming a three-dimensional model, edges are line segments connecting two vertices, the edges define connection relations among the vertices and form the basis of a face, the face is a polygon formed by a plurality of vertices and edges, namely the grid unit, the face defines the surface of an object in the three-dimensional model, and the plurality of faces are combined together to form the complete appearance of the object.

A voxel is a Pixel in three-dimensional space, similar to a Pixel in a two-dimensional image, a voxel is a basic unit of three-dimensional volume data, each voxel occupies a cubic volume in three-dimensional space, the size of which depends on the resolution of the entire three-dimensional model. Voxels are typically used to represent volumetric information of a three-dimensional model in three-dimensional space, and may store a variety of data about the volume, such as geometric occupancy, density, color, etc. Each voxel in the three-dimensional model represented by voxel data has a well-defined spatial position, defined by its coordinates (x, y, z) in a three-dimensional coordinate system.

The point cloud is a model data representation method of a three-dimensional model, and is composed of a large number of three-dimensional coordinate points, and the points together describe the surface of an object or a scene. That is, the point cloud is a set of stores formed by the spatial coordinates of each sampling point of the three-dimensional object surface. The point cloud data includes three-dimensional coordinates (x, y, z) of the sampling point, and at least one of laser reflection Intensity (Intensity), color information, and reflection Intensity.

Alternatively, the method of converting the partial views respectively corresponding to the at least two object components into the three-dimensional model of the component may be implemented as at least one of the following.

First, manual conversion.

In some embodiments, the local views corresponding to at least two object components are sent to at least one modeling terminal, and a modeling person models the object components according to the local views at the modeling terminal to obtain model data corresponding to the object components.

Second, by AI technology conversion.

In some embodiments, the partial views corresponding to the at least two object components are input into a three-dimensional modeling model to obtain component three-dimensional models corresponding to the at least two object components, wherein the three-dimensional modeling model is a machine learning model which is obtained through training in advance and is used for outputting the component three-dimensional model according to the partial views.

Optionally, the three-dimensional modeling model is implemented as at least one of a Wonder3D model, a diffusion model based on fractional distillation sampling (Score Distillation Sampling, SDS), a CRAFTSMAN model, a diffusion model based on a Make-It-3D frame (a frame that generates a high quality 3D model from a single picture by diffusion priors), a Unique3D model, a Stable Video 3D (SV3D) model, and the like, and an artificial intelligence model that converts a two-dimensional image into a three-dimensional model can be implemented.

And 240, combining the component three-dimensional models respectively corresponding to the at least two object components to obtain an object three-dimensional model of the first object.

Illustratively, after the three-dimensional model of the component corresponding to each object component is generated, the three-dimensional model of the object of the first object is obtained by combining the three-dimensional models of the components of each object. In the embodiment of the application, model data corresponding to the three-dimensional model of the component is combined to obtain model data for representing the three-dimensional model of the object.

In some embodiments, the three-dimensional model of the object is obtained by combining the three-dimensional models of the components respectively corresponding to the at least two object components according to the positions of the at least two object components in the three-dimensional space. The method comprises the steps of schematically obtaining first position information of at least two object components in a three-dimensional space, wherein the first position information is used for indicating positions corresponding to the at least two object components respectively in the three-dimensional space when the first object is formed, and combining part three-dimensional models corresponding to the at least two object components respectively according to the first position information to obtain an object three-dimensional model of the first object.

In some embodiments, the first positional information of the at least two object components in three-dimensional space may be translated based on the second positional information of the at least two object components in the two-dimensional plane. Illustratively, second position information of at least two object components in a two-dimensional plane is acquired from the global view, and the second position information is converted into first position information based on a spatial mapping relationship between the three-dimensional space and the two-dimensional plane.

In some embodiments, when the global view includes at least two views showing the first object from at least two first view directions, second position information of at least two object components in a two-dimensional plane is acquired from the at least two views, respectively, and the second position information is converted into the first position information based on a spatial mapping relationship between a three-dimensional space and a two-dimensional plane corresponding to each view.

Illustratively, the first position information includes at least one three-dimensional coordinate point corresponding to the object component in a three-dimensional coordinate system corresponding to the three-dimensional space. The three-dimensional coordinate system is a coordinate system established with a specified point in a specified three-dimensional space (for example, an edge point at the lower right corner of the space in the three-dimensional space or a spatial center point in the three-dimensional space) as an origin, and with x-axis, y-axis, and z-axis extending in three mutually perpendicular directions as coordinate axes.

Illustratively, the second position information includes at least one two-dimensional coordinate point corresponding to the object component in a planar coordinate system corresponding to the global view. The plane coordinate system is a coordinate system established by taking a designated pixel in the global view as an origin, a first direction in a plane in which the global view is located as an x-axis and a second direction in the plane in which the global view is located as a y-axis, wherein the first direction and the second direction are two directions perpendicular to each other.

And a space mapping relation exists between the three-dimensional coordinate points and the two-dimensional coordinate points. Alternatively, the at least one two-dimensional coordinate point is implemented as a center of gravity position, a center position, or any other position belonging to the object component.

Alternatively, the mapping method for mapping the first position information to obtain the three-dimensional space according to the second position information of the two-dimensional plane may be at least one of perspective projection (PERSPECTIVE PROJECTION), orthogonal projection (Orthographic Projection), depth information prediction and addition, feature-based mapping, and parameterized mapping, which are not limited herein.

In the embodiment of the application, after the three-dimensional model of the first object is obtained by combining the component three-dimensional models of at least two object components, the three-dimensional model of the first object can be applied to downstream scenes such as animation driving, game modeling, medical scene simulation, building scene simulation, cultural relic restoration research simulation and the like.

In some embodiments, in a downstream application process, when there is a component adjustment requirement for a specified object component of at least two object components, model data of a component three-dimensional model corresponding to the specified object component, and first position information used in a combination stage by the component three-dimensional model corresponding to the at least two object components are acquired, model data of the component three-dimensional model corresponding to the object component is adjusted according to the component adjustment requirement, adjusted model data is acquired, and the model data of the component three-dimensional model corresponding to the specified object component in the model data corresponding to the object three-dimensional model is replaced by the adjusted model data based on a position of the specified object component indicated in the first position information in the three-dimensional space, so that a component-adjusted object three-dimensional model is obtained. Optionally, the component adjustment requirement includes at least one of model deformation adjustment, model color adjustment, and model position adjustment, taking the implementation of model data as mesh data as an example, the model deformation adjustment or the model position adjustment is implemented by adjusting the mesh vertex position of the model data of the three-dimensional model of the component, and the adjustment of the model color adjustment is implemented by adjusting the color information corresponding to the mesh unit.

In summary, in the process of generating the three-dimensional model through the two-dimensional image, the global view of the first object is divided into the local views corresponding to the at least two object components, and the part three-dimensional model corresponding to each object component is generated through the local views, so that the object three-dimensional model of the first object is obtained by combining the part three-dimensional models of the at least two object components. The first object is decomposed into at least two object components, so that the generation granularity of the object three-dimensional model is thinned, independent model data are generated for each object component, and the problem that all components in the finally obtained object three-dimensional model of the first object belong to the same model data and the object components are adhered is solved, so that the object three-dimensional model has higher editing flexibility in downstream application. In the downstream application process, when the three-dimensional model of the object needs to be locally adjusted, model data corresponding to the three-dimensional model of the adjusting part is supported to realize local editing, so that the workload of the three-dimensional model of the object in the downstream application is reduced.

In the embodiment of the application, when the object three-dimensional model of the first object is generated through the global view, the following flow steps are included.

And 1, dividing the object component.

In some alternative embodiments, the segmentation of the local view from the global view by the region segmentation model is performed with at least two object components corresponding to each other. Referring to fig. 4, a flowchart of a three-dimensional model data processing method according to an exemplary embodiment of the application is shown, and the method includes steps 221 to 222, wherein steps 221 to 222 are lower steps of step 220.

Step 221, obtaining position distribution information of at least two object components in the global view.

Wherein the location distribution information is used to indicate a distribution of at least two object components in the global view.

Alternatively, the manner of acquiring the position distribution information may be implemented as at least one of the following.

First, manual labeling.

Schematically, a user marks each object component in the global view through a terminal, and the terminal generates position distribution information of at least two object components according to the marking result of the user.

Alternatively, the user may effect the annotation by indicating at least one marker point when annotating each object component in the global view. Illustratively, a marking operation corresponding to at least two object components in the global view is received, wherein the marking operation is used for marking at least one marking point corresponding to the at least two object components, and position distribution information corresponding to the at least two object components is generated based on the marking point indicated by the marking operation.

In some embodiments, marker point position information of marker points corresponding to the object components is recorded as position distribution information corresponding to the object components. That is, the computing device (terminal or server) determines the location of each object component from the marker points.

In some embodiments, different object components are indicated by different styles of marker points. Optionally, the pattern of the mark points includes at least one of mark point color, mark point shape, mark point size, mark point transparency, and mark point identification. For example, a head portion of the first subject is marked with red mark points, a garment portion of the first subject is marked with green mark points, and a torso portion of the first subject is marked with blue mark points.

In one example, taking the example of marking operation indication three marking points for each object component, as shown in fig. 5, which shows a schematic diagram of implementation procedure of the marking operation provided in one exemplary embodiment of the present application, on the provided global view 500 of the first object, the user indicates the headwear part in the first object through the a marking point 510 indicated by the marking operation, indicates the clothing part of the first object through the B marking point 520, and the C marking point 530 indicates the waistband part of the first object, so that indication of the position distribution information of the object component for the first object is implemented through the above marking points.

Optionally, when the user annotates each object component in the global view, the annotation may be implemented by indicating the region range to which the object component corresponds. Illustratively, receiving region range labeling operations corresponding to at least two object components in the global view, wherein the region range labeling operations are used for labeling image areas corresponding to the at least two object components respectively, and generating position distribution information corresponding to the at least two object components respectively based on the image areas indicated by the region range labeling operations.

In one example, a user draws a region range corresponding to each object component on a global view through a virtual brush tool provided by the terminal, and the computing device determines a location of each object component according to the region range drawn by the virtual brush tool.

The position distribution of each object component is indicated through manual annotation, so that the segmentation accuracy in the process of segmenting and acquiring the local view corresponding to the object component can be improved, and the accuracy of generating the downstream three-dimensional model is further ensured.

Second, automatic detection.

In some embodiments, based on image semantics of the global view, object components existing in the global view are identified, and location distribution information corresponding to at least two object components respectively is obtained.

In some embodiments, the global view is input to the component recognition model, and the position distribution information corresponding to at least two object components respectively is output. The component recognition model is a model which is obtained through pre-training and used for recognizing positions corresponding to all object components in the global view.

Alternatively, the component recognition model may be implemented by a neural Network model such as a convolutional neural Network (Convolutional Neural Networks, CNN), a feed-forward neural Network (Feedforward Neural Network, FNN), a Residual Network (ResNet), a converter (transducer), and the like, which is not particularly limited herein.

In some embodiments, the training process of the component recognition model is implemented by acquiring sample image data and sample position information, wherein the sample image data is a two-dimensional image to be subjected to image recognition, the sample position information is information for marking distribution of each component in the sample image data, inputting the sample image data into the component recognition model to be trained to obtain predicted position information, determining a first loss value between the predicted position information and the sample position information through a first loss function, and iteratively adjusting model parameters of the component recognition model to be trained according to the first loss value until convergence to obtain the component recognition model.

Alternatively, the first Loss function may be implemented as at least one of a Cross entropy Loss (Cross-Entropy Loss) function, a mean square error Loss (Mean Squared Error Loss, MSE) function, a logarithmic Loss function, a minimum absolute deviation Loss (Least Absolute Deviations Loss, L1 Loss) function, and the like, which are not limited herein.

In some embodiments, to improve recognition accuracy of the component recognition model, a Prompt word (Prompt) corresponding to at least two object components is input to the component recognition model while the global view is input.

Optionally, the prompt word includes at least one of name information, color information, volume information and component sequence information corresponding to at least two object components respectively. In one example, taking the example that the prompt includes name information corresponding to the object component, the prompt input to the component recognition model along with the global view is "recognize hat worn by the character, clothing worn by the character, body torso of the character in the global view".

Namely, through the AI model, each object component in the first image is automatically identified under the simple text prompt, and the position distribution corresponding to each object component is output, the requirement of manual operation is reduced, and the efficiency of generating the whole model is improved.

Step 222, inputting the global view and the position distribution information corresponding to the at least two object components to the region segmentation model, so as to obtain the local view corresponding to the at least two object components.

In the embodiment of the application, the region segmentation model is a machine learning model which is obtained through pre-training and used for segmenting the region where each object component is located from the global view. That is, the region segmentation model segments each region in the global view as a complete view to obtain local views to which at least two object components respectively correspond.

Alternatively, the above-mentioned region segmentation model may be implemented by a convolutional neural network, a feed-forward neural network, a residual network, a converter, or the like neural network model, which is not particularly limited herein.

In some embodiments, when the position distribution information is information obtained through manual labeling, after the global view and the position distribution information are input into the region segmentation model, the region segmentation model identifies regions corresponding to all object components in the global view according to marking points corresponding to all object components in the position distribution information, and segments images according to identification results to obtain local views corresponding to all object components.

In other embodiments, when the position distribution information is obtained by identifying the component identification model, the component identification model and the region segmentation model may be implemented as different processing units of the same model, that is, the pre-trained AI model includes the component identification unit and the region segmentation unit, after the prompt word corresponding to the global view and the object component is input to the AI model, the component identification unit identifies, in the global view, the position distribution information corresponding to at least two object components with image semantics matching text semantics corresponding to the prompt word, and the region segmentation unit segments the global view according to the position distribution information to obtain local views corresponding to the at least two object components respectively.

In some embodiments, the training process of the region segmentation model is realized by acquiring sample image data and sample labeling information, wherein the sample image data comprises a sample pair consisting of a sample global view and a sample local view, the sample global view is a two-dimensional image for displaying a sample object, the sample local view is a two-dimensional image for displaying at least two sample components of the sample object, the sample labeling information is manually labeled and is used for indicating position distribution information of each sample component in the sample global view in the sample pair, the sample global data and the sample labeling information are input into the region segmentation model to be trained to obtain a prediction local view, a second loss value between the prediction local view and the sample local view is determined through a second loss function, and model parameters of the region segmentation model to be trained are iteratively adjusted according to the first loss value until convergence is achieved to obtain the region segmentation model.

Alternatively, the second loss function may be implemented as at least one of a cross entropy loss function, a mean square error loss function, a logarithmic loss function, a minimum absolute deviation loss function, and the like, which is not limited herein.

In one example, the image segmentation process of at least two object components is schematically illustrated using the region segmentation Model implementation described above as a visual segmentation large Model (SEGMENT ANYTHING Model, SAM).

SAM is a model that enables segmentation by means of "hinting". As shown in fig. 6, which illustrates a schematic diagram of a visual segmentation large model (SAM) 600 according to an exemplary embodiment of the present application, the SAM600 includes three modules, an image encoder (Image Encoder) 610, a hint encoder (Prompt Encoder) 620, and a Mask Decoder (Mask Decoder) 630.

In the embodiment of the present application, the image encoder 610 is configured to receive the input global view 601 and perform encoding on the global view 601, output an image encoded representation (Image Embedding) of the global view 601, and the hint encoder 620 is configured to receive the input hint information 602 and perform encoding on the hint information 602, and output a hint encoded representation corresponding to the hint information 602. The coded image coded representation and the hint coded representation are input together into a Mask decoder 630, the two coded representations are converted into at least two masks (masks) 603 corresponding to the at least two object components respectively by the Mask decoder 630, and the at least two partial views corresponding to the at least two object components respectively can be obtained by dividing the at least two partial masks 603 corresponding to the at least two object components on the basis of the global view 601.

In the embodiment of the present application, the prompt message 602 is the obtained position distribution information corresponding to at least two object components. Alternatively, the hint information 602 may be implemented as at least one of a point (point), a box (box), a text (text) or a mask (mask) indicating a location distribution of at least two object parts.

In some embodiments, the image encoder 610 is implemented using a visual transformer (Vision Transformer, viT) pre-trained via a masked self-encoding (Masked Autoencoders, MAE) learning method.

In some embodiments, a different structure of hint encoder 620 is selected according to the type of hint information that is input. Illustratively, different hints may be divided into sparse hints and dense hints, where the sparse hints include marker points, marker boxes, and descriptive text and the dense hints include region-scope masks. For sparse hints, hint encoder 620 implements encoding of marker points and marker boxes by position encoding, encoding of descriptive text by text encoding, and for dense hints, hint encoder 620 implements encoding of region range masks by convolutional layers.

In some embodiments, mask decoder 630 employs a modified transducer decoder block and a dynamic mask pre-header. The modified transform decoder block updates all feature representations in both directions (from hint-encoded representation to image-encoded representation, and from image-encoded representation to hint-encoded representation) using hint self-attention and cross-attention. After running two decoder blocks, upsampling is implemented on the image embedding, and the upsampling result is mapped by a multi-layer perceptron (Multilayer Perceptron, MLP) to a dynamic linear classifier, which then calculates the foreground mask probability for each object component for each position in the global view, and finally outputs the mask corresponding to each object component according to the foreground mask probability.

In some embodiments, the training process of the SAM600 is implemented by acquiring a sample global view, a sample mask and sample labeling information, where the sample global view is a two-dimensional image for displaying a sample object, the sample mask is a mask for indicating an area where at least two sample components are located in the sample global view, the sample labeling information is manually labeled for indicating position distribution information of each sample component in the sample global view in the sample pair, the sample global view and the sample labeling information are input into the SAM to be trained, a prediction mask corresponding to at least two sample components is obtained, a second loss value between the prediction mask and the sample mask is determined through a second loss function, and model parameters of the SAM to be trained are iteratively adjusted according to the first loss value until convergence is achieved, so as to obtain the SAM600.

In some embodiments, in order to improve the model training effect of the SAM600, in the training process, a candidate mask with high confidence is detected from the prediction mask, the candidate mask is filled into the sample global view to obtain a masked sample global view, other un-annotated parts in the masked sample global view are annotated manually to obtain supplementary annotation information, and the supplementary annotation information is used as sample annotation information to participate in the training process of the model.

In one example, taking a global view as an example of three views of a first object, as shown in fig. 7, which illustrates a segmentation schematic diagram of object components provided in an exemplary embodiment of the present application, after inputting the three views 710 of the first object and a set of marking points of each object component that are manually marked to a SAM, a mask corresponding to each object component is obtained, and the three views 710 of the first object are segmented by using the mask to obtain three views 720 corresponding to each object component.

In the embodiment of the application, the high-efficiency and accurate image segmentation is realized by combining the pre-trained region segmentation model with a small amount of labeling information or prompt text of the object component parts by manpower, and the model accuracy of the subsequent three-dimensional model generation and the execution efficiency of the whole flow are improved by ensuring the accuracy and the segmentation efficiency of the local view as the segmentation result.

And 2, generating a three-dimensional model of the component of the object.

In some alternative embodiments, the process of converting the partial view into the three-dimensional model of the component is implemented by using the three-dimensional modeling model, please refer to fig. 8, which illustrates a flowchart of a three-dimensional model data processing method according to an exemplary embodiment of the present application, where the flowchart includes step 231, and step 231 is a lower step of step 230.

And 231, inputting the partial views corresponding to the at least two object components into the three-dimensional modeling model to obtain the three-dimensional model of the component corresponding to the at least two object components.

Model data is data defining attribute information of a three-dimensional model of a component of an object component, enabling the object component to be accurately rendered and displayed in a computer graphics application. Wherein the attribute information includes at least one of geometry, appearance, and model behavior of the object component.

In the embodiment of the application, the three-dimensional modeling model is a model which is obtained through pre-training and is used for determining a corresponding part three-dimensional model according to the local view and outputting model data of the part three-dimensional model. The input of the three-dimensional model of the component is image data (i.e., local view and/or global view), and the output is model data (e.g., mesh data, point cloud data) of the three-dimensional model.

In some embodiments, the training process of the three-dimensional modeling model is realized by acquiring sample view data and sample model data, wherein the sample view data is a two-dimensional image of a sample object to be generated by the three-dimensional model, the sample model data is model data corresponding to the three-dimensional model of the sample object, the sample view data is sent to the three-dimensional modeling model to be trained to obtain prediction model data of a prediction three-dimensional model, a third loss value between the prediction model data and the sample model data is determined through a third loss function, and model parameters of the three-dimensional modeling model to be trained are iteratively adjusted according to the third loss value until convergence is achieved to obtain the three-dimensional modeling model.

Alternatively, the third loss function may be implemented as at least one of a cross entropy loss function, a mean square error loss function, a logarithmic loss function, a minimum absolute deviation loss function, and the like, which is not limited herein.

Optionally, the three-dimensional modeling model is implemented as at least one of a Wonder3D model, a diffusion model based on fractional distillation sampling (Score Distillation Sampling, SDS), a worker (CRAFTSMAN) model, a diffusion model based on a Make-It-3D frame (a frame that generates a high quality 3D model from a single picture by diffusion priors), a Unique3D model, a Stable Video 3D (SV3D) model, and the like, which can implement an artificial intelligence model that converts a two-dimensional image into a three-dimensional model.

In one example, taking the three-dimensional modeling model implementation as CRAFTSMAN model as an example, the generation process of model data of at least two object components is schematically illustrated, as shown in fig. 9, which shows a schematic diagram of a model structure provided by an exemplary embodiment of the present application, the CRAFTSMAN model 900 is a model implemented based on a diffusion model, and the CRAFTSMAN model 900 includes an encoder 910, a U-Net network structure 920, and a decoder 930, where the U-Net network structure 920 includes a forward diffusion network 921 and a backward diffusion network 922.

The encoder 910 and decoder 930 are implemented as a Residual Network (ResNet), among other things. The encoder 910 compresses the image representation of the partial view corresponding to the object component into a low resolution image (downsampling process), and the decoder 930 decodes the low resolution features into mesh data of the component three-dimensional model of the object component (upsampling process).

The U-Net network structure 920 includes a Multiple View (MV) diffusion model structure and a normal-based geometry refiner. The MV diffusion model is used for completing a first stage generation flow, namely, generating a rough grid with smooth geometric shapes according to the image characteristics of the input partial view, the process operates on a potential space learned from a 3D representation based on a potential set, rough geometric shapes with regular grid topology can be generated in a short time, and the normal-based geometric refiner is used for completing a second stage generation flow, namely, refining the rough grid of the last stage by using an enhanced multi-view normal map generated by 2D normal diffusion, so that surface details are enhanced, and finally grid data of the component three-dimensional model are output.

In the training process, training sample data 901 is input into CRAFTSMAN model 900 after being encoded to obtain prediction grid data, CRAFTSMAN model 900 is trained based on the effect of the prediction grid data, CRAFTSMAN model 900 learns feature mapping transformation from a two-dimensional image to a three-dimensional model until a loss function converges to an optimal point, and optimized CRAFTSMAN model 900 is obtained.

In the embodiment of the present application, in the generation of the model data for the ith object component, the input of CRAFTSMAN model 900 includes the partial view 903 of the ith object component, the partial view 903 is encoded by encoder 910 to obtain an image encoded representation, and the image encoded representation is input to U-Net network structure 920. That is, the partial view 903 is encoded into the potential space z _init by the encoder 910.

The forward diffusion network 921 in CRAFTSMAN model 900 is operative to sample gaussian noise from standard gaussian noise, the sampled gaussian noise feature representation and the image encoded representation encoded by partial view 903 are input to the backward diffusion network 922, and the backward diffusion network 922 is used to reduce noise to obtain a three-dimensional model feature representation. Illustratively, the back-diffusion network 922 performs iterative denoising with respect to the part three-dimensional model corresponding to the local view 903 as a denoising condition until denoising is completed, thereby obtaining a three-dimensional model feature representation. The three-dimensional model feature representation is input to a decoder 930, and decoded by the decoder 930 to obtain mesh data 906 corresponding to the i-th object component.

In some alternative embodiments, the input of CRAFTSMAN model 900 also includes a hint word (not shown in the figures) that is used to control the model generation effect by CRAFTSMAN model 900 in generating a three-dimensional model based on partial view 903. When the input of CRAFTSMAN model 900 also includes a hint word, encoder 910 also includes a text encoder that performs encoding of the hint word.

Optionally, the hint word includes at least one of generating style hint information for indicating a model style of the three-dimensional model of the part generated based on the partial view 903, optionally generating style hint information including at least one of a canvas style, a cartoon style, a photo style, a digital art style, etc., and attribute hint information for indicating model attributes of the three-dimensional model of the part generated based on the partial view 903, optionally including at least one of resolution, size, model data format, model size (storage space occupation amount), model quality.

In some embodiments, to enhance the generation of model data when generating model data based on a local view, image completion is performed on the local view prior to generating the model data. The method comprises the steps of inputting a local view corresponding to at least two object components into an image generation model to obtain a third view after the shielding part in the local view is complemented, and inputting the third view corresponding to at least two object components into a three-dimensional modeling model to obtain model data corresponding to at least two object components.

The image generation model is a machine learning model which is obtained through training in advance and is used for filling image content of an occluded area of an object component part in the local view. Alternatively, the image generation model may be implemented by a convolutional neural network, a feedforward neural network, a residual network, a converter, or the like neural network model, which is not particularly limited herein.

In some embodiments, the training process of the image generation model is realized by acquiring a sample incomplete view and a sample complete view, wherein the sample incomplete view is a two-dimensional image with regional defects for displaying a sample object, the sample complete view is a two-dimensional image for displaying a complete sample object, the sample incomplete view is sent to the image generation model to be trained to obtain a predicted complete view, a fourth loss value between the predicted complete view and the sample complete view is determined through a fourth loss function, and model parameters of the image generation model to be trained are iteratively adjusted according to the fourth loss value until convergence to obtain the image generation model.

In some embodiments, the sample incomplete view may be an image obtained by performing a random mask on the sample complete view.

Alternatively, the fourth loss function may be implemented as at least one of a cross entropy loss function, a mean square error loss function, a logarithmic loss function, a minimum absolute deviation loss function, and the like, which is not limited herein.

In one example, taking an image generation model implemented as a latent diffusion model (Latent Diffusion Model, LDM) as an example, the latent diffusion model is a deep learning model for generating an image by diffusion in a latent space, the latent diffusion model can learn image features by denoising and denoising the image, so that the latent diffusion model needs to implement denoising and denoising of the image through a forward process and a backward process, that is, the latent diffusion model performs denoising processing on the image through a forward diffusion network and performs denoising processing on the denoised image through a backward diffusion network.

In the training process of the potential diffusion model, training data of the model comprises a sample data pair formed by an occluded view and a complete view of a training sample, wherein the occluded view can be obtained by randomly masking the complete view.

As shown in fig. 10, which illustrates a schematic diagram of an image complement process provided by an exemplary embodiment of the present application, a local view 1001 corresponding to each object component obtained by dividing a global view is input into an LDM1000 to perform image complement, so as to obtain a third view 1002.

Wherein, after the local view 1001 is input to the LDM1000, the LDM1000 performs a noise adding process on the local view 1001 to obtain a noise image, which may be expressed asAfter LDM1000 adds noise to local view 1001 to complement the object components in local view 1001 to generate a target, iterative noise reduction is performed on the noise image, which may be expressed asWherein, the method comprises the steps of, wherein,Representing a standard wiener Process (STANDARD WIENER processes, also known as brownian motion),Is thatThe drift coefficient (Drift Coefficient) of the (c),Is thatIs used for the diffusion coefficient of (a),Is thatThe image samples corresponding to the time instants are compared,Time slave is representedBack flow toThe standard wiener process at the time of the process,Is thatIs provided for the distribution of the data of (a),Representing the data distribution score.

In the noise-adding process of LDM1000 to local view 1001, the diffusion duration is a continuous time variableIn the followingThe distribution of data corresponding to time instants can be expressed asIn the followingThe distribution of data corresponding to time instants can be expressed asThe diffusion duration is used to indicate the number of steps required to generate the third view 1002 from the noise image in an iterative noise reduction process. In order to learn the image features, the computer equipment needs to sample the real sample through the discretization reverse process。

In the embodiment of the application, the conversion process of obtaining the three-dimensional model through two-dimensional image conversion is realized through the AI model such as CRAFTSMAN model, so that the requirement on manual modeling is reduced, the labor cost is reduced, the generation efficiency of the three-dimensional model is improved, and the production efficiency of related business of the three-dimensional model is further improved.

In the embodiment of the application, the part which is missed when the object component part is blocked or segmented in the partial view obtained by segmentation is complemented by the pre-trained image generation model, and then the three-dimensional model generation is realized based on the complemented third view.

And 3, assembling a three-dimensional model of the object component.

In some alternative embodiments, when combining the component three-dimensional models corresponding to at least two object components, first position information of the two object components is determined by using the registered three-dimensional model directly generated from the global view, so that the component three-dimensional models are combined according to the first position information to obtain the object three-dimensional model. Referring to fig. 11, a flowchart of a three-dimensional model data processing method according to an exemplary embodiment of the application is shown, where the flowchart includes steps 241 to 243, and the steps 241 to 243 are the lower steps of the step 240.

Step 241, converting the global view into a registered three-dimensional model corresponding to the first object.

In the embodiment of the application, the registration three-dimensional model is a three-dimensional model obtained by directly converting the global view based on the first object, and is used as a reference basis for mutual splicing of the three-dimensional models of the components in the process of combining the three-dimensional models of the components so as to ensure the accuracy of the process of combining the three-dimensional models of the components.

In some embodiments, the global view is input into a three-dimensional modeling model to obtain model data corresponding to the registered three-dimensional model, wherein the three-dimensional modeling model is a model which is obtained through pre-training and is used for determining the corresponding registered three-dimensional model according to the global view and outputting the model data of the registered three-dimensional model.

In the embodiment of the present application, the conversion manner of converting the global view into the registered three-dimensional model is the same as the conversion manner of converting the local view (or the third view) of the object component into the component three-dimensional model, and will not be described herein.

Step 242, determining first position information of at least two object components according to the mapping relation between the component three-dimensional model and the registration three-dimensional model, which are respectively corresponding to the at least two object components.

In the embodiment of the application, as the mapping relation of the 2D component-3D component exists between the global view and the component three-dimensional model of the object component, the mapping relation of the 2D component-2D whole exists between the object component in the global view and the whole object corresponding to the first object, and the mapping relation of the 2D whole-3D whole exists between the global view and the registration three-dimensional model of the whole object corresponding to the first object, the mapping relation of the 3D component-3D whole between the component three-dimensional model of the object component and the registration three-dimensional model representing the whole object of the first object can be obtained by combining the mapping chains formed by the three mapping relations.

Schematically, a first mapping relation between a three-dimensional model of a component corresponding to at least two object components and a first registration area in a global view is established, wherein the first registration area is used for indicating areas where the at least two object components are located in the global view, a second mapping relation between the first registration area corresponding to the at least two object components in the global view and a second registration area of the first object is established, wherein the second registration area is used for indicating an area where the first object is located in the global view, a third mapping relation between the second registration area of the first object in the global view and the three-dimensional model is established, a mapping chain formed by the first mapping relation, the second mapping relation and the third mapping relation is established, a fourth mapping relation between the three-dimensional model of the component corresponding to the at least two object components and the three-dimensional model is determined according to the fourth mapping relation, and first position information of the at least two object components is determined according to the fourth mapping relation. After the first mapping relation, the second mapping relation and the third mapping relation are obtained, the mapping relation between the three-dimensional model of the component and the second registration area can be obtained through the first mapping relation and the second mapping relation, the mapping relation between the three-dimensional model of the component and the registration three-dimensional model can be obtained through the mapping relation between the three-dimensional model of the component and the second registration area and the third mapping relation, and the obtained fourth mapping relation is obtained, so that the mapping relation between the three-dimensional model of the component and the registration three-dimensional model of the component in the three-dimensional space can be established through the two-dimensional image.

In one example, the fourth mapping relationship is taken as first position information indicating a situation of the position of the at least two object components in the three-dimensional space.

For example, as shown in fig. 12, which is a schematic diagram illustrating a combination process of a three-dimensional model of a component provided by an exemplary embodiment of the present application, a global view 1201 generates a registered three-dimensional model 1202 through a three-dimensional modeling model 1210, a local view 1203 corresponding to each object component generates a three-dimensional model 1204 corresponding to each object component through the three-dimensional modeling model 1210, a mapping relationship 1205 between points in the three-dimensional model 1204 and points in the registered three-dimensional model 1202 is obtained through the mapping chain, and first position information of the object component is indicated through the mapping relationship 1205.

In some embodiments, since there may be some wrong mapping relationships in the actual calculation process, these wrong mapping relationships may negatively affect the final combination result, so the fourth mapping relationship obtained is screened to ensure the accuracy of the mapping relationship used in combination. The fourth mapping relation comprises at least two first point pair relations corresponding to at least two object components respectively, wherein the first point pair relations are used for indicating mapping relations of mapping point pairs formed by first position points in the three-dimensional model of the component and second position points in the registration three-dimensional model, the at least two first point pair relations are screened according to mapping accuracy conditions of the at least two first point pair relations corresponding to the ith object component to obtain second point pair relations, i is a positive integer, and the first position information is determined according to the second point pair relations corresponding to the at least two object components respectively.

In some embodiments, a random sample consensus (Random Sample Consensus, RANSAC) algorithm removes the wrong mapping. The method comprises the steps of sampling at least two first point pair relations corresponding to an ith object component part to obtain at least one third point pair relation, taking the at least one third point pair relation as a reference transformation relation, determining mapping accuracy corresponding to the at least two first point pair relations respectively, screening the at least two first point pair relations based on a threshold according to the mapping accuracy corresponding to the at least two first point pair relations respectively, and obtaining a second point pair relation.

In some embodiments, when screening at least two first point pair relationships, the first point pair relationship with the mapping accuracy higher than or equal to the threshold value is reserved, the first point pair relationship lower than the threshold value is screened out, and the reserved first point pair relationship is determined as a second point pair relationship.

Illustratively, the threshold is a preconfigured threshold.

By way of example, a certain number of mapping point pairs are randomly sampled, the mapping relation of the mapping point pairs is calculated, then all the mapping points are applied to the mapping relation, the number of the mapping point pairs within a certain error range is counted, the process is repeated for a plurality of times, and the mapping relation with the best fitting degree is selected as a final result. Therefore, the wrong mapping relation can be effectively eliminated, and the accuracy of the three-dimensional model of the combined part is improved.

In some embodiments, the first position information corresponding to the at least two object components is composed by a second point pair relationship corresponding to the at least two object components, respectively, that is, the combination of the three-dimensional model of the component can be realized through the second point pair relationship.

Step 243, combining the three-dimensional models of the components corresponding to the at least two object components according to the first position information to obtain the three-dimensional model of the first object.

In the embodiment of the application, the first position information comprises second point pair relations corresponding to at least two object components respectively, when the three-dimensional models of the parts corresponding to the at least two object components are combined according to the first position information, the positions of the three-dimensional models of the parts in the three-dimensional model are determined by taking the three-dimensional model as a reference through the second point pair relations, namely, the positions of the three-dimensional models of the parts of the at least two object components in the three-dimensional model are sequentially combined to the three-dimensional model according to the positions of the three-dimensional model indicated by the second point pair relations, so that the combination process of the three-dimensional models of the parts is realized.

In some embodiments, the at least two object components are registered with the registered three-dimensional model in a first registration mode based on the first position information to obtain registration results respectively corresponding to the at least two object components, and the component three-dimensional models respectively corresponding to the at least two object components are combined based on the registration results to obtain the object three-dimensional model of the first object.

Alternatively, the first registration mode described above may be implemented as a rigid registration (Rigid Registration) mode, a fast point feature histogram (Fast Point Feature Histograms, FPFH) mode, a direction histogram feature descriptor (Signature of Histograms of OrienTations, SHOT), an iterative closest point (ITERATIVE CLOSEST POINT, ICP) algorithm mode, a normal distribution transform (Normal Distributions Transform, NDT) algorithm mode, or the like.

In one example, the first registration mode is implemented as a rigid registration mode, i.e. a process of combining the three-dimensional models of the components of the at least two object components by rigid registration. Illustratively, according to the first position information, performing rigid registration on at least two object components and the registration three-dimensional model respectively to obtain registration results corresponding to the at least two object components, and according to the registration results, combining model data corresponding to the at least two object components respectively to obtain the object three-dimensional model of the first object.

Rigid registration is a technique applied in three-dimensional point clouds or grids, the aim of which is to find an optimal rigid transformation such that one point (source) can coincide as much as possible with another point (target) after passing this transformation. Mathematically, the rigid registration can be expressed as minimizing a metric function that measures the difference between the transformed point cloud and the target point cloud. Rigid transformations include rotation and translation, but do not include scaling or deformation, so all vertices share a transformation matrix.

In one example, points (sources) of the object components in the three-dimensional model of the component generated from the partial view and points (targets) in the registered three-dimensional model are made to coincide as much as possible by minimizing an objective function, wherein the objective function is as shown in equation one.

Equation one:。

Wherein, Is a rotation matrix of the rotation,Is the translation vector of the motion vector,Is a point in a three-dimensional model of the component generated by the partial view,Is to register in the three-dimensional modelCorresponding points.

In one example, rigid registration is performed on individual object components using a Procrustes analysis method to achieve a combination of three-dimensional models of the components. Procludes analysis is a method of achieving optimal alignment between objects in two or three dimensions, and in particular, by calculating optimal rotation, translation, and scaling transformations to minimize the distance between one shape and another. Illustratively, a Prussian distance (Procrustes Distance) between a component three-dimensional model of the object component generated through the partial view and a model region corresponding to the object component in the registration three-dimensional model is obtained, and the position and the size of the component three-dimensional model of the object component generated through the partial view are adjusted according to the Prussian distance so as to overlap with the model region corresponding to the object component in the registration three-dimensional model, thereby realizing a combination process of the object components.

In the application of the embodiment of the application, by utilizing the Proclusts analysis method and combining the known part three-dimensional model of the object component and the fourth mapping relation between the registration three-dimensional model of the first object, the rigid registration of each object component and the registration three-dimensional model can be rapidly performed, so that the combination of at least two object components is realized.

When the component three-dimensional models of the object components are combined, the shape and the size of the object indicated by the model data of the component three-dimensional models can be guaranteed to be unchanged in the combination process through rigid registration, the real geometric characteristics of the component three-dimensional models are guaranteed not to be influenced in the combination process, and the accuracy of the model combination process is improved.

In some embodiments, before performing the rigid registration, registration is further performed on model data corresponding to each object component through a second registration mode, schematically, according to a position relationship between at least two object components indicated by the first position information, the model data of the three-dimensional model of the component is registered through the second registration mode, registration model data corresponding to at least two object components respectively are obtained, the registration model data are used for indicating the model data obtained after the deformation of the three-dimensional model of the component is completed through the second registration mode, and the registration model data corresponding to at least two object components respectively are combined according to the registration result, so that the three-dimensional model of the object of the first object is obtained.

Optionally, the second registration mode includes at least one of a Non-rigid registration (Non-rigid Deformation Processing) mode, a depth-learning based deformation registration network mode, a diffusion model based unsupervised deformation registration mode, a physical constraint based deformation registration mode, and the like.

In one example, taking the second registration mode as a non-rigid registration as an example, schematically, according to the position relationship between at least two object components indicated by the first position information, performing non-rigid registration on model data of at least two object components to obtain registration model data corresponding to at least two object components respectively, where the registration model data is used to indicate model data obtained after non-rigid deformation of the three-dimensional model of the component is completed through non-rigid registration, and combining the registration model data corresponding to at least two object components respectively according to the registration result to obtain the three-dimensional model of the object of the first object.

That is, in order to solve the problem of mold penetration and the problem of joining between at least two object components, it is necessary to perform a non-rigid deformation process on the member. Non-rigid deformation refers to modifying the shape of an object while maintaining certain properties of the object. In the process of combining the component three-dimensional models of at least two object components, the connection between the at least two object components can be more natural through the adjustment of the model data by non-rigid deformation, and phenomena such as hollowness, overlapping and the like are avoided.

Schematically, after the registration result is obtained, when model data corresponding to each object component is combined according to the registration result, model representation units in the model data of each object component are combined to a three-dimensional space where the registration three-dimensional model is located according to a mapping relation indicated by the registration result, so that a combination process of the model data is realized. For example, taking the model data as the example, the mesh data includes meshes (including vertices, edges and faces) forming the three-dimensional model of the component, taking the mapping relationship between the vertices of the mesh as the example indicated by the registration result, and configuring a first vertex in the three-dimensional model of the component to a position where a second vertex having the mapping relationship with the first vertex in the three-dimensional model is located, thereby realizing a combination process of the model data of the components of each object based on the three-dimensional model.

Alternatively, the non-rigid deformation may be achieved by a variety of methods, such as physics-based simulation, energy-minimization-based optimization methods, and the like.

Illustratively, as shown in FIG. 12, after the mapping 1205 is obtained, the object three-dimensional model 1206 corresponding to the first object is finally obtained by combining the rigid registration 1220 and the non-rigid registration 1230.

In the embodiment of the application, the global view is directly converted into the three-dimensional model of the first object, and the three-dimensional model is used as a registration three-dimensional model to assist in establishing the position information of the object component in the three-dimensional space, so that the combination efficiency of the model data of the object component is improved under the guidance of the registration three-dimensional model.

In the embodiment of the application, the mapping chains of the 2D component-3D component, the 2D component-2D whole and the 2D whole-3D whole are established by registering the three-dimensional model, so that the mapping relation between the 3D component and the 3D whole is known through the mapping chains, the combination of the object component parts is realized through the mapping relation, and the combination efficiency of the component three-dimensional model of the combined object component parts is improved.

According to the three-dimensional model data processing method provided by the embodiment of the application, the first object is decomposed into at least two object components, so that the generation granularity of the object three-dimensional model is thinned, and independent model data are generated for each object component, so that the problem that all components in the finally obtained object three-dimensional model of the first object belong to the same model data and the object components are adhered is solved, the object three-dimensional model has higher editing flexibility in the downstream application process, and when the object three-dimensional model needs to be locally adjusted, the model data corresponding to the object components are supported to realize local editing, so that the workload of the object three-dimensional model in the downstream application process is reduced.

Exemplary, as shown in fig. 13, an effect comparison schematic of directly implementing three-dimensional model generation through CRAFTSMAN models and three-dimensional model generation implemented by the method according to an embodiment of the present application is shown.

In the generation result 1310 directly obtained by the CRAFTSMAN model, since the global view of the first object is directly input to the CRAFTSMAN model, the mesh data of the first object is output by the CRAFTSMAN model, and the three-dimensional model of the first object is indicated by the mesh data, in the generation result 1310, since the three-dimensional models of all object components in the first object are represented by the same mesh, there is a problem of adhesion between the object components.

In the result 1320 of generating the first object generated by the method provided by the embodiment of the application, because the model data corresponding to each object component is independently generated, the three-dimensional model of the first object is obtained by combining the model data of the component three-dimensional models of each object component which are independently generated, thereby solving the problem of adhesion between the object components and being more beneficial to the requirements of physical simulation such as model driving, collision between the object components and the like of the three-dimensional model in the downstream application process.

It should be noted that, before and during the process of collecting the relevant data of the user, the present application may display a prompt interface, a popup window or output voice prompt information, where the prompt interface, popup window or voice prompt information is used to prompt the user to collect the relevant data currently, so that the present application only starts to execute the relevant step of obtaining the relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or popup window is not obtained), the relevant step of obtaining the relevant data of the user is finished, i.e. the relevant data of the user is not obtained. In other words, all user data collected by the present application is collected with the consent and authorization of the user, and the collection, use and processing of relevant user data requires compliance with relevant laws and regulations and standards of the relevant country and region.

Referring now to FIG. 14, a block diagram illustrating a three-dimensional model data processing apparatus according to an exemplary embodiment of the present application is shown, the apparatus including the following modules.

An acquisition module 1410 configured to acquire a global view of a first object, the global view including a two-dimensional image showing the first object from a first perspective, the first object including at least two object components;

a segmentation module 1420, configured to segment the global view into local views respectively corresponding to the at least two object components, where the local views include two-dimensional images showing the at least two corresponding object components from the first view direction;

A generating module 1430, configured to convert the partial views corresponding to the at least two object components into a three-dimensional model of the component;

And the combining module 1440 is configured to combine the component three-dimensional models corresponding to the at least two object components respectively to obtain an object three-dimensional model of the first object.

In some alternative embodiments, as shown in fig. 15, the partitioning module 1420 includes:

A first obtaining unit 1421, configured to obtain location distribution information of the at least two object components in the global view, where the location distribution information is used to indicate a distribution situation of the at least two object components in the global view;

a dividing unit 1422, configured to divide the first object in the global view based on the location distribution information, so as to obtain local views respectively corresponding to the at least two object components.

In some optional embodiments, the first obtaining unit 1421 is further configured to identify, based on image semantics of the global view, object components existing in the global view, and obtain location distribution information corresponding to the at least two object components respectively;

The first obtaining unit 1421 is further configured to receive a marking operation corresponding to each of the at least two object components in the global view, where the marking operation is used to mark at least one marking point corresponding to each of the at least two object components, and generate location distribution information corresponding to each of the at least two object components based on the marking point indicated by the marking operation.

In some optional embodiments, the segmentation unit 1422 is further configured to input location distribution information corresponding to the global view and the at least two object components to a region segmentation model, so as to obtain local views corresponding to the at least two object components, where the region segmentation model is a machine learning model that is trained in advance and used for segmenting an area where each object component is located from the global view.

In some optional embodiments, the generating module 1430 is further configured to input the partial views corresponding to the at least two object components into a three-dimensional modeling model, to obtain a three-dimensional model of the component corresponding to the at least two object components, where the three-dimensional modeling model is a machine learning model that is obtained by training in advance and is used for outputting the three-dimensional model of the component according to the partial views.

In some alternative embodiments, the apparatus further comprises:

A complement module 1450, configured to input local views corresponding to the at least two object components respectively to an image generation model, to obtain a third view after the occlusion part in the local views is complemented, where the image generation model is a machine learning model that is obtained in advance and is used for filling the image content of the occluded area of the object component in the local view;

The generating module 1430 is further configured to input the third views corresponding to the at least two object components into the three-dimensional modeling model to obtain a three-dimensional model of the component corresponding to the at least two object components.

In some alternative embodiments, the combining module 1440 includes:

A second acquisition unit 1441 configured to acquire first position information of the at least two object components in a three-dimensional space, where the first position information is used to indicate positions of the at least two object components in the three-dimensional space that respectively correspond when the first object is composed;

and a combining unit 1442, configured to combine the component three-dimensional models corresponding to the at least two object components according to the first position information, so as to obtain an object three-dimensional model of the first object.

In some optional embodiments, the generating module 1430 is further configured to convert the global view into a registered three-dimensional model corresponding to the first object;

The second obtaining unit 1441 is further configured to determine the first position information of the at least two object components according to a mapping relationship between the component three-dimensional model and the registration three-dimensional model, where the component three-dimensional model and the registration three-dimensional model correspond to the at least two object components respectively.

In some alternative embodiments, the combining module 1440 further comprises:

a mapping unit 1443, configured to establish a first mapping relationship between the component three-dimensional model respectively corresponding to the at least two object components and a first registration area in the global view, where the first registration area is used to indicate an area where the at least two object components are respectively located in the global view;

The mapping unit 1443 is further configured to establish a second mapping relationship between a first registration area corresponding to each of the at least two object components in the global view and a second registration area of the first object, where the second registration area is used to indicate an area where the first object is located in the global view;

The mapping unit 1443 is further configured to establish a third mapping relationship between the second registration area of the first object in the global view and the registered three-dimensional model;

the mapping unit 1443 is further configured to determine a fourth mapping relationship between the component three-dimensional model and the registration three-dimensional model, where the component three-dimensional model and the registration three-dimensional model correspond to the at least two object components respectively, according to a mapping chain formed by the first mapping relationship, the second mapping relationship, and the third mapping relationship;

the second obtaining unit 1441 is further configured to determine the first position information of the at least two object components according to the fourth mapping relationship.

the second obtaining unit 1441 is further configured to screen the at least two first point pair relationships according to the mapping accuracy of the at least two first point pair relationships corresponding to the ith object component to obtain a second point pair relationship, where i is a positive integer;

The second obtaining unit 1441 is further configured to determine the first position information according to a second point pair relationship corresponding to the at least two object components respectively.

In some optional embodiments, the second obtaining unit 1441 is further configured to sample at least one third point pair relationship from the at least two first point pair relationships corresponding to the ith object element;

the second obtaining unit 1441 is further configured to determine mapping accuracies corresponding to the at least two first point pair relationships respectively by using the at least one third point pair relationship as a reference transformation relationship;

the second obtaining unit 1441 is further configured to screen the at least two first point pair relationships based on a threshold according to the mapping accuracies corresponding to the at least two first point pair relationships, to obtain the second point pair relationship.

In some alternative embodiments, the combining module 1440 further comprises:

A registration unit 1444, configured to register the at least two object components and the registered three-dimensional model in a first registration mode based on the first position information, so as to obtain registration results that respectively correspond to the at least two object components;

The combining unit 1442 is further configured to combine the component three-dimensional models corresponding to the at least two object components respectively based on the registration result, to obtain an object three-dimensional model of the first object.

In some optional embodiments, the registration unit 1444 is further configured to register, according to a positional relationship between the at least two object components indicated by the first position information, model data of the three-dimensional model of the component in a second registration mode, to obtain registration model data corresponding to the at least two object components respectively, where the registration model data is used to indicate model data obtained after the deformation of the three-dimensional model of the component is completed in the second registration mode;

The combining unit 1442 is further configured to combine the registration model data corresponding to the at least two object components according to the registration result, to obtain an object three-dimensional model of the first object.

It should be noted that, in the three-dimensional model data processing apparatus provided in the foregoing embodiment, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the three-dimensional model data processing device and the three-dimensional model data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, and are not repeated herein.

The application also provides a computer device, which comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the three-dimensional model data processing method provided by each method embodiment. It should be noted that the computer device may be a computer device as provided in fig. 16 below.

Referring to fig. 16, a schematic diagram of a computer device according to an exemplary embodiment of the present application is shown. Specifically, the computer device 1600 includes a central processing unit (Central Processing Unit, CPU) 1601, a system Memory 1604 including a random access Memory (Random Access Memory, RAM) 1602 and a Read Only Memory (ROM) 1603, and a system bus 1605 connecting the system Memory 1604 and the central processing unit 1601. The computer device 1600 also includes a basic input/output system (I/O system) 1606 to facilitate transfer of information between various devices within the computer, and a mass storage device 1607 for storing an operating system 1613, application programs 1614, and other program modules 1615.

The basic input/output system 1606 includes a display 1608 for displaying information and an input device 1609, such as a mouse, keyboard, etc., for user input of information. Wherein the display 1608 and the input device 1609 are both coupled to the central processing unit 1601 by way of an input output controller 1610 coupled to the system bus 1605. The basic input/output system 1606 may also include an input/output controller 1610 for receiving and processing input from a keyboard, mouse, or electronic stylus, among a plurality of other devices. Similarly, the input-output controller 1610 also provides output to a display screen, printer, or other type of output device.

The mass storage device 1607 is connected to the central processing unit 1601 by a mass storage controller (not shown) connected to the system bus 1605. The mass storage device 1607 and its associated computer-readable media provide non-volatile storage for the computer device 1600. That is, the mass storage device 1607 may include a computer readable medium (not shown) such as a hard disk or a compact disk-Only (CD-ROM) drive.

The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable programmable read-Only Memory (EPROM), electrically erasable programmable read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital versatile disks (DIGITAL VERSATILE DISC, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 1604 and mass storage 1607 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1601, the one or more programs containing instructions for implementing the three-dimensional model data processing method or the machine translation model-based translation method provided by the respective method embodiments described above, and the central processing unit 1601 executes the one or more programs to implement the three-dimensional model data processing method or the machine translation model-based translation method provided by the respective method embodiments described above.

According to various embodiments of the invention, the computer device 1600 may also operate through a network, such as the Internet, to remote computers connected to the network. That is, the computer device 1600 may be connected to the network 1612 through a network interface unit 1611 coupled to the system bus 1605, or alternatively, the network interface unit 1611 may be used to connect to other types of networks or remote computer systems (not shown).

The memory also includes one or more programs stored in the memory, the one or more programs including steps for performing the three-dimensional model data processing method provided by the embodiment of the present invention, which are executed by the computer device.

The embodiment of the application also provides computer equipment, which comprises a memory and a processor, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded by the processor and realizes the three-dimensional model data processing method.

The embodiment of the application also provides a computer readable storage medium, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the readable storage medium, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the three-dimensional model data processing method.

The application also provides a computer program product which, when run on a computer, causes the computer to execute the three-dimensional model data processing method provided by the method embodiments.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing related hardware, and the program may be stored in a computer readable storage medium, which may be a computer readable storage medium included in the memory of the above embodiments, or may be a computer readable storage medium separately existing and not incorporated in the terminal. The computer readable storage medium stores at least one instruction, at least one program, a code set, or an instruction set, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the three-dimensional model data processing method or the machine translation model-based translation method. Alternatively, the computer readable storage medium may include a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a Solid state disk (SSD, solid STATE DRIVES), an optical disk, or the like. The random access memory may include resistive random access memory (ReRAM, RESISTANCE RANDOM ACCESS MEMORY) and dynamic random access memory (DRAM, dynamic Random Access Memory), among others. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc. The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims

1. A method for processing three-dimensional model data, the method comprising:

2. The method according to claim 1, wherein the dividing from the global view to obtain the local views respectively corresponding to the at least two object components includes:

Acquiring position distribution information of the at least two object components in the global view, wherein the position distribution information is used for indicating the distribution condition of the at least two object components in the global view;

And dividing the first object in the global view based on the position distribution information to obtain local views respectively corresponding to the at least two object components.

3. The method of claim 2, wherein the obtaining location distribution information of the at least two object components in the global view comprises:

based on the image semantic of the global view, identifying the object components existing in the global view to obtain the position distribution information respectively corresponding to the at least two object components, or

The method comprises the steps of receiving marking operations corresponding to at least two object components in the global view, marking at least one marking point corresponding to the at least two object components, and generating position distribution information corresponding to the at least two object components based on the marking points indicated by the marking operations.

4. A method according to claim 2 or 3, wherein said dividing the first object in the global view based on the location distribution information to obtain local views respectively corresponding to the at least two object components comprises:

The global view and the position distribution information corresponding to the at least two object components are input into a region segmentation model, so that the local view corresponding to the at least two object components is obtained;

The region segmentation model is a machine learning model which is obtained through pre-training and used for segmenting regions where all object components are located from the global view.

5. A method according to any one of claims 1 to 3, wherein said converting the respective corresponding partial views of the at least two object components into a three-dimensional model of the component comprises:

And inputting the partial views corresponding to the at least two object components into a three-dimensional modeling model to obtain part three-dimensional models corresponding to the at least two object components, wherein the three-dimensional modeling model is a machine learning model which is obtained through training in advance and is used for outputting the part three-dimensional models according to the partial views.

6. The method according to claim 5, wherein the inputting the partial views respectively corresponding to the at least two object components into the three-dimensional modeling model to obtain the three-dimensional model of the component respectively corresponding to the at least two object components comprises:

Inputting the local views corresponding to the at least two object components into an image generation model to obtain a third view after the shielding part in the local view is complemented, wherein the image generation model is a machine learning model which is obtained in advance and used for filling the image content of the shielded area of the object component in the local view;

And inputting the third views corresponding to the at least two object components into the three-dimensional modeling model to obtain the component three-dimensional model corresponding to the at least two object components.

7. A method according to any one of claims 1 to 3, wherein said combining the three-dimensional models of the parts of the at least two object components, respectively, to obtain the three-dimensional model of the first object, comprises:

Acquiring first position information of the at least two object components in a three-dimensional space, wherein the first position information is used for indicating positions of the at least two object components respectively corresponding to the first object in the three-dimensional space;

and combining the component three-dimensional models respectively corresponding to the at least two object components according to the first position information to obtain an object three-dimensional model of the first object.

8. The method of claim 7, wherein the obtaining first positional information of the at least two object components in three-dimensional space comprises:

converting the global view into a registration three-dimensional model corresponding to the first object;

and determining the first position information of the at least two object components according to the mapping relation between the component three-dimensional model and the registration three-dimensional model, which correspond to the at least two object components respectively.

9. The method according to claim 8, wherein the determining the first position information of the at least two object components according to a mapping relationship between the component three-dimensional model and the registration three-dimensional model, which correspond to the at least two object components, respectively, includes:

Establishing a first mapping relation between a component three-dimensional model respectively corresponding to the at least two object components and a first registration area in the global view, wherein the first registration area is used for indicating areas where the at least two object components are respectively located in the global view;

Establishing a second mapping relation between a first registration area corresponding to the at least two object components in the global view and a second registration area of the first object, wherein the second registration area is used for indicating an area where the first object is located in the global view;

Establishing a third mapping relation between the second registration area of the first object in the global view and the registration three-dimensional model;

Determining a fourth mapping relation between the component three-dimensional model and the registration three-dimensional model, which correspond to the at least two object components respectively, according to a mapping chain formed by the first mapping relation, the second mapping relation and the third mapping relation;

and determining the first position information of the at least two object components according to the fourth mapping relation.

10. The method according to claim 9, wherein the fourth mapping relation comprises at least two first point pair relations corresponding to the at least two object components, respectively, the first point pair relations being used to indicate mapping relations of mapping point pairs formed by first position points in the three-dimensional model of the component and second position points in the registered three-dimensional model;

The determining the first location information of the at least two object components according to the fourth mapping relation includes:

Screening the at least two first point pair relations according to the mapping accuracy condition of the at least two first point pair relations corresponding to the ith object component part to obtain a second point pair relation, wherein i is a positive integer;

and determining the first position information according to the second point pair relations respectively corresponding to the at least two object components.

11. The method according to claim 10, wherein the filtering the at least two first point pair relationships according to the mapping accuracy of the at least two first point pair relationships corresponding to the ith object component part to obtain the second point pair relationship includes:

Sampling at least one third point pair relation from the at least two first point pair relations corresponding to the ith object component part;

Taking the at least one third point pair relationship as a reference transformation relationship, and determining mapping accuracy corresponding to the at least two first point pair relationships respectively;

and screening the at least two first point pair relations based on a threshold value according to the mapping accuracy respectively corresponding to the at least two first point pair relations to obtain the second point pair relation.

12. The method according to claim 8, wherein combining the three-dimensional models of the parts corresponding to the at least two object components according to the first position information to obtain the three-dimensional model of the first object includes:

Registering the at least two object components and the registration three-dimensional model in a first registration mode based on the first position information to obtain registration results respectively corresponding to the at least two object components;

And combining the component three-dimensional models respectively corresponding to the at least two object components based on the registration result to obtain an object three-dimensional model of the first object.

13. The method according to claim 12, wherein combining the three-dimensional models of the parts of the at least two object components based on the registration result to obtain the three-dimensional model of the first object comprises:

Registering model data of the three-dimensional model of the component in a second registration mode according to the position relation between the at least two object components indicated by the first position information to obtain registration model data corresponding to the at least two object components respectively, wherein the registration model data are used for indicating the model data obtained after the deformation of the three-dimensional model of the component is completed in the second registration mode;

and combining the registration model data respectively corresponding to the at least two object components according to the registration result to obtain an object three-dimensional model of the first object.

14. A three-dimensional model data processing apparatus, the apparatus comprising:

The segmentation module is used for segmenting the local view corresponding to the at least two object components from the global view, and the local view comprises a two-dimensional image showing the corresponding object components from the first view direction;

15. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the three-dimensional model data processing method of any of claims 1 to 13.

16. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the three-dimensional model data processing method of any one of claims 1 to 13.

17. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the three-dimensional model data processing method of any one of claims 1 to 13.