[go: up one dir, main page]

CN119478919A - A method and device for object recognition based on three-dimensional reconstruction - Google Patents

A method and device for object recognition based on three-dimensional reconstruction Download PDF

Info

Publication number
CN119478919A
CN119478919A CN202411525920.5A CN202411525920A CN119478919A CN 119478919 A CN119478919 A CN 119478919A CN 202411525920 A CN202411525920 A CN 202411525920A CN 119478919 A CN119478919 A CN 119478919A
Authority
CN
China
Prior art keywords
point cloud
nerf
reconstruction
sub
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411525920.5A
Other languages
Chinese (zh)
Inventor
孙峰奇
白龙
尹超
陈志斌
徐皓
张凯凯
陈曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Online Information Technology Co Ltd
Original Assignee
China Unicom Online Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Online Information Technology Co Ltd filed Critical China Unicom Online Information Technology Co Ltd
Priority to CN202411525920.5A priority Critical patent/CN119478919A/en
Publication of CN119478919A publication Critical patent/CN119478919A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种基于三维重建的物体识别方法和装置,属于计算机图像处理领域,上述方法包括:基于多视几何的三维重建步骤;无纹理区域检测步骤;基于NeRF的三维重建步骤;点云融合步骤和点云分类步骤。本发明可应用在在影视制作、虚拟现实及增强现实中,提高物品的识别效率和识别准确度,降低制作成本。

The present invention relates to an object recognition method and device based on three-dimensional reconstruction, belonging to the field of computer image processing, and the method comprises: a three-dimensional reconstruction step based on multi-view geometry; a textureless region detection step; a three-dimensional reconstruction step based on NeRF; a point cloud fusion step and a point cloud classification step. The present invention can be applied in film and television production, virtual reality and augmented reality to improve the recognition efficiency and recognition accuracy of objects and reduce production costs.

Description

Object identification method and device based on three-dimensional reconstruction
Technical Field
The present invention relates to the field of computer image processing, and in particular, to an object recognition method and apparatus based on three-dimensional reconstruction.
Background
In the field of three-dimensional reconstruction, the current mainstream schemes include multi-view geometry based methods and NeRF.
The method based on multi-view geometry is a traditional method before the popularity of the neural network, and is typified by Colmap software. Colmap extract geometric information from the two-dimensional images at multiple viewing angles, and derive a three-dimensional structure. The method mainly comprises the steps of feature extraction and matching, sfM-based sparse reconstruction, MVS-based dense reconstruction, and BA-based camera calibration and optimization. After the process, a three-dimensional model is finally obtained.
NeRF (Neural RADIANCE FIELDS) is a neural network-based three-dimensional reconstruction and rendering technique capable of generating a high quality three-dimensional scene representation from a set of two-dimensional images. The core idea is to learn the color and density distribution of each spatial position and view angle in the scene through a multi-layer perceptron (MLP) network, thereby generating a volume rendering output. But multi-view based schemes cannot handle areas without texture.
NeRF training requires input of image and camera parameters, the algorithm encodes each three-dimensional coordinate at high frequency, inputs it into the neural network, and the network outputs the color (RGB) and density (transparency of volume rendering) of the point. The generated image and the input real image are compared, and network parameters are adjusted so that the generated image is more matched with the real image, but the scheme based on NeRF has the problem of low operation efficiency.
In the field of three-dimensional scene recognition based on point cloud, pointNet uses a shared multi-layer perceptron to learn the characteristics of each point, so that the unordered point cloud can be directly processed, a learner subsequently proposes PointNet ++ [19] network on the basis of the characteristics, the capability of multi-layer neighborhood characteristic extraction and aggregation is increased, and the local and global structures of the point cloud data can be better captured. DGCNN constructing the adjacency relation of the point cloud data through the dynamic graph, so that the topological structure between the points can be captured better. The reconstructed model has irregular topology, and is difficult to perform secondary rendering.
Disclosure of Invention
The invention aims to provide an object identification method and device based on three-dimensional reconstruction, which solve the defects in the prior art, and the technical problem to be solved by the invention is realized by the following technical scheme.
The first aspect of the invention provides an object identification method based on three-dimensional reconstruction, which comprises the following steps:
A three-dimensional reconstruction step based on multi-view geometry, wherein a three-dimensional reconstruction algorithm based on multi-view geometry is used for inputting a plurality of scene images and outputting a point cloud of a scene;
A non-texture region detection step, namely setting a threshold value according to depth information obtained in a multi-view geometric three-dimensional reconstruction process, and judging that the non-texture region exists if the depth information is smaller than the threshold value and the region does not have point clouds according to the depth information of each photo;
Based on NeRF, processing the input image based on NeRF to generate a corresponding point cloud, specifically realized by the following substeps:
the image input substep NeRF relies on multi-view image input, the images from multiple camera angles covering different view angles of the target scene to ensure the integrity and accuracy of the reconstruction;
NeRF using an MLP multi-layer perceptron neural network to fit the radiation field of the scene, the neural network receiving as input the pose and ray direction of each camera and outputting color and density values for each 3D spatial location, allowing NeRF to generate a continuous three-dimensional representation, capable of reconstructing depth and geometry information even in non-textured areas;
NeRF, rendering the density and color values of the three-dimensional space back to a two-dimensional image through a volume rendering technology, gradually optimizing a loss function of the two-dimensional image, and generating a high-quality rendering image matched with an input photo;
after the reconstruction of the three-dimensional scene is completed, extracting points with higher density from the NeRF model as point cloud data, wherein the points represent object boundaries or other important geometric features in the scene;
The step NeRF of converting the point cloud comprises the following substeps of converting the scene output by NeRF into a three-dimensional point cloud, wherein each point in the point cloud is represented by the position and color value of the point cloud, and fitting the spatial information based on the input picture to the geometric characteristics of the non-texture area NeRF so as to generate a corresponding point cloud;
Aligning, correcting and integrating point clouds obtained from NeRF and multi-view geometry to ensure that each point cloud is in the same coordinate system;
and a point cloud classification step of identifying objects from the point clouds based on a deep-learning classification model, wherein each classified point cloud corresponds to a specific object category in the asset library.
With reference to the first aspect, the above-mentioned point cloud fusion step further includes:
The method comprises the substeps of obtaining the pose of a camera, wherein in the multi-view geometric reconstruction, each photo corresponds to the pose of one camera and comprises rotation and translation information of the camera;
NeRF, generating point cloud substeps, namely, extracting point cloud from NeRF to represent denser and optimized spatial point distribution in a scene, wherein the spatial point distribution comprises color and depth information;
A point cloud sub-step of multi-view geometry, namely, a group of preliminary point clouds based on different angles is generated by using SFM to recover a structure from motion or MVS multi-view stereo matching algorithm;
And the fusion substep is optimized by iterating the closest point ICP algorithm, so that the two groups of point clouds are accurately matched in a three-dimensional space.
With reference to the first aspect, the above-mentioned point cloud classifying step further includes:
preprocessing the fused point cloud, wherein the preprocessing comprises denoising, downsampling and point cloud normalization so as to reduce calculation burden and improve classification accuracy;
The feature extraction sub-step is that the point cloud classification relies on a feature extraction algorithm, the geometric information and the spatial distribution of each point are extracted into high-dimensional feature vectors, and the feature extraction algorithm comprises local curvature, normal vector estimation and a point cloud convolutional neural network based on deep learning;
A classification model training sub-step of processing sparse and irregular point cloud data based on deep learning classification models PointNet and PointCNN and classifying each point cloud based on global and local features;
and the object identification and segmentation sub-step is to classify the object to be identified from the point cloud, wherein the classified point cloud corresponds to a specific object category in the asset library.
With reference to the first aspect, the above-mentioned point cloud classification step further includes a point cloud homogenization step:
A grid conversion sub-step, namely converting point cloud data into triangular grid representation by using Marching Cubes or Poisson Surface Reconstruction grid generation algorithm, and describing the surface structure of the object more accurately by the connection mode of the triangular grid representation vertexes;
the fixed resolution sampling substep comprises the steps of resampling the grid based on the set resolution after generating the grid representation, generating new point cloud data through a uniform sampling algorithm, so that the distribution of data points of the new point cloud in space is more uniform;
And in the point cloud homogenization substep, the intervals of each point of the point cloud data are approximately equal, and the points can be uniformly covered in different areas of the object, so that the problem of overlarge concentration of the point density is avoided.
With reference to the first aspect, the above-mentioned point cloud classifying step further includes an object classifying step based on PCA:
PCA feature extraction substep, namely performing principal component analysis PCA on the homogenized point cloud data, and selecting and extracting the first 10 principal components by calculating a covariance matrix and extracting the first several principal components, wherein the principal components can describe most geometric features of an object;
Aligning the main components, namely aligning the extracted main components with the main components of similar objects in the asset library, and ensuring that two groups of point clouds are compared in the same direction by aligning a first main component, wherein the aligning process involves rigid body transformation, so that the objects are geometrically aligned, and the subsequent classification is convenient;
Calculating principal component distance ion, namely calculating principal component distance between the current object and each similar object in the asset library after alignment is completed, wherein the principal component distance is based on Euclidean distance measurement method to measure similarity of two objects in a low-dimensional feature space, and an object with smaller principal component distance is considered to have higher similarity with the current object;
And returning to the most similar object substep, namely selecting an object closest to the asset library according to the calculated principal component distance, wherein the object is a classification result most similar to the current object.
A second aspect of the present invention provides an object recognition device based on three-dimensional reconstruction, the device comprising:
the three-dimensional reconstruction module based on the multi-view geometry is used for inputting a plurality of scene images and outputting a point cloud of a scene by using a three-dimensional reconstruction algorithm based on the multi-view geometry;
The texture-free region detection module is used for setting a threshold according to depth information obtained in the three-dimensional reconstruction process of the multi-view geometry, and judging that the texture-free region exists if the depth information is smaller than the threshold and the region does not have point clouds according to the depth information of each photo;
Based on NeRF's three-dimensional reconstruction module, based on NeRF carries out processing to the image of input, generates corresponding point cloud, specifically realizes through following substep:
The image input submodule NeRF relies on multi-view image input, the images come from multiple camera angles and cover different view angles of a target scene so as to ensure the integrity and accuracy of reconstruction, and for a non-texture area, a powerful neural network of NeRF can be utilized to fit color and geometric characteristics;
NeRF a neural network modeling sub-module, which uses an MLP multi-layer perceptron neural network to fit the radiation field of the scene, the neural network can accept the pose and the light direction of each camera as input and output the color and density values of each 3D space position, allowing NeRF to generate continuous three-dimensional representation, and reconstructing depth and geometric information even in a non-textured area;
NeRF, rendering the density and color values of the three-dimensional space back to a two-dimensional image through a volume rendering technology, gradually optimizing a loss function of the two-dimensional image, and generating a high-quality rendering image matched with an input photo;
the point cloud extraction submodule can extract points with higher density from the NeRF model as point cloud data after the reconstruction of the three-dimensional scene is completed, wherein the points represent object boundaries or other important geometric features in the scene;
The point cloud conversion submodule NeRF is used for converting a scene output by the input picture into three-dimensional point clouds, wherein each point in the point clouds is represented by the position and the color value of the point clouds, and for a texture-free area, neRF is used for fitting the spatial information based on the input picture to the geometric characteristics of the texture-free area so as to generate corresponding point clouds;
The point cloud fusion module is used for aligning, correcting and integrating the point clouds obtained from NeRF and the multi-view geometry, so that each point cloud is ensured to be in the same coordinate system;
And the point cloud classification module is used for identifying the object from the point clouds based on the deep learning classification model, and each classified point cloud corresponds to a specific object category in the asset library.
The invention combines two schemes of multi-view and NeRF for three-dimensional reconstruction, can be applied to film and television production, virtual reality and augmented reality, improves the identification efficiency and the identification accuracy of articles, and reduces the production cost.
Drawings
FIG. 1 is a flow chart of the steps of a three-dimensional reconstruction-based object recognition method of the present invention;
Fig. 2 is an internal structural diagram of the electronic device provided by the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Abbreviations and key terms commonly used in the art are defined as follows:
A point cloud refers to a set of discrete three-dimensional data points acquired by three-dimensional scanning or stereovision, etc., each point containing spatial coordinates (e.g., x, y, z). The point cloud is generally used for representing the surface structure of an object or a scene, and is widely applied to the fields of computer vision, 3D reconstruction, automatic driving, virtual reality and the like.
A mesh is a three-dimensional geometry made up of vertices, edges, and faces, typically used to represent the surface of an object. The mesh model can accurately describe the shape of objects, and is particularly suitable for scene and object representation in three-dimensional modeling, animation, game development, and computer graphics.
NeRF is a neural network based 3D scene representation method that enables high quality three-dimensional reconstruction from 2D images by mapping arbitrary points in three-dimensional space to color and volume density. NeRF is widely used in the fields of new view angle synthesis, 3D reconstruction, virtual reality and the like.
Multiview geometry is the field of studying the geometric relationships of deriving three-dimensional structures from images acquired from multiple perspectives. By analyzing a plurality of images of the same scene shot by the camera at different positions, the multi-view geometric technology can reconstruct the three-dimensional shape of the scene, and is widely applied to three-dimensional reconstruction and posture estimation in computer vision.
Classification is a task in machine learning that aims at assigning input data into predefined categories or labels. Classification algorithms typically identify patterns or features in input data by training models and categorize new data based on these features. Classification is widely used in the fields of image recognition, text analysis, speech recognition, etc.
PCA (principal component analysis) is a statistical method for dimension reduction that preserves the principal features or variances of data by transforming high-dimensional data into a low-dimensional space. PCA is widely applied to scenes such as data compression, feature extraction, noise filtration and the like, and has obvious effect especially in processing a high-dimensional data set.
As shown in fig. 1, the specific implementation process of the present invention is as follows:
1) Three-dimensional reconstruction based on multi-view geometry. Any three-dimensional reconstruction algorithm based on multi-view geometry in the current mainstream can be used, and the three-dimensional reconstruction algorithm is input into a plurality of scene images and output into a point cloud of a scene.
2) And detecting an un-textured area. And setting a threshold value by utilizing depth information obtained in the three-dimensional reconstruction process of the multi-view geometry. For the depth information of each photo, if the depth value is smaller than the threshold value and the region has no point cloud, the texture-free region is determined.
3) Three-dimensional reconstruction based on NeRF.
A) The input photo NeRF relies on multi-view photo input. The pictures may be from multiple camera angles, covering different perspectives of the target scene, preferably evenly distributed, to ensure the integrity and accuracy of the reconstruction. Particularly for non-textured areas, a strong neural network of NeRF can be utilized to fit the color and geometric features.
B) Neural network modeling NeRF a MLP (multi-layer perceptron) neural network is used to fit the radiation field of a scene. The network will accept each camera pose and ray direction as inputs and output color and intensity values for each 3D spatial location. This allows NeRF to generate a continuous three-dimensional representation that can reconstruct depth and geometry information even in non-textured areas.
C) And volume rendering, namely NeRF, rendering the density and color values of the three-dimensional space back to a two-dimensional image through a volume rendering technology, gradually optimizing a loss function of the two-dimensional image, and generating a high-quality rendered image matched with the input photo.
D) And extracting point cloud, namely extracting points with higher density from the NeRF model as point cloud data after the reconstruction of the three-dimensional scene is completed. These points typically represent object boundaries or other important geometric features in the scene.
E) The scene output by the conversion to a point cloud NeRF may be converted to a three-dimensional point cloud, each point in the point cloud being represented by its position and color value. For the non-textured region NeRF fits its geometric features based on spatial information of the input picture, thereby generating a corresponding point cloud.
4) And (5) fusing point clouds.
A) The camera pose is obtained, namely, in multi-view geometric reconstruction, each photo corresponds to the pose of one camera (comprising rotation and translation information of the camera). These poses are used to determine the spatial position of each picture in the scene.
B) NeRF the point cloud extracted from NeRF represents a denser and optimized spatial point distribution in the scene, including color and depth information.
C) Point clouds of multi-view geometry a set of preliminary point clouds based on different angles can also be generated by conventional multi-view geometry algorithms such as SFM (structure recovery from motion) or MVS (multi-view stereo matching). These point clouds are typically sparse, but may provide additional geometric constraints to the scene.
D) Fusion process the process of point cloud fusion involves the alignment, correction and integration of point clouds obtained from different methods (NeRF and multi-view geometry). By aligning camera pose information, it can be ensured that each point cloud is in the same coordinate system. Further fusion can be optimized by iterative closest point algorithm (ICP) or other technique, so that the two sets of point clouds are exactly matched in three-dimensional space.
5) And (5) classifying point clouds.
A) And (3) preprocessing the point cloud, namely preprocessing the fused point cloud, such as denoising, downsampling and point cloud normalization, so as to reduce calculation load and improve classification accuracy.
B) Feature extraction point cloud classification typically relies on feature extraction algorithms to extract the geometric information and spatial distribution of each point as high-dimensional feature vectors. Common feature extraction methods include local curvature, normal vector estimation, and deep learning based point cloud convolutional neural networks (e.g., pointNet).
C) Classification model training-based on deep learning classification models, particularly PointNet and PointCNN, etc., can process sparse, irregular point cloud data and classify each point based on global and local features.
D) Object identification and segmentation, namely, after classification is completed, the identified object is segmented from the point cloud. Each point after classification corresponds to a particular object category in the asset library.
6) And homogenizing the point cloud.
A) Point cloud to mesh conversion-point cloud data is converted to a triangular mesh representation using a mesh generation algorithm (e.g., marching Cubes or Poisson Surface Reconstruction). This transformation enables a more accurate description of the surface structure of the object by way of the connection of the vertices of the mesh.
B) Fixed resolution sampling-after the grid representation is generated, the grid is resampled based on the set resolution. And generating new point cloud data through a uniform sampling algorithm, so that the data points of the new point cloud are distributed more uniformly in space. This is critical for the subsequent analysis and comparison process.
C) And (3) a point cloud homogenization result, wherein the intervals of each point of the finally obtained point cloud data are approximately equal, and the points can be uniformly covered in different areas of the object, so that the problem of too concentrated point density is avoided.
7) Object classification based on PCA.
A) PCA feature extraction, namely performing Principal Component Analysis (PCA) on the homogenized point cloud data. The goal of PCA is to convert Gao Weidian cloud data into a low-dimensional feature space, typically by computing the covariance matrix and extracting the first few principal components. You can choose to extract the first 10 principal components, which are typically able to describe most of the geometric features of the object.
B) Aligning principal components, namely aligning the extracted principal components with principal components of similar objects in an asset library. The comparison of the two sets of point clouds in the same direction is typically ensured by aligning the first principal component. The alignment process may involve rigid body transformations so that the objects are geometrically aligned for subsequent classification.
C) And calculating the distance between principal components, namely calculating the distance between the current object and the principal components of each similar object in the asset library after alignment is completed. This distance may be based on a metric such as Euclidean distance to measure the similarity of two objects in a low dimensional feature space. Objects with smaller principal component distances are considered to have higher similarity to the current object.
D) And returning the most similar object, namely selecting the object closest to the asset library according to the calculated principal component distance. This object is the most similar classification result to the current object and can be used for further object recognition or classification tasks in general.
Fig. 2 illustrates a physical structure diagram of an electronic device, which may be an intelligent terminal, and an internal structure diagram thereof may be as shown in fig. 2. The electronic device includes a processor, an internal memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection.
It will be appreciated by those skilled in the art that the structure shown in fig. 2 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device to which the present inventive arrangements are applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
The invention has been proposed for use in a communication cloud tray product. Innovative solutions are provided for multiple industries by extracting key information and intelligently classifying. In the fields of film and television production, virtual Reality (VR) and Augmented Reality (AR), the technology can remarkably improve the creation efficiency and reduce the production cost. For example, complex backgrounds or props are quickly built in film and television special effects, realistic environment and object models are automatically generated during game design, and richer and personalized user experience is provided in VR/AR applications. With the rapid growth of the digital content industry, there is an increasing demand for efficient generation of high quality visual content. Therefore, the invention not only meets the existing market demand, but also has the potential to develop new application scenes, such as interactive teaching material production in online education and the like.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application.
In the above detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components unless context indicates otherwise. The illustrated embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.一种基于三维重建的物体识别方法,其特征在于,所述方法包括:1. A method for object recognition based on three-dimensional reconstruction, characterized in that the method comprises: 基于多视几何的三维重建步骤,使用基于多视几何的三维重建算法,输入为多张场景图像,输出为场景的点云;The 3D reconstruction step based on multi-view geometry uses a 3D reconstruction algorithm based on multi-view geometry, with multiple scene images as input and a point cloud of the scene as output; 无纹理区域检测步骤,利用多视几何的三维重建过程中得到的深度信息,设定阈值,针对每张照片的深度信息,如果所述深度信息小于所述设定阈值,同时区域没有点云,则判定为无纹理区域;The texture-free area detection step uses the depth information obtained in the 3D reconstruction process of multi-view geometry to set a threshold. For the depth information of each photo, if the depth information is less than the set threshold and there is no point cloud in the area, it is determined to be a texture-free area. 基于NeRF的三维重建步骤,基于NeRF对输入的图像进行处理,生成相应的点云,具体通过以下子步骤来实现:The NeRF-based 3D reconstruction step processes the input image based on NeRF to generate the corresponding point cloud, which is implemented through the following sub-steps: 图像输入子步骤:NeRF依赖于多视角图像输入,所述图像来自多个摄像机角度,覆盖目标场景的不同视角,以保证重建的完整性和准确性;对于无纹理区域,利用NeRF的强大神经网络来拟合颜色和几何特征;Image input sub-step: NeRF relies on multi-view image input, which comes from multiple camera angles and covers different perspectives of the target scene to ensure the integrity and accuracy of the reconstruction; for textureless areas, NeRF's powerful neural network is used to fit color and geometric features; 神经网络建模子步骤:NeRF使用一个MLP多层感知机神经网络来拟合场景的辐射场,所述神经网络会接受每个相机位姿和光线方向作为输入,并输出每个3D空间位置的颜色和密度值,允许NeRF生成连续的三维表示,即使是在无纹理的区域也能够重建其深度和几何信息;Neural network modeling sub-step: NeRF uses an MLP multi-layer perceptron neural network to fit the radiation field of the scene. The neural network accepts each camera pose and light direction as input and outputs color and density values for each 3D spatial position, allowing NeRF to generate a continuous 3D representation and reconstruct depth and geometry even in textureless areas. 体积渲染子步骤:NeRF通过体积渲染技术,将三维空间的密度和颜色值渲染回二维图像,并逐步优化其损失函数,生成与输入照片匹配的高质量渲染图像;Volume rendering sub-step: NeRF uses volume rendering technology to render the density and color values of the three-dimensional space back into a two-dimensional image, and gradually optimizes its loss function to generate a high-quality rendered image that matches the input photo; 点云提取子步骤:完成三维场景的重建后,可以从NeRF模型中提取密度较高的点作为点云数据,所述点代表场景中的物体边界或其他重要的几何特征;Point cloud extraction sub-step: After completing the reconstruction of the 3D scene, points with higher density can be extracted from the NeRF model as point cloud data, where the points represent the boundaries of objects or other important geometric features in the scene; 点云转换子步骤:NeRF输出的场景可以转化为三维点云,点云中的每个点由它的位置和颜色值表示;对于无纹理区域,NeRF将基于输入图片的空间信息拟合其几何特征,从而生成相应的点云;Point cloud conversion sub-step: The scene output by NeRF can be converted into a 3D point cloud, where each point is represented by its position and color value. For textureless areas, NeRF will fit its geometric features based on the spatial information of the input image to generate the corresponding point cloud. 点云融合步骤:将从NeRF和多视几何得到的点云进行对齐、校正和整合,确保每个点云都在同一坐标系中;Point cloud fusion step: align, correct and integrate the point clouds obtained from NeRF and multi-view geometry to ensure that each point cloud is in the same coordinate system; 点云分类步骤:基于深度学习的分类模型,将物体从点云中识别出来;分类后的每个点云对应于资产库中的特定物体类别。Point cloud classification step: Based on the deep learning classification model, objects are identified from the point cloud; each classified point cloud corresponds to a specific object category in the asset library. 2.如权利要求1所述的一种基于三维重建的物体识别方法,其特征在于,所述点云融合步骤进一步包括:2. The object recognition method based on three-dimensional reconstruction according to claim 1, wherein the point cloud fusion step further comprises: 获取相机位姿子步骤:在多视角几何重建中,每张照片都对应一个相机的位姿,包括相机的旋转和平移信息;所述位姿用于确定每张图片在场景中的空间位置;Obtaining the camera pose sub-step: In multi-view geometric reconstruction, each photo corresponds to a camera pose, including the camera's rotation and translation information; the pose is used to determine the spatial position of each picture in the scene; NeRF生成的点云子步骤:从NeRF中提取的点云代表场景中较为密集且经过优化的空间点分布,包含颜色和深度信息;NeRF generated point cloud sub-step: The point cloud extracted from NeRF represents a dense and optimized spatial point distribution in the scene, including color and depth information; 多视几何的点云子步骤:通过SFM从运动中恢复结构或MVS多视图立体匹配算法,生成一组基于不同角度的初步点云;Point cloud sub-step of multi-view geometry: generate a set of preliminary point clouds based on different angles through SFM to recover structure from motion or MVS multi-view stereo matching algorithm; 融合子步骤:通过迭代最近点ICP算法来优化,使得两组点云在三维空间中精确匹配。Fusion sub-step: Optimize by iterating the closest point ICP algorithm so that the two sets of point clouds are accurately matched in three-dimensional space. 3.如权利要求1所述的一种基于三维重建的物体识别方法,其特征在于,所述点云分类步骤进一步包括:3. The object recognition method based on three-dimensional reconstruction according to claim 1, wherein the point cloud classification step further comprises: 点云预处理子步骤:对融合后的点云进行预处理,所述预处理是去噪、下采样和点云归一化,以减少计算负担并提升分类的准确度;Point cloud preprocessing sub-step: preprocessing the fused point cloud, the preprocessing is denoising, downsampling and point cloud normalization, to reduce the computational burden and improve the classification accuracy; 特征提取子步骤:点云分类依赖于特征提取算法,将每个点云的几何信息和空间分布提取为高维特征向量,所述特征提取算法包括局部曲率、法向量估计以及基于深度学习的点云卷积神经网络;Feature extraction sub-step: Point cloud classification relies on feature extraction algorithms to extract the geometric information and spatial distribution of each point cloud into a high-dimensional feature vector. The feature extraction algorithms include local curvature, normal vector estimation, and point cloud convolutional neural networks based on deep learning. 分类模型训练子步骤:基于深度学习的分类模型PointNet和PointCNN,处理稀疏的、不规则的点云数据,并基于全局和局部特征对每个点云进行分类;Classification model training sub-step: The deep learning-based classification models PointNet and PointCNN process sparse and irregular point cloud data and classify each point cloud based on global and local features; 物体识别和分割子步骤:将待识别的物体从点云中分类出来;所述分类后的点云对应于资产库中的特定物体类别。Object recognition and segmentation sub-step: classifying the objects to be recognized from the point cloud; the classified point cloud corresponds to a specific object category in the asset library. 4.如权利要求3所述的一种基于三维重建的物体识别方法,其特征在于,所述点云分类步骤进一步包括点云均匀化步骤,具体分为以下子步骤:4. The object recognition method based on three-dimensional reconstruction according to claim 3, characterized in that the point cloud classification step further includes a point cloud homogenization step, which is specifically divided into the following sub-steps: 网格转换子步骤:使用Marching Cubes或Poisson Surface Reconstruction网格生成算法,将点云数据转换为三角网格表示,通过三角网格表示顶点的连接方式,更精确描述物体的表面结构;Mesh conversion sub-step: Use Marching Cubes or Poisson Surface Reconstruction mesh generation algorithm to convert point cloud data into triangular mesh representation, and use the triangular mesh to represent the connection mode of vertices to more accurately describe the surface structure of the object; 固定分辨率采样子步骤:在生成网格表示后,基于设定的分辨率对网格进行重新采样;通过均匀采样算法,生成新的点云数据,使得所述新的点云数据在空间中的分布更加均匀;Fixed resolution sampling sub-step: after generating the grid representation, re-sampling the grid based on the set resolution; generating new point cloud data through a uniform sampling algorithm, so that the new point cloud data is more evenly distributed in space; 点云均匀化子步骤:点云数据每个点的间隔大致相等,并能够在物体的不同区域中均匀覆盖,避免出现点云数据密度过于集中的问题。Point cloud homogenization sub-step: The intervals between each point in the point cloud data are roughly equal, and can be evenly covered in different areas of the object, avoiding the problem of excessive concentration of point cloud data density. 5.如权利要求4所述的一种基于三维重建的物体识别方法,其特征在于,所述点云分类步骤进一步包括基于PCA的物体分类步骤:5. The object recognition method based on three-dimensional reconstruction according to claim 4, characterized in that the point cloud classification step further comprises an object classification step based on PCA: PCA特征提取子步骤:对均匀化后的点云数据进行主成分分析 PCA,通过计算协方差矩阵并提取前几个主成分,选择提取前10个主成分,所述主成分能够描述物体的大部分几何特征;PCA feature extraction sub-step: perform principal component analysis PCA on the homogenized point cloud data, calculate the covariance matrix and extract the first few principal components, and select the first 10 principal components to extract. The principal components can describe most of the geometric features of the object; 对齐主成分子步骤:将提取的主成分与资产库中同类物体的主成分对齐,通过对齐第一个主成分来确保两组点云在同一个方向上进行比较,对齐的过程涉及刚体变换,使得物体在几何上对齐,便于后续的分类;Align principal components: Align the extracted principal components with the principal components of similar objects in the asset library. Aligning the first principal component ensures that the two sets of point clouds are compared in the same direction. The alignment process involves rigid body transformation, which makes the objects geometrically aligned for subsequent classification. 计算主成分间距离子步骤:完成对齐之后,计算当前物体与资产库中每个同类物体的主成分距离,所述主成分间距离是基于欧几里得距离度量方法来衡量两个物体在低维特征空间中的相似度,主成分间距离较小的物体被认为与当前物体具有更高的相似性;Calculate the principal component distance ion step: After the alignment is completed, calculate the principal component distance between the current object and each similar object in the asset library. The principal component distance is based on the Euclidean distance measurement method to measure the similarity of two objects in a low-dimensional feature space. Objects with smaller principal component distances are considered to have higher similarity with the current object; 返回最相似的物体子步骤:根据计算的主成分间距离,选择资产库中距离最近的物体,所述物体就是与当前物体最相似的分类结果。Return the most similar object sub-step: According to the calculated distance between principal components, select the object with the closest distance in the asset library, and the object is the classification result most similar to the current object. 6.一种基于三维重建的物体识别装置,其特征在于,所述装置包括:6. An object recognition device based on three-dimensional reconstruction, characterized in that the device comprises: 基于多视几何的三维重建模块,使用基于多视几何的三维重建算法,输入为多张场景图像,输出为场景的点云;The 3D reconstruction module based on multi-view geometry uses a 3D reconstruction algorithm based on multi-view geometry. The input is multiple scene images and the output is a point cloud of the scene. 无纹理区域检测模块,利用多视几何的三维重建过程中得到的深度信息,设定阈值,针对每张照片的深度信息,如果所述深度信息小于阈值,同时区域没有点云,则判定为无纹理区域;The texture-free area detection module uses the depth information obtained in the 3D reconstruction process of multi-view geometry to set a threshold. For the depth information of each photo, if the depth information is less than the threshold and there is no point cloud in the area, it is determined to be a texture-free area. 基于NeRF的三维重建模块,基于NeRF对输入的图像进行处理,生成相应的点云,具体通过以下子步骤来实现:The NeRF-based 3D reconstruction module processes the input image based on NeRF and generates the corresponding point cloud. This is achieved through the following sub-steps: 图像输入子模块:NeRF依赖于多视角图像输入,所述图像来自多个摄像机角度,覆盖目标场景的不同视角,以保证重建的完整性和准确性;对于无纹理区域,可以利用NeRF的强大神经网络来拟合颜色和几何特征;Image input submodule: NeRF relies on multi-view image input, which comes from multiple camera angles and covers different perspectives of the target scene to ensure the integrity and accuracy of reconstruction; for textureless areas, NeRF's powerful neural network can be used to fit color and geometric features; 神经网络建模子模块:NeRF使用一个MLP多层感知机神经网络来拟合场景的辐射场,所述神经网络会接受每个相机位姿和光线方向作为输入,并输出每个3D空间位置的颜色和密度值,允许NeRF生成连续的三维表示,即使是在无纹理的区域也能够重建其深度和几何信息;Neural network modeling submodule: NeRF uses an MLP multi-layer perceptron neural network to fit the radiation field of the scene. The neural network accepts each camera pose and light direction as input and outputs color and density values for each 3D spatial position, allowing NeRF to generate a continuous 3D representation and reconstruct depth and geometry even in textureless areas. 体积渲染子模块:NeRF通过体积渲染技术,将三维空间的密度和颜色值渲染回二维图像,并逐步优化其损失函数,生成与输入照片匹配的高质量渲染图像;Volume rendering submodule: NeRF uses volume rendering technology to render the density and color values of the three-dimensional space back into a two-dimensional image, and gradually optimizes its loss function to generate a high-quality rendered image that matches the input photo; 点云提取子模块:完成三维场景的重建后,可以从NeRF模型中提取密度较高的点作为点云数据,所述点代表场景中的物体边界或其他重要的几何特征;Point cloud extraction submodule: After completing the reconstruction of the 3D scene, high-density points can be extracted from the NeRF model as point cloud data, which represent the boundaries of objects or other important geometric features in the scene; 点云转换子模块:NeRF输出的场景可以转化为三维点云,点云中的每个点由它的位置和颜色值表示;对于无纹理区域,NeRF将基于输入图片的空间信息拟合其几何特征,从而生成相应的点云;Point cloud conversion submodule: The scene output by NeRF can be converted into a 3D point cloud, where each point is represented by its position and color value. For textureless areas, NeRF will fit its geometric features based on the spatial information of the input image to generate the corresponding point cloud. 点云融合模块:将从NeRF和多视几何得到的点云进行对齐、校正和整合,确保每个点云都在同一坐标系中;Point cloud fusion module: aligns, corrects and integrates the point clouds obtained from NeRF and multi-view geometry to ensure that each point cloud is in the same coordinate system; 点云分类模块:基于深度学习的分类模型,将物体从点云中识别出来;分类后的每个点云对应于资产库中的特定物体类别。Point cloud classification module: Based on the deep learning classification model, objects are identified from the point cloud; each classified point cloud corresponds to a specific object category in the asset library. 7.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时,实现权利要求1~5中任一所述的基于三维重建的物体识别方法的步骤。7. A computer-readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed by a processor, the steps of the object recognition method based on three-dimensional reconstruction described in any one of claims 1 to 5 are implemented. 8.一种计算机设备,其特征在于,包括:8. A computer device, comprising: 存储器,用于存储计算机程序;Memory for storing computer programs; 处理器,用于执行所述计算机程序以实现权利要求1~5中任一所述的基于三维重建的物体识别方法的步骤。A processor, configured to execute the computer program to implement the steps of the object recognition method based on three-dimensional reconstruction as described in any one of claims 1 to 5.
CN202411525920.5A 2024-10-30 2024-10-30 A method and device for object recognition based on three-dimensional reconstruction Pending CN119478919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411525920.5A CN119478919A (en) 2024-10-30 2024-10-30 A method and device for object recognition based on three-dimensional reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411525920.5A CN119478919A (en) 2024-10-30 2024-10-30 A method and device for object recognition based on three-dimensional reconstruction

Publications (1)

Publication Number Publication Date
CN119478919A true CN119478919A (en) 2025-02-18

Family

ID=94568954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411525920.5A Pending CN119478919A (en) 2024-10-30 2024-10-30 A method and device for object recognition based on three-dimensional reconstruction

Country Status (1)

Country Link
CN (1) CN119478919A (en)

Similar Documents

Publication Publication Date Title
Boukhayma et al. 3d hand shape and pose from images in the wild
CN117274756B (en) Method and device for fusion of two-dimensional image and point cloud based on multi-dimensional feature registration
CN110675487B (en) Three-dimensional face modeling and recognition method and device based on multi-angle two-dimensional face
CN115131245B (en) A point cloud completion method based on attention mechanism
US12380594B2 (en) Pose estimation method and apparatus
CN113538569B (en) Weak texture object pose estimation method and system
CN116583878A (en) Method and system for personalizing 3D head model deformations
Tabib et al. Learning-based hole detection in 3D point cloud towards hole filling
CN117115358B (en) Automatic digital person modeling method and device
CN119206006B (en) Three-dimensional model data processing method, device, equipment, medium and product
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
Niu et al. Overview of image-based 3D reconstruction technology
CN114022542A (en) A method of making 3D database based on 3D reconstruction
CN118397282B (en) Three-dimensional point cloud robustness component segmentation method based on semantic SAM large model
CN120070522A (en) Three-dimensional model reconstruction method based on point cloud data processing
CN117541537B (en) Space-time difference detection method and system based on all-scenic-spot cloud fusion technology
CN117351078A (en) Target size and 6D pose estimation method based on shape prior
CN119991937A (en) A single-view 3D human body reconstruction method based on Gaussian surface elements
CN119741419A (en) Training method and device for diffusion model for Gaussian primitive completion
CN119963766A (en) Method, device, equipment, medium and program product for three-dimensional reconstruction
CN118967913A (en) A neural radiance field rendering method based on straight line constraints
CN119006742A (en) Human body three-dimensional reconstruction method and system based on deep learning
Ortiz-Cayon et al. Automatic 3d car model alignment for mixed image-based rendering
CN119478919A (en) A method and device for object recognition based on three-dimensional reconstruction
CN116704136A (en) Reconstruction method and device of three-dimensional indoor scene, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination