CN110648299A

CN110648299A - Image processing method, image processing apparatus, and computer-readable storage medium

Info

Publication number: CN110648299A
Application number: CN201810670236.4A
Authority: CN
Inventors: 廖可; 张宇鹏; 王炜
Original assignee: Liguang Co
Current assignee: Liguang Co
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2020-01-03

Abstract

Embodiments of the present invention provide an image processing method, an apparatus, and a computer-readable storage medium, wherein the image processing method includes: acquiring a panoramic image and one or more partial images within the range of the panoramic image; acquiring panoramic semantics according to the panoramic image information, wherein the panoramic semantic information corresponds to a semantically divided area in the panoramic image; according to the panoramic semantic information and its corresponding semantically divided area, determine one or more of the one or more partial images in the one or more partial images The focus area is obtained, and detailed semantic information is obtained according to the determined focus area; image description information is obtained by using the panoramic semantic information and the detailed semantic information.

Description

Image processing method, image processing apparatus, and computer-readable storage medium

技术领域technical field

本申请涉及图像处理领域，尤其涉及一种图像处理方法、图像处理装置和计算机可读存储介质。The present application relates to the field of image processing, and in particular, to an image processing method, an image processing apparatus, and a computer-readable storage medium.

背景技术Background technique

多传感器成像系统由位于相同或不同位置的多个和/或多种传感器组成。在通过多传感器成像系统采集到图像或视频数据之后，可以对来自多传感器的多个图像或视频信息进行处理，以输出相应的图像处理结果。A multi-sensor imaging system consists of multiple and/or multiple sensors located in the same or different locations. After the image or video data is collected by the multi-sensor imaging system, multiple images or video information from the multi-sensors can be processed to output corresponding image processing results.

在现有技术中，当多传感器成像系统获取的图像包括全景图像和全景图像范围内的一个或多个局部图像时，一般会对所得到的全景图像和局部图像进行融合处理，并得到融合之后的全景融合图像。但是，仅仅对上述全景图像和局部图像进行简单的融合处理所得到的全景融合图像，并不能获得用户所期望的全部应用信息，例如相关语义信息、描述信息等。In the prior art, when an image acquired by a multi-sensor imaging system includes a panoramic image and one or more partial images within the range of the panoramic image, the obtained panoramic image and the partial image are generally fused, and after fusion is obtained panorama fusion image. However, the panoramic fusion image obtained by simply merging the above-mentioned panoramic image and the partial image cannot obtain all the application information expected by the user, such as relevant semantic information, description information, and the like.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题，根据本发明的一个方面，提供了一种图像处理方法，包括：获取全景图像和在所述全景图像范围内的一个或多个局部图像；据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；利用所述全景语义信息和所述细节语义信息得到图像描述信息。In order to solve the above technical problems, according to one aspect of the present invention, an image processing method is provided, which includes: acquiring a panoramic image and one or more partial images within the range of the panoramic image; obtaining panoramic semantics according to the panoramic image information, wherein the panoramic semantic information corresponds to a semantically divided area in the panoramic image; according to the panoramic semantic information and its corresponding semantically divided area, determine one or more of the one or more partial images in the one or more partial images The focus area is obtained, and detailed semantic information is obtained according to the determined focus area; image description information is obtained by using the panoramic semantic information and the detailed semantic information.

根据本发明的另一个方面，提供了一种图像处理装置，包括：获取单元，获取全景图像和在所述全景图像范围内的一个或多个局部图像；语义划分单元，根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；焦点区域获取单元，根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；描述单元，利用所述全景语义信息和所述细节语义信息得到图像描述信息。According to another aspect of the present invention, there is provided an image processing apparatus, comprising: an acquisition unit for acquiring a panoramic image and one or more partial images within the range of the panoramic image; and a semantic dividing unit for acquiring according to the panoramic image Panoramic semantic information, wherein the panoramic semantic information corresponds to the semantic division area in the panoramic image; the focus area acquisition unit, according to the panoramic semantic information and the corresponding semantic division area, in the one or more local One or more focus areas are determined in the image, and detailed semantic information is obtained according to the determined focus areas; the description unit is used to obtain image description information by using the panoramic semantic information and the detailed semantic information.

根据本发明的另一个方面，提供一种图像处理装置，包括：处理器；和存储器，在所述存储器中存储有计算机程序指令，其中，在所述计算机程序指令被所述处理器运行时，使得所述处理器执行以下步骤：获取全景图像和在所述全景图像范围内的一个或多个局部图像；根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；利用所述全景语义信息和所述细节语义信息得到图像描述信息。According to another aspect of the present invention, there is provided an image processing apparatus, comprising: a processor; and a memory in which computer program instructions are stored, wherein, when the computer program instructions are executed by the processor, causing the processor to perform the following steps: acquiring a panoramic image and one or more partial images within the range of the panoramic image; acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to the panoramic image Semantic division areas in the image; according to the panoramic semantic information and its corresponding semantic division areas, determine one or more focus areas in the one or more partial images, and obtain detailed semantic information according to the determined focus areas ; Obtain image description information by using the panoramic semantic information and the detailed semantic information.

根据本发明的另一个方面，提供一种计算机可读存储介质，其上存储有计算机程序指令，其中，所述计算机程序指令被处理器执行时实现以下步骤：获取全景图像和在所述全景图像范围内的一个或多个局部图像；根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；利用所述全景语义信息和所述细节语义信息得到图像描述信息。According to another aspect of the present invention, there is provided a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the following steps: acquiring a panoramic image and storing the panoramic image one or more partial images within the range; obtain panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to the semantic division area in the panoramic image; according to the panoramic semantic information and its corresponding semantics Divide regions, determine one or more focus regions in the one or more partial images, and obtain detailed semantic information according to the determined focus regions; and obtain image description information by using the panoramic semantic information and the detailed semantic information.

根据本发明的上述图像处理方法、图像处理装置和计算机可读存储介质，能够针对全景图像和在所述全景图像范围内的一个或多个局部图像分别获取全景语义信息和细节语义信息，并据此得到图像描述信息。可见，根据本发明上述方法、装置和计算机可读存储介质得到的图像描述信息能够兼顾全景图像关于场景描述的全景语义信息，和局部图像焦点区域的关于细节描述的细节语义信息，提高了图像描述的精确度，可以有效应用于自动驾驶、机器人交互等领域。According to the above-mentioned image processing method, image processing device and computer-readable storage medium of the present invention, panoramic semantic information and detailed semantic information can be obtained respectively for a panoramic image and one or more partial images within the range of the panoramic image, and according to This gets the image description information. It can be seen that the image description information obtained according to the above method, device and computer-readable storage medium of the present invention can take into account the panoramic semantic information about the scene description of the panoramic image and the detailed semantic information about the detailed description of the focal area of the local image, which improves the image description. It can be effectively used in the fields of autonomous driving, robot interaction and so on.

附图说明Description of drawings

通过结合附图对本发明的实施例进行详细描述，本发明的上述和其它目的、特征、优点将会变得更加清楚。The above and other objects, features, and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention in conjunction with the accompanying drawings.

图1示出本发明一个实施例的图像处理方法的流程图；1 shows a flowchart of an image processing method according to an embodiment of the present invention;

图2(a)示出根据本发明一个实施例的全景图像；图2(b)示出根据本发明一个实施例的红外局部图像，图2(c)示出根据本发明一个实施例将图2(a)中的全景图像和图2(b)中的红外局部图像融合后得到的全景融合图像；Fig. 2(a) shows a panoramic image according to an embodiment of the present invention; Fig. 2(b) shows a partial infrared image according to an embodiment of the present invention, and Fig. 2(c) shows a The panoramic fusion image obtained after the panoramic image in 2(a) and the infrared partial image in Fig. 2(b) are fused;

图3(a)示出根据本发明一个实施例的全景融合图像；图3(b)示出将图3(a)中的局部融合图像进行坐标变换得到的坐标变换后的局部融合图像；图3(c)示出图3(b)中经坐标变换后的局部融合图像中清晰区域1和模糊区域2的位置示意图；Fig. 3(a) shows a panoramic fusion image according to an embodiment of the present invention; Fig. 3(b) shows a coordinate-transformed local fusion image obtained by performing coordinate transformation on the local fusion image in Fig. 3(a); Fig. 3(c) shows a schematic diagram of the positions of the clear area 1 and the blurred area 2 in the local fusion image after coordinate transformation in FIG. 3(b);

图4(a)示出根据本发明一个实施例的全景融合图像；图4(b)示出将图4(a)中的局部融合图像进行坐标变换得到的坐标变换后的局部融合图像；图4(c)示出将图4(b)的坐标变换后的局部融合图像进行重新采样；图4(d)示出将图4(c)中重新采样后的局部融合图像进行坐标逆变换得到全景图像；Fig. 4(a) shows a panoramic fusion image according to an embodiment of the present invention; Fig. 4(b) shows a coordinate-transformed local fusion image obtained by performing coordinate transformation on the local fusion image in Fig. 4(a); Fig. 4(c) shows that the local fused image after the coordinate transformation of Fig. 4(b) is resampled; Fig. 4(d) shows that the resampled local fused image in Fig. panoramic image;

图5示出根据本发明一个实施例的全景图像的示意图；FIG. 5 shows a schematic diagram of a panoramic image according to an embodiment of the present invention;

图6示出根据本发明一个实施例的全景图像中局部图像和焦点区域的位置示意图；6 shows a schematic diagram of the positions of a partial image and a focus area in a panoramic image according to an embodiment of the present invention;

图7示出根据本发明一个实施例的图像处理装置的框图；FIG. 7 shows a block diagram of an image processing apparatus according to an embodiment of the present invention;

图8示出根据本发明一个实施例的图像处理装置的框图。FIG. 8 shows a block diagram of an image processing apparatus according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图来描述根据本发明实施例的图像处理方法、图像处理装置和计算机可读存储介质。在附图中，相同的参考标号自始至终表示相同的元件。应当理解：这里描述的实施例仅仅是说明性的，而不应被解释为限制本发明的范围。An image processing method, an image processing apparatus, and a computer-readable storage medium according to embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, the same reference numbers refer to the same elements throughout. It should be understood that the embodiments described herein are illustrative only and should not be construed as limiting the scope of the invention.

下面将参照图1描述根据本发明实施例的图像处理方法。本发明实施例的图像处理方法既可以应用于静态图像，也可以应用于随时间变化的视频中的视频帧等，在此不做限制。图1示出该图像处理方法100的流程图。An image processing method according to an embodiment of the present invention will be described below with reference to FIG. 1 . The image processing method in the embodiment of the present invention can be applied to a still image, a video frame in a video that changes over time, and the like, which is not limited herein. FIG. 1 shows a flowchart of the image processing method 100 .

如图1所示，在步骤S101中，获取全景图像和在所述全景图像范围内的一个或多个局部图像。As shown in FIG. 1, in step S101, a panoramic image and one or more partial images within the range of the panoramic image are acquired.

在本步骤中，可以利用多传感器系统获取所述全景图像和一个或多个局部图像。在由多传感器系统获取的一系列图像中，可以包括由多传感器成像系统中的全景传感器获取的全景图像，和由多传感器成像系统中的一个或多个局部传感器获取的在全景图像范围内的一个或多个局部图像。其中，全景图像可以由全景传感器通过广角手法对例如360度的场景图像信息进行拍摄而获取，并且可以进一步通过经纬度坐标系的转换被映射为二维图像。相应地，在全景图像所拍摄的场景范围内，还可以通过一个或多个局部传感器获取一个或多个局部图像。其中，局部传感器可以例如为：高清传感器、红外传感器、光场传感器、点云传感器、立体视觉传感器、激光传感器中的一种或多种。通过上述局部传感器，可以获取相应的例如：高清局部图像、红外局部图像、光场局部图像、点云局部图像、立体视觉局部图像、激光局部图像中的一个或多个。In this step, the panoramic image and one or more partial images may be acquired using a multi-sensor system. The series of images acquired by the multi-sensor system may include a panoramic image acquired by a panoramic sensor in the multi-sensor imaging system, and a range of images within the panoramic image acquired by one or more local sensors in the multi-sensor imaging system One or more partial images. Wherein, the panoramic image can be obtained by shooting, for example, 360-degree scene image information by a panoramic sensor through a wide-angle method, and can be further mapped into a two-dimensional image through transformation of a latitude and longitude coordinate system. Correspondingly, within the scope of the scene captured by the panoramic image, one or more partial images may also be acquired through one or more partial sensors. The local sensor may be, for example, one or more of a high-definition sensor, an infrared sensor, a light field sensor, a point cloud sensor, a stereo vision sensor, and a laser sensor. Through the above-mentioned local sensors, one or more of corresponding high-definition local images, infrared local images, light field local images, point cloud local images, stereoscopic vision local images, and laser local images can be acquired.

在通过多传感器系统获取到全景图像和在其场景范围内的一个或多个局部图像后，进一步地，还可以根据所获取的局部图像所在的位置，对所述一个或多个局部图像进行融合处理，得到局部融合图像，其中局部融合图像和局部图像一一对应。最后，可以将全景图像和经融合处理后的局部融合图像进行融合，可以得到全景融合图像。After the panoramic image and one or more partial images within its scene range are obtained through the multi-sensor system, further, the one or more partial images can be fused according to the location of the obtained partial image. processing to obtain a local fusion image, wherein the local fusion image and the local image are in one-to-one correspondence. Finally, the panoramic image and the fused local fused image can be fused to obtain a panoramic fused image.

图2示出根据本发明实施例的全景图像、局部图像和全景融合图像的示意图。具体地，图2(a)为本发明实施例中利用全景传感器获取的全景图像；图2(b)为利用红外传感器获取的红外局部图像，图2(c)为将图2(a)中的全景图像和图2(b)中的红外局部图像融合后得到的全景融合图像。如图2(c)所示，可以将图2(b)中的红外局部图像进行融合处理，并将处理后的局部融合图像融合至图2(a)的全景图像的中心区域。FIG. 2 shows a schematic diagram of a panoramic image, a partial image, and a panoramic fusion image according to an embodiment of the present invention. Specifically, Fig. 2(a) is a panoramic image obtained by using a panoramic sensor in an embodiment of the present invention; Fig. 2(b) is an infrared partial image obtained by using an infrared sensor, and Fig. 2(c) is a The panorama image obtained by fusing the panorama image with the infrared partial image in Fig. 2(b). As shown in Fig. 2(c), the infrared partial image in Fig. 2(b) can be fused, and the processed partial fusion image can be fused to the central area of the panoramic image in Fig. 2(a).

在本步骤中，可以分别获取独立的全景图像和一个或多个局部图像；也可以在初始阶段获取例如图2(c)所示的全景融合图像，并根据所述全景融合图像来处理得到分离的全景图像和局部图像，以供后续步骤处理。In this step, an independent panoramic image and one or more partial images can be obtained respectively; a panoramic fusion image such as shown in Figure 2(c) can also be obtained at the initial stage, and processed according to the panoramic fusion image to obtain the separation The panorama and partial images are processed in subsequent steps.

在一个示例中，当初始阶段所获取的图像为全景融合图像时，可以首先基于一个或多个局部融合图像在全景融合图像中所在的位置，从全景融合图像中获取一个或多个局部融合图像；随后再对局部融合图像进行处理，以分别得到局部图像和/或全景图像。在实际应用中，可选地，可以通过全景融合图像中所包含的局部融合图像的位置信息获取局部融合图像在全景融合图像中所在的位置，例如，可以通过全景融合图像的元数据(metadata)来获知所述位置信息，或者可以通过全景融合图像的图片文件中的相关描述来获知所述位置信息。在知晓了局部融合图像(或局部图像)在全景融合图像中所在的位置之后，可以从全景融合图像中分离得到局部融合图像。这里，所获取的局部融合图像一般是为了适应于全景图像的经纬坐标系，将局部图像进行了畸变处理的局部融合图像。因此，可选地，为了得到没有畸变的局部图像，在一个示例中，可以对所述一个或多个局部融合图像进行坐标变换，从而消除畸变以得到所述局部图像。在另一个示例中，也可以首先对局部融合图像进行坐标变换以消除畸变；随后针对坐标变换后的局部融合图像获取一个或多个图像相关的特征(例如可以从坐标变换后的局部融合图像的中心开始搜索，以获取诸如图像分辨率和/或聚焦信息等图像像素级的特征)；最后根据所获取的图像的特征，对坐标变换后的局部融合图像去除模糊区域，得到本步骤中所需的局部图像。In one example, when the image acquired in the initial stage is a panoramic fused image, one or more local fused images may be first obtained from the panoramic fused image based on the location of the one or more local fused images in the panoramic fused image ; and then process the local fused image to obtain a local image and/or a panoramic image, respectively. In practical applications, optionally, the position of the local fused image in the panoramic fused image can be obtained through the position information of the local fused image included in the panoramic fused image. For example, the metadata of the panoramic fused image can be obtained. to obtain the position information, or the position information can be obtained through the relevant description in the picture file of the panoramic fusion image. After knowing the location of the local fusion image (or the local image) in the panoramic fusion image, the local fusion image can be separated from the panoramic fusion image. Here, the acquired local fusion image is generally a local fusion image obtained by distorting the local image in order to adapt to the latitude and longitude coordinate system of the panoramic image. Therefore, optionally, in order to obtain a partial image without distortion, in one example, coordinate transformation may be performed on the one or more partial fusion images, so as to remove the distortion to obtain the partial image. In another example, coordinate transformation may also be performed on the locally fused image to eliminate distortion; then one or more image-related features (for example, can be obtained from the coordinate transformed locally fused image) Start searching at the center to obtain image pixel-level features such as image resolution and/or focus information); finally, according to the obtained image features, remove the blurred area from the coordinate-transformed local fusion image to obtain the required image in this step. local image.

图3示出了从全景融合图像中获取局部图像的示意图。图3(a)示出了根据本发明一个实施例的全景融合图像，图像中虚线框出的为局部融合图像的位置。图3(b)示出了将图3(a)中的局部融合图像进行坐标变换以消除畸变，所得到的坐标变换后的局部融合图像。进一步地，可以在图3(b)的图像中进行特征提取，获取诸如图像分辨率和/或聚焦信息等特征，并对坐标变换后的局部融合图像去除模糊区域，以得到局部图像。其中，局部融合图像的模糊区域例如可以出现在该图像内的某些线条或颜色的过渡区域，或该图像的边缘区域等。图3(c)示出图3(b)中经坐标变换后的局部融合图像中清晰区域1和模糊区域2的位置示意图。其中，中心方框内的区域1为清晰区域，两个方框嵌套的区域2为模糊区域。在一个示例中，可以利用所提取的图像特征对模糊区域2进行处理并去除，以使得最终获得的局部图像(未示出)足够清晰。Figure 3 shows a schematic diagram of obtaining a partial image from a panoramic fusion image. FIG. 3( a ) shows a panoramic fusion image according to an embodiment of the present invention, and the dotted line in the image is the position of the local fusion image. FIG. 3(b) shows the coordinate transformation of the local fusion image in FIG. 3(a) to eliminate distortion, and the obtained local fusion image after the coordinate transformation is obtained. Further, feature extraction can be performed in the image of FIG. 3(b) to obtain features such as image resolution and/or focus information, and to remove blurred areas from the coordinate transformed local fusion image to obtain a local image. Wherein, the blurred area of the locally fused image may, for example, appear in the transition area of certain lines or colors in the image, or the edge area of the image. FIG. 3( c ) shows a schematic diagram of the positions of the clear area 1 and the blurred area 2 in the local fusion image after coordinate transformation in FIG. 3( b ). Among them, the area 1 in the central box is a clear area, and the area 2 in which the two boxes are nested is a blurred area. In one example, the blurred area 2 may be processed and removed using the extracted image features, so that the final obtained partial image (not shown) is sufficiently clear.

可选地，当初始阶段所获取的图像为全景融合图像时，还可以基于全景融合图像和根据其所在位置获取的局部融合图像来获取全景图像。具体地，首先可以对所述一个或多个局部融合图像进行坐标变换以消除畸变；随后针对坐标变换后的局部融合图像获取一个或多个与图像相关的特征，例如可以从坐标变换后的局部融合图像的周围，获取诸如图像分辨率和/或聚焦信息等图像像素级的特征；接下来，考虑到最初获取全景图像和局部图像所使用的传感器不同，从而可以具有不同的图像分辨率、聚焦信息等图像特征，并导致融合后的全景融合图像和局部融合图像也会相应地具有不同的图像特征，因此可以利用所获取的图像相关的特征对坐标变换后的局部融合图像进行重新采样，以调整坐标变换后的局部融合图像，使具有与其周围的全景融合图像相同的特征(图像分辨率、聚焦信息等)；最后，对重新采样后的局部融合图像进行坐标逆变换(即与前述的坐标变换处理过程相反)，也就是说，对重新采样后的局部融合图像进行畸变处理后，将其投影回全景融合图像的坐标系中进行融合，以将全景融合图像中原来局部融合图像所在的区域，替换为具有与全景融合图像相同图像特征的处理后的局部融合图像，以得到全景图像。在本发明一个示例中，重新采样针对多传感器成像系统中不同的局部传感器具有不同的操作方式。例如，对高清传感器可以进行高清数据的像素重采样，对红外传感器可以进行红外数据的获取可见光处理，对光场传感器可以进行光场数据的分辨率和对焦信息平均化和调整，对点云传感器可以进行点云数据的投影和像素补充，对立体视觉传感器可以进行立体视觉数据的去除深度信息和分辨率调整等等。以上重新采样的处理过程和方法仅为示例，在实际应用中，可以采用任何相关的重新采样处理方法，在此不做限制。Optionally, when the image acquired in the initial stage is a panoramic fusion image, the panoramic image may also be acquired based on the panoramic fusion image and the local fusion image obtained according to its location. Specifically, firstly, coordinate transformation may be performed on the one or more local fused images to eliminate distortion; then, one or more image-related features may be obtained for the coordinate transformed local fused images, for example, from the coordinate transformed local fused images. Fusing the periphery of the image to obtain image pixel-level features such as image resolution and/or focus information; Therefore, the obtained image-related features can be used to resample the coordinate-transformed local fused image to Adjust the coordinate-transformed local fusion image to have the same features (image resolution, focus information, etc.) as the surrounding panoramic fusion image; finally, perform inverse coordinate transformation on the resampled local fusion image (that is, with the aforementioned coordinates The transformation process is reversed), that is to say, after the resampled local fusion image is distorted, it is projected back to the coordinate system of the panoramic fusion image for fusion, so as to fuse the area where the original local fusion image is located in the panoramic fusion image. , replaced with a processed local fusion image with the same image features as the panoramic fusion image to obtain a panoramic image. In one example of the present invention, resampling operates differently for different local sensors in a multi-sensor imaging system. For example, pixel resampling of high-definition data can be performed for high-definition sensors, infrared data acquisition and visible light processing can be performed for infrared sensors, resolution of light field data and focus information averaging and adjustment for light field sensors, and point cloud sensors. Projection and pixel supplementation of point cloud data can be performed, and depth information and resolution adjustment of stereo vision data can be performed for stereo vision sensors. The above resampling processing procedure and method are only examples, and in practical applications, any relevant resampling processing method may be adopted, which is not limited herein.

图4示出了从全景融合图像中获取全景图像的示意图。图4(a)示出了根据本发明一个实施例的全景融合图像，其中虚线框出的为局部融合图像的位置。图4(b)示出了将图4(a)中的局部融合图像进行坐标变换以消除畸变，以得到的坐标变换后的局部融合图像。进一步地，可以在图4(b)的图像周围进行特征提取，获取诸如图像分辨率和/或聚焦信息等图像特征。图4(c)示出了将图4(b)的坐标变换后的局部融合图像利用所提取的图像特征进行重新采样的示意图。图4(d)示出了将图4(c)中重新采样后的局部融合图像进行坐标逆变换以投影到图4(a)的全景融合图像的示意图，最终得到具有一致的图像特征(例如图像分辨率、聚焦信息)的全景图像。FIG. 4 shows a schematic diagram of obtaining a panoramic image from a panoramic fusion image. FIG. 4( a ) shows a panoramic fusion image according to an embodiment of the present invention, wherein the position of the local fusion image is framed by the dotted line. Fig. 4(b) shows the coordinate transformation of the locally fused image in Fig. 4(a) to eliminate distortion to obtain a locally fused image after coordinate transformation. Further, feature extraction can be performed around the image of FIG. 4(b) to obtain image features such as image resolution and/or focus information. FIG. 4( c ) shows a schematic diagram of resampling the local fused image after the coordinate transformation of FIG. 4( b ) using the extracted image features. Fig. 4(d) shows a schematic diagram of performing coordinate inverse transformation on the resampled local fused image in Fig. 4(c) to project it to the panoramic fused image in Fig. 4(a), and finally obtains images with consistent image features (such as image resolution, focus information).

在步骤S102中，根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域。In step S102, panoramic semantic information is obtained according to the panoramic image, wherein the panoramic semantic information corresponds to a semantically divided area in the panoramic image.

在本步骤中，可以对所述全景图像进行处理，以得到全景图像的一个或多个全景语义信息及其相应的语义划分区域。在一个示例中，可以利用图像识别等技术来获取全景语义信息和对应的语义划分区域所在的范围。可选地，可以根据所获取的全景语义信息来进一步获取图像的背景信息和/或场景描述信息等与全景图像相关的信息。In this step, the panoramic image may be processed to obtain one or more panoramic semantic information of the panoramic image and its corresponding semantic division area. In an example, technologies such as image recognition may be used to obtain panoramic semantic information and the range where the corresponding semantically divided regions are located. Optionally, information related to the panoramic image, such as background information and/or scene description information of the image, may be further obtained according to the obtained panoramic semantic information.

图5示出根据本发明一个实施例所获取的全景图像的示意图。根据图5所示的示例，全景图像中的全景语义信息可以根据图像识别内容，得到诸如天空、地面、人等信息，其分别可以对应于图像识别得到的不同区域范围，而据此所获得的背景信息和/或场景描述信息可以包括例如：户外、拥挤的人群、面对面交谈等。FIG. 5 shows a schematic diagram of a panoramic image acquired according to an embodiment of the present invention. According to the example shown in FIG. 5 , the panoramic semantic information in the panoramic image can obtain information such as sky, ground, and people according to the image recognition content, which can respectively correspond to different regions obtained by image recognition, and the obtained Background information and/or scene description information may include, for example: outdoors, crowded people, face-to-face conversations, and the like.

在步骤S103中，可以根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息。可选地，可以在所述全景语义信息中选择与所述一个或多个局部图像相关的一个或多个焦点语义信息，根据所选择的焦点语义信息及其对应的语义划分区域，在所述一个或多个局部图像中通过神经网络、图像信息处理等技术获得相应的感兴趣区域，以作为所述一个或多个焦点区域，并根据所确定的焦点区域获取相应的细节语义信息。当然，上述具体操作方式仅为示例，在实际应用中，所确定的焦点区域也可以不完全在所述局部图像中，例如，焦点区域可以与局部图像区域仅存在部分重合，或焦点区域可以与局部图像区域完全不重合等；相应地，所选择的针对焦点区域的焦点语义信息及后续获取的细节语义信息也可以仅部分与局部图像相关，或完全不与局部图像相关。此时，本步骤中，可以仅根据所述全景语义信息及其相应的语义划分区域确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息。In step S103, one or more focus regions may be determined in the one or more partial images according to the panoramic semantic information and its corresponding semantic division regions, and detailed semantic information may be obtained according to the determined focus regions. Optionally, one or more focus semantic information related to the one or more partial images may be selected from the panoramic semantic information, and according to the selected focus semantic information and its corresponding semantic division area, in the Corresponding regions of interest are obtained in one or more partial images through techniques such as neural network and image information processing as the one or more focus regions, and corresponding detailed semantic information is obtained according to the determined focus regions. Of course, the above specific operation methods are only examples. In practical applications, the determined focus area may not be completely in the partial image. For example, the focus area may only partially overlap with the partial image area, or the focus area may overlap with the partial image area. The partial image areas do not overlap at all, etc. Correspondingly, the selected focus semantic information for the focus area and the subsequently acquired detailed semantic information may be only partially related to the partial image, or not related to the partial image at all. At this time, in this step, one or more focus regions may be determined only according to the panoramic semantic information and its corresponding semantic division regions, and detailed semantic information may be acquired according to the determined focus regions.

图6示出根据本发明一个实施例的全景图像，获取局部图像和焦点区域的位置示意图。根据图6所示的示例，全景图像中的虚线框内为由局部传感器获取的局部图像经融合后的局部融合图像所在的区域范围，图6下部的红外图像为在从局部融合图像转换的局部图像中确定的焦点区域的放大示意图。根据图6中所放大的焦点区域的红外图像，可以获取相应的细节语义信息例如为：人的情绪为高兴。FIG. 6 is a schematic diagram showing a panoramic image according to an embodiment of the present invention, and a schematic diagram of acquiring a partial image and a position of a focal area. According to the example shown in Figure 6, the dotted frame in the panoramic image is the area where the local fusion image obtained by the local sensor is fused, and the infrared image in the lower part of Figure 6 is in the local fusion image converted from the local fusion image. A zoomed-in schematic of the focal region identified in the image. According to the magnified infrared image of the focus area in FIG. 6 , corresponding detailed semantic information can be obtained, for example, the emotion of a person is happy.

在步骤S104中，利用所述全景语义信息和所述细节语义信息得到图像描述信息。在本步骤中，可以将所述全景语义信息和所述细节语义信息进行融合，并结合权重得到所述图像描述信息。可选地，可以将用于描述场景的全景语义信息E，与用于描述细节的细节语义信息S进行融合，并基于不同的模型结构(例如权值平均、贝叶斯估计、数据融合神经网络、强化学习等)，得到最终的图像描述信息。In step S104, image description information is obtained by using the panoramic semantic information and the detailed semantic information. In this step, the panoramic semantic information and the detailed semantic information may be fused, and the image description information may be obtained by combining the weights. Optionally, the panoramic semantic information E used to describe the scene can be fused with the detailed semantic information S used to describe the details, and based on different model structures (such as weight averaging, Bayesian estimation, data fusion neural network) , reinforcement learning, etc.) to obtain the final image description information.

如前所述，根据本发明实施例所述的全景图像和局部图像可以均为静态图像，也可以分别为视频中的一帧视频帧。当全景图像和局部图像为视频中的视频帧时，可以分别为在同一时刻i所采集的全景视频中的一帧全景图像，和一部或多部局部视频中的一帧或多帧局部图像。当全景图像和局部图像分别为视频中在同一时刻采集的一帧视频帧时，所述利用所述全景语义信息和所述细节语义信息得到图像描述信息可以包括：将不同时刻的所述全景语义信息和所述细节语义信息分别进行融合，并依时序处理，以得到随时间变化的所述图像描述信息。也就是说，针对视频的图像描述信息可以是随时间变化而逐渐演进变化的信息，而并非固定不变的。此时，图像描述信息中的全景语义信息可以表示为E_i，细节语义信息可以表示为S_i，i为时间。相应地，随i变化的全景语义信息的时间序列可以为E_i-2，E_i-1，E_i，E_i+1…，随i变化的细节语义信息的时间序列可以为S_i-2，S_i-1，S_i，S_i+1…。例如，图像描述信息随时间变化的示例可以为：两个人在户外交谈，在i-1时刻是高兴的，在i时刻开始了争吵等等。As mentioned above, the panoramic image and the partial image according to the embodiment of the present invention may both be static images, or may be a video frame in a video respectively. When the panoramic image and the partial image are video frames in the video, they can be one frame of panoramic image in the panoramic video collected at the same time i, and one or more frames of partial images in one or more partial videos. . When the panoramic image and the partial image are respectively a video frame collected at the same time in the video, the obtaining the image description information by using the panoramic semantic information and the detailed semantic information may include: combining the panoramic semantic information at different times The information and the detailed semantic information are respectively fused and processed according to time series to obtain the image description information that changes over time. That is to say, the image description information for the video may be information that changes gradually over time, rather than being fixed. At this time, the panoramic semantic information in the image description information can be represented as E _i , the detailed semantic information can be represented as S _i , and i is time. Correspondingly, the time series of panoramic semantic information changing with i can be E _i-2 , E _i-1 , E _i , E _i+1 . . . The time series of detailed semantic information changing with i can be S _i-2 , S _i-1 , S _i , S _i+1 . . . For example, an example of image description information changing over time may be: two people are talking outdoors, are happy at time i-1, start arguing at time i, and so on.

在一个示例中，利用权值平均的图像描述信息可以表示为：In an example, the image description information averaged by weights can be expressed as:

R＝W_siS_i+W_eiE_i+W_s(i-1)S_i-1+W_e(i-1)E_i-1+…R=W _si S _i +W _ei E _i +W _s(i-1) S _i-1 +W _e(i-1) E _i-1 +…

其中R为图像语义加权平均信息(即加权的图像描述信息)，W_si为细节语义信息权值，W_ei为全景语义信息权值，i为时间。Among them, R is the weighted average information of image semantics (ie, weighted image description information), _Wsi is the weight of detail semantic information, _Wei is the weight of panoramic semantic information, and i is time.

在另一个示例中，基于贝叶斯估计的图像描述信息可以表示为：In another example, the image description information based on Bayesian estimation can be expressed as:

P_r＝minL(P(P_si,P_ei))P _r =minL(P(P _si ,P _ei ))

其中P_si为细节语义信息贝叶斯估计，P_ei为全景语义信息贝叶斯估计，i为时间，P(P_si，P_ei)为联合分布函数，P_r为联合分布函数P的最小似然估计，即图像描述信息融合值。where P _si is the Bayesian estimation of detail semantic information, P _ei is the Bayesian estimation of panoramic semantic information, i is time, P(P _si , P _ei ) is the joint distribution function, and P _r is the minimum likelihood of the joint distribution function P Natural estimation, that is, the fusion value of image description information.

在另一个示例中，基于强化学习的图像描述信息可以表示为：In another example, the image description information based on reinforcement learning can be expressed as:

turple(S,A,R,P)＝turple((S_si,S_ei),A,(R_si,R_ei),P)Turple(S,A,R,P)=turple((S _si ,S _ei ),A,(R _si ,R _ei ),P)

其中turple(x)为强化学习四元素系统；S、A、R为输入；P为输出；其中S为环境信息或状态，可以分为细节语义信息S_si和全景语义信息S_ei；A为状态下的行为或者动作；R为每个状态下每个动作的奖励，可以分为细节语义信息带来的奖励R_si和全景语义信息带来的奖励R_ei；P为当前状态下的图像描述信息或其相对应的行为函数。Wherein turtle(x) is the reinforcement learning four-element system; S, A, R are the input; P is the output; where S is the environmental information or state, which can be divided into detailed semantic information S _si and panoramic semantic information S _ei ; A is the state The behavior or action under ; R is the reward of each action in each state, which can be divided into the reward R _si brought by the detailed semantic information and the reward Re _ei brought by the panoramic semantic information; P is the image description information in the current state or its corresponding behavioral function.

根据本发明的上述图像处理方法，能够针对全景图像和在所述全景图像范围内的一个或多个局部图像分别获取全景语义信息和细节语义信息，并据此得到图像描述信息。可见，根据本发明上述方法得到的图像描述信息能够兼顾全景图像关于场景描述的全景语义信息，和局部图像焦点区域的关于细节描述的细节语义信息，提高了图像描述的精确度，可以有效应用于自动驾驶、机器人交互等领域。According to the above-mentioned image processing method of the present invention, panoramic semantic information and detailed semantic information can be obtained respectively for a panoramic image and one or more partial images within the range of the panoramic image, and image description information can be obtained accordingly. It can be seen that the image description information obtained according to the above method of the present invention can take into account the panoramic semantic information about the scene description of the panoramic image and the detailed semantic information about the detailed description of the focal area of the local image, which improves the accuracy of the image description and can be effectively applied to Autonomous driving, robot interaction and other fields.

例如，在机器人交互领域，在现有技术中一般只能根据全景图像得到相应的全景语义信息，从而仅进行关于场景的描述，而无法有针对性地对于焦点区域进行相应的细节语义分析并作出反应。而根据本发明实施例的上述方法，不仅能够得到针对场景描述的全景语义信息，还能够进一步得到针对焦点区域的细节语义信息，并且可以根据需要变换所述焦点区域，从而可以兼顾场景和场景中的不同焦点的描述，有助于在场的机器人进行更加精确和有针对性的交流和反应。For example, in the field of robot interaction, in the prior art, the corresponding panoramic semantic information can only be obtained according to the panoramic image, so that only the description of the scene can be performed, and the corresponding detailed semantic analysis of the focal area cannot be targeted and made. reaction. According to the above method of the embodiment of the present invention, not only the panoramic semantic information for the scene description, but also the detailed semantic information for the focal area can be obtained, and the focal area can be transformed as needed, so that the scene and the scene can be taken into account. The description of the different focal points helps the presence of robots to communicate and respond more precisely and in a targeted manner.

下面，参照图7来描述根据本发明实施例的图像处理装置。图7示出了根据本发明实施例的图像处理装置700的框图。本发明实施例的图像处理装置既可以应用于静态图像，也可以应用于随时间变化的视频中的视频帧，在此不做限制。如图7所示，图像处理装置700包括获取单元710、语义划分单元720、焦点区域获取单元730和描述单元740。除了这些单元以外，装置700还可以包括其他部件，然而，由于这些部件与本发明实施例的内容无关，因此在这里省略其图示和描述。此外，由于根据本发明实施例的图像处理装置700执行的下述操作的具体细节与在上文中参照图1-图6描述的细节相同，因此在这里为了避免重复而省略对相同细节的重复描述。Hereinafter, an image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. 7 . FIG. 7 shows a block diagram of an image processing apparatus 700 according to an embodiment of the present invention. The image processing apparatus in the embodiment of the present invention may be applied to a still image or a video frame in a video that changes over time, which is not limited herein. As shown in FIG. 7 , the image processing apparatus 700 includes an acquisition unit 710 , a semantic division unit 720 , a focus area acquisition unit 730 and a description unit 740 . In addition to these units, the apparatus 700 may also include other components, however, since these components are not related to the content of the embodiments of the present invention, their illustration and description are omitted here. In addition, since the specific details of the following operations performed by the image processing apparatus 700 according to the embodiment of the present invention are the same as those described above with reference to FIGS. 1 to 6 , repeated descriptions of the same details are omitted here to avoid repetition. .

图7中的图像处理装置700的获取单元710配置为获取全景图像和在所述全景图像范围内的一个或多个局部图像。The acquisition unit 710 of the image processing apparatus 700 in FIG. 7 is configured to acquire a panoramic image and one or more partial images within the range of the panoramic image.

获取单元710可以利用多传感器系统获取所述全景图像和一个或多个局部图像。在由多传感器系统获取的一系列图像中，可以包括由多传感器成像系统中的全景传感器获取的全景图像，和由多传感器成像系统中的一个或多个局部传感器获取的在全景图像范围内的一个或多个局部图像。其中，全景图像可以由全景传感器通过广角手法对例如360度的场景图像信息进行拍摄而获取，并且可以进一步通过经纬度坐标系的转换被映射为二维图像。相应地，在全景图像所拍摄的场景范围内，还可以通过一个或多个局部传感器获取一个或多个局部图像。其中，局部传感器可以例如为：高清传感器、红外传感器、光场传感器、点云传感器、立体视觉传感器、激光传感器中的一种或多种。通过上述局部传感器，可以获取相应的例如：高清局部图像、红外局部图像、光场局部图像、点云局部图像、立体视觉局部图像、激光局部图像中的一个或多个。The acquisition unit 710 may acquire the panoramic image and one or more partial images using a multi-sensor system. The series of images acquired by the multi-sensor system may include a panoramic image acquired by a panoramic sensor in the multi-sensor imaging system, and a range of images within the panoramic image acquired by one or more local sensors in the multi-sensor imaging system One or more partial images. Wherein, the panoramic image can be obtained by shooting, for example, 360-degree scene image information by a panoramic sensor through a wide-angle method, and can be further mapped into a two-dimensional image through transformation of a latitude and longitude coordinate system. Correspondingly, within the scope of the scene captured by the panoramic image, one or more partial images may also be acquired through one or more partial sensors. The local sensor may be, for example, one or more of a high-definition sensor, an infrared sensor, a light field sensor, a point cloud sensor, a stereo vision sensor, and a laser sensor. Through the above-mentioned local sensors, one or more of corresponding high-definition local images, infrared local images, light field local images, point cloud local images, stereoscopic vision local images, and laser local images can be acquired.

在具体操作过程中，获取单元710可以分别获取独立的全景图像和一个或多个局部图像；也可以在初始阶段获取例如图2(c)所示的全景融合图像，并根据所述全景融合图像来处理得到分离的全景图像和局部图像，以供后续步骤使用。During the specific operation, the acquiring unit 710 can acquire an independent panoramic image and one or more partial images respectively; it can also acquire, for example, a panoramic fusion image as shown in FIG. to process the separated panoramic and partial images for use in subsequent steps.

在一个示例中，当初始阶段所获取的图像为全景融合图像时，可以首先基于一个或多个局部融合图像在全景融合图像中所在的位置，从全景融合图像中获取一个或多个局部融合图像；随后再对局部融合图像进行处理，以分别得到局部图像和/或全景图像。在实际应用中，可选地，可以通过全景融合图像中所包含的局部融合图像的位置信息获取局部融合图像在全景融合图像中所在的位置，例如，可以通过全景融合图像的元数据(metadata)来获知所述位置信息，或者可以通过全景融合图像的图片文件中的相关描述来获知所述位置信息。在知晓了局部融合图像(或局部图像)在全景融合图像中所在的位置之后，可以从全景融合图像中分离得到局部融合图像。这里，所获取的局部融合图像一般是为了适应于全景图像的经纬坐标系，将局部图像进行了畸变处理的局部融合图像。因此，可选地，为了得到没有畸变的局部图像，在一个示例中，可以对所述一个或多个局部融合图像进行坐标变换，从而消除畸变以得到所述局部图像。在另一个示例中，也可以首先对局部融合图像进行坐标变换以消除畸变；随后针对坐标变换后的局部融合图像获取一个或多个图像相关的特征(例如可以从坐标变换后的局部融合图像的中心开始搜索，以获取诸如图像分辨率和/或聚焦信息等图像像素级的特征)；最后根据所获取的图像的特征，对坐标变换后的局部融合图像去除模糊区域，得到所需的局部图像。In one example, when the image acquired in the initial stage is a panoramic fused image, one or more local fused images may be first obtained from the panoramic fused image based on the location of the one or more local fused images in the panoramic fused image ; and then process the local fused image to obtain a local image and/or a panoramic image, respectively. In practical applications, optionally, the position of the local fused image in the panoramic fused image can be obtained through the position information of the local fused image included in the panoramic fused image. For example, the metadata of the panoramic fused image can be obtained. to obtain the position information, or the position information can be obtained through the relevant description in the picture file of the panoramic fusion image. After knowing the location of the local fusion image (or the local image) in the panoramic fusion image, the local fusion image can be separated from the panoramic fusion image. Here, the acquired local fusion image is generally a local fusion image obtained by distorting the local image in order to adapt to the latitude and longitude coordinate system of the panoramic image. Therefore, optionally, in order to obtain a partial image without distortion, in one example, coordinate transformation may be performed on the one or more partial fusion images, so as to remove the distortion to obtain the partial image. In another example, coordinate transformation may also be performed on the locally fused image to eliminate distortion; then one or more image-related features (for example, can be obtained from the coordinate transformed locally fused image) The center starts to search to obtain image pixel-level features such as image resolution and/or focus information); finally, according to the acquired image features, the blurred area is removed from the coordinate-transformed local fusion image to obtain the desired local image. .

语义划分单元720根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域。The semantic division unit 720 obtains panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to the semantic division area in the panoramic image.

语义划分单元720可以对所述全景图像进行处理，以得到全景图像的一个或多个全景语义信息及其相应的语义划分区域。在一个示例中，可以利用图像识别等技术来获取全景语义信息和对应的语义划分区域所在的范围。可选地，可以根据所获取的全景语义信息来进一步获取图像的背景信息和/或场景描述信息等与全景图像相关的信息。The semantic division unit 720 may process the panoramic image to obtain one or more panoramic semantic information of the panoramic image and its corresponding semantic division area. In an example, technologies such as image recognition may be used to obtain panoramic semantic information and the range where the corresponding semantically divided regions are located. Optionally, information related to the panoramic image, such as background information and/or scene description information of the image, may be further obtained according to the obtained panoramic semantic information.

焦点区域获取单元730可以根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息。可选地，可以在所述全景语义信息中选择与所述一个或多个局部图像相关的一个或多个焦点语义信息，根据所选择的焦点语义信息及其对应的语义划分区域，在所述一个或多个局部图像中通过神经网络、图像信息处理等技术获得相应的感兴趣区域，以作为所述一个或多个焦点区域，并根据所确定的焦点区域获取相应的细节语义信息。当然，上述具体操作方式仅为示例，在实际应用中，所确定的焦点区域也可以不完全在所述局部图像中，例如，焦点区域可以与局部图像区域仅存在部分重合，或焦点区域可以与局部图像区域完全不重合等；相应地，所选择的针对焦点区域的焦点语义信息及后续获取的细节语义信息也可以仅部分与局部图像相关，或完全不与局部图像相关。此时，焦点区域获取单元730可以仅根据所述全景语义信息及其相应的语义划分区域确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息。The focal area acquiring unit 730 may determine one or more focal areas in the one or more partial images according to the panoramic semantic information and its corresponding semantic division areas, and acquire detailed semantic information according to the determined focal areas. Optionally, one or more focus semantic information related to the one or more partial images may be selected from the panoramic semantic information, and according to the selected focus semantic information and its corresponding semantic division area, in the Corresponding regions of interest are obtained in one or more partial images through techniques such as neural network and image information processing as the one or more focus regions, and corresponding detailed semantic information is obtained according to the determined focus regions. Of course, the above specific operation methods are only examples. In practical applications, the determined focus area may not be completely in the partial image. For example, the focus area may only partially overlap with the partial image area, or the focus area may overlap with the partial image area. The partial image areas do not overlap at all, etc. Correspondingly, the selected focus semantic information for the focus area and the subsequently acquired detailed semantic information may be only partially related to the partial image, or not related to the partial image at all. At this time, the focal area acquiring unit 730 may determine one or more focal areas only according to the panoramic semantic information and its corresponding semantic division areas, and acquire detailed semantic information according to the determined focal areas.

图6示出根据本发明一个实施例的全景图像所获取的局部图像和焦点区域的示意图。根据图6所示的示例，全景图像中的虚线框内为由局部传感器获取的局部图像经融合后的局部融合图像所在的区域范围，图6下部的红外图像为在从局部融合图像转换的局部图像中确定的焦点区域的放大示意图。根据图6中所放大的焦点区域的红外图像，可以获取相应的细节语义信息例如为：人的情绪为高兴。FIG. 6 shows a schematic diagram of a partial image and a focal area acquired by a panoramic image according to an embodiment of the present invention. According to the example shown in Figure 6, the dotted frame in the panoramic image is the area where the local fusion image obtained by the local sensor is fused, and the infrared image in the lower part of Figure 6 is in the local fusion image converted from the local fusion image. A zoomed-in schematic of the focal region identified in the image. According to the magnified infrared image of the focus area in FIG. 6 , corresponding detailed semantic information can be obtained, for example, the emotion of a person is happy.

描述单元740利用所述全景语义信息和所述细节语义信息得到图像描述信息。描述单元740可以将所述全景语义信息和所述细节语义信息进行融合，并结合权重得到所述图像描述信息。可选地，可以将用于描述场景的全景语义信息E，与用于描述细节的细节语义信息S进行融合，并基于不同的模型结构(例如权值平均、贝叶斯估计、数据融合神经网络、强化学习等)，得到最终的图像描述信息。The description unit 740 obtains image description information by using the panoramic semantic information and the detailed semantic information. The description unit 740 may fuse the panoramic semantic information and the detailed semantic information, and obtain the image description information by combining the weights. Optionally, the panoramic semantic information E used to describe the scene can be fused with the detailed semantic information S used to describe the details, and based on different model structures (such as weight averaging, Bayesian estimation, data fusion neural network) , reinforcement learning, etc.) to obtain the final image description information.

如前所述，根据本发明实施例所述的全景图像和局部图像可以均为静态图像，也可以分别为视频中的一帧视频帧。当全景图像和局部图像均为视频中的视频帧时，可以分别为在同一时刻i所采集的全景视频中的一帧全景图像，和在采集的一部或多部局部视频中的一帧或多帧局部图像。当全景图像和局部图像分别为视频中在同一时刻采集的一帧视频帧时，所述利用所述全景语义信息和所述细节语义信息得到图像描述信息可以包括：将不同时刻的所述全景语义信息和所述细节语义信息分别进行融合，并依时序处理，以得到随时间变化的所述图像描述信息。也就是说，针对视频的图像描述信息可以是随时间变化而逐渐演进变化的信息，而并非固定不变的。此时，图像描述信息中的全景语义信息可以表示为E_i，细节语义信息可以表示为S_i，i为时间。相应地，随i变化的全景语义信息的时间序列可以为E_i-2，E_i-1，E_i，E_i+1…，随i变化的细节语义信息的时间序列可以为S_i-2，S_i-1，S_i，S_i+1…。在一个示例中，图像描述信息随时间变化的结果可以为：两个人在户外交谈，在i-1时刻是高兴的，在i时刻开始了争吵等等。As mentioned above, the panoramic image and the partial image according to the embodiment of the present invention may both be static images, or may be a video frame in a video respectively. When both the panoramic image and the partial image are video frames in the video, they can be one frame of panoramic image in the panoramic video collected at the same moment i, and one frame or one frame of the collected one or more partial videos. Multi-frame partial images. When the panoramic image and the partial image are respectively a frame of video frames collected at the same time in the video, the obtaining the image description information by using the panoramic semantic information and the detailed semantic information may include: combining the panoramic semantic information at different times The information and the detailed semantic information are respectively fused and processed according to time series to obtain the image description information that changes over time. That is to say, the image description information for the video may be information that changes gradually over time, rather than being fixed. At this time, the panoramic semantic information in the image description information can be represented as E _i , the detailed semantic information can be represented as S _i , and i is time. Correspondingly, the time series of panoramic semantic information changing with i can be E _i-2 , E _i-1 , E _i , E _i+1 . . . The time series of detailed semantic information changing with i can be S _i-2 , S _i-1 , S _i , S _i+1 . . . In one example, the result of the time-varying image description information may be: two people talking outdoors, happy at time i-1, quarreling at time i, and so on.

P_r＝minL(P(P_si,P_ei))P _r =minL(P(P _si ,P _ei ))

根据本发明的上述图像处理装置，能够针对全景图像和在所述全景图像范围内的一个或多个局部图像分别获取全景语义信息和细节语义信息，并据此得到图像描述信息。可见，根据本发明上述方法得到的图像描述信息能够兼顾全景图像关于场景描述的全景语义信息，和局部图像焦点区域的关于细节描述的细节语义信息，提高了图像描述的精确度，可以有效应用于自动驾驶、机器人交互等领域。According to the above-mentioned image processing apparatus of the present invention, panoramic semantic information and detailed semantic information can be obtained respectively for a panoramic image and one or more partial images within the range of the panoramic image, and image description information can be obtained accordingly. It can be seen that the image description information obtained according to the above method of the present invention can take into account the panoramic semantic information about the scene description of the panoramic image and the detailed semantic information about the detailed description of the focal area of the local image, which improves the accuracy of the image description and can be effectively applied to Autonomous driving, robot interaction and other fields.

下面，参照图8来描述根据本发明实施例的图像处理装置。图8示出了根据本发明实施例的图像处理装置800的框图。如图8所示，该装置800可以是计算机或服务器。Hereinafter, an image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. 8 . FIG. 8 shows a block diagram of an image processing apparatus 800 according to an embodiment of the present invention. As shown in FIG. 8, the apparatus 800 may be a computer or a server.

如图8所示，图像处理装置800包括一个或多个处理器810以及存储器820，当然，除此之外，图像处理装置800还可能包括多传感器成像系统以及输出装置(未示出)等，这些组件可以通过总线系统和/或其它形式的连接机构互连。应当注意，图8所示的图像处理装置800的组件和结构只是示例性的，而非限制性的，根据需要，图像处理装置800也可以具有其他组件和结构。As shown in FIG. 8 , the image processing apparatus 800 includes one or more processors 810 and a memory 820. Of course, in addition to this, the image processing apparatus 800 may also include a multi-sensor imaging system and an output device (not shown), etc., These components may be interconnected by bus systems and/or other forms of connection mechanisms. It should be noted that the components and structures of the image processing apparatus 800 shown in FIG. 8 are only exemplary and non-limiting, and the image processing apparatus 800 may also have other components and structures as required.

处理器810可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元，并且可以利用存储器820中所存储的计算机程序指令以执行期望的功能，可以包括：获取全景图像和在所述全景图像范围内的一个或多个局部图像；根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；利用所述全景语义信息和所述细节语义信息得到图像描述信息。Processor 810 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may utilize computer program instructions stored in memory 820 to perform desired functions, which may include: Obtain a panoramic image and one or more partial images within the range of the panoramic image; obtain panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to a semantically divided area in the panoramic image; the panoramic semantic information and its corresponding semantic division area, determine one or more focal areas in the one or more partial images, and obtain detailed semantic information according to the determined focal area; use the panoramic semantic information and all the The detailed semantic information is described to obtain the image description information.

存储器820可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器810可以运行所述程序指令，以实现上文所述的本发明的实施例的图像处理装置的功能以及/或者其它期望的功能，并且/或者可以执行根据本发明实施例的图像处理方法。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据。Memory 820 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 810 may execute the program instructions to implement the functions of the image processing apparatus of the embodiments of the present invention described above and/or other desired functions, and/or the image processing method according to the embodiment of the present invention may be performed. Various application programs and various data may also be stored in the computer-readable storage medium.

下面，描述根据本发明实施例的计算机可读存储介质，其上存储有计算机程序指令，其中，所述计算机程序指令被处理器执行时实现以下步骤：获取全景图像和在所述全景图像范围内的一个或多个局部图像；根据所述全景图像获取全景语义信息，其中，所述全景语义信息对应于所述全景图像中的语义划分区域；根据所述全景语义信息及其相应的语义划分区域，在所述一个或多个局部图像中确定一个或多个焦点区域，并根据所确定的焦点区域获取细节语义信息；利用所述全景语义信息和所述细节语义信息得到图像描述信息。In the following, a computer-readable storage medium according to an embodiment of the present invention is described, on which computer program instructions are stored, wherein, when the computer program instructions are executed by a processor, the following steps are implemented: acquiring a panoramic image and within the range of the panoramic image one or more partial images; obtain panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to a semantically divided area in the panoramic image; according to the panoramic semantic information and its corresponding semantically divided area , determine one or more focus areas in the one or more partial images, and acquire detailed semantic information according to the determined focus areas; obtain image description information by using the panoramic semantic information and the detailed semantic information.

当然，上述的具体实施例仅是例子而非限制，且本领域技术人员可以根据本发明的构思从上述分开描述的各个实施例中合并和组合一些步骤和装置来实现本发明的效果，这种合并和组合而成的实施例也被包括在本发明中，在此不一一描述这种合并和组合。Of course, the above-mentioned specific embodiments are only examples rather than limitations, and those skilled in the art can combine and combine some steps and devices from the above-mentioned separately described embodiments according to the concept of the present invention to achieve the effect of the present invention. Combinations and combinations of embodiments are also included in the present invention, and such combinations and combinations are not individually described herein.

注意，在本发明中提及的优点、优势、效果等仅是示例而非限制，不能认为这些优点、优势、效果等是本发明的各个实施例必须具备的。另外，上述发明的具体细节仅是为了示例的作用和便于理解的作用，而非限制，上述细节并不限制本发明为必须采用上述具体的细节来实现。Note that the advantages, advantages, effects, etc. mentioned in the present invention are only examples and not limitations, and these advantages, advantages, effects, etc. should not be considered as necessarily possessed by each embodiment of the present invention. In addition, the specific details of the above-mentioned invention are only for the purpose of example and easy understanding, but not for limitation, and the above-mentioned details do not limit the present invention to be realized by the above-mentioned specific details.

本发明中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的，可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇，指“包括但不限于”，且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”，且可与其互换使用，除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”，且可与其互换使用。The block diagrams of the devices, apparatus, apparatuses, and systems involved in the present invention are only illustrative examples and are not intended to require or imply that the connections, arrangements, and configurations must be in the manner shown in the block diagrams. As those skilled in the art will appreciate, these means, apparatuses, apparatuses, systems may be connected, arranged, configured in any manner. Words such as "including", "including", "having" and the like are open-ended words meaning "including but not limited to" and are used interchangeably therewith. As used herein, the words "or" and "and" refer to and are used interchangeably with the word "and/or" unless the context clearly dictates otherwise. As used herein, the word "such as" refers to and is used interchangeably with the phrase "such as but not limited to".

本发明中的步骤流程图以及以上方法描述仅作为例示性的例子并且不意图要求或暗示必须按照给出的顺序进行各个实施例的步骤。如本领域技术人员将认识到的，可以按任意顺序进行以上实施例中的步骤的顺序。诸如“其后”、“然后”、“接下来”等等的词语不意图限制步骤的顺序；这些词语仅用于引导读者通读这些方法的描述。此外，例如使用冠词“一个”、“一”或者“该”对于单数的要素的任何引用不被解释为将该要素限制为单数。The flowcharts of steps in the present invention and the above method descriptions are merely illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As those skilled in the art will recognize, the sequence of steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are merely used to guide the reader through the description of the methods. In addition, any reference to an element in the singular, eg, using the articles "a," "an," or "the," should not be construed as limiting that element to the singular.

另外，本文中的各个实施例中的步骤和装置并非仅限定于某个实施例中实行，事实上，可以根据本发明的概念来结合本文中的各个实施例中相关的部分步骤和部分装置以构思新的实施例，而这些新的实施例也包括在本发明的范围内。In addition, the steps and devices in the various embodiments herein are not limited to be implemented in a certain embodiment. In fact, some steps and devices in the various embodiments herein can be combined according to the concept of the present invention to achieve New embodiments are contemplated and included within the scope of the present invention.

以上所述的方法的各个操作可以通过能够进行相应的功能的任何适当的手段而进行。该手段可以包括各种硬件和/或软件组件和/或模块，包括但不限于电路、专用集成电路(ASIC)或处理器。The various operations of the methods described above may be performed by any suitable means capable of performing the corresponding function. The means may include various hardware and/or software components and/or modules, including but not limited to circuits, application specific integrated circuits (ASICs) or processors.

可以利用被设计用于进行在此所述的功能的通用处理器、数字信号处理器(DSP)、ASIC、场可编程门阵列信号(FPGA)或其他可编程逻辑器件(PLD)、离散门或晶体管逻辑、离散的硬件组件或者其任意组合而实现或进行所述的各个例示的逻辑块、模块和电路。通用处理器可以是微处理器，但是作为替换，该处理器可以是任何商业上可获得的处理器、控制器、微控制器或状态机。处理器还可以实现为计算设备的组合，例如DSP和微处理器的组合，多个微处理器、与DSP核协作的一个或多个微处理器或任何其他这样的配置。A general purpose processor, digital signal processor (DSP), ASIC, field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or The various illustrated logic blocks, modules and circuits described are implemented or carried out in transistor logic, discrete hardware components, or any combination thereof. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in cooperation with a DSP core, or any other such configuration.

结合本发明描述的方法或算法的步骤可以直接嵌入在硬件中、处理器执行的软件模块中或者这两种的组合中。软件模块可以存在于任何形式的有形存储介质中。可以使用的存储介质的一些例子包括随机存取存储器(RAM)、只读存储器(ROM)、快闪存储器、EPROM存储器、EEPROM存储器、寄存器、硬碟、可移动碟、CD-ROM等。存储介质可以耦接到处理器以便该处理器可以从该存储介质读取信息以及向该存储介质写信息。在替换方式中，存储介质可以与处理器是整体的。软件模块可以是单个指令或者许多指令，并且可以分布在几个不同的代码段上、不同的程序之间以及跨过多个存储介质。The steps of a method or algorithm described in connection with the present invention may be directly embedded in hardware, in a software module executed by a processor, or in a combination of the two. A software module may exist in any form of tangible storage medium. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disks, removable disks, CD-ROMs, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral with the processor. A software module may be a single instruction or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media.

在此发明的方法包括用于实现所述的方法的一个或多个动作。方法和/或动作可以彼此互换而不脱离权利要求的范围。换句话说，除非指定了动作的具体顺序，否则可以修改具体动作的顺序和/或使用而不脱离权利要求的范围。The methods of the invention herein include one or more acts for implementing the described methods. The methods and/or actions may be interchanged with each other without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.

所述的功能可以按硬件、软件、固件或其任意组合而实现。如果以软件实现，功能可以作为一个或多个指令存储在切实的计算机可读介质上。存储介质可以是可以由计算机访问的任何可用的切实介质。通过例子而不是限制，这样的计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其他光碟存储、磁碟存储或其他磁存储器件或者可以用于携带或存储指令或数据结构形式的期望的程序代码并且可以由计算机访问的任何其他切实介质。如在此使用的，碟(disk)和盘(disc)包括紧凑盘(CD)、激光盘、光盘、数字通用盘(DVD)、软碟和蓝光盘，其中碟通常磁地再现数据，而盘利用激光光学地再现数据。The functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions on a tangible computer-readable medium. A storage medium can be any available tangible medium that can be accessed by a computer. By way of example and not limitation, such computer readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices or may be used to carry or store desired in the form of instructions or data structures program code and any other tangible medium that can be accessed by a computer. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc, and blu-ray disc, where a disc typically reproduces data magnetically, while a disc The data is reproduced optically with a laser.

因此，计算机程序产品可以进行在此给出的操作。例如，这样的计算机程序产品可以是具有有形存储(和/或编码)在其上的指令的计算机可读的有形介质，该指令可由一个或多个处理器执行以进行在此所述的操作。计算机程序产品可以包括包装的材料。Accordingly, a computer program product may perform the operations set forth herein. For example, such a computer program product may be a tangible computer-readable medium having instructions physically stored (and/or encoded) thereon, the instructions executable by one or more processors to perform the operations described herein. The computer program product may include packaging materials.

软件或指令也可以通过传输介质而传输。例如，可以使用诸如同轴电缆、光纤光缆、双绞线、数字订户线(DSL)或诸如红外、无线电或微波的无线技术的传输介质从网站、服务器或者其他远程源传输软件。Software or instructions may also be transmitted over a transmission medium. For example, software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, or microwave.

此外，用于进行在此所述的方法和技术的模块和/或其他适当的手段可以在适当时由用户终端和/或基站下载和/或其他方式获得。例如，这样的设备可以耦接到服务器以促进用于进行在此所述的方法的手段的传送。或者，在此所述的各种方法可以经由存储部件(例如RAM、ROM、诸如CD或软碟等的物理存储介质)提供，以便用户终端和/或基站可以在耦接到该设备或者向该设备提供存储部件时获得各种方法。此外，可以利用用于将在此所述的方法和技术提供给设备的任何其他适当的技术。Furthermore, modules and/or other suitable means for carrying out the methods and techniques described herein may be downloaded and/or otherwise obtained by user terminals and/or base stations as appropriate. For example, such a device may be coupled to a server to facilitate the transfer of means for carrying out the methods described herein. Alternatively, the various methods described herein may be provided via storage means (eg, RAM, ROM, physical storage media such as CDs or floppy disks, etc.) so that user terminals and/or base stations may Various methods are obtained when the device provides storage components. Furthermore, any other suitable techniques for providing the methods and techniques described herein to a device may be utilized.

其他例子和实现方式在本发明和所附权利要求的范围和精神内。例如，由于软件的本质，以上所述的功能可以使用由处理器、硬件、固件、硬连线或这些的任意的组合执行的软件实现。实现功能的特征也可以物理地位于各个位置，包括被分发以便功能的部分在不同的物理位置处实现。而且，如在此使用的，包括在权利要求中使用的，在以“至少一个”开始的项的列举中使用的“或”指示分离的列举，以便例如“A、B或C的至少一个”的列举意味着A或B或C，或AB或AC或BC，或ABC(即A和B和C)。此外，措辞“示例的”不意味着描述的例子是优选的或者比其他例子更好。Other examples and implementations are within the scope and spirit of the invention and appended claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hardwiring, or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed so that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, the use of "or" in the recitation of items beginning with "at least one" indicates separate recitations, such as "at least one of A, B, or C." An enumeration means A or B or C, or AB or AC or BC, or ABC (ie A and B and C). Furthermore, the word "exemplary" does not imply that the described example is preferred or better than other examples.

可以不脱离由所附权利要求定义的教导的技术而进行对在此所述的技术的各种改变、替换和更改。此外，本发明的权利要求的范围不限于以上所述的处理、机器、制造、事件的组成、手段、方法和动作的具体方面。可以利用与在此所述的相应方面进行基本相同的功能或者实现基本相同的结果的当前存在的或者稍后要开发的处理、机器、制造、事件的组成、手段、方法或动作。因而，所附权利要求包括在其范围内的这样的处理、机器、制造、事件的组成、手段、方法或动作。Various changes, substitutions and alterations to the techniques described herein can be made without departing from the techniques taught by the appended claims. Furthermore, the scope of the claims of the present invention is not limited to the specific aspects of the process, machine, manufacture, composition of events, means, methods and acts described above. A currently existing or later-to-be-developed process, machine, manufacture, composition, means, method, or act of performing substantially the same function or achieving substantially the same results as the corresponding aspects described herein may be utilized. Accordingly, the appended claims include within their scope such processes, machines, manufacture, compositions of events, means, methods, or acts.

提供所发明的方面的以上描述以使本领域的任何技术人员能够做出或者使用本发明。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的，并且在此定义的一般原理可以应用于其他方面而不脱离本发明的范围。因此，本发明不意图被限制到在此示出的方面，而是按照与在此发明的原理和新颖的特征一致的最宽范围。The above description of the inventive aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features of the invention herein.

为了例示和描述的目的已经给出了以上描述。此外，此描述不意图将本发明的实施例限制到在此发明的形式。尽管以上已经讨论了多个示例方面和实施例，但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for the purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the invention to the forms of invention herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

1. An image processing method comprising:

acquiring a panoramic image and one or more local images within the panoramic image;

acquiring panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic division areas in the panoramic image;

determining one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and acquiring detailed semantic information according to the determined focus areas;

and obtaining image description information by using the panoramic semantic information and the detail semantic information.

2. The method of claim 1, wherein the acquiring the panoramic image and the one or more local images within the panoramic image comprises:

acquiring one or more local fusion images from a panoramic fusion image based on the positions of the one or more local fusion images in the panoramic fusion image, wherein the panoramic fusion image is obtained by fusing the panoramic image and the one or more local images, and the local images are fused and correspond to the local fusion images in the panoramic fusion image one to one;

and processing the local fusion image to obtain the panoramic image and/or the local image.

3. The method of claim 2, wherein the processing the locally fused image to obtain the panoramic image and/or the local image comprises:

and carrying out coordinate transformation on the one or more local fusion images to obtain the local images.

4. The method of claim 2, wherein the processing the locally fused image to obtain the panoramic image and/or the local image comprises:

performing coordinate transformation on the one or more locally fused images;

resampling the local fusion image after coordinate transformation;

and carrying out coordinate inverse transformation on the local fusion image after resampling so as to obtain the panoramic image.

5. The method of claim 1, wherein the obtaining panoramic semantic information from the panoramic image further comprises:

and acquiring background information and/or scene description information of the panoramic image according to the panoramic semantic information of the panoramic image and the corresponding semantic division area.

6. The method of claim 1, wherein the determining one or more focal regions in the one or more local images according to the panoramic semantic information and its corresponding semantic zoning regions comprises:

and selecting one or more pieces of focus semantic information related to the local images from the panoramic semantic information, and determining one or more focus areas in the one or more local images according to the selected focus semantic information and the corresponding semantic division areas.

7. The method of claim 1, wherein the deriving image description information using the panorama semantic information and the detail semantic information comprises:

and fusing the panoramic semantic information and the detail semantic information to obtain the image description information.

8. The method of claim 1, wherein,

the panoramic image and the local image are respectively a frame of video frame collected at the same time in the video.

9. The method of claim 8, wherein when the panoramic image and the local image are respectively a frame of video frames captured at the same time in a video, the obtaining image description information by using the panoramic semantic information and the detail semantic information comprises:

and respectively fusing the panoramic semantic information and the detail semantic information at different moments, and processing according to a time sequence to obtain the image description information which changes along with time.

10. The method of any one of claims 1-9,

the panoramic image is acquired by a panoramic sensor in a multi-sensor imaging system;

the one or more local images are acquired by one or more local sensors in a multi-sensor imaging system.

11. An image processing apparatus comprising:

an acquisition unit that acquires a panoramic image and one or more partial images within the panoramic image;

the semantic dividing unit acquires panoramic semantic information according to the panoramic image, wherein the panoramic semantic information corresponds to semantic dividing areas in the panoramic image;

a focus area obtaining unit, which determines one or more focus areas in the one or more local images according to the panoramic semantic information and the corresponding semantic division areas thereof, and obtains detail semantic information according to the determined focus areas;

and the description unit is used for obtaining image description information by utilizing the panoramic semantic information and the detail semantic information.

12. An image processing apparatus comprising:

a processor;

and a memory having computer program instructions stored therein,

wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:

13. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of: