CN111476275A

CN111476275A - Image recognition-based target detection method, server and storage medium

Info

Publication number: CN111476275A
Application number: CN202010185440.4A
Authority: CN
Inventors: 付美蓉
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2020-07-31

Abstract

The invention discloses a target detection method based on image recognition, which is applied to a server. The method includes receiving an image to be detected uploaded by a client, extracting a corresponding image feature map, and inputting the image feature map into a target extraction model to output first image data. , judging whether the first image data contains at least two first preset target types and a first preset number of first target frames, and if the conditions are met, judge the positional relationship between the first target frames, from Find the corresponding judgment result in the database, and if the judgment result is the first judgment result, judge whether the first target frame contains the second target frame, and if so, identify and judge whether the first preset data corresponding to the second target frame is Corresponding to the second preset data in the database, and feedback information is generated according to the analysis result and fed back to the client. The present invention can replace the manual judgment of whether the rescue situation of the photo is true, and improve the judgment efficiency.

Description

Image recognition-based target detection method, server and storage medium

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种基于图片识别的目标检测方法、服务器及存储介质。The invention relates to the technical field of data processing, and in particular, to a target detection method, server and storage medium based on image recognition.

背景技术Background technique

保险公司对于购买了车险的用户，会赠送对应的救援服务，并委托救援商为用户提供救援服务。当救援商为用户提供救援服务后，为了向保险公司证明救援的真实性，而非假冒行为，因此救援商在完成救援后需要对现场进行拍照取证，所拍摄的照片中需要显示故障车放置在拖车上并上传给服务商，通过服务商人工判断是否真实救援。Insurance companies will provide corresponding rescue services to users who have purchased auto insurance, and entrust rescue providers to provide rescue services for users. After the rescue provider provides rescue services for users, in order to prove the authenticity of the rescue to the insurance company, rather than counterfeiting, the rescue provider needs to take pictures of the scene to collect evidence after completing the rescue. On the trailer and upload it to the service provider, the service provider manually judges whether it is a real rescue.

但是通过肉眼来判断照片中的救援是否属实的方式不仅存在效率低下，同时人工判断存在误差较大的情况，导致判断结果不准确。因此如何提高识别照片真伪的效率成为了亟需解决的技术问题。However, the method of judging whether the rescue in the photo is true by the naked eye is not only inefficient, but also has large errors in manual judgment, resulting in inaccurate judgment results. Therefore, how to improve the efficiency of recognizing the authenticity of photos has become a technical problem that needs to be solved urgently.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于提供一种基于图片识别的目标检测方法、服务器及存储介质，旨在如何提高识别照片真伪的效率的问题。The main purpose of the present invention is to provide a target detection method, server and storage medium based on picture recognition, aiming at the problem of how to improve the efficiency of identifying the authenticity of pictures.

为实现上述目的，本发明提供的一种基于图片识别的目标检测方法，应用于服务器，该方法包括：In order to achieve the above purpose, a target detection method based on image recognition provided by the present invention is applied to a server, and the method includes:

接收步骤：接收客户端上传的待检测图像，将所述待检测图像输入预先训练的图像数据特征提取模型，得到与所述待检测图像对应的图像特征图；Receiving step: receiving the image to be detected uploaded by the client, inputting the image to be detected into a pre-trained image data feature extraction model to obtain an image feature map corresponding to the image to be detected;

第一判断步骤：将所述得到的图像特征图输入预先训练的目标提取模型，输出第一图像数据，判断所述第一图像数据中是否包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量；The first judgment step: inputting the obtained image feature map into a pre-trained target extraction model, outputting first image data, and judging whether the first image data includes at least two first targets of the first preset target type frame, and the number of first target frames of each of the preset types is a first preset number;

第二判断步骤：若所述第一图像数据中包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量，则分别获取各所述第一目标框的位置坐标，并基于预设计算规则判断各所述第一目标框之间的位置关系；The second judgment step: if the first image data includes at least two first target frames of the first preset target type, and the number of the first target frames of each of the preset types is the first preset quantity , the position coordinates of each of the first target frames are obtained respectively, and the positional relationship between each of the first target frames is determined based on a preset calculation rule;

第三判断步骤：根据所述位置关系，从数据库中预先建立的位置关系与判断结果之间的映射关系表中找出对应的判断结果，其中，所述判断结果包括第一判断结果及第二判断结果，所述第一判断结果表示第一图像数据对应信息的真实性为待定，所述第二判断结果表示第一图像数据对应信息的真实性为不属实；及The third judgment step: according to the positional relationship, find out the corresponding judgment result from the mapping relationship table between the positional relationship and judgment result pre-established in the database, wherein the judgment result includes the first judgment result and the second judgment result. Judgment results, the first judgment result indicates that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result indicates that the authenticity of the information corresponding to the first image data is not true; and

反馈步骤：若所述判断结果为第一判断结果，则分别判断所述第一目标框中是否包含第二目标框，若包含所述第二目标框，则分别识别所述第二目标框对应的第一预设数据，分析第一预设数据与预先存储在数据库中的第二预设数据是否对应，并根据分析结果生成反馈信息反馈至所述客户端，其中，所述分析结果包括属实及不属实。Feedback step: if the judgment result is the first judgment result, judge whether the first target frame includes a second target frame, and if the second target frame is included, identify the corresponding second target frame the first preset data, analyze whether the first preset data corresponds to the second preset data pre-stored in the database, and generate feedback information based on the analysis results to feed back to the client, wherein the analysis results include true and untrue.

优选地，所述目标提取模型为SSD模型，所述判断所述第一图像数据中是否包含至少两种第一预设目标类型的第一目标框包括：Preferably, the target extraction model is an SSD model, and the judging whether the first image data contains at least two first target frames of the first preset target types includes:

基于所述SSD模型分别为所述图像特征图中的每一个像素点生成对应的默认框，并获取各所述默认框在图像特征图中的位置坐标及对应于不同第一预设目标类型的概率评分，并将各所述默认框的概率评分中的最大值设置为初级置信度；Based on the SSD model, a corresponding default frame is generated for each pixel in the image feature map, and the position coordinates of each of the default frames in the image feature map and corresponding to different first preset target types are obtained. probability score, and the maximum value in the probability score of each of the default boxes is set as the primary confidence;

对所述初级置信度对应的默认框按照概率评分从大到小进行排序，以所述概率评分最大值对应的默认框为起始点，依次获取预设数量的所述默认框作为目标候选框，基于各所述目标候选框的位置坐标进行包围盒回归分析，得到对应于各目标候选框的区域大小；Sorting the default boxes corresponding to the primary confidence level according to the probability score from large to small, taking the default box corresponding to the maximum probability score as a starting point, and sequentially obtaining a preset number of the default boxes as target candidate boxes, Perform bounding box regression analysis based on the position coordinates of each of the target candidate frames to obtain the area size corresponding to each target candidate frame;

对各所述目标候选框的概率评分进行softxmax分类，得到各所述目标候选框对应于不同预设目标类型分类的目标置信度；及Perform softxmax classification on the probability score of each of the target candidate frames to obtain target confidence levels corresponding to different preset target types for each of the target candidate frames; and

基于非极大值抑制算法，获取第三预设数量的iou(M，b)高于预设阈值的目标候选框作为第一目标框，其中，M表示概率评分最大值对应的默认框，b表示图像特征图中除默认框M之外的其他默认框，iou(M，b)表示默认框M与默认框b之间的重叠度。Based on the non-maximum value suppression algorithm, a third preset number of target candidate frames whose iou(M, b) is higher than the preset threshold are obtained as the first target frame, where M represents the default frame corresponding to the maximum probability score, b represents other default boxes except the default box M in the image feature map, and iou(M, b) represents the degree of overlap between the default box M and the default box b.

优选地，所述目标提取模型的训练过程包括：Preferably, the training process of the target extraction model includes:

获取图像特征图样本，基于所述目标提取模型对所述图像特征图样本中的每个像素点分别生成对应的默认框样本，并获取各默认框样本在该图像特征图样本中的坐标位置，以及对应于不同第一预设目标类型的概率评分；Obtaining an image feature map sample, generating a corresponding default frame sample for each pixel in the image feature map sample based on the target extraction model, and obtaining the coordinate position of each default frame sample in the image feature map sample, and probability scores corresponding to different first preset target types;

基于每个默认框样本的所述位置坐标和概率评分，分别计算各默认框样本的softmax分类损失和包围盒回归损失之和；及Calculate the sum of the softmax classification loss and the bounding box regression loss for each default box sample, respectively, based on the location coordinates and probability score for each default box sample; and

对所述softmax分类损失和包围盒回归损失之和按照从大到小进行排序，以所述softmax分类损失和包围盒回归损失之和最小者对应的默认框样本为起始点，依次获取预设数量的所述默认框样本，计算所述预设数量的默认框样本的损失函数，并将计算出的所述预设数量的默认框样本的损失函数，在所述目标提取模型中反向传播，以对所述目标提取模型的各层网络的权重值进行更新，训练得到该目标提取模型。Sort the sum of the softmax classification loss and the bounding box regression loss in descending order, take the default box sample corresponding to the smallest sum of the softmax classification loss and the bounding box regression loss as the starting point, and obtain the preset number in turn the default frame samples of The target extraction model is obtained by training by updating the weight values of each layer network of the target extraction model.

优选地，所述损失函数通过以下公式计算：Preferably, the loss function is calculated by the following formula:

其中，L_conf(x，c)为softmax分类损失，L_loc(x，l，g)为包围盒回归损失，K＝|f_k|*|f_k|*α，|f_k|为最大图像特征图的尺寸，α为权重值，x为默认框，c为默认框的类别信息，l为默认框的位置信息，g为默认框的标定区域结果。Among them, L _conf (x, c) is the softmax classification loss, L _loc (x, l, g) is the bounding box regression loss, K=|f _k |*|f _k |*α, |f _k | is the largest image The size of the feature map, α is the weight value, x is the default box, c is the category information of the default box, l is the location information of the default box, and g is the calibration area result of the default box.

优选地，所述反馈步骤还包括：Preferably, the feedback step further includes:

若所述分析结果为不属实，则将所述待检测图像进行直方图均衡化处理得到第二图像数据，将所述第二图像数据调整至预设角度后重新输入所述接收步骤中的图像特征提取模型。If the analysis result is not true, perform histogram equalization on the image to be detected to obtain second image data, adjust the second image data to a preset angle and then re-input the image in the receiving step Feature extraction model.

为实现上述目的，本发明还进一步提供一种服务器，所述服务器包括存储器和处理器，所述存储器上存储有基于图片识别的目标检测程序，所述基于图片识别的目标检测程序被所述处理器执行时实现如下步骤：In order to achieve the above object, the present invention further provides a server, the server includes a memory and a processor, and the memory stores a target detection program based on picture recognition, and the target detection program based on picture recognition is processed by the The following steps are implemented when the device is executed:

基于所述SSD模型分别为所述图像特征图中每一个像素点生成对应的默认框，并获取各所述默认框在图像特征图中的位置坐标及对应于不同第一预设目标类型的概率评分，并将各所述默认框的概率评分中的最大值设置为初级置信度；Based on the SSD model, a corresponding default frame is generated for each pixel in the image feature map, and the position coordinates of each default frame in the image feature map and the probability corresponding to different first preset target types are obtained. score, and set the maximum value in the probability score of each of the default boxes as the primary confidence;

对所述初级置信度对应的默认框按照概率评分从大到小进行排序，以所述概率评分最大值对应的默认框为起始点，依次获取第二预设数量的所述默认框作为目标候选框，基于各所述目标候选框的位置坐标进行包围盒回归分析，得到对应于各目标候选框的区域大小；Sort the default boxes corresponding to the primary confidence levels in descending order of probability score, take the default box corresponding to the maximum probability score as the starting point, and sequentially obtain a second preset number of the default boxes as target candidates frame, and perform bounding box regression analysis based on the position coordinates of each target candidate frame to obtain the area size corresponding to each target candidate frame;

为实现上述目的，本发明进一步提供一种计算机可读存储介质，所述计算机可读存储介质上存储有基于图片识别的目标检测程序，所述基于图片识别的目标检测程序可被一个或者多个处理器执行，以实现如上所述的基于图片识别的目标检测方法的步骤。In order to achieve the above object, the present invention further provides a computer-readable storage medium on which a target detection program based on picture recognition is stored, and the target detection program based on picture recognition can be stored by one or more The processor executes the steps of the above-mentioned image recognition-based target detection method.

本发明提出的基于图片识别的目标检测方法、服务器及存储介质，通过接收客户端上传的待检测图像，将待检测图像输入图像数据特征提取模型得到图像特征图，将图像特征图输入目标提取模型输出第一图像数据，判断第一图像数据中是否包含至少两种第一预设目标类型的第一目标框，且每种第一目标框的数量为第一预设数量，若满足条件则分别获取各第一目标框的位置坐标，并判断各第一目标框之间的位置关系，根据所述位置关系，从数据库中的映射关系表中找出对应的判断结果，若判断结果为第一判断结果，则分别判断第一目标框中是否包含第二目标框，若包含则分别识别第二目标框对应的第一预设数据，判断第一预设数据是否与数据库中的第二预设数据对应，并根据分析结果生成反馈信息反馈至客户端。本发明能够代替人工通过判断照片判断救援情况是否属实，提高判断效率，同时减少人工判断导致的误差。The target detection method, server and storage medium based on image recognition proposed by the present invention receive the image to be detected uploaded by the client, input the to-be-detected image into an image data feature extraction model to obtain an image feature map, and input the image feature map into the target extraction model Outputting the first image data, judging whether the first image data contains at least two first target frames of the first preset target type, and the number of each first target frame is the first preset quantity, and if the conditions are met, respectively Obtain the position coordinates of each first target frame, and judge the positional relationship between each first target frame, and find out the corresponding judgment result from the mapping relationship table in the database according to the positional relationship, if the judgment result is the first The judgment result is to judge whether the first target frame contains the second target frame, and if so, identify the first preset data corresponding to the second target frame, and judge whether the first preset data is the same as the second preset data in the database. The corresponding data is generated, and feedback information is generated and fed back to the client according to the analysis results. The present invention can replace manual judgment to determine whether the rescue situation is true by judging photos, improve judgment efficiency, and reduce errors caused by manual judgment at the same time.

附图说明Description of drawings

图1为本发明服务器较佳实施例的应用环境图；Fig. 1 is the application environment diagram of the preferred embodiment of the server of the present invention;

图2为图1中基于图片识别的目标检测程序较佳实施例的程序模块示意图；Fig. 2 is the program module schematic diagram of the preferred embodiment of the target detection program based on picture recognition in Fig. 1;

图3为本发明基于图片识别的目标检测方法较佳实施例的流程示意图。FIG. 3 is a schematic flowchart of a preferred embodiment of a target detection method based on image recognition according to the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

为了使本发明的目的、技术本实施例及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technology, and advantages of this embodiment of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

需要说明的是，在本发明中涉及“第一”、“第二”等的描述仅用于描述目的，而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外，各个实施例之间的技术本实施例可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术本实施例的结合出现相互矛盾或无法实现时应当认为这种技术本实施例的结合不存在，也不在本发明要求的保护范围之内。It should be noted that the descriptions involving "first", "second", etc. in the present invention are only for the purpose of description, and should not be construed as indicating or implying their relative importance or implying the number of indicated technical features . Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In addition, the technology between the various embodiments can be combined with each other, but it must be based on the realization by those of ordinary skill in the art. When the combination of the technology in this embodiment is contradictory or cannot be realized, it should be considered that this technology is based on Combinations of embodiments do not exist and are not within the scope of protection claimed by the present invention.

本发明提供一种服务器1。The present invention provides a server 1 .

所述服务器1包括，但不仅限于，存储器11、处理器12及网络接口13。The server 1 includes, but is not limited to, a memory 11 , a processor 12 and a network interface 13 .

其中，存储器11至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如，SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是服务器1的内部存储单元，例如该服务器1的硬盘。存储器11在另一些实施例中也可以是服务器1的外部存储设备，例如该服务器1上配备的插接式硬盘，智能存储卡(Smart Media Card，SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like. The memory 11 may be an internal storage unit of the server 1 in some embodiments, such as a hard disk of the server 1 . In other embodiments, the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card equipped on the server 1. , Flash card (Flash Card) and so on.

进一步地，存储器11还可以既包括服务器1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于服务器1的应用软件及各类数据，例如基于图片识别的目标检测程序10的代码等，还可以用于暂时地存储已经输出或者将要输出的数据。Further, the memory 11 may also include both an internal storage unit of the server 1 and an external storage device. The memory 11 can be used not only to store application software installed on the server 1 and various data, such as the code of the target detection program 10 based on image recognition, etc., but also to temporarily store data that has been output or will be output.

处理器12在一些实施例中可以是一中央处理器(Central Processing Unit，CPU)、控制器、微控制器、微处理器或其他数据处理芯片，用于运行存储器11中存储的程序代码或处理数据，例如执行基于图片识别的目标检测程序10等。In some embodiments, the processor 12 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program codes or processing stored in the memory 11 Data, for example, the target detection program 10 based on image recognition is executed.

网络接口13可选的可以包括标准的有线接口、无线接口(如WI-FI接口)，通常用于在该服务器与其他电子设备之间建立通信连接。Optionally, the network interface 13 may include a standard wired interface and a wireless interface (such as a WI-FI interface), which is usually used to establish a communication connection between the server and other electronic devices.

客户端14可以是桌上型计算机、笔记本、平板电脑、手机等。The client 14 may be a desktop computer, a notebook computer, a tablet computer, a mobile phone, or the like.

网络15可以为互联网、云网络、无线保真(Wi-Fi)网络、个人网(PAN)、局域网(LAN)和/或城域网(MAN)。网络环境中的各种设备可以被配置为根据各种有线和无线通信协议连接到通信网络。这样的有线和无线通信协议的例子可以包括但不限于以下中的至少一个：传输控制协议和互联网协议(TCP/IP)、用户数据报协议(UDP)、超文本传输协议(HTTP)、文件传输协议(FTP)、ZigBee、EDGE、IEEE 802.11、光保真(Li-Fi)、802.16、IEEE 802.11s、IEEE 802.11g、多跳通信、无线接入点(AP)、设备对设备通信、蜂窝通信协议和/或蓝牙(BlueTooth)通信协议或其组合。The network 15 may be the Internet, a cloud network, a wireless fidelity (Wi-Fi) network, a personal network (PAN), a local area network (LAN), and/or a metropolitan area network (MAN). Various devices in a network environment may be configured to connect to communication networks according to various wired and wireless communication protocols. Examples of such wired and wireless communication protocols may include, but are not limited to, at least one of the following: Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), ZigBee, EDGE, IEEE 802.11, Optical Fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless access point (AP), device-to-device communication, cellular communication protocol and/or a Bluetooth (BlueTooth) communication protocol or a combination thereof.

可选地，该服务器1还可以包括用户接口，用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard)，可选的用户接口还可以包括标准的有线接口、无线接口。可选地，在一些实施例中，显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode，有机发光二极管)触摸器等。其中，显示器也可以称为显示屏或显示单元，用于显示在服务器1中处理的信息以及用于显示可视化的用户界面。Optionally, the server 1 may further include a user interface, and the user interface may include a display (Display), an input unit such as a keyboard (Keyboard), and an optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be referred to as a display screen or a display unit for displaying information processed in the server 1 and for displaying a visual user interface.

图1仅示出了具有组件11-15以及基于图片识别的目标检测程序10的服务器1，本领域技术人员可以理解的是，图1示出的结构并不构成对服务器1的限定，可以包括比图示更少或者更多的部件，或者组合某些部件，或者不同的部件布置。FIG. 1 only shows the server 1 having components 11-15 and the target detection program 10 based on image recognition. Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the server 1, and may include Fewer or more components than shown, or some components are combined, or a different arrangement of components.

在本实施例中，图1的基于图片识别的目标检测程序10被处理器12执行时，实现以下步骤：In this embodiment, when the image recognition-based target detection program 10 of FIG. 1 is executed by the processor 12, the following steps are implemented:

接收步骤：接收客户端14上传的待检测图像，将所述待检测图像输入预先训练的图像数据特征提取模型，得到与所述待检测图像对应的图像特征图；Receiving step: receiving the image to be detected uploaded by the client 14, inputting the image to be detected into a pre-trained image data feature extraction model to obtain an image feature map corresponding to the image to be detected;

反馈步骤：若所述判断结果为第一判断结果，则分别判断所述第一目标框中是否包含第二目标框，若包含所述第二目标框，则分别识别所述第二目标框对应的第一预设数据，分析第一预设数据与预先存储在数据库中的第二预设数据是否对应，并根据分析结果生成反馈信息反馈至所述客户端14，其中，所述分析结果包括属实及不属实。Feedback step: if the judgment result is the first judgment result, judge whether the first target frame includes a second target frame, and if the second target frame is included, identify the corresponding second target frame the first preset data, analyze whether the first preset data corresponds to the second preset data pre-stored in the database, and generate feedback information according to the analysis results to feed back to the client 14, wherein the analysis results include true and false.

在另一实施例中，所述反馈步骤还包括：In another embodiment, the feedback step further includes:

关于上述步骤的详细介绍，请参照下述图2关于基于图片识别的目标检测程序10实施例的程序模块示意图及图3关于基于图片识别的目标检测方法实施例的方法流程示意图的说明。For a detailed description of the above steps, please refer to the following descriptions of FIG.

参照图2所示，为图1中基于图片识别的目标检测程序10实施例的程序模块示意图。基于图片识别的目标检测程序10被分割为多个模块，该多个模块存储于存储器11中，并由处理器12执行，以完成本发明。本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段。Referring to FIG. 2 , it is a schematic diagram of program modules of an embodiment of the target detection program 10 based on image recognition in FIG. 1 . The target detection program 10 based on image recognition is divided into a plurality of modules, and the plurality of modules are stored in the memory 11 and executed by the processor 12 to complete the present invention. A module referred to in the present invention refers to a series of computer program instruction segments capable of accomplishing specific functions.

在本实施例中，所述基于图片识别的目标检测程序10包括接收模块110、第一判断模块120、第二判断模块130、第三判断模块140及反馈模块150。In this embodiment, the target detection program 10 based on image recognition includes a receiving module 110 , a first judging module 120 , a second judging module 130 , a third judging module 140 and a feedback module 150 .

接收模块110，用于接收客户端14上传的待检测图像，将所述待检测图像输入预先训练的图像数据特征提取模型，得到与所述待检测图像对应的图像特征图。The receiving module 110 is configured to receive the to-be-detected image uploaded by the client 14, input the to-be-detected image into a pre-trained image data feature extraction model, and obtain an image feature map corresponding to the to-be-detected image.

在本实施例中，服务器1接收客户端14(例如摄像机或其他具有拍摄功能的拍摄终端、或具有拍摄功能及传输图像功能的设备)上传的待检测图像，利用预先训练的图像特征提取模型从待检测图像中提取出图像特征图。在本实施例中，图像特征提取模型由MobileNetV2网络模型训练得到，MobileNetV2网络模型是一种轻量级的卷积神经网络结构模型，MobileNetV2网络模型可以高效地对分辨率不高的图像进行快速识别，且具有运算占用带宽较小的特点，可以搭载于移动设备上使用。MobileNetV2网络模型包括依次连接的53层卷积层、1层池化层以及1层全连接层，其中，53层卷积层包括依次连接的1层输入层、17个瓶颈构建块、1层输出层，每个瓶颈构建块分别包括3层卷积层，53层卷积层的卷积核均为3×3。In this embodiment, the server 1 receives the image to be detected uploaded by the client 14 (for example, a camera or other shooting terminal with shooting function, or a device with shooting function and image transmission function), and uses a pre-trained image feature extraction model to extract images from the image. The image feature map is extracted from the image to be detected. In this embodiment, the image feature extraction model is trained by the MobileNetV2 network model. The MobileNetV2 network model is a lightweight convolutional neural network structure model. The MobileNetV2 network model can efficiently identify images with low resolution. , and has the characteristics of small operation bandwidth, which can be used on mobile devices. The MobileNetV2 network model includes 53 layers of convolutional layers, 1 layer of pooling layers and 1 layer of fully connected layers connected in sequence, among which the 53 layers of convolutional layers include 1 layer of input layer, 17 bottleneck building blocks and 1 layer of output layer connected in sequence Each bottleneck building block includes 3 convolutional layers, and the convolution kernels of the 53 convolutional layers are all 3×3.

在其他实施例中，还可以在训练MobileNetV2网络模型时，预先为所述MobileNetV2网络模型设定损失函数，将训练样本输入到MobileNetV2网络模型中，对输入的训练样本进行前向传播得到实际输出，将预设的目标输出和实际输出代入损失函数中，计算损失函数的损失值，进行反向传播并利用损失值对所述MobileNetV2网络模型的参数进行优化，得到优化后的MobileNetV2网络模型。然后再选取一个训练样本输入到优化后的MobileNetV2网络模型中，参照前述操作，再次对优化后的MobileNetV2网络模型进行训练，直到达到停止训练的条件为止。In other embodiments, when training the MobileNetV2 network model, a loss function can be set for the MobileNetV2 network model in advance, the training samples are input into the MobileNetV2 network model, and the input training samples are forward propagated to obtain the actual output, Substitute the preset target output and the actual output into the loss function, calculate the loss value of the loss function, perform backpropagation and use the loss value to optimize the parameters of the MobileNetV2 network model to obtain the optimized MobileNetV2 network model. Then select a training sample and input it into the optimized MobileNetV2 network model. Referring to the above operation, train the optimized MobileNetV2 network model again until the condition for stopping training is reached.

第一判断模块120，用于将所述得到的图像特征图输入预先训练的目标提取模型，输出第一图像数据，判断所述第一图像数据中是否包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量。The first judgment module 120 is used for inputting the obtained image feature map into a pre-trained target extraction model, outputting first image data, and judging whether the first image data contains at least two types of first preset targets. A first target frame, and the number of the first target frames of each preset type is a first preset quantity.

在本实施例中，当判断第一图像数据中包含至少两种第一预设目标类型的第一目标框，且每种预设类型的第一目标框的数量为第一预设数量(在本实施例中，第一预设数量为1)时，说明客户端14上传的待检测图像符合要求，则执行后续步骤；否则，说明客户端14上传的待检测图像不符合要求，并生成反馈信息反馈至客户端14。In this embodiment, when it is determined that the first image data includes first target frames of at least two first preset target types, and the number of first target frames of each preset type is the first preset number (in In this embodiment, when the first preset number is 1), it means that the image to be detected uploaded by the client 14 meets the requirements, and the subsequent steps are executed; otherwise, it means that the image to be detected uploaded by the client 14 does not meet the requirements, and a feedback is generated The information is fed back to the client 14 .

其中，所述目标框是基于第三方标记工具(例如RectLabel)绘制的，每个目标框对应一种第一预设目标类型。The target frame is drawn based on a third-party marking tool (eg, RectLabel), and each target frame corresponds to a first preset target type.

以下通过具体例子进一步说明：The following is further explained with specific examples:

例如保险公司对于购买了车险的用户，赠送对应的救援服务，并委托救援商为用户提供救援服务。当救援商为用户提供救援服务后，为了向保险公司证明救援的真实性，而非假冒行为，因此救援商在完成救援后需要对现场进行拍照取证并反馈给服务商。For example, insurance companies provide corresponding rescue services to users who have purchased auto insurance, and entrust rescue providers to provide rescue services for users. When rescue providers provide rescue services for users, in order to prove the authenticity of the rescue to the insurance company, rather than counterfeiting, the rescue providers need to take pictures of the scene to collect evidence and feedback to the service providers after completing the rescue.

因此，通过本方案将救援商现场拍摄的照片输入图像特征提取模型后，得到所述照片的图像特征图，之后将图像特征图输入目标提取模型中得到所述照片的第一图像数据，对第一图像数据行进分析判断，当判断所述第一图像数据中包含至少两种第一预设目标类型(例如故障车、拖车)的第一目标框(至少包含故障车的图框及拖车的图框)，且每种预设类型的第一目标框的数量为第一预设数量时，说明客户端14上传的待检测图像符合要求，则执行后续步骤；否则，说明客户端14上传的待检测图像不符合要求，可能存在造假行为，或者拍摄的图片不符合要求，并生成反馈信息反馈至客户端14。Therefore, through this scheme, after the photos taken by rescuers on site are input into the image feature extraction model, the image feature map of the photos is obtained, and then the image feature maps are input into the target extraction model to obtain the first image data of the photos. An image data is analyzed and judged. When it is judged that the first image data includes at least two first target types of first preset target types (eg, a broken car, a trailer), a first target frame (at least the frame of the broken car and the image of the trailer) are included. frame), and when the number of the first target frames of each preset type is the first preset number, it means that the image to be detected uploaded by the client 14 meets the requirements, and the subsequent steps are executed; otherwise, it is explained that the image to be detected uploaded by the client 14 meets the requirements; It is detected that the image does not meet the requirements, there may be fraudulent behavior, or the captured picture does not meet the requirements, and feedback information is generated and fed back to the client 14 .

之后，将所述图像特征图输入预先训练的目标提取模型得到所述待监测图像对应的第一图像数据。所述目标提取模型为SSD模型。上述步骤中，判断所述第一图像数据中是否包含至少两种第一预设目标类型的第一目标框包括：After that, input the image feature map into a pre-trained target extraction model to obtain first image data corresponding to the image to be monitored. The target extraction model is an SSD model. In the above step, judging whether the first image data contains at least two first target frames of the first preset target type includes:

所述目标提取模型的训练过程包括：The training process of the target extraction model includes:

获取的图像特征图样本，基于所述目标提取模型对所述图像特征图样本中的每个像素点分别生成对应的默认框样本，并获取各默认框样本在该图像特征图样本中的坐标位置，以及对应于不同第一预设目标类型的概率评分；The acquired image feature map samples, based on the target extraction model, generate corresponding default frame samples for each pixel in the image feature map samples respectively, and obtain the coordinate position of each default frame sample in the image feature map sample , and the probability scores corresponding to different first preset target types;

对所述softmax分类损失和包围盒回归损失之和按照从大到小进行排序，以所述softmax分类损失和包围盒回归损失之和最小者对应的默认框样本为起始点，依次获取预设数量的所述默认框样本，计算所述预设数量的默认框样本的损失函数，并将计算出的所述预设数量的默认框样本的损失函数，在所述目标提取模型中反向传播，以对所述目标提取模型的各层网络的权重值进行更新，训练得到该目标提取模型；Sort the sum of the softmax classification loss and the bounding box regression loss in descending order, take the default box sample corresponding to the smallest sum of the softmax classification loss and the bounding box regression loss as the starting point, and obtain the preset number in turn the default frame samples of Update the weight value of each layer network of the target extraction model, and train to obtain the target extraction model;

所述损失函数通过以下公式计算：The loss function is calculated by the following formula:

第二判断模块130，用于若所述第一图像数据中包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量(在本实施例中，第一预设数量为1)，则分别获取各所述第一目标框的位置坐标，并基于预设计算规则判断各所述第一目标框之间的位置关系。The second judging module 130 is configured to, if the first image data includes at least two first target frames of the first preset target type, and the number of the first target frames of each preset type is the first The preset number (in this embodiment, the first preset number is 1), the position coordinates of each of the first target frames are obtained respectively, and the distance between each of the first target frames is determined based on the preset calculation rule. Positional relationship.

在本实施例中，当识别出第一图像数据中包含至少两种第一预设目标类型的第一目标框时，分别获取目标提取模型输出的第一目标框的四个顶点的位置坐标，并基于预设计算规则判断各第一目标框之间的位置关系，以故障车和拖车为例，所述位置关系可以是故障车在拖车上方，或者故障车在拖车下方，或者故障车在拖车左方，或者故障车在拖车右方。In this embodiment, when identifying the first target frame containing at least two first preset target types in the first image data, the position coordinates of the four vertices of the first target frame output by the target extraction model are obtained respectively, And determine the positional relationship between the first target frames based on the preset calculation rules. Taking the faulty vehicle and the trailer as an example, the positional relationship can be that the faulty vehicle is above the trailer, or the faulty vehicle is below the trailer, or the faulty vehicle is in the trailer Left, or the faulty car is on the right of the trailer.

其中，所述计算规则为：分别取故障车代表的第一目标框和拖车代表的第一目标框各顶点的坐标，即故障车左上坐标(x1，y1)，故障车右上坐标(x2，y2)，故障车左下坐标(x4，y4)，故障车右下坐标(x3，y3)，拖车左上坐标(a1，b1)，拖车右上坐标(a2，b2)，拖车左下坐标(a4，b4)，拖车右下坐标(a3，b3)，利用x1减去a1，x2减去a2，若结果为一正一负，说明故障车处于拖车的中间位置，同时利用y1减去b1，y2减去b2，若结果均为正，说明故障车位于拖车上方，在同时满足上述两个条件时，则说明故障车位于拖车的正上方，能够初步判断救援方上传的照片满足保险商对现场取证的要求。Wherein, the calculation rule is: respectively take the coordinates of each vertex of the first target frame represented by the faulty car and the first target frame represented by the trailer, that is, the upper left coordinate of the faulty vehicle (x1, y1), and the upper right coordinate of the faulty vehicle (x2, y2 ), the lower left coordinate of the faulty vehicle (x4, y4), the lower right coordinate of the faulty vehicle (x3, y3), the upper left coordinate of the trailer (a1, b1), the upper right coordinate of the trailer (a2, b2), the lower left coordinate of the trailer (a4, b4), The lower right coordinates of the trailer (a3, b3), subtract a1 from x1, and subtract a2 from x2. If the result is one positive and one negative, it means that the faulty vehicle is in the middle of the trailer. At the same time, subtract b1 from y1 and subtract b2 from y2. If the results are all positive, it means that the faulty vehicle is located above the trailer. When the above two conditions are met, it means that the faulty vehicle is located directly above the trailer. It can be preliminarily determined that the photos uploaded by the rescuer meet the requirements of the insurer for on-site evidence collection.

第三判断模块140，用于根据所述位置关系，从数据库中预先建立的位置关系与判断结果之间的映射关系表中找出对应的判断结果，其中，所述判断结果包括第一判断结果及第二判断结果，所述第一判断结果表示第一图像数据对应信息的真实性为待定，所述第二判断结果表示第一图像数据对应信息的真实性为不属实。The third judging module 140 is configured to, according to the positional relationship, find out the corresponding judgment result from the mapping relationship table between the positional relationship and judgment result pre-established in the database, wherein the judgment result includes the first judgment result and a second judgment result, the first judgment result indicates that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result indicates that the authenticity of the information corresponding to the first image data is not true.

为了进一步说明该步骤的具体方案，在本实施例中，继续以上述的例子进行说明。通过从数据库中预先建立的位置关系与判断结果之间的映射关系表中找出对应的判断结果，其中，判断结果包括第一判断结果及第二判断结果。当第一图像数据中显示故障车在拖车的上方时，则说明救援方上传的照片符合要求，但是由于故障车或拖车的真实性可能存在不确定性，即待定，因此向客户端14发出第一判断结果(待定，即故障车及/或故障车的真实身份待定)的反馈信息，反馈信息显示第一图像数据对应信息的真实性为待定。当第一图像数据中未显示故障车置于拖车上方时，则说明救援商上传的照片不符合要求，即向客户端14发出第二判断结果(不属实)的反馈信息。In order to further illustrate the specific solution of this step, in this embodiment, the above-mentioned example is continued to be described. The corresponding judgment result is found from the mapping relationship table between the position relationship and the judgment result pre-established in the database, wherein the judgment result includes the first judgment result and the second judgment result. When the first image data shows that the faulty car is above the trailer, it means that the photos uploaded by the rescuer meet the requirements, but because the authenticity of the faulty car or the trailer may be uncertain, that is, to be determined, the client 14 will be sent the first Feedback information of the judgment result (to be determined, that is, the real identity of the faulty vehicle and/or the faulty vehicle is to be determined), the feedback information indicates that the authenticity of the information corresponding to the first image data is to be determined. When the first image data does not show that the faulty vehicle is placed above the trailer, it means that the photos uploaded by the rescue provider do not meet the requirements, that is, a feedback message of the second judgment result (not true) is sent to the client 14 .

反馈模块150，用于若所述判断结果为第一判断结果，则分别判断所述第一目标框中是否包含第二目标框，若包含所述第二目标框，则分别识别所述第二目标框对应的第一预设数据，分析第一预设数据与预先存储在数据库中的第二预设数据是否对应，并根据分析结果生成反馈信息反馈至所述客户端14，其中，所述分析结果包括属实及不属实。The feedback module 150 is configured to judge whether the first target frame includes a second target frame if the judgment result is the first judgment result, and identify the second target frame if the second target frame is included. For the first preset data corresponding to the target frame, analyze whether the first preset data corresponds to the second preset data pre-stored in the database, and generate feedback information to feed back to the client 14 according to the analysis result, wherein the Analysis results include true and false.

为了避免故障车或拖车的真实性可能存在不确定性的情况发生，例如救援商可能上传假冒照片，照片中的故障车或拖车不是救援商现场拍摄的，因此在本实施例中，在判断出不同第一目标框之间的位置关系之后，还需要对第一目标框进行分析，继续以上述的例子进行说明该步骤的具体方案。若判断结果为第一判断结果，即救援商上传的照片中显示故障车在拖车上方，则分别判断第一目标框中是否包含第二目标框，例如故障车或拖车的车牌。其中，若第一目标框中包含第二目标框，则分别识别第二目标框对应的第一预设数据(例如车牌号)，分析第一预设数据与预先存储在数据库中的第二预设数据(例如车主姓名)是否对应，并根据分析结果生成反馈信息反馈至客户端14。In order to avoid the situation where the authenticity of the faulty car or trailer may be uncertain, for example, the rescue company may upload fake photos, and the faulty car or trailer in the photo was not taken by the rescue company on the spot. Therefore, in this embodiment, after judging the After the positional relationship between the different first target frames, the first target frame still needs to be analyzed, and the specific solution of this step is described with the above example. If the judgment result is the first judgment result, that is, the photo uploaded by the rescue provider shows that the faulty vehicle is above the trailer, then it is determined whether the first target frame contains the second target frame, such as the license plate of the faulty vehicle or the trailer. Wherein, if the first target frame contains the second target frame, the first preset data (for example, the license plate number) corresponding to the second target frame are respectively identified, and the first preset data and the second preset data pre-stored in the database are analyzed. Set whether the data (for example, the name of the car owner) corresponds, and generate feedback information according to the analysis result and feed it back to the client 14 .

在另一实施例中，所述反馈模块150还用于若所述分析结果为不属实，则将所述待检测图像进行直方图均衡化处理得到第二图像数据，将所述第二图像数据调整至预设角度后重新输入所述接收步骤中的图像特征提取模型。In another embodiment, the feedback module 150 is further configured to perform histogram equalization processing on the to-be-detected image to obtain second image data if the analysis result is false, and the second image data After adjusting to the preset angle, re-input the image feature extraction model in the receiving step.

在本实施例中，若出现分析结果为不属实，可能是救援商上传的照片为假冒照片，也可能是救援商上传的照片不符合服务器识别要求，例如照片光线暗或者角度摆放不好导致的。因此，当出现分析结果为不属实时，可以将待检测图像进行直方图均衡化处理得到第二图像数据，并将第二图像数据调整预设角度(例如270°，即对称翻转)后重新输入接收步骤中的图像特征提取模型，重复上述步骤。In this embodiment, if the analysis result is false, it may be that the photos uploaded by the rescue provider are fake photos, or the photos uploaded by the rescue provider do not meet the server identification requirements, for example, the photo is dark or the angle is not well placed. of. Therefore, when the analysis result is not real-time, the to-be-detected image can be processed by histogram equalization to obtain the second image data, and the second image data can be adjusted to a preset angle (for example, 270°, that is, symmetrically flipped) and then input again. Receive the image feature extraction model in the step, and repeat the above steps.

此外，本发明还提供一种基于图片识别的目标检测方法。参照图3所示，为本发明基于图片识别的目标检测方法的实施例的方法流程示意图。服务器1的处理器12执行存储器11中存储的基于图片识别的目标检测程序10时实现基于图片识别的目标检测方法的如下步骤：In addition, the present invention also provides a target detection method based on image recognition. Referring to FIG. 3 , it is a schematic flowchart of an embodiment of a target detection method based on image recognition according to the present invention. When the processor 12 of the server 1 executes the image recognition-based target detection program 10 stored in the memory 11, the following steps are implemented to realize the image recognition-based target detection method:

S110，接收客户端14上传的待检测图像，将所述待检测图像输入预先训练的图像数据特征提取模型，得到与所述待检测图像对应的图像特征图。S110: Receive the to-be-detected image uploaded by the client 14, input the to-be-detected image into a pre-trained image data feature extraction model, and obtain an image feature map corresponding to the to-be-detected image.

S120，将所述得到的图像特征图输入预先训练的目标提取模型，输出第一图像数据，判断所述第一图像数据中是否包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量。S120: Input the obtained image feature map into a pre-trained target extraction model, output first image data, and determine whether the first image data includes first target frames of at least two first preset target types, and The number of the first target frames of each preset type is the first preset number.

因此，通过本方案将救援商现场拍摄的照片输入图像特征提取模型后，得到所述照片的图像特征图，之后将图像特征图输入目标提取模型中得到所述照片的第一图像数据，对第一图像数据进行分析判断，当判断所述第一图像数据中包含至少两种第一预设目标类型(例如故障车、拖车)的第一目标框(至少包含故障车的图框及拖车的图框)，且每种预设类型的第一目标框的数量为第一预设数量时，说明客户端14上传的待检测图像符合要求，则执行后续步骤；否则，说明客户端14上传的待检测图像不符合要求，可能存在造假行为，或者拍摄的图片不符合要求，并生成反馈信息反馈至客户端14。Therefore, through this scheme, after the photos taken by rescuers on site are input into the image feature extraction model, the image feature map of the photos is obtained, and then the image feature maps are input into the target extraction model to obtain the first image data of the photos. An image data is analyzed and judged, when it is judged that the first image data includes at least two first target types of the first preset target types (such as a faulty car and a trailer) in a first target frame (at least the frame of the faulty car and the image of the trailer). frame), and when the number of the first target frames of each preset type is the first preset number, it means that the image to be detected uploaded by the client 14 meets the requirements, and the subsequent steps are executed; otherwise, it is explained that the image to be detected uploaded by the client 14 meets the requirements; It is detected that the image does not meet the requirements, there may be fraudulent behavior, or the captured picture does not meet the requirements, and feedback information is generated and fed back to the client 14 .

S130，若所述第一图像数据中包含至少两种第一预设目标类型的第一目标框，且每种所述预设类型的第一目标框的数量为第一预设数量(在本实施例中，第一预设数量为1)，则分别获取各所述第一目标框的位置坐标，并基于预设计算规则判断各所述第一目标框之间的位置关系。S130, if the first image data includes at least two first target frames of a first preset target type, and the number of first target frames of each of the preset types is a first preset quantity (in this case In the embodiment, if the first preset number is 1), the position coordinates of each of the first target frames are obtained respectively, and the positional relationship between each of the first target frames is determined based on a preset calculation rule.

S140，根据所述位置关系，从数据库中预先建立的位置关系与判断结果之间的映射关系表中找出对应的判断结果，其中，所述判断结果包括第一判断结果及第二判断结果，所述第一判断结果表示第一图像数据对应信息的真实性为待定，所述第二判断结果表示第一图像数据对应信息的真实性为不属实。S140, according to the positional relationship, find out the corresponding judgment result from the mapping relationship table between the positional relationship and the judgment result pre-established in the database, wherein the judgment result includes a first judgment result and a second judgment result, The first judgment result indicates that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result indicates that the authenticity of the information corresponding to the first image data is not true.

S150，若所述判断结果为第一判断结果，则分别判断所述第一目标框中是否包含第二目标框，若包含所述第二目标框，则分别识别所述第二目标框对应的第一预设数据，分析第一预设数据与预先存储在数据库中的第二预设数据是否对应，并根据分析结果生成反馈信息反馈至所述客户端14，其中，所述分析结果包括属实及不属实。S150: If the judgment result is the first judgment result, judge whether the first target frame includes a second target frame, and if the second target frame is included, identify the corresponding frame of the second target frame. For the first preset data, analyze whether the first preset data corresponds to the second preset data pre-stored in the database, and generate feedback information according to the analysis result to feed back to the client 14, wherein the analysis result includes true and untrue.

为了避免故障车或拖车的真实性可能存在不确定性的情况发生，例如救援商可能上传假冒照片，照片中的故障车或拖车不是救援商现场拍摄的，因此在本实施例中，在判断出不同第一目标框之间的位置关系之后，还需要对第一目标框进行分析，继续以上述的例子进行说明该步骤的具体方案。若判断结果为第一判断结果，即救援商上传的照片中显示故障车在拖车上方，则分别判断第一目标框中是否包含第二目标框，例如故障车或拖车的车牌。其中，若第一目标框中包含第二目标框，则分别识别第二目标框对应的第一预设数据(例如车牌号)，分析第一预设数据与预先存储在数据库中的第二预设数据(例如车主姓名)是否对应，并根据分析结果生成反馈信息反馈至客户端14。In order to avoid the situation where the authenticity of the faulty car or trailer may be uncertain, for example, the rescue company may upload fake photos, and the faulty car or trailer in the photo was not taken by the rescue company on the spot. Therefore, in this embodiment, after judging the After the positional relationship between the different first target frames, the first target frame still needs to be analyzed, and the specific solution of this step is described with the above example. If the judgment result is the first judgment result, that is, the photo uploaded by the rescue provider shows that the faulty vehicle is above the trailer, then it is determined whether the first target frame contains the second target frame, such as the license plate of the faulty vehicle or the trailer. Wherein, if the first target frame includes a second target frame, the first preset data (for example, the license plate number) corresponding to the second target frame are respectively identified, and the first preset data and the second preset data pre-stored in the database are analyzed. Set whether the data (for example, the name of the car owner) corresponds, and generate feedback information according to the analysis result and feed it back to the client 14 .

在另一实施例中，该方法还包括以下步骤：In another embodiment, the method further includes the following steps:

此外，本发明实施例还提出一种计算机可读存储介质，计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等中的任意一种或者几种的任意组合。计算机可读存储介质中包括基于图片识别的目标检测程序10，本发明之计算机可读存储介质的具体实施方式与上述基于图片识别的目标检测方法以及服务器1的具体实施方式大致相同，在此不再赘述。In addition, an embodiment of the present invention also provides a computer-readable storage medium, where the computer-readable storage medium may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read-only memory (ROM), an erasable programmable read-only memory Memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc. any one or any combination of several. The computer-readable storage medium includes a target detection program 10 based on image recognition. The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the above-mentioned image recognition-based target detection method and server 1. Repeat.

需要说明的是，上述本发明实施例序日仅仅为了描述，不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that the above-mentioned preambles of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprising", "comprising" or any other variation thereof herein are intended to encompass a non-exclusive inclusion such that a process, device, article or method comprising a list of elements includes not only those elements, but also includes no explicit Other elements listed, or those inherent to such a process, apparatus, article, or method are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.

上述本发明实施例序日仅仅为了描述，不代表实施例的优劣。通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术本实施例本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本发明各个实施例所述的方法。The above-mentioned preambles of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments. From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on such understanding, the technical aspects of the present invention can be embodied in the form of software products that are essentially or contribute to the prior art, and the computer software products are stored in a storage medium (such as a ROM) as described above. /RAM, magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

Claims

1. A target detection method based on picture recognition is applied to a server, and is characterized by comprising the following steps:

a receiving step: receiving an image to be detected uploaded by a client, inputting the image to be detected into a pre-trained image data feature extraction model, and obtaining an image feature map corresponding to the image to be detected;

a first judgment step: inputting the obtained image feature map into a pre-trained target extraction model, outputting first image data, and judging whether the first image data contains first target frames of at least two first preset target types, wherein the number of the first target frames of each preset type is a first preset number;

a second judgment step: if the first image data comprises first target frames of at least two first preset target types and the number of the first target frames of each preset type is a first preset number, respectively acquiring the position coordinates of the first target frames, and judging the position relation between the first target frames based on a preset calculation rule;

a third judging step: according to the position relation, finding out a corresponding judgment result from a mapping relation table between the position relation and the judgment result which are pre-established in a database, wherein the judgment result comprises a first judgment result and a second judgment result, the first judgment result shows that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result shows that the authenticity of the information corresponding to the first image data is not true; and

a feedback step: if the judgment result is a first judgment result, respectively judging whether the first target frame comprises a second target frame, if so, respectively identifying first preset data corresponding to the second target frame, analyzing whether the first preset data correspond to second preset data stored in a database in advance, and generating feedback information according to the analysis result to feed back the feedback information to the client, wherein the analysis result comprises an actual fact and a non-actual fact.

2. The method as claimed in claim 1, wherein the object extraction model is an SSD model, and the determining whether the first image data includes a first object frame of at least two first preset object types comprises:

respectively generating corresponding default frames for each pixel point in the image feature map based on the SSD model, acquiring position coordinates of each default frame in the image feature map and probability scores corresponding to different first preset target types, and setting the maximum value in the probability scores of each default frame as a primary confidence coefficient;

sorting the default frames corresponding to the primary confidence degrees from large to small according to probability scores, sequentially obtaining a second preset number of default frames as target candidate frames by taking the default frame corresponding to the maximum value of the probability scores as a starting point, and performing bounding box regression analysis based on the position coordinates of each target candidate frame to obtain the area size corresponding to each target candidate frame;

performing softxmax classification on the probability score of each target candidate frame to obtain target confidence coefficients of each target candidate frame corresponding to different preset target type classifications; and

and acquiring a third preset number of target candidate frames with iou (M, b) higher than a preset threshold as the first target frame based on a non-maximum suppression algorithm, wherein M represents a default frame corresponding to the maximum probability score, b represents other default frames except the default frame M in the image feature map, and iou (M, b) represents the overlapping degree between the default frame M and the default frame b.

3. The image recognition-based target detection method of claim 2, wherein the training process of the target extraction model comprises:

acquiring an image feature map sample, respectively generating corresponding default frame samples for each pixel point in the image feature map sample based on the target extraction model, and acquiring the coordinate position of each default frame sample in the image feature map sample and probability scores corresponding to different first preset target types;

respectively calculating the sum of softmax classification loss and bounding box regression loss of each default frame sample based on the position coordinate and probability score of each default frame sample; and

sequencing the sum of the softmax classification loss and the bounding box regression loss from large to small, sequentially acquiring a preset number of default frame samples by taking the default frame sample corresponding to the minimum softmax classification loss and the bounding box regression loss as a starting point, calculating the loss functions of the default frame samples of the preset number, performing back propagation on the calculated loss functions of the default frame samples of the preset number in the target extraction model, updating the weight values of each layer of network of the target extraction model, and training to obtain the target extraction model.

4. The image recognition-based target detection method of claim 3, wherein the loss function is calculated by the following formula:

wherein, L_conf(x, c) Softmax classification loss, L_loc(x, l, g) is bounding box regression loss, K ═ f_k|*|f_k|*α，|f_kThe method comprises the steps of |, weighting value α, default frame c, default frame position information l and calibration area result g.

5. The target detection method based on picture recognition according to any one of claims 1-4, wherein the feedback step further comprises:

and if the analysis result is not true, performing histogram equalization processing on the image to be detected to obtain second image data, adjusting the second image data to a preset angle, and inputting the second image data into the image feature extraction model in the receiving step again.

6. A server, comprising a memory and a processor, wherein the memory stores a picture recognition-based object detection program, and the picture recognition-based object detection program when executed by the processor implements the steps of:

7. The server according to claim 6, wherein the object extraction model is an SSD model, and the determining whether the first image data includes a first object box of at least two first preset object types includes:

8. The server of claim 7, wherein the training process of the target extraction model comprises:

9. The server of claim 8, wherein the loss function is calculated by the formula:

10. A computer-readable storage medium, wherein a picture recognition-based object detection program is stored on the computer-readable storage medium, and the picture recognition-based object detection program is executable by one or more processors to implement the steps of the picture recognition-based object detection method according to any one of claims 1 to 5.