CN114004986A

CN114004986A - Image processing method, training method, device, equipment and medium for detection model

Info

Publication number: CN114004986A
Application number: CN202111279421.9A
Authority: CN
Inventors: 蒋旻悦; 杨喜鹏; 谭啸; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-02-01

Abstract

The disclosure provides an image processing method, a training device, equipment, a training medium and a training product of a detection model, relates to the field of artificial intelligence, particularly relates to computer vision and deep learning technology, and particularly can be used in smart cities and smart traffic scenes. The image processing method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises at least one target object; and performing detection operation on the image to be processed to obtain a detection result for each target object in at least one target object, wherein the detection result for each target object is associated with the distortion information of each target object.

Description

Image processing method, training method, device, equipment and medium for detection model

Technical Field

The utility model relates to the artificial intelligence field, concretely relates to computer vision and deep learning technique, specifically can be used to under wisdom city and the intelligent transportation scene.

Background

In the related art, when an object in an image is detected by using a detection model, the detection accuracy of the detection model is low, resulting in poor detection effect.

Disclosure of Invention

The disclosure provides an image processing method, a training method and device for a detection model, an electronic device, a storage medium and a program product.

According to an aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be processed, wherein the image to be processed comprises at least one target object; and performing detection operation on the image to be processed to obtain a detection result for each target object in the at least one target object, wherein the detection result for each target object is associated with the distortion information of each target object.

According to another aspect of the present disclosure, there is provided a training method of a detection model, including: acquiring an image training sample, wherein the image training sample comprises at least one reference object; training a detection model based on distortion information for each of the at least one reference object.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the device comprises a first acquisition module and a detection module. The device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring an image to be processed, and the image to be processed comprises at least one target object; and the detection module is used for executing detection operation on the image to be processed to obtain a detection result aiming at each target object in the at least one target object, wherein the detection result aiming at each target object is associated with the distortion information of each target object.

According to another aspect of the present disclosure, there is provided a training apparatus for detecting a model, including: the device comprises a second acquisition module and a first training module. The second acquisition module is used for acquiring an image training sample, wherein the image training sample comprises at least one reference object; a first training module to train a detection model based on distortion information for each of the at least one reference object.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method and/or the training method of the detection model described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image processing method and/or the training method of the detection model described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the image processing method and/or the training method of the detection model described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an application scenario of an image processing method and a training method of a detection model according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 3 schematically shows a schematic diagram of an image processing method according to an embodiment of the present disclosure;

fig. 4 schematically shows a schematic diagram of an image processing method according to another embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a method of training a detection model according to an embodiment of the present disclosure;

fig. 6 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus for detection models according to an embodiment of the present disclosure; and

FIG. 8 is a block diagram of an electronic device for implementing a training method for image processing and/or detection models in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

An embodiment of the present disclosure provides an image processing method, including: acquiring an image to be processed, wherein the image to be processed comprises at least one target object. Then, a detection operation is performed on the image to be processed, and a detection result for each target object in the at least one target object is obtained, wherein the detection result for each target object is associated with distortion information of each target object.

The embodiment of the present disclosure provides a training method for a detection model, including: an image training sample is acquired, wherein the image training sample comprises at least one reference object. Then, a detection model is trained based on distortion information for each of the at least one reference object.

Fig. 1 schematically illustrates an application scenario of an image processing method and a training method of a detection model according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, an application scenario 100 of the present disclosure includes, for example, a to-be-processed image 110 and a detection model 120.

Exemplarily, the image to be processed 110 comprises for example a plurality of

target objects

111, 112, 113, 114. The

target objects

111, 112, 113, 114 include, for example, users, vehicles, objects, and the like.

Illustratively, the detection models 120 may include object detection models including, but not limited to, the fast RCNN (fast Regions with connected Neural Networks) model.

After the detection model 120 is obtained through training, the image to be processed 110 is input into the detection model 120 for detection, and

detection results

131, 132, 133 and 134 are obtained. The

detection results

131, 132, 133, 134 include, for example, position information of the respective target objects in the image to be processed 110. The

detection results

131, 132, 133, 134 may be represented by prediction boxes, for example, dashed boxes.

The embodiment of the present disclosure provides an image processing method, and an image processing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 4 in conjunction with an application scenario of fig. 1.

Fig. 2 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S220.

In operation S210, a to-be-processed image including at least one target object is acquired.

In operation S220, a detection operation is performed on the image to be processed, resulting in a detection result for each of the at least one target object, the detection result for each target object being associated with distortion information of each target object.

Illustratively, the image to be processed of the embodiment of the present disclosure is acquired by, for example, an image acquisition device including a fisheye camera. Because the image to be processed acquired by the fisheye camera has distortion information, the distortion information influences the image detection effect when the image to be processed is detected.

Since different target objects are in different regions in the image to be processed, the distortion information of different target objects is different. For example, the distortion degree of the target object at the edge portion of the image to be processed is large, and the distortion degree of the target object at the middle portion of the image to be processed is small.

Illustratively, the detection operation may be performed on the image to be processed by a detection model. When the detection model is trained, the detection capability of the detection model on the object with larger distortion degree can be improved, so that the detection accuracy of the target object with larger distortion degree is improved when the detection model obtained by training detects the image to be processed. In other words, when the trained detection model is used to detect the image to be processed, the detection capabilities of the detection model for target objects with different distortion degrees may be different, so that the detection result for each target object is associated with the distortion information of the target object, i.e., the detection results for target objects with different distortion degrees are all better.

According to the embodiment of the disclosure, when the detection model is used for detecting the image to be processed, the distortion information is considered, so that the obtained detection result is better, the detection capability of the detection model on the image with distortion is improved, and the image detection effect is further improved.

Fig. 3 schematically shows a schematic diagram of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 3, the detection model may be trained using a plurality of image training samples. Taking the image training sample 310 as an example, the image training sample 310 includes a plurality of reference objects, for example, the reference objects include but are not limited to users, vehicles, and objects.

Taking 3 reference objects as an example, the image training sample 310 includes label information, where the label information includes 3

labeling boxes

311, 312, 313 corresponding to the 3 reference objects one by one, and the labeling boxes are used to characterize each reference object in the image training sample 310.

In the process of training the detection model, the detection model is used to predict the positions of the reference objects in the image training sample 310 to obtain a prediction frame, and the prediction frame represents the prediction result of the reference objects. The prediction box is for example indicated by a dashed box. Illustratively, each annotation box corresponds to, for example, a prediction box.

For the target objects, the distortion information of each target object is associated with a first distance, which includes, for example, the distance between each target object and the image center of the image to be processed.

Similarly, for the reference objects, the distortion information for each reference object is associated with a third distance, including, for example, the distance between each reference object and the center of the image training sample 310. For example, the distance between the reference object 311 and the center of the image training sample 310 is L₁The distance between the reference object 312 and the center of the image training sample 310 is L₂The distance between the reference object 313 and the image center of the image training sample 310 is L₃. The larger the distance, the closer the reference object is to the edge region of the image training sample 310, i.e., the larger the distortion degree of the reference object.

Illustratively, the model parameters of the detection model are derived based on a loss function of the detection model, the loss function comprising a plurality of loss values.

For example, each reference object corresponds to a penalty value associated with a second distance between the annotation box and the prediction box. For example, the loss value a is determined based on the second distance between the labeling box 311 and the corresponding prediction box₁Determining a loss value a based on a second distance between the labeling box 312 and the corresponding prediction box₂Determining a loss value a based on a second distance between the labeling box 313 and the corresponding prediction box₃。

Weights corresponding to the loss values of the loss function are associated with distortion information of the reference object in the image training sample. For example, the loss value a₁Corresponding weight w₁And a distance L₁Associated, distance L₁The larger the weight w₁The larger. Loss value a₂Corresponding weight w₂And a distance L₂Associated, distance L₂The larger the weight w₂The larger. Loss value a₃Corresponding weight w₃And a distance L₃Associated, distance L₃The larger the weight w₃The larger.

In one example, the distance L may be directly related to₁Distance L₂Distance L₃Respectively as weight w₁Weight w₂Weight w₂。

In another example, the distance L may be measured₁Distance L₂Distance L₃Respectively carrying out normalization processing to obtain normalized distance d₁、d₂、d₃Will be

Respectively as weight w₁Weight w₂Weight w₂。

After obtaining each weight, calculating a Loss value Loss ═ w corresponding to the Loss function₁*a₁+w₂*a₂+w₃*a₃. Then, the detection model is trained based on the loss value corresponding to the loss function.

According to the embodiment of the disclosure, the larger the distance between the reference object and the image center of the image training sample is, the larger the distortion degree of the representation reference object is, and the larger the weight for the reference object is, so that the detection capability of the detection model on the reference object with the larger distortion degree is improved in the training process, and the detection effect of the model is improved.

Fig. 4 schematically shows a schematic diagram of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 4, an image training sample 410 and an image training sample 420 are taken as examples. The image training sample 410 includes, for example, an annotation box 411 for a reference object, and the center of the image training sample 410 is, for example, the intersection of the abscissa axis and the ordinate axis. The image training sample 420 includes, for example, an annotation box 421 for the reference object, and the center of the image training sample 420 is, for example, an intersection of an abscissa axis and an ordinate axis.

Illustratively, the labeling box includes a reference vertex and a remaining vertex other than the reference vertex, and the direction information of the labeling box is associated with a relative position between the reference vertex and the remaining vertex. The direction information of the label box includes, for example, upper right, upper left, lower left, and lower right.

If the direction information of the label box is upper right, the label box can be represented as (X, Y, + H, + W). If the direction information of the label box is upper left, the label box can be represented as (X, Y, -H, + W). If the direction information of the label box is lower left, the label box can be represented as (X, Y, -H, -W). If the direction information of the label box is the lower right, the label box can be represented as (X, Y, + H, -W). Wherein X represents the abscissa of the reference vertex, Y represents the ordinate of the reference vertex, H represents the length of the labeled box, and W represents the width of the labeled box.

For example, the label box 411 includes 4 vertices, reference vertex c₁The vertex closest to the center of the image among the 4 vertices, and the remaining 3 vertices. Reference vertex c₁If the relative position with respect to the remaining vertices is the lower right, the direction information of the label box 411 is the upper left, and the label box 411 can be expressed as(Xc₁，Yc₁，-Hc₁，+Wc₁). Wherein Xc₁The abscissa, Yc, representing the reference vertex of the label box 411₁Denotes the ordinate, Hc, of the reference vertex of the label box 411₁Indicates the length, Wc, of the label box 411₁Indicating the width of the label box 411.

For example, the label box 421 includes 4 vertices, reference vertex c₂The vertex closest to the center of the image among the 4 vertices, and the remaining 3 vertices. Reference vertex c₂If the relative position to the remaining vertex is the upper left, the direction information of the label box 421 is the lower right, and the label box 421 can be expressed as (Xc)₂，Yc₂，+Hc₂，-Wc₂). Wherein Xc₂Denotes the abscissa, Yc, of the reference vertex of the label box 421₂Denotes the ordinate, Hc, of the reference vertex of the label box 421₂Indicates the length, Wc, of the label box 421₂Indicating the width of the callout box 421.

Illustratively, the model parameters of the detection model are obtained based on the direction information of the labeling box. That is, in the process of training the detection model, the direction information of the label box needs to be used for training. If the labeling frame is represented only based on the coordinates of the reference vertex and the length and width values of the labeling frame, the representation mode of the labeling frame is not accurate enough, and therefore the training precision of the detection model is affected. For example, when the length and width values of the labeling box 411 and the labeling box 421 are the same, and the coordinate values of the reference vertex of the labeling box 411 and the labeling box 421 are the same, representing the labeling box based on only the coordinate of the reference vertex and the length and width value of the labeling box will cause the representations of the labeling box 411 and the labeling box 421 to be consistent, so that the labeling box 411 and the labeling box 421 cannot be distinguished. Therefore, the direction information of the marking frame is used for adding the positive sign and the negative sign to the length and width value of the marking frame, so that the representation mode of the marking frame is more accurate, the training precision of the detection model is improved, and the detection effect of the target detection is improved.

FIG. 5 schematically shows a flow chart of a training method of a detection model according to an embodiment of the present disclosure.

As shown in fig. 5, the training method 500 of the detection model according to the embodiment of the present disclosure may include operations S510 to S520, for example.

In operation S510, an image training sample is acquired, the image training sample including at least one reference object.

In operation S520, a detection model is trained based on distortion information of each of at least one reference object.

According to the embodiment of the disclosure, distortion information of each reference object is different in different image areas in the image training sample. For example, the degree of distortion of the reference object is greater in the edge region of the image training sample, and the degree of distortion of the reference object is smaller in the middle region of the image training sample. When the detection model is trained, the detection model is trained based on the distortion information of different reference objects, so that the detection capability of the detection model on the reference object with larger distortion degree is improved, and the precision of the detection model is improved.

Illustratively, the distortion information for each reference object is associated with a third distance comprising a distance between each reference object and an image center of the image training sample.

Training the detection model based on distortion information for each of the at least one reference object, for example, includes: based on the third distance, a weight for each reference object is determined, and then model parameters of the detection model are adjusted based on the weights. For example, for the third distance of each reference object, the larger the third distance is, the larger the weight corresponding to the reference object is, so that the detection capability of the reference object with the larger weight in the model training process is improved.

Illustratively, adjusting the model parameters of the detection model based on the weights includes, for example: a loss value of a loss function of the detection model is adjusted based on the weight, and then, a model parameter of the detection model is adjusted based on the adjusted loss value.

For example, the loss value is associated with a fourth distance between an annotation box used to annotate the reference object in the image training sample and a prediction box characterizing a prediction result of the reference object.

According to the embodiment of the disclosure, the larger the distance between the reference object and the image center of the image training sample is, the larger the distortion degree of the representation reference object is, and the larger the weight for the reference object is, so that the detection capability of the detection model on the reference object with the large distortion degree is improved in the training process, thereby improving the detection effect of the model.

Illustratively, for an annotation box in an image training sample, a relative position between a reference vertex in the annotation box and remaining vertices other than the reference vertex is determined. The reference vertex is, for example, a vertex having the smallest distance to the image center among the plurality of vertices of the labeling box.

Then, based on the relative position between the reference vertex and the residual vertex, the direction information of the labeling frame is determined, and based on the direction information of the labeling frame, the detection model is trained. For example, based on the direction information of the label box, the length and width values of the label box are signed to more accurately represent the label box.

According to the embodiment of the disclosure, the direction information of the labeling frame is used for adding the positive sign and the negative sign to the length and width value of the labeling frame, so that the representation mode of the labeling frame is more accurate, the training precision of the detection model is improved, and the detection effect of the target detection is improved.

Fig. 6 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the image processing apparatus 600 of the embodiment of the present disclosure includes, for example, a first acquisition module 620 and a detection module 620.

The first obtaining module 620 may be configured to obtain an image to be processed, where the image to be processed includes at least one target object. According to the embodiment of the present disclosure, the first obtaining module 620 may, for example, perform operation S210 described above with reference to fig. 2, which is not described herein again.

The detection module 620 may be configured to perform a detection operation on the image to be processed to obtain a detection result for each of the at least one target object, where the detection result for each target object is associated with distortion information of each target object. According to the embodiment of the present disclosure, the detecting module 620 may perform, for example, the operation S220 described above with reference to fig. 2, which is not described herein again.

According to an embodiment of the present disclosure, the distortion information of each target object is associated with a first distance, which includes a distance between each target object and an image center of the image to be processed.

According to the embodiment of the disclosure, the detection operation is executed by using a detection model, the model parameters of the detection model are obtained based on a loss function of the detection model, and the weight corresponding to the loss value of the loss function is associated with the distortion information of the reference object in the image training sample; the loss value is associated with a second distance between an annotation box and a prediction box, the annotation box is used for annotating a reference object in the image training sample, and the prediction box represents a prediction result of the reference object.

According to the embodiment of the disclosure, the model parameters of the detection model are obtained based on the direction information of the labeling frame, and the labeling frame is used for labeling the reference object in the image training sample; the labeling box comprises a reference vertex and a residual vertex except the reference vertex, and the direction information of the labeling box is associated with the relative position between the reference vertex and the residual vertex.

FIG. 7 schematically illustrates a block diagram of a training apparatus for detecting a model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 for detecting a model according to the embodiment of the present disclosure includes, for example, a second obtaining module 710 and a first training module 720.

The second acquisition module 710 may be configured to acquire an image training sample, wherein the image training sample includes at least one reference object. According to the embodiment of the present disclosure, the second obtaining module 710 may, for example, perform operation S510 described above with reference to fig. 5, which is not described herein again.

The first training module 720 may be configured to train the detection model based on distortion information for each of the at least one reference object. According to an embodiment of the present disclosure, the first training module 720 may, for example, perform operation S520 described above with reference to fig. 5, which is not described herein again.

According to an embodiment of the present disclosure, the distortion information of each reference object is associated with a third distance, the third distance comprising a distance between each reference object and an image center of the image training sample; the first training module 720 includes: a determination submodule and an adjustment submodule. A determination submodule for determining a weight for each reference object based on the third distance; and the adjusting submodule is used for adjusting the model parameters of the detection model based on the weight.

According to an embodiment of the disclosure, the adjustment submodule includes: a first adjusting unit and a second adjusting unit. A first adjusting unit, configured to adjust a loss value of a loss function of the detection model based on the weight, where the loss value is associated with a fourth distance between an annotation frame and a prediction frame, the annotation frame is used to label a reference object in the image training sample, and the prediction frame represents a prediction result of the reference object; and the second adjusting unit is used for adjusting the model parameters of the detection model based on the adjusted loss value.

According to an embodiment of the present disclosure, the apparatus 700 may further include: the device comprises a first determining module, a second determining module and a second training module. The first determination module is used for determining the relative positions of a reference vertex in an annotation frame and the rest of vertexes except the reference vertex aiming at the annotation frame in the image training sample; the second determination module is used for determining the direction information of the labeling frame based on the relative position between the reference vertex and the residual vertex; and the second training module is used for training the detection model based on the direction information of the labeling frame.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. The electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as an image processing method and/or a training method of a detection model. For example, in some embodiments, the image processing methods and/or the training methods of the detection models may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image processing method and/or the training method of the detection model described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g. by means of firmware) to perform an image processing method and/or a training method of the detection model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable image processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing method comprising:

acquiring an image to be processed, wherein the image to be processed comprises at least one target object; and

performing a detection operation on the image to be processed to obtain a detection result for each target object in the at least one target object,

wherein the detection result for the each target object is associated with distortion information of the each target object.

2. The method of claim 1, wherein the distortion information of the each target object is associated with a first distance, the first distance comprising a distance between the each target object and an image center of the image to be processed.

3. The method of claim 1 or 2, wherein:

the detection operation is executed by using a detection model, the model parameters of the detection model are obtained based on a loss function of the detection model, and the weight corresponding to the loss value of the loss function is associated with the distortion information of the reference object in the image training sample;

the loss value is associated with a second distance between an annotation box used to annotate a reference object in the image training sample and a prediction box characterizing a prediction result of the reference object.

4. The method of any of claims 1-3, wherein:

the model parameters of the detection model are obtained based on the direction information of a labeling frame, and the labeling frame is used for labeling a reference object in an image training sample;

the labeling box comprises a reference vertex and a residual vertex except the reference vertex, and the direction information of the labeling box is associated with the relative position between the reference vertex and the residual vertex.

5. A training method of a detection model, comprising:

acquiring an image training sample, wherein the image training sample comprises at least one reference object; and

training a detection model based on distortion information for each of the at least one reference object.

6. The method of claim 5, wherein the distortion information for the each reference object is associated with a third distance, the third distance comprising a distance between the each reference object and an image center of the image training sample; the training a detection model based on distortion information of each of the at least one reference object comprises:

determining a weight for the each reference object based on the third distance; and

based on the weights, model parameters of the detection model are adjusted.

7. The method of claim 6, wherein said adjusting model parameters of the detection model based on the weights comprises:

adjusting a loss value of a loss function of the detection model based on the weight, wherein the loss value is associated with a fourth distance between an annotation box and a prediction box, the annotation box is used for annotating a reference object in an image training sample, and the prediction box is used for representing a prediction result of the reference object; and

and adjusting the model parameters of the detection model based on the adjusted loss value.

8. The method of any of claims 5-7, further comprising:

determining, for a labeling box in the image training sample, relative positions between a reference vertex in the labeling box and remaining vertices except the reference vertex;

determining direction information of the labeling box based on the relative position between the reference vertex and the residual vertex; and

and training the detection model based on the direction information of the labeling frame.

9. An image processing apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring an image to be processed, and the image to be processed comprises at least one target object; and

a detection module for performing a detection operation on the image to be processed to obtain a detection result for each target object of the at least one target object,

10. The apparatus of claim 9, wherein the distortion information of the each target object is associated with a first distance, the first distance comprising a distance between the each target object and an image center of the image to be processed.

11. The apparatus of claim 9 or 10, wherein:

12. The apparatus of any one of claims 9-11, wherein:

13. A training apparatus for testing a model, comprising:

the second acquisition module is used for acquiring an image training sample, wherein the image training sample comprises at least one reference object; and

a first training module to train a detection model based on distortion information for each of the at least one reference object.

14. The apparatus of claim 13, wherein the distortion information of the each reference object is associated with a third distance, the third distance comprising a distance between the each reference object and an image center of the image training sample; the first training module comprises:

a determination sub-module for determining a weight for said each reference object based on said third distance; and

and the adjusting submodule is used for adjusting the model parameters of the detection model based on the weight.

15. The apparatus of claim 14, wherein the adjustment submodule comprises:

a first adjusting unit, configured to adjust a loss value of a loss function of the detection model based on the weight, where the loss value is associated with a fourth distance between an annotation frame and a prediction frame, the annotation frame is used for annotating a reference object in an image training sample, and the prediction frame represents a prediction result of the reference object; and

and the second adjusting unit is used for adjusting the model parameters of the detection model based on the adjusted loss value.

16. The apparatus of any of claims 13-15, further comprising:

a first determining module, configured to determine, for an annotation box in the image training sample, relative positions between a reference vertex in the annotation box and remaining vertices except the reference vertex;

a second determining module, configured to determine, based on the relative position between the reference vertex and the remaining vertex, direction information of the labeling box; and

and the second training module is used for training the detection model based on the direction information of the labeling frame.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.