CN111178126A

CN111178126A - Target detection method, target detection device, computer equipment and storage medium

Info

Publication number: CN111178126A
Application number: CN201911138793.2A
Authority: CN
Inventors: 庄月清; 沈磊; 李志远; 李伯勋; 俞刚; 张弛
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-05-19

Abstract

The application relates to a target detection method, a target detection device, computer equipment and a storage medium. The method comprises the following steps: inputting a target image comprising a target to be detected into a first detection model, and determining a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected; inputting the target image into a second detection model, and determining a third detection frame corresponding to the target to be detected; matching the first detection frame and the third detection frame to determine a target detection frame; and determining the associated target detection frame and the second detection frame as the detection result of the target to be detected. By the embodiment of the invention, the detection precision is improved, and the problem of inaccurate position of the detection frame can be avoided.

Description

Target detection method, target detection device, computer equipment and storage medium

Technical Field

The present application relates to the field of object detection technologies, and in particular, to an object detection method, an object detection apparatus, a computer device, and a storage medium.

Background

With the development of artificial intelligence technology, Computer Vision (CV) has made a significant advance. Among them, Object Detection (Object Detection) is one of the main research directions of computer vision.

Object detection is mainly used to determine whether a certain area in an image contains an object to be identified. The method includes the steps of determining whether a certain area in an image contains a target to be detected and a target part of the target to be detected, and displaying a first detection frame including the target to be detected and a second detection frame including the target part of the target to be detected if the certain area contains the target to be detected and the target part of the target to be detected. For example, a face frame and a pedestrian frame are displayed.

However, the first detection frame may be misaligned.

Disclosure of Invention

In view of the above, it is desirable to provide a target detection method, an apparatus, a computer device and a storage medium, which can improve detection accuracy and avoid inaccurate detection frame positions.

In a first aspect, an embodiment of the present invention provides a target detection method, where the method includes:

inputting a target image comprising a target to be detected into a first detection model, and determining a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected;

inputting the target image into the second detection model, and determining a third detection frame corresponding to the target to be detected;

matching the first detection frame and the third detection frame to determine a target detection frame;

and determining the associated target detection frame and the second detection frame as the detection result of the target to be detected.

In one embodiment, the matching the first detection frame and the third detection frame to determine the target detection frame includes:

determining the intersection ratio between the first detection frame and the third detection frame;

when the intersection ratio is larger than a preset threshold value, determining the third detection frame as a target detection frame;

and when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame.

In one embodiment, the inputting the target image including the target to be detected into the first detection model, and determining the first detection frame corresponding to the target to be detected and the second detection frame corresponding to the target portion of the target to be detected includes:

inputting the target image into a first detection model to obtain a first offset output by the first detection model and corresponding to a target to be detected and a second offset output by the first detection model and corresponding to a target part of the target to be detected;

determining a first detection frame according to the first anchor frame and the first offset;

and determining a second detection frame according to the first anchor frame and the second offset.

In one embodiment, the inputting the target image including the target to be detected into the second detection model and determining the third detection frame corresponding to the target to be detected includes:

inputting the target image into a second detection model to obtain a third offset output by the second detection model and corresponding to the target to be detected;

and determining a third detection frame according to the second anchor frame and the third offset.

In one embodiment, before the target image including the target to be detected is input to the first detection model, the method further includes:

acquiring a first training sample set; the first training sample set comprises a plurality of first sample images and first sample offsets and second sample offsets corresponding to the first sample images;

and training the detection model based on the first training sample set to obtain a first detection model.

In one embodiment, before the target image including the target to be detected is input to the second detection model, the method further includes:

acquiring a second training sample set; the second training sample set comprises a plurality of second sample images and third sample offsets corresponding to the second sample images;

and training the detection model based on the second training sample set to obtain a second detection model.

In one embodiment, the object to be detected is a pedestrian; the target part of the target to be detected is the face of the pedestrian.

In a second aspect, an embodiment of the present invention provides an object detection apparatus, including:

the first detection frame determining module is used for inputting a target image comprising a target to be detected into the first detection model, and determining a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected;

the third detection frame determining module is used for inputting the target image into the second detection model and determining a third detection frame corresponding to the target to be detected;

the target detection frame determining module is used for matching the first detection frame and the third detection frame to determine a target detection frame;

and the detection result output module is used for determining the associated target detection frame and the second detection frame as the detection result of the target to be detected.

In one embodiment, the target detection frame determining module is specifically configured to determine an intersection ratio between the first detection frame and the third detection frame; when the intersection ratio is larger than a preset threshold value, determining the third detection frame as a target detection frame; and when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame.

In one embodiment, the first detection frame determining module is specifically configured to input the target image into the first detection model, and obtain a first offset corresponding to the target to be detected and a second offset corresponding to the target portion of the target to be detected, which are output by the first detection model; determining a first detection frame according to the first anchor frame and the first offset; and determining a second detection frame according to the first anchor frame and the second offset.

In one embodiment, the third detection frame determining module is specifically configured to input the target image into the second detection model, and obtain a third offset output by the second detection model and corresponding to the target to be detected; and determining a third detection frame according to the second anchor frame and the third offset.

In one embodiment, the apparatus further comprises:

the first training sample set acquisition module is used for acquiring a first training sample set; the first training sample set comprises a plurality of first sample images and first sample offsets and second sample offsets corresponding to the first sample images;

and the first detection model training module is used for training the detection model based on the first training sample set to obtain a first detection model.

In one embodiment, the apparatus further comprises:

the second training sample set acquisition module is used for acquiring a second training sample set; the second training sample set comprises a plurality of second sample images and third sample offsets corresponding to the second sample images;

and the second detection model training module is used for training the detection model based on the second training sample set to obtain a second detection model.

In a third aspect, an embodiment of the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the method as described above when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the method as described above.

According to the target detection method, the target detection device, the computer equipment and the storage medium, the target image comprising the target to be detected is input into the first detection model, and the first detection frame corresponding to the target to be detected and the second detection frame corresponding to the target part of the target to be detected are determined; inputting the target image into the second detection model, and determining a third detection frame corresponding to the target to be detected; matching the first detection frame and the third detection frame to determine a target detection frame; and determining the associated target detection frame and the second detection frame as the detection result of the target to be detected. According to the embodiment of the invention, the first detection frame comprising the target to be detected is obtained according to the output result of the first detection model, the third detection frame comprising the target to be detected is obtained according to the output result of the second detection model, and the target detection frame more accurately comprising the target to be detected is selected from the first detection frame and the third detection frame, so that the detection precision is improved, and the problem of inaccurate position of the detection frame can be avoided.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a target detection method;

FIG. 2 is a schematic flow chart diagram of a method for object detection in one embodiment;

FIG. 3 is a flowchart illustrating a step of matching the first detection box and the third detection box to determine a target detection box according to an embodiment;

FIG. 4 is a schematic flow chart of a target detection method in another embodiment;

FIG. 5 is a block diagram of an embodiment of an object detection device;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target detection method provided by the application can be applied to the application environment shown in fig. 1. The application environment includes a terminal 101 and a server 102, and the terminal 101 and the server 102 communicate with each other through a network. The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 102 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In an embodiment, as shown in fig. 2, an object detection method is provided, and the method may be applied to the server in fig. 1, and may also be applied to the terminal in fig. 1, which is not limited in this embodiment. The following description is given by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:

step 201, inputting a target image including a target to be detected into a first detection model, and determining a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected.

In this embodiment, a first detection model is trained in advance, and when detecting a target, a target image is input into the first detection model, and the first detection model outputs a corresponding result according to the target image. And then, determining a first detection frame comprising the target to be detected and a second detection frame comprising the target part of the target to be detected according to the same anchor frame and the output result of the first detection model, wherein the target to be detected and the target part of the target to be detected have position relevance.

For example, the target to be detected is a pedestrian, the target part of the target to be detected is the face of the pedestrian, and the face and the pedestrian have position relevance. Inputting a target image A into a first detection model, and then acquiring a first detection frame comprising a pedestrian and a second detection frame comprising a human face according to an output result of the first detection model, wherein the first detection frame and the second detection frame have relevance and correspond to the human body position of the same pedestrian and the human face position of the same pedestrian in the target image A.

Step 202, inputting the target image into the second detection model, and determining a third detection frame corresponding to the target to be detected.

In this embodiment, a second detection model is trained in advance, the target image is input into the second detection model, and the second detection model outputs a corresponding result according to the target image. And then, determining a third detection frame comprising the target to be detected according to the other anchor frame and the output result of the second detection model.

For example, the target image a is input into the second detection model, and the third detection frame including the pedestrian is acquired according to the output result of the second detection model.

The sequence of step 201 and step 202 is not limited in detail in the embodiment of the present invention, and may be set according to actual situations. The target image may be a feature map, and may be specifically set based on the first detection model and the second detection model.

And step 203, matching the first detection frame and the third detection frame to determine a target detection frame.

In this embodiment, the first detection frame and the third detection frame are compared, and one detection frame is selected from the first detection frame and the third detection frame as the target detection frame, that is, the third detection frame is used to correct the first detection frame, so that the obtained target detection frame more accurately includes the target to be detected.

For example, the third detection frame is selected as the target detection frame from the first detection frame and the third detection frame including the pedestrian.

And 204, determining the associated target detection frame and the second detection frame as the detection result of the target to be detected.

In this embodiment, the target detection frame and the second detection frame are associated and output. For example, the third detection frame including a pedestrian and the second detection frame including a face are determined as a human body frame and a face frame corresponding to the same pedestrian, and both are associated as a detection result output corresponding to the same pedestrian.

In the target detection method, a target image comprising a target to be detected is input into a first detection model, and a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected are determined; inputting the target image into the second detection model, and determining a third detection frame corresponding to the target to be detected; matching the first detection frame and the third detection frame to determine a target detection frame; and determining the associated target detection frame and the second detection frame as the detection result of the target to be detected. According to the embodiment of the invention, the first detection frame comprising the target to be detected is obtained according to the output result of the first detection model, the third detection frame comprising the target to be detected is obtained according to the output result of the second detection model, and the target detection frame more accurately comprising the target to be detected is selected from the first detection frame and the third detection frame, so that the detection precision is improved, and the problem of inaccurate position of the detection frame can be avoided.

In another embodiment, as shown in fig. 3, this embodiment relates to an optional process of matching the first detection frame and the third detection frame to determine the target detection frame. On the basis of the embodiment shown in fig. 2, the step 203 may specifically include the following steps:

step 301, determining the intersection ratio between the first detection frame and the third detection frame.

In this embodiment, an intersection of the first detection frame and the third detection frame and a union of the first detection frame and the third detection frame are determined, and a ratio of the intersection to the union is calculated to obtain an intersection-union ratio between the first detection frame and the third detection frame.

For example, the intersection of the first detection frame and the third detection frame is M, the union of the first detection frame and the third detection frame is N, and the intersection ratio IOU is calculated as M/N.

And step 302, when the intersection ratio is greater than a preset threshold value, determining a third detection frame as a target detection frame.

In this embodiment, a threshold of the intersection ratio is preset, and if the intersection ratio is greater than the preset threshold, it indicates that the third detection frame is more accurate, the third detection frame is determined as the target detection frame. For example, the preset threshold is P, and if M/N is greater than P, the third detection frame is determined as the target detection frame.

And step 303, when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame.

In this embodiment, if the intersection ratio is not greater than the preset threshold, which indicates that the first detection frame is more accurate, the first detection frame is determined as the target detection frame. For example, if M/N is not greater than P, the first detection box is determined to be the target detection box.

In the step of matching the first detection frame and the third detection frame and determining the target detection frame, determining the intersection ratio between the first detection frame and the third detection frame; when the intersection ratio is larger than a preset threshold value, determining the third detection frame as a target detection frame; and when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame. According to the embodiment of the invention, the target detection frame which more accurately comprises the target to be detected can be selected from the first detection frame and the third detection frame according to the intersection ratio, so that the detection precision is improved, and the problem of inaccurate position of the detection frame is avoided.

In another embodiment, as shown in fig. 4, this embodiment relates to an alternative process of the target detection method. On the basis of the embodiment shown in fig. 2, the method may specifically include the following steps:

step 401, obtaining a first training sample set; the first training sample set comprises a plurality of first sample images and first sample offsets and second sample offsets corresponding to the first sample images; and training the detection model based on the first training sample set to obtain a first detection model.

In this embodiment, the server obtains a plurality of first sample images, and a first sample offset and a second sample offset corresponding to each first sample image, and combines the plurality of first sample images and the first sample offsets and the second sample offsets corresponding to each first sample image into a first training sample set. And when the first detection model is trained, taking the first sample image as input, and taking the first sample offset and the second sample offset as supervision to train the detection model, so as to obtain the first detection model.

For example, the first sample images a1 and a2 … … a100 are input as detection models, and the first offsets b1 and b2 … … b100 and the second offsets c1 and c2 … … c100 are supervised to train the detection models, thereby obtaining the first detection models.

Step 402, obtaining a second training sample set; the second training sample set comprises a plurality of second sample images and third sample offsets corresponding to the second sample images; and training the detection model based on the second training sample set to obtain a second detection model.

In this embodiment, the server obtains a plurality of second sample images and third sample offsets corresponding to the second sample images, and combines the plurality of second sample images and the third sample offsets corresponding to the second sample images into a second training sample set. And when the second detection model is trained, the second sample image is used as input, the third sample offset is used as supervision, and the detection model is trained to obtain the second detection model.

For example, the second sample images d1 and d2 … … d100 are used as input of the detection model, and the third offsets e1 and e2 … … e100 are used as supervision to train the detection model, so as to obtain the second detection model.

The detection model may be an RPN (Region pro-social Network), which is not limited in detail in the embodiment of the present invention and may be set according to an actual situation.

Step 403, inputting the target image into a first detection model, wherein the first detection model outputs a first offset corresponding to the target to be detected and a second offset corresponding to the target part of the target to be detected; determining a first detection frame according to the first anchor frame and the first offset; and determining a second detection frame according to the first anchor frame and the second offset.

In this embodiment, after the obtained target image is obtained, the target image is input into the first detection model trained in step 401, and the first detection model outputs a first offset and a second offset, where the first offset corresponds to the target to be detected, the second offset corresponds to the target portion of the target to be detected, and the target to be detected and the target portion of the target to be detected have position relevance.

The first offset is used for indicating a position difference value between the first anchor frame and the target to be detected, and the second offset is used for indicating a position difference value between the first anchor frame and the target part of the target to be detected. In an embodiment, after the first offset and the second offset are obtained, the position of the first anchor frame may be adjusted according to the first offset to obtain a first detection frame, and the position of the first anchor frame may be adjusted according to the second offset to obtain a second detection frame.

Step 404, inputting the target image into a second detection model to obtain a third offset output by the second detection model and corresponding to the target to be detected; and determining a third detection frame according to the second anchor frame and the third offset.

In this embodiment, the target image is input into the second detection model trained in step 402, and the second detection model outputs the third offset corresponding to the target to be detected. And the third offset is used for indicating a position difference value between the second anchor frame and the target to be detected, so that after the third offset is obtained, the position of the second anchor frame can be adjusted according to the third offset to obtain a third detection frame.

The first anchor frame and the second anchor frame are different anchor frames, and the positions of the first detection frame obtained according to the first anchor frame and the third detection frame obtained according to the second anchor frame may be the same or different.

The present embodiment does not limit the sequence of

steps

401 and 402, and the sequence of

steps

403 and 404 in detail, and may be set according to actual situations.

And 405, matching the first detection frame and the third detection frame to determine a target detection frame.

In one embodiment, the intersection-to-parallel ratio between the first detection frame and the third detection frame is determined; when the intersection ratio is larger than a preset threshold value, determining the third detection frame as a target detection frame; and when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame.

And step 406, determining the associated target detection frame and the second detection frame as the detection result of the target to be detected.

In this embodiment, the server establishes an association relationship between the target detection frame and the second detection frame, and outputs the associated target detection frame and second detection frame to the terminal, and the terminal displays the target detection frame and second detection frame on the target image.

In the target detection method, a first detection model and a second detection model are trained in advance, wherein the first detection model outputs a first offset corresponding to the target to be detected and a second offset corresponding to the target part of the target to be detected according to the target image, and the second detection model outputs a third offset corresponding to the target to be detected according to the target image. Obtaining a first detection frame corresponding to the target to be detected according to a preset first anchor frame and a first offset; obtaining a second detection frame corresponding to the target part of the target to be detected according to the first anchor frame and the second offset; the third detection frame corresponding to the target to be detected can be obtained according to the preset second anchor frame and the preset third offset, the two detection models and the two anchor frames are adopted, so that the obtained first offset and the obtained third offset are different, the obtained first detection frame and the obtained third detection frame are different in position, and the third detection frame can be adopted to correct the first detection frame, so that the detection precision is improved, and the problem of inaccurate position of the detection frame is avoided.

It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 5, there is provided an object detection apparatus including:

a first detection frame determining module 501, configured to input a target image including a target to be detected into a first detection model, and determine a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target portion of the target to be detected;

a third detection frame determining module 502, configured to input the target image into the second detection model, and determine a third detection frame corresponding to the target to be detected;

a target detection frame determining module 503, configured to match the first detection frame and the third detection frame, and determine a target detection frame;

and a detection result output module 504, configured to determine the associated target detection frame and the second detection frame as the detection result of the target to be detected.

In one embodiment, the target detection frame determining module 503 is specifically configured to determine an intersection ratio between the first detection frame and the third detection frame; when the intersection ratio is larger than a preset threshold value, determining the third detection frame as a target detection frame; and when the intersection ratio is not greater than a preset threshold value, determining the first detection frame as a target detection frame.

In one embodiment, the first detection frame determining module 501 is specifically configured to input the target image into the first detection model, and obtain a first offset corresponding to the target to be detected and a second offset corresponding to the target portion of the target to be detected, which are output by the first detection model; determining a first detection frame according to the first anchor frame and the first offset; and determining a second detection frame according to the first anchor frame and the second offset.

In one embodiment, the third detection frame determining module 502 is specifically configured to input the target image into the second detection model, so as to obtain a third offset output by the second detection model and corresponding to the target to be detected; and determining a third detection frame according to the second anchor frame and the third offset.

In one embodiment, the apparatus further comprises:

For specific limitations of the target detection device, reference may be made to the above limitations of the target detection method, which are not described herein again. The modules in the target detection device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing object detection data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of object detection, the method comprising:

inputting the target image into a second detection model, and determining a third detection frame corresponding to the target to be detected;

2. The method of claim 1, wherein matching the first detection box and the third detection box to determine a target detection box comprises:

determining an intersection ratio between the first detection frame and the third detection frame;

when the intersection ratio is larger than a preset threshold value, determining the third detection frame as the target detection frame;

and when the intersection ratio is not greater than the preset threshold value, determining the first detection frame as the target detection frame.

3. The method according to claim 1, wherein the inputting the target image including the target to be detected into the first detection model, and the determining the first detection frame corresponding to the target to be detected and the second detection frame corresponding to the target portion of the target to be detected comprises:

inputting the target image into the first detection model to obtain a first offset output by the first detection model and corresponding to the target to be detected and a second offset output by the first detection model and corresponding to the target part of the target to be detected;

determining the first detection frame according to the first anchor frame and the first offset;

and determining the second detection frame according to the first anchor frame and the second offset.

4. The method according to claim 1, wherein the inputting the target image into a second detection model and determining a third detection frame corresponding to the target to be detected comprises:

inputting the target image into the second detection model to obtain a third offset output by the second detection model and corresponding to the target to be detected;

and determining the third detection frame according to the second anchor frame and the third offset.

5. The method according to any one of claims 1-4, wherein prior to said inputting the object image comprising the object to be detected into the first detection model, the method further comprises:

acquiring a first training sample set; the first training sample set comprises a plurality of first sample images and a first sample offset and a second sample offset which correspond to the first sample images;

and training a detection model based on the first training sample set to obtain the first detection model.

6. The method according to any one of claims 1-4, wherein before said inputting the object image comprising the object to be detected into the second detection model, the method further comprises:

and training a detection model based on the second training sample set to obtain the second detection model.

7. The method according to any one of claims 1 to 4, wherein the object to be detected is a pedestrian; the target part of the target to be detected is the face of the pedestrian.

8. An object detection apparatus, characterized in that the apparatus comprises:

the first detection frame determining module is used for inputting a target image comprising a target to be detected into a first detection model, and determining a first detection frame corresponding to the target to be detected and a second detection frame corresponding to a target part of the target to be detected;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.