CN112734747B

CN112734747B - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN112734747B
Application number: CN202110081738.5A
Authority: CN
Inventors: 周红花
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2024-06-25
Anticipated expiration: 2041-01-21
Also published as: CN112734747A

Abstract

The application discloses a target detection method, a target detection device, electronic equipment and a storage medium; the method comprises the steps of determining at least one target object to be detected in an image to be detected, and obtaining a target template image of each target object; determining the image complexity of each target template image based on the pixel value difference of each pixel point in each target template image; determining a detection threshold corresponding to each target object according to the image complexity; identifying an image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold; a target object detection area of the target object is determined from the primary object detection areas. The application can adapt to the detection scene with various targets, and improves the accuracy of target detection.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a target detection method, a target detection device, an electronic device, and a storage medium.

Background

With the development of computer technology, artificial intelligence is increasingly used. The technology of target detection by means of machine learning by means of artificial intelligence is becoming a mainstream research direction of target detection. The task of object detection is to find objects of interest from the image, to determine their class and position, e.g. to detect faces, vehicles or buildings etc. from the image.

In the prior art, the target detection can be generally performed through a deep neural network model, but the deep neural network model needs a large amount of marking data, and the model is trained by marking coordinate data of a target frame on an image sample.

In addition, a template matching method can be adopted, the similarity of the screenshot of each sliding window position of the target template picture and the picture to be detected is calculated through the comparison among the pictures of the sliding windows, and the window with the similarity higher than a specified threshold value is used as a detection result, but the method cannot adapt to different types of detection targets, and the detection accuracy is lower in detection scenes with multiple target types.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, which can adapt to detection scenes with various targets and improve the accuracy of target detection.

The embodiment of the application provides a target detection method, which comprises the following steps:

determining at least one target object to be detected in the image to be detected, and acquiring a target template image corresponding to each target object;

determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image;

determining a detection threshold corresponding to each target object according to the image complexity of each target template image;

identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object;

for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object;

And determining at least one target object detection area of the target object from the initial selection object detection areas of the target objects to obtain at least one target object detection area of each target object.

Accordingly, an embodiment of the present application provides an object detection apparatus, including:

The determining unit is used for determining at least one target object to be detected in the image to be detected and acquiring a target template image corresponding to each target object;

a complexity determining unit, configured to determine an image complexity of each target template image based on a pixel value difference between pixel points in each target template image;

The threshold determining unit is used for determining a detection threshold corresponding to each target object according to the image complexity of each target template image;

The identification unit is used for identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object;

A preliminary selection determining unit, configured to determine, for each target object, at least one preliminary selection object detection area of the target object from the candidate object detection areas according to a similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object;

And the target determining unit is used for determining at least one target object detection area of the target object from the initial target detection areas of the target objects to obtain at least one target object detection area of each target object.

Alternatively, in some embodiments of the present application, the identifying unit may include a scaling subunit and an identifying subunit, as follows:

The scaling subunit is used for scaling the target template image corresponding to each target object under different scales to obtain target template images corresponding to each target object under multiple scales;

And the identification subunit is used for identifying the image to be detected based on the target template images under a plurality of scales to obtain a plurality of candidate object detection areas of each target object.

Alternatively, in some embodiments of the present application, the target determining unit may include a dividing subunit, a calculating subunit, and a fifth determining subunit, as follows:

The dividing subunit is configured to divide the primary object detection area and the target template image into grids according to each target object, so as to obtain a plurality of sub-object detection grid areas of the primary object detection area and a plurality of sub-template grid areas of the target template image;

A calculating subunit, configured to calculate a sub-region similarity between a target sub-object detection grid region of the primary object detection region and a target sub-template grid region of the target template image, where a position of the target sub-object detection grid region corresponds to a position of the target sub-template grid region;

and a fifth determining subunit, configured to determine at least one target object detection area of the target object from the initially selected object detection areas of the target object based on the sub-area similarity, so as to obtain at least one target object detection area of each target object.

Optionally, in some embodiments of the present application, the calculating subunit may specifically be configured to calculate, for each color channel, a first pixel mean value of a target sub-object detection grid area of the primary object detection area; calculating a second pixel mean value of a target sub-template grid region of the target template image under each color channel; and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

Alternatively, in some embodiments of the present application, the complexity determining unit may include a first determining subunit and a second determining subunit, as follows:

the first determining subunit is configured to determine at least two types of difference parameters of each target template image based on a difference in pixel value between pixel points in each target template image;

and the second determination subunit is used for determining the image complexity of each target template image based on the difference parameters.

Optionally, in some embodiments of the application, the variance parameters include a lateral variance parameter and a longitudinal variance parameter; the second determining subunit may specifically be configured to fuse the transverse difference parameter and the longitudinal difference parameter to obtain an image complexity of each target template image.

Optionally, in some embodiments of the present application, the threshold determining unit may be specifically configured to determine the detection threshold corresponding to each target object based on an image complexity of each target template image and a preset mapping relation set, where the preset mapping relation set includes a mapping relation between a preset image complexity and a preset detection threshold.

Optionally, in some embodiments of the present application, the preset mapping relation set includes a first sub-mapping relation set and a second sub-mapping relation set; the first sub-mapping relation set comprises an inverse mapping relation between preset image complexity and a preset detection threshold; the second sub-mapping relation set comprises a fixed mapping relation between preset image complexity and a preset detection threshold;

the threshold determining unit may include a third determining subunit and a fourth determining subunit, as follows:

The third determining subunit is configured to determine, when the image complexity of the target template image is less than a preset complexity, a detection threshold of a target object corresponding to the target template image based on the image complexity of the target template image and the first sub-mapping relation set;

and the fourth determining subunit is used for determining a detection threshold value of the target object corresponding to the target template image based on the image complexity of the target template image and the second sub-mapping relation set when the image complexity of the target template image is not less than the preset complexity.

Optionally, in some embodiments of the present application, the target determining unit may include a first selecting subunit, a second selecting subunit, and a sixth determining subunit, as follows:

the first selecting subunit is configured to select, for each target object, a primary object detection area with the highest similarity with the target template image in the primary object detection areas as a candidate target object detection area of the target object;

A second selecting subunit, configured to select, from the primary object detection areas, a candidate target object detection area corresponding to the target object based on a distance between the candidate target object detection area and the primary object detection area, where the distance characterizes a degree of overlapping between the candidate target object detection area and the primary object detection area;

and a sixth determining subunit, configured to determine at least one target object detection area of the target object according to the candidate target object detection area.

Optionally, in some embodiments of the present application, the second selecting subunit may specifically be configured to use, as the reference object detection area, a primary object detection area having a distance from the candidate target object detection area greater than a preset distance threshold; and selecting a reference object detection region with highest similarity with the target template image from the reference object detection regions as a candidate target object detection region corresponding to the target object.

The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the target detection method provided by the embodiment of the application.

In addition, the embodiment of the application further provides a storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the target detection method provided by the embodiment of the application.

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, which can determine at least one target object to be detected in an image to be detected and acquire a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object; and determining at least one target object detection area of the target object from the initial selection object detection areas of the target objects to obtain at least one target object detection area of each target object. The embodiment of the application does not need to carry out a large amount of training, saves manpower and material resources, determines the detection threshold value of the target object based on the complexity of the image, is not fixed, can adapt to the detection scene with various types of targets, and can improve the accuracy of target detection.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1a is a schematic view of a scenario of a target detection method according to an embodiment of the present application;

FIG. 1b is a flowchart of a target detection method according to an embodiment of the present application;

FIG. 1c is an explanatory diagram of a target detection method provided by an embodiment of the present application;

FIG. 1d is another illustration of a target detection method provided by an embodiment of the present application;

FIG. 1e is another illustration of a target detection method provided by an embodiment of the present application;

FIG. 2a is another flow chart of a target detection method according to an embodiment of the present application;

FIG. 2b is another flowchart of a target detection method according to an embodiment of the present application;

FIG. 2c is another flow chart of a target detection method according to an embodiment of the present application;

FIG. 3a is a schematic diagram of a target detection apparatus according to an embodiment of the present application;

FIG. 3b is a schematic diagram of another structure of the object detection device according to the embodiment of the present application;

FIG. 3c is a schematic diagram of another structure of the object detection device according to the embodiment of the present application;

FIG. 3d is a schematic diagram of another structure of the object detection device according to the embodiment of the present application;

FIG. 3e is a schematic diagram of another structure of the object detection device according to the embodiment of the present application;

FIG. 3f is a schematic diagram of another structure of the object detection device according to the embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium. The object detection device may be integrated in an electronic device, which may be a terminal or a server.

It will be appreciated that the target detection method of this embodiment may be executed on the terminal, may be executed on the server, or may be executed by both the terminal and the server. The above examples should not be construed as limiting the application.

As shown in fig. 1a, an example is a method in which a terminal and a server perform target detection together. The object detection system provided by the embodiment of the application comprises a terminal 10, a server 11 and the like; the terminal 10 and the server 11 are connected via a network, for example, a wired or wireless network connection, wherein the object detection means may be integrated in the server.

Wherein, the server 11 can be used for: determining at least one target object to be detected in the image to be detected, and acquiring a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold corresponding to the target object; and determining a target object detection area of the target object from the initial selection object detection areas of the target objects to obtain a target object detection area of each target object. The server 11 may be a single server, or may be a server cluster or cloud server composed of a plurality of servers.

The terminal 10 may acquire an image to be detected, determine at least one target object to be detected in the image to be detected, and send related information such as the image to be detected and the target object to be detected to the server 11, so that the server 11 identifies the image to be detected based on a target template image of the target object, to obtain a target object detection area of the target object in the image to be detected. The server 11 may also transmit the detection result obtained by the recognition to the terminal 10, that is, the target object detection area of the target object to the terminal 10, and the terminal 10 may receive the target object detection area of the target object transmitted by the server 11. The terminal 10 may include a mobile phone, a smart tv, a tablet computer, a notebook computer, or a personal computer (PC, personal Computer), among others. A client may also be provided on the terminal 10, which may be an application client or a browser client, etc.

The above-described step of target detection by the server 11 may be performed by the terminal 10.

The embodiment of the application provides a target detection method, which relates to a computer vision technology in the field of artificial intelligence. The embodiment of the application can adapt to detection scenes with various types of targets and improve the accuracy of target detection.

Wherein artificial intelligence (AI, artificial Intelligence) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The Computer Vision technology (CV) is a science for researching how to make a machine "look at", and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as identifying, tracking and measuring on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, object detection and localization, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous localization and map construction, and other techniques, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The present embodiment will be described from the point of view of an object detection apparatus, which may be integrated in an electronic device, which may be a server or a terminal, or the like.

The target detection method of the embodiment of the application can be applied to various target detection scenes, and the types of target objects to be detected are not limited, for example, the target objects can be small icons, text targets, large icons, complex targets and the like. For example, for a large number of types of game target detection tasks in a plurality of game pictures, the target detection method provided by the embodiment can determine the detection threshold of the target object based on the image complexity, the detection threshold is not fixed, the detection method can adapt to detection scenes with a plurality of types of targets, and the accuracy of target detection can be improved.

As shown in fig. 1b, the specific flow of the target detection method may be as follows:

101. At least one target object to be detected in the images to be detected is determined, and a target template image corresponding to each target object is acquired.

The image to be detected may include at least one target object to be detected, which is specifically an image where a specific location of the target object needs to be identified. The image type of the image to be detected is not limited.

The target object may be various types of targets to be identified, for example, may be various types of small icons, simple targets, characters, buttons, large icons, complex targets, and the like. The target template image may be used to identify a target object in the image to be detected, which may be regarded as a standard image containing the target object. The standard image is an image corresponding to a predetermined target object.

102. The image complexity of each target template image is determined based on pixel value differences between pixel points in each target template image.

Wherein the image complexity characterizes the complexity of texture colors, etc. of the target template image. The pixel value may be an RGB (red green blue) value or a gray value, which is not limited in this embodiment.

Optionally, in this embodiment, the step of determining the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image may include:

Determining at least two types of difference parameters of each target template image based on pixel value differences among pixel points in each target template image;

and determining the image complexity of each target template image based on the difference parameters.

The difference parameters may include a lateral difference parameter, a longitudinal difference parameter, an oblique difference parameter, and the like of the pixel point, which is not limited in this embodiment. Specifically, each difference parameter can be weighted and fused to obtain the image complexity of the target template image. Thus, based on the difference parameters, the image complexity can be acquired more quickly and accurately.

The transverse difference parameter can represent the difference between each column of pixel points of the target template image from left to right or from right to left; the longitudinal difference parameter can represent the difference between each row of pixel points of the target template image from top to bottom or from bottom to top; the diagonal disparity parameter may characterize the disparity between pixels in the direction that is neither parallel nor perpendicular to the target template image.

Optionally, in this embodiment, the difference parameters include a lateral difference parameter and a longitudinal difference parameter; the step of determining the image complexity of each target template image based on the variance parameter may comprise:

and fusing the transverse difference parameters and the longitudinal difference parameters to obtain the image complexity of each target template image.

The transverse difference parameter may be a transverse gradient difference value matrix, and the longitudinal difference parameter may be a longitudinal gradient difference value matrix. Specifically, the transverse gradient difference value matrix may be obtained by performing a difference operation on pixel values of pixels in two adjacent columns of the image matrix of the target template image, and the longitudinal gradient difference value matrix may be obtained by performing a difference operation on pixel values of pixels in two adjacent rows of the image matrix of the target template image.

The fusion manner of the transverse difference parameter and the longitudinal difference parameter may be various, which is not limited in this embodiment. For example, a sum-of-variance operation or a mean square-error operation may be performed on the lateral difference parameter and the longitudinal difference parameter, and the operation result may be used as the image complexity of the target template image.

Specifically, the target template image is an image of N rows and M columns, the image matrix of the target template image can be recorded as N rows and M columns, the longitudinal gradient difference value matrix can be recorded as A1, and the transverse gradient difference value matrix can be recorded as A2. The calculation process of the image complexity of A1 and A2 is as follows:

1) As shown in fig. 1c, the longitudinal gradient difference value matrix can be obtained by calculating the difference value of the pixel points of two adjacent rows of the image matrix and then performing absolute value on the difference value result. Specifically, the filter 1 may be used to perform convolution operation on the image matrix of the target template image, where the effect is equivalent to the difference operation between two adjacent rows of the image matrix, so as to obtain a matrix of N-1 rows and M columns, and then calculate an absolute value (abs is an absolute value function) for each element of the matrix, so as to obtain a positive integer matrix of N-1 rows and M columns, where the positive integer matrix is a longitudinal gradient difference value matrix A1.

2) As shown in fig. 1c, the transverse gradient difference value matrix can be obtained by calculating the difference value of the pixels in two adjacent columns of the image matrix and then performing absolute value on the difference value result. Specifically, the filter 2 may be used to perform convolution operation on the image matrix of the target template image, where the effect is equivalent to the difference operation between two adjacent columns of the image matrix, so as to obtain a matrix of N rows and M-1 columns, and then calculate an absolute value (abs is an absolute value function) for each element of the matrix, so as to obtain a positive integer matrix of N rows and M-1 columns, where the positive integer matrix is the transverse gradient difference value matrix A2.

Wherein, the filter 1 may be: The filter 2 may be: [ -1 1].

3) After the longitudinal gradient difference value matrix A1 and the transverse gradient difference value matrix A2 are obtained, the image complexity may be calculated based on A1 and A2. In a specific embodiment, the average value of each value of the A1 and A2 matrices may be calculated to obtain AM1 and AM2, that is, all the values of the A1 and A2 matrices are added respectively and divided by the product of the number of rows and columns of the matrices; the calculation processes of AM1 and AM2 are shown in the following formulas (1) and (2), respectively:

Wherein a _i,j represents the values of the ith row and the jth column in the matrix A1, and a _l,k represents the values of the 1 st row and the kth column in the matrix A2; the image complexity can be specifically obtained by performing sum-and-variance operation on the AM1 and the AM2, and is recorded as c, The image complexity is also shown as equation (3):

alternatively, the pixel value of the present embodiment may be a luminance value, and the image complexity of the target template image may be determined based on the luminance difference of the entire target template image. However, for a target object with a too dark picture, the complexity of the calculated image is too low, and the local complexity of the picture cannot be identified.

103. And determining a detection threshold corresponding to each target object according to the image complexity of each target template image.

Optionally, in this embodiment, the step of determining the detection threshold corresponding to each target object according to the image complexity of each target template image may include:

and determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between the preset image complexity and the preset detection threshold.

The preset mapping relation set includes a mapping relation between a preset image complexity and a preset detection threshold, and it can be understood that the preset mapping relation set may be regarded as a function, the preset image complexity is an independent variable, and the preset detection threshold is an independent variable. In a specific embodiment, the image complexity is denoted as c, the detection threshold is denoted as a, and the preset mapping relationship set may be denoted as a=f (c).

The mapping relationship may include an inverse mapping relationship, which may be linear or nonlinear, which is not limited in this embodiment. The inversely proportional mapping relationship specifically refers to a change in complexity of the preset image, and the preset detection threshold changes in the opposite direction along with the change, for example, the preset image complexity increases and the preset detection threshold decreases. The mapping relation set to be inversely proportional can adapt to more diversified detection scenes, and in some detection scenes, the image complexity of an object to be detected may be greatly different, and some textures are simple and some textures are complex; in this embodiment, for a target object with simple texture, the preset detection threshold is larger, for a target object with complex texture, the preset detection threshold is smaller, because the target object with simple texture contains fewer features, a non-target object is easy to detect when the target is detected, and the accuracy of target detection can be improved by the higher detection threshold; for a target object with complex texture, the target object itself contains more features and is easy to distinguish from a non-target object, so that the requirement on a preset detection threshold is relatively low.

Alternatively, in some embodiments, the mapping relationship between the preset image complexity and the preset detection threshold in the preset mapping relationship set may be set according to practical situations, which is not limited in this embodiment, for example, the preset detection threshold may be obtained by performing reciprocal operation or polynomial operation on the preset image complexity.

Optionally, in this embodiment, the preset mapping relation set includes a first sub-mapping relation set and a second sub-mapping relation set; the first sub-mapping relation set comprises an inverse mapping relation between preset image complexity and a preset detection threshold; the second sub-mapping relation set comprises a fixed mapping relation between preset image complexity and a preset detection threshold;

The step of determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set may include:

when the image complexity of the target template image is smaller than the preset complexity, determining a detection threshold of a target object corresponding to the target template image based on the image complexity of the target template image and a first sub-mapping relation set;

and when the image complexity of the target template image is not less than the preset complexity, determining a detection threshold of a target object corresponding to the target template image based on the image complexity of the target template image and a second sub-mapping relation set.

The fixed mapping relationship in the second sub-mapping relationship set specifically means that when the complexity of the preset image changes, the preset detection threshold is a fixed value and does not change along with the change of the complexity of the preset image.

The preset complexity may be set according to practical situations, which is not limited in this embodiment. In a specific scenario, when the image complexity is not less than the preset complexity, the increase of the image complexity hardly affects the detection threshold, and a detection threshold that is too low causes the non-target object to be detected, so when the image complexity reaches a certain degree (for example, not less than the preset complexity), the detection threshold can be set to a fixed value, and is not reduced with the increase of the image complexity.

In a specific embodiment, the preset complexity may be set to be 15, and then a section of the preset image complexity of the first sub-map relationship set is 0 to 15, where the section may adopt a linear threshold method, and the size of the detection threshold is inversely proportional to the image complexity; the second set of sub-map relationships has a preset image complexity interval of 15 to infinity, and the detection threshold for the interval may be set to a fixed value, such as 0.68. Specifically, the relationship between the preset image complexity and the preset detection threshold in the preset mapping relationship set may be shown as equation (4), where c is the preset image complexity and a is the preset detection threshold:

Referring to fig. 1d, a graph of image complexity and a detection threshold is shown, where the image complexity value is shown on the horizontal axis and the detection threshold is shown on the vertical axis. It can be seen from the figure that the detection threshold and the image complexity are in a linear relationship when the image complexity value is smaller than 15, and the detection threshold is a constant value when the image complexity is larger than 15.

104. And identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object.

The target template image can slide on the image to be detected, and a plurality of candidate object detection areas of the target object corresponding to the target template image are obtained. The target template image slides on the image to be detected, namely the image to be detected is traversed, and a plurality of candidate object detection areas with the same size as the target template image are marked on the image to be detected. In some embodiments, the target template image may be scaled to obtain a multi-scale target template image, and then the scale of the candidate object detection area obtained by dividing the target template image is different for the same target object based on the multi-scale target template image.

Optionally, in this embodiment, the step of identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object may include:

Scaling the target template image corresponding to each target object under different scales to obtain target template images corresponding to each target object under multiple scales;

and identifying the image to be detected based on target template images under a plurality of scales to obtain a plurality of candidate object detection areas of each target object.

The scale of the scaling may be set according to practical situations, which is not limited in this embodiment. For example, scaling may be performed in the following proportions:

0.8，0.85，0.9，0.95，1，1.05，1.1，1.15，1.2。

The length and the width of the target template image are required to be respectively scaled to the scales of the above scale, and then the target template image slides on the image to be detected based on the scaled target template images under a plurality of scales, so that the image to be detected is identified. By scaling the target template image by a multi-layer pyramid scale method, the problem that the scale of the target object in the image to be detected is different from that of the target template image can be solved, and the accuracy of target detection is improved.

The multi-layer pyramid scale method is to scale an image in a certain proportion, so that a series of image sequences with different scales are obtained, and a linear difference value method and the like are generally adopted in the scaling process.

105. For each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object.

In this embodiment, the step of determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object may include:

calculating the similarity between the candidate object detection area and the target template image;

And taking the candidate object detection area with the similarity larger than the detection threshold value as a primary object detection area of the target object.

In some embodiments, the similarity between the candidate detection area and the target template image may be calculated by image contour detection. The image contour detection specifically may first perform preprocessing on the image (i.e., the candidate object detection area and the target template image), for example, performing smoothing filtering processing by using a two-dimensional gaussian template, to remove image noise; performing edge detection processing on the smoothed image to obtain an edge response image, wherein brightness, color and the like can be generally related to the available gradient characteristic information which can distinguish an object from a background; and finally, carrying out accurate positioning on the contour of the edge response image to obtain contour information of the candidate object detection area and contour information of the target template image, and calculating the similarity based on the contour information of the candidate object detection area and the contour information of the target template image.

Alternatively, a method of normalizing the correlation coefficient may be employed to calculate the similarity between the candidate detection area and the target template image. The specific similarity calculation formula is shown in the formula (5):

Wherein T '(x', y ') represents the relative value of each pixel point in the target template image to its mean value (i.e., the mean value of the pixel values of each pixel point in the target template image), x' and y 'represent the positions of the pixel points in the target template image, I' (x+x ', y+y') represents the relative value of each pixel point in the candidate object detection area to its mean value (i.e., the mean value of the pixel values of each pixel point in the candidate object detection area), x+x 'and y+y' represent the positions of the pixel points in the image to be detected, x and y can be regarded as the positions of the reference position points of the candidate object detection area (specifically, the vertices of the candidate object detection area) in the image to be detected, and R (x, y) represents the correlation coefficient between the candidate object detection area and the target template image.

In other embodiments, feature extraction may be performed on the image of the candidate detection area to obtain feature information of the candidate detection area; extracting features of the target template image to obtain feature information of the target template image; and calculating the feature similarity between the feature information of the candidate object detection area and the feature information of the target template image, and taking the feature similarity as the similarity between the candidate object detection area and the target template image.

The feature extraction process may include convolution operation and pooling operation, and specifically feature extraction may be performed on the candidate object detection area and the target template image through a neural network. The type of the neural Network is not limited, and may be, for example, an open ended model (Inception), an efficiency Network (EFFICIENTNET), a visual geometry group Network (VGGNet, visual Geometry Group Network), a Residual Network (ResNet, residual Network), a dense connection convolution Network (DenseNet, dense Convolutional Network), or the like, but it should be understood that the neural Network of the present embodiment is not limited to only the above-listed types.

The feature information of the candidate object detection area is specifically a feature vector of the candidate object detection area, and the feature information of the target template image is specifically a feature vector of the target template image; the vector distance of the two can be calculated, and the size of the vector distance characterizes the size of the feature similarity. The larger the vector distance is, the smaller the feature similarity is, and the smaller the similarity is between the candidate object detection area and the target template image; the smaller the vector distance is, the larger the feature similarity is, and the larger the similarity is between the candidate object detection area and the target template image.

Alternatively, for the target template image containing text, the similarity between the candidate object detection area and the target template image may be determined by using an OCR (Optical Character Recognition ) text detection recognition method, so as to perform matching of contents.

It will be appreciated that for each target object, if the candidate object detection area is identified by a multi-scale target template image, then the candidate object detection area also includes a plurality of scales; in performing the similarity calculation between the candidate detection area and the target template image, the dimensions of the two should be identical, that is, the candidate detection area should perform the similarity calculation with the target template image of the same scale.

106. And determining at least one target object detection area of the target object from the initial selection object detection areas of the target objects to obtain at least one target object detection area of each target object.

The primary selected object detection area obtained through filtering the detection threshold may include a primary selected object detection area that is only partially similar to the target template image, so that the primary selected object detection area needs to be further screened. In some embodiments, it may be further filtered using mesh color filtering. The grid color filtering method divides a target template image and a primary object detection area into a plurality of sub-grid areas, calculates sub-area similarity between the sub-grid areas (namely the sub-template grid areas) in the target template image and the sub-grid areas (namely the sub-object detection grid areas) corresponding to the positions in the primary object detection area under each color channel, and determines a target object detection area of a target object from the primary object detection area based on the sub-area similarity. And screening the primary selected object detection area through similarity comparison between the corresponding sub-grid areas in the target template image and the corresponding sub-grid areas in the primary selected object detection area, so that the target template image and the primary selected object detection area are comprehensively compared, and the primary selected object detection area which is only locally similar to the target template image is eliminated. Wherein local similarity is partially similar, and not wholly similar.

Optionally, in this embodiment, the step of determining at least one target object detection area of the target object from the initial object detection areas of the target object to obtain at least one target object detection area of each target object may include:

for each target object, respectively carrying out grid division on the primary object detection area and the target template image to obtain a plurality of sub-object detection grid areas of the primary object detection area and a plurality of sub-template grid areas of the target template image;

calculating the sub-region similarity between a target sub-object detection grid region of the primary object detection region and a target sub-template grid region of the target template image, wherein the position of the target sub-object detection grid region corresponds to the position of the target sub-template grid region;

And determining at least one target object detection area of the target object from the initial target object detection areas of the target object based on the sub-area similarity, so as to obtain at least one target object detection area of each target object.

The grid division manner may be set according to actual situations, which is not limited in this embodiment. However, the grid division modes of the target template image and the primary object detection area should be consistent, the number of the sub-object detection grid areas and the sub-template grid areas obtained by dividing the target template image and the primary object detection area is the same, and the positions of the sub-object detection grid areas and the sub-template grid areas respectively correspond to each other. In the above embodiments, the plurality of fingers is two or more.

In a specific embodiment, the target template image and the primary object detection area may be trisected on the number of rows and columns, and divided into 3*3 small spaces, and then four small spaces at the upper left corner are combined into a sub-grid area, four small spaces at the lower left corner are combined into a sub-grid area, four small spaces at the upper right corner are combined into a sub-grid area, and four small spaces at the lower right corner are combined into a sub-grid area, as shown in fig. 1e, so as to obtain four sub-template grid areas of the upper left, the upper right, the lower left and the lower right of the target template image, and four sub-object detection grid areas of the upper left, the upper right, the lower left and the lower right of the primary object detection area.

The positions of the target sub-object detection grid area and the positions of the target sub-template grid area correspond to each other, and specifically may be that the sub-object detection grid area at the upper left corner and the sub-template grid area at the upper left corner perform sub-region similarity calculation, the sub-object detection grid area at the lower right corner and the sub-template grid area at the lower right corner perform sub-region similarity calculation, and the like. The target sub-object detection grid area refers to a certain sub-object detection grid area of the plurality of sub-object detection grid areas, and the target sub-template grid area refers to a certain sub-template grid area of the plurality of sub-template grid areas.

Optionally, in some embodiments, calculating the sub-region similarity between the target sub-object detection grid region and the target sub-template grid region may specifically extract feature information of the target sub-object detection grid region, extract feature information of the target sub-template grid region, and calculate the similarity between feature information of the target sub-object detection grid region and feature information of the target sub-template grid region, where the similarity may be used as the sub-region similarity between the target sub-object detection grid region and the target sub-template grid region. Wherein the characteristic information may be extracted through a neural network.

Optionally, in some embodiments, the step of "calculating a sub-region similarity between the target sub-object detection grid region of the preliminary object detection region and the target sub-template grid region of the target template image" may include:

Calculating a first pixel mean value of a target sub-object detection grid area of the primary object detection area under each color channel;

calculating a second pixel mean value of a target sub-template grid region of the target template image under each color channel;

and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

The step of calculating the first pixel mean value of the target sub-object detection grid area of the initially selected object detection area under each color channel may specifically be calculating the mean value of pixel values of all pixel points in the target sub-object detection network area under the red channel, the blue channel and the green channel respectively, where the mean value is the first pixel mean value. Optionally, in some embodiments, the first pixel average may also be a pixel value average of all pixels in the target sub-object detection network area, that is, the pixel value average of the pixels under all color channels, where the color channels are not distinguished.

Similarly, the step of calculating the second pixel average value of the target sub-template grid region of the target template image under each color channel may specifically be calculating the average value of pixel values of all pixel points in the target sub-template grid region under the red channel, the blue channel and the green channel respectively, where the average value is the second pixel average value. Optionally, in some embodiments, the second pixel average may also be a pixel value average of all the pixels in the target sub-template grid area, that is, the pixel value average of the pixels under all the color channels, where the color channels are not distinguished.

The step of calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value may specifically include:

for each color channel, calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value under the color channel.

The color channels may include a red color channel, a blue color channel, and a green color channel. In this embodiment, the pixel mean value (i.e., the first pixel mean value) of each sub-object detection grid region under three color channels may be calculated, the pixel mean value (i.e., the second pixel mean value) of each sub-template grid region under three color channels may be calculated, and then the sub-region similarity between the first pixel mean value of each sub-object detection grid region and the second pixel mean value of the sub-template grid region corresponding to the position may be calculated based on the first pixel mean value of the sub-object detection grid region.

The sub-region similarity may include sub-region similarity under a blue channel, sub-region similarity under a red channel, sub-region similarity under a green channel, sub-region similarity under all color channels, and the like. The sub-region similarity under the blue channel is calculated by a first pixel mean value of the target sub-object detection grid region under the blue channel and a second pixel mean value of the target sub-template grid region under the blue channel, and the same is the same.

Specifically, the method for calculating the pixel mean value under each color channel may be as follows:

pixel mean value under red channel:

pixel mean under blue channel: />

pixel mean value under green channel:

pixel mean value under all color channels:

average_mean＝(light_r+light_b+light_g)/3

Wherein P and Q are the number of rows and columns of the sub-grid region, the value range of P is 0 < p.ltoreq.P, P is an integer, the value range of Q is 0 < q.ltoreq.Q, Q is an integer, R _p,q represents the pixel value of the pixel point (P, Q) under the red channel, B _p,q represents the pixel value of the pixel point (P, Q) under the blue channel, and G _p,q represents the pixel value of the pixel point (P, Q) under the green channel.

The similarity calculation of the subregions can be shown as equation (6):

Wherein light _t represents the second pixel mean of the sub-template grid region in the target template image under all/a certain color channel, and light _c represents the first pixel mean of the sub-object detection grid region in the primary object detection region under all/a certain color channel. light _t and light _c are the same color channel, that is, if light _t is the second pixel mean of the sub-panel grid region under the blue channel, light _c must be the first pixel mean of the sub-object detection grid region under the blue channel. The same color channels of light _t and light _c ensure the accuracy of the sub-region similarity calculation.

In the step of determining the at least one target object detection area of the target object from the initially selected object detection areas of the target object based on the sub-area similarity to obtain the at least one target object detection area of each target object, the method of selecting the target object detection area based on the sub-area similarity is not limited in this embodiment. Specifically, the preset condition of the target object detection area may be that the sub-area similarity of all the sub-object detection grid areas of the initially selected object detection area with the sub-template grid areas corresponding to the positions of the target template images under each color channel is higher than a preset threshold, where the preset threshold may be set according to the actual situation, and this embodiment is not limited in this way. It is understood that the preset condition selected as the target object detection area may also be set according to the actual situation.

For example, the preset condition of the target object detection area may be: and when one sub-object detection grid area in the primary object detection area does not meet the preset condition, the primary object detection area cannot be used as a target object detection area.

Wherein r_ simility is the sub-region similarity of the target sub-object detection grid region and the target sub-template grid region under the red channel, g_ simility is the sub-region similarity of the target sub-object detection grid region and the target sub-template grid region under the green channel, b_ simility is the sub-region similarity of the target sub-object detection grid region and the target sub-template grid region under the blue channel, and average_ simility is the sub-region similarity of the target sub-object detection grid region and the target sub-template grid region under all color channels.

In a specific scenario, the number of rows and columns of the target template image are respectively K and L, and then the number of rows and columns of the primary object detection area are also K and L, and according to the mesh division method of the above embodiment, the target template image and the primary object detection area are respectively divided into four sub-mesh areas of upper left, upper right, lower left and lower right (the target template image corresponds to the sub-template mesh area, the primary object detection area corresponds to the sub-object detection mesh area), and the sub-mesh areas of four positions are respectively denoted as S1, S2, S3 and S4, where S1 represents the sub-mesh area of the upper left corner, S2 represents the sub-mesh area of the upper right corner, S3 represents the sub-mesh area of the lower right corner, and S4 represents the sub-mesh area of the lower right corner, as shown in fig. 1e, may be represented by formulas (7) (8) (9) (10):

S1＝(0≤row＜K*2/3，0≤col＜L*2/3) (7)

S2＝(0≤row＜K*2/3，L*1/3≤col≤L) (8)

S3＝(K*1/3＜row≤K，0≤col＜L*2/3) (9)

S4＝(K*1/3＜row≤K，L*1/3≤col≤L) (10)

Wherein row is the range of the number of rows of the sub-grid area, col is the range of the number of columns of the sub-grid area, and row interception modes of the sub-grid area are from 0 th row to K2/3 th row of the target template image or the initial selection object detection area, wherein row is more than or equal to row and less than K2/3.

In this embodiment, the 4-mesh filtering method of 2/3 of the foregoing embodiments may be used to filter the primary object detection area, and the primary object detection area with the local color difference screened out may have a high fault tolerance in terms of division, and the strong filtering method may be used in terms of result discrimination (e.g., any sub-object detection mesh area is not allowed in the foregoing embodiments does not satisfy the preset condition). Alternatively, fine grid division may be used, such as dividing the target template image and the primary object detection area into 8 x 8 small spaces (finer than 3*3 division of the 4-grid filtering method of 2/3 of the above embodiment), so that the fault tolerance rate is lower in terms of division, and weak filtering methods may be used in terms of result discrimination (e.g., allowing a certain percentage of the sub-object detection grid areas to fail to satisfy the preset condition).

Optionally, in this embodiment, the step of determining at least one target object detection area of the target object from the initial object detection areas of the target object may include:

For each target object, selecting a primary object detection area with highest similarity with the target template image from the primary object detection areas as a candidate target object detection area of the target object;

Selecting a candidate target object detection area corresponding to the target object from the initial selection object detection areas based on the distance between the candidate target object detection area and the initial selection object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the initial selection object detection area;

And determining at least one target object detection area of the target object according to the candidate target object detection area.

The distance may specifically be an intersection ratio (IOU, intersection Over Union) between the candidate target object detection area and the initial selection object detection area, or may be a distance between a position reference point of the candidate target object detection area and a position reference point of the initial selection object detection area, where it is noted that the selection manners of the position reference points of the candidate target object detection area and the initial selection object detection area should be the same, for example, the position reference point may specifically be an upper left corner vertex of the candidate target object detection area or an upper left corner vertex of the initial selection object detection area.

The step of determining at least one target object detection area of the target object according to the candidate target object detection areas may specifically be to use all candidate target object detection areas as target object detection areas of the target object, or may further screen the candidate target object detection areas to obtain target object detection areas of the target object, which is not limited in this embodiment, where a further screening manner is not limited.

Optionally, in this embodiment, the step of selecting, from the preliminary selection object detection areas, the candidate target object detection area corresponding to the target object based on the distance between the candidate target object detection area and the preliminary selection object detection area may include:

Taking a primary object detection region with the distance from the candidate target object detection region being greater than a preset distance threshold as a reference object detection region;

and selecting a reference object detection region with highest similarity with the target template image from the reference object detection regions as a candidate target object detection region corresponding to the target object.

The similarity is the similarity calculated in step 105. The preset distance threshold may be set according to actual situations, which is not limited in this embodiment.

The step of selecting, from the reference object detection areas, a reference object detection area having the highest similarity with the target template image as a candidate target object detection area corresponding to the target object may include:

Selecting a reference object detection region with highest similarity with the target template image from the reference object detection regions as a candidate target object detection region corresponding to the target object, and taking each reference object detection region as a new initial selection object detection region;

And returning to the step of taking the initially selected object detection area with the distance from the candidate target object detection area being larger than a preset distance threshold as the reference object detection area until the number of the candidate target object detection areas meets the preset number.

In this embodiment, the candidate object detection area with low similarity may be filtered through a dynamic threshold to obtain a primary object detection area, and the primary object detection area may be further screened through a greedy Non-maximum suppression (NMS, non-Maximun Suppression) method, so as to improve the accuracy of target detection. Non-maximum suppression is a process of finding local maxima.

The greedy non-extremum suppression method can be used for removing detection areas with high overlap ratio or detection areas with close peripheral distances. Specifically, the greedy method may obtain a two-dimensional matrix with the same size as the image matrix of the image to be detected, where each element in the two-dimensional matrix may represent a candidate detection area corresponding to a position, where the position of the candidate detection area in the two-dimensional matrix is determined by a position reference point of the candidate detection area in the area to be detected, for example, the position reference point may specifically be a vertex of an upper left corner of the candidate detection area (a pixel point a in the image to be detected), and then an element corresponding to the pixel point a in the two-dimensional matrix represents the candidate detection area, and it should be noted that a selection method of the position reference point of each candidate detection area should be consistent.

In an embodiment, for each scale of the target template image of each target object, if the position reference point is the top left corner vertex of the candidate object detection area, each point (element) in the two-dimensional matrix represents a similarity value between a rectangular frame (rectangular frame, i.e. candidate object detection area) with the same length and width as the length and width of the target template image and the corresponding target template image from this point, and this two-dimensional matrix is a similarity map. That is, the candidate detection area may be represented by an array 1*3 in which one value represents the length of the candidate detection area, one value represents the width of the candidate detection area, and one value represents the similarity of the candidate detection area and its corresponding target template image.

Specifically, in the case where the primary object detection area is obtained by multi-scale target template image recognition, the screening process may be as follows:

1) Determining the number num of target object detection areas to be acquired and a preset distance threshold;

2) And generating a three-channel similarity map aiming at the target template image of each scale, wherein the three channels are the similarity between the candidate object detection area and the target template image and the length and the width of the candidate object detection area respectively. Each position in the similarity map represents the similarity, rectangular length, rectangular width of the rectangular frame centered on the position. Comparing the similarity maps under each scale, removing all rectangular frames with non-maximum similarity at the same position aiming at the elements at the same position corresponding to the similarity maps of each scale, and only reserving the element (or rectangular frame) with the maximum similarity to obtain a target similarity map of only one three channels; in some embodiments, the screening processes 1) to 4) are equivalent to further screening the primary selected object detection region by a non-extremum suppression method, wherein the corresponding numerical value of the candidate object detection region of the non-primary selected object detection region in the similarity map can be set to zero;

3) Finding the maximum value in the target similarity map, taking the detection frame with the highest matching degree as an election frame (namely a candidate target object detection area), adding the row number and column number coordinates of the election frame to an election frame list, deleting rectangular frames with the distance to the election frame being smaller than a preset distance threshold value, namely setting the corresponding numerical values of the rectangular frames in the target similarity map to zero, or deleting the rectangular frames intersected with the election frame at the time;

4) And judging that if the number of the election frames is less than num and the target similarity map has a non-0 value, returning to the step 3, otherwise, ending the operation, wherein the election frame list comprises the position information of the selected and extracted candidate target object detection area.

According to the greedy non-maximum value suppression method, the rectangular frame with the largest global matching degree can be obtained in a circulating mode, the rectangular frame intersected with the rectangular frame with the largest global matching degree is deleted, and finally 0 or a plurality of local optimal solutions are obtained.

In a specific embodiment, for example, the 2/3 grid method of the target detection method of the present application may be used to compare with other methods such as 1/2 grid method and overall comparison method (i.e. without grid division), and specifically, the comparison experiment may be performed by using the f1 index when the iou (cross-over ratio) threshold is 0.75, where the experimental result data is shown in the following table:

Wherein the hooking representation uses this method. f1 is used as a comprehensive evaluation index, the higher the f1 value is, the better the target detection effect is, and the calculation method of f1 is as shown in the formulas (11), (12) and (13):

Wherein P represents the precision rate, R represents the recall rate, TP represents the true positive, FP represents the false positive, and FN represents the false negative. Specifically, true and false refer to whether the label box is a target object to be detected, and positive and negative indicate whether the label box is detected by the target detection algorithm, for example, true positive indicates that the target detection algorithm detects the target object to be detected.

The multi-scale matching of the header is a multi-layer pyramid scale method. "complexity filtering" refers to a method of setting a detection threshold based on image complexity. "mesh filtering" refers to the mesh color filtering method of the embodiment in step 106 described above. The "remove repetition" is to remove the primary object detection region with a high degree of overlap.

Specifically, "feature matching" of the header refers to a feature matching method, wherein the feature matching method is to extract multi-level pyramid feature points of a target template image and an image to be detected respectively, then match the feature points of the target template image and the image to be detected, and take a region with the matching degree higher than a specified threshold value as a detection result.

Specifically, the "template matching" of the header refers to a template matching method, wherein the template matching method is a method for comparing pictures of sliding windows, the similarity of the screenshot of each position sliding window of a target template image and a picture to be detected is calculated, and a region with the similarity higher than a specified threshold is used as a detection result.

As can be seen from the table, the template matching method is superior to feature matching (group a and group B controls) for the target detection effect; the 2/3 mesh filtration method is superior to 1/2 mesh (group D and group E controls); linear filtration is superior to binary filtration (group E and group F controls); the parameter-optimized linear parameter threshold is superior to linear filtering (group G and group H controls). The binary filtering is a binary threshold method, which refers to a method that a fixed detection threshold value is not changed along with the complexity of an image.

The target detection method of the application obtains better detection effect under the condition of not needing any training data. In the experiment of comprehensive matching of various targets, the effect of the scheme of the application is far higher than that of a method using template matching and a method using feature matching, and the use of resources and training cost are far lower than that of a detection method using deep learning.

The target detection method provided by the application can dynamically adjust the detection threshold based on the image complexity of the target object, cover the target objects with different image complexity types, adapt to various types of target detection scenes, further improve the accuracy of target detection, and the detected target object detection area comprises the target object. In addition, a plurality of candidate object detection areas of the target object to be detected can be obtained based on the target template images under different scales, and at least one target object detection area is determined from the candidate object detection areas. In some embodiments, the image to be detected includes a plurality of scale target objects, the accuracy of target detection can be improved based on the multi-scale target template image, for example, the target object to be detected is a goldfish, the image to be detected includes goldfish with different sizes, and target object detection areas with different scales can be detected by a multi-scale method, where each target object detection area includes a target object with a corresponding scale, that is, a goldfish.

As can be seen from the above, the electronic device in this embodiment may determine at least one target object to be detected in the image to be detected, and obtain a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold corresponding to the target object; and determining a target object detection area of the target object from the initial selection object detection areas of the target objects to obtain a target object detection area of each target object. The embodiment of the application does not need to carry out a large amount of training, saves manpower and material resources, determines the detection threshold value of the target object based on the complexity of the image, is not fixed, can adapt to the detection scene with various types of targets, and can improve the accuracy of target detection.

The method according to the previous embodiment will be described in further detail below with the specific integration of the object detection device in the server.

The embodiment of the application provides a target detection method, as shown in fig. 2a, the specific flow of the target detection method may be as follows:

201. The server determines at least one target object to be detected in the images to be detected, and acquires a target template image corresponding to each target object.

The target object may be various types of targets to be identified, for example, may be various types of small icons, simple targets, characters, buttons, large icons, complex targets, and the like. The target template image may be used to identify a target object in the image to be detected, which may be regarded as a standard image containing the target object.

202. The server determines the image complexity of each target template image based on the pixel value differences between the pixels in each target template image.

203. The method comprises the steps that a server determines a detection threshold corresponding to each target object based on image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between preset image complexity and a preset detection threshold, and the mapping relation comprises an inversely-proportional mapping relation.

The inverse mapping relationship may be linear or nonlinear, which is not limited in this embodiment. The inversely proportional mapping relationship specifically refers to a change in complexity of the preset image, and the preset detection threshold changes in the opposite direction along with the change, for example, the preset image complexity increases and the preset detection threshold decreases.

204. And the server identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object.

The target template image can slide on the image to be detected, and a plurality of candidate object detection areas of the target object corresponding to the target template image are obtained. The target template image slides on the image to be detected, namely the image to be detected is traversed, and a plurality of candidate object detection areas with the same size as the target template image are marked on the image to be detected.

205. The server determines a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold corresponding to the target object.

In this embodiment, the step of determining the initial object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object may include:

206. And the server selects a primary selected object detection area with highest similarity with the target template image from the primary selected object detection areas as a candidate target object detection area of the target object aiming at each target object.

207. The server selects a candidate target object detection area corresponding to the target object from the initial selection object detection areas based on the distance between the candidate target object detection area and the initial selection object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the initial selection object detection area; and determining at least one target object detection area of the target object according to the candidate target object detection area.

In a specific embodiment, the position of the target object in the game screen needs to be detected, and the detection threshold of the target object can be determined first by the target detection method provided by the embodiment, and then target detection is performed. As shown in fig. 2b, the image complexity of the target template image corresponding to the target object may be determined first, and then the detection threshold may be determined based on the image complexity, where the detection thresholds corresponding to different image complexities are different. And identifying the image to be detected based on the target template image under the multi-scale condition to obtain a multi-scale candidate object detection area, screening based on a detection threshold value to obtain an initial object detection area, and further screening by a greedy non-maximum value inhibition method and a grid color filtering method to obtain a target object detection area, wherein the target object detection area is the final detection result as shown in fig. 2 c.

The application can solve the problem of lack of data in deep learning training by using a shallow pixel texture matching technology of the target template image and the picture to be detected, and solve the problem of multiple target objects by using a complexity calculation result based on the target template image as a coefficient of a matching threshold; the problem of different-scale target objects of various models of scenes is solved by using a spatial multi-scale matching method.

As can be seen from the above, in this embodiment, at least one target object to be detected in the image to be detected may be determined by the server, and a target template image corresponding to each target object may be obtained; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between the preset image complexity and the preset detection threshold, and the mapping relation comprises an inversely-proportioned mapping relation; the server identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold corresponding to the target object; for each target object, selecting a primary object detection area with highest similarity with the target template image from the primary object detection areas as a candidate target object detection area of the target object; selecting a candidate target object detection area corresponding to the target object from the initial selection object detection areas based on the distance between the candidate target object detection area and the initial selection object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the initial selection object detection area; and determining at least one target object detection area of the target object according to the candidate target object detection area. The embodiment of the application does not need to carry out a large amount of training, saves manpower and material resources, determines the detection threshold value of the target object based on the complexity of the image, is not fixed, can adapt to the detection scene with various types of targets, and can improve the accuracy of target detection.

In order to better implement the above method, the embodiment of the present application further provides an object detection device, as shown in fig. 3a, which may include a determining unit 301, a complexity determining unit 302, a threshold determining unit 303, an identifying unit 304, a preliminary selection determining unit 305, and an object determining unit 306, as follows:

(1) A determination unit 301;

The determining unit 301 is configured to determine at least one target object to be detected in the image to be detected, and acquire a target template image corresponding to each target object.

(2) A complexity determination unit 302;

The complexity determining unit 302 is configured to determine an image complexity of each target template image based on a difference in pixel value between pixel points in each target template image.

Alternatively, in some embodiments of the present application, the complexity determining unit 302 may include a first determining subunit 3021 and a second determining subunit 3022, see fig. 3b, as follows:

The first determining subunit 3021 is configured to determine at least two types of difference parameters of each target template image based on the difference of pixel values between the pixel points in each target template image;

a second determining subunit 3022, configured to determine an image complexity of each target template image based on the difference parameter.

Optionally, in some embodiments of the application, the variance parameters include a lateral variance parameter and a longitudinal variance parameter; the second determining subunit 3022 may specifically be configured to fuse the lateral difference parameter and the longitudinal difference parameter to obtain an image complexity of each target template image.

(3) A threshold value determining unit 303;

The threshold determining unit 303 is configured to determine a detection threshold corresponding to each target object according to the image complexity of each target template image.

Optionally, in some embodiments of the present application, the threshold determining unit 303 may be specifically configured to determine the detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, where the preset mapping relation set includes a mapping relation between the preset image complexity and the preset detection threshold.

The threshold determining unit 303 may comprise a third determining subunit 3031 and a fourth determining subunit 3032, see fig. 3c, as follows:

The third determining subunit 3031 is configured to determine, when the image complexity of the target template image is less than a preset complexity, a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the first sub-mapping relation set;

A fourth determining subunit 3032, configured to determine, when the image complexity of the target template image is not less than a preset complexity, a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the second sub-mapping relation set.

(4) An identification unit 304;

and the identifying unit 304 is configured to identify the image to be detected according to the target template image, so as to obtain a plurality of candidate object detection areas of each target object.

Alternatively, in some embodiments of the present application, the identifying unit 304 may include a scaling subunit 3041 and an identifying subunit 3042, see fig. 3d, as follows:

The scaling subunit 3041 is configured to scale the target template image corresponding to each target object under different scales, so as to obtain target template images corresponding to each target object under multiple scales;

The identifying subunit 3042 is configured to identify the image to be detected based on the target template images under multiple scales, so as to obtain multiple candidate object detection areas of each target object.

(5) A preliminary selection determination unit 305;

And a preliminary selection determining unit 305, configured to determine, for each target object, at least one preliminary selection object detection area of the target object from the candidate object detection areas according to a similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object.

(6) A target determination unit 306;

And a target determining unit 306, configured to determine at least one target object detection area of the target object from the initial selection object detection areas of the target objects, so as to obtain at least one target object detection area of each target object.

Alternatively, in some embodiments of the present application, the target determining unit 306 may include a dividing subunit 3061, a calculating subunit 3062, and a fifth determining subunit 3063, see fig. 3e, as follows:

The dividing subunit 3061 is configured to divide the primary object detection area and the target template image into a plurality of sub-object detection grid areas of the primary object detection area and a plurality of sub-template grid areas of the target template image according to each target object;

a computing subunit 3062, configured to compute a sub-region similarity between a target sub-object detection grid region of the initially selected object detection region and a target sub-template grid region of the target template image, where a position of the target sub-object detection grid region corresponds to a position of the target sub-template grid region;

and a fifth determining subunit 3063, configured to determine, from among the primary object detection areas of the target objects, at least one target object detection area of the target object based on the sub-area similarity, to obtain at least one target object detection area of each target object.

Alternatively, in some embodiments of the present application, the calculating subunit 3062 may specifically be configured to calculate the first pixel mean value of the target sub-object detection grid area of the preliminary object detection area under each color channel; calculating a second pixel mean value of a target sub-template grid region of the target template image under each color channel; and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

Alternatively, in some embodiments of the present application, the target determining unit 306 may include a first selecting subunit 3064, a second selecting subunit 3065, and a sixth determining subunit 3066, see fig. 3f, as follows:

the first selecting subunit 3064 is configured to select, for each target object, a first selected object detection area with the highest similarity with the target template image from the first selected object detection areas as a candidate target object detection area of the target object;

a second selecting subunit 3065, configured to select, from the primary object detection areas, a candidate target object detection area corresponding to the target object based on a distance between the candidate target object detection area and the primary object detection area, where the distance characterizes a degree of overlapping between the candidate target object detection area and the primary object detection area;

A sixth determining subunit 3066 is configured to determine at least one target object detection region of the target object according to the candidate target object detection region.

Optionally, in some embodiments of the present application, the second selecting subunit 3065 may specifically be configured to use, as the reference object detection area, a primary object detection area having a distance from the candidate target object detection area greater than a preset distance threshold; and selecting a reference object detection region with highest similarity with the target template image from the reference object detection regions as a candidate target object detection region corresponding to the target object.

As can be seen from the above, in this embodiment, the determining unit 301 determines at least one target object to be detected in the image to be detected, and obtains a target template image corresponding to each target object; determining, by the complexity determining unit 302, an image complexity of each target template image based on a difference in pixel values between pixel points in each target template image; determining, by the threshold determining unit 303, a detection threshold corresponding to each target object according to the image complexity of each target template image; the identification unit 304 identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; determining, by a preliminary selection determining unit 305, at least one preliminary selection object detection region of a target object from among the candidate object detection regions according to a similarity between the candidate object detection region and the target template image and a detection threshold corresponding to the target object, for each target object; at least one target object detection area of each target object is obtained by determining at least one target object detection area of the target object from among the initially selected object detection areas of the target object by the target determination unit 306. The embodiment of the application does not need to carry out a large amount of training, saves manpower and material resources, determines the detection threshold value of the target object based on the complexity of the image, is not fixed, can adapt to the detection scene with various types of targets, and can improve the accuracy of target detection.

The embodiment of the application also provides an electronic device, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to the embodiment of the application, specifically:

The electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:

The processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, if the electronic device is a terminal, it may further include a display unit and the like, which are not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

Determining at least one target object to be detected in the image to be detected, and acquiring a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and the detection threshold corresponding to the target object; and determining at least one target object detection area of the target object from the initial selection object detection areas of the target objects to obtain at least one target object detection area of each target object.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

As can be seen from the above, the present embodiment may determine at least one target object to be detected in the image to be detected, and obtain a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value differences among the pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining a primary object detection region of the target object from the candidate object detection regions according to the similarity between the candidate object detection regions and the target template image and the detection threshold corresponding to the target object; and determining a target object detection area of the target object from the initial selection object detection areas of the target objects to obtain a target object detection area of each target object. The embodiment of the application does not need to carry out a large amount of training, saves manpower and material resources, determines the detection threshold value of the target object based on the complexity of the image, is not fixed, can adapt to the detection scene with various types of targets, and can improve the accuracy of target detection.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the object detection methods provided by the embodiment of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The instructions stored in the storage medium may perform steps in any one of the target detection methods provided in the embodiments of the present application, so that the beneficial effects that any one of the target detection methods provided in the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the object detection aspect described above.

The above description of the target detection method, the device, the electronic equipment and the storage medium provided by the embodiment of the present application applies specific examples to illustrate the principle and the implementation of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. A method of detecting an object, comprising:

2. The method according to claim 1, wherein the identifying the image to be detected according to the target template image, to obtain a plurality of candidate object detection areas of each target object, includes:

3. The method of claim 1, wherein determining at least one target object detection area of the target object from the initial selection object detection areas of the target object results in at least one target object detection area of each target object, comprising:

4. A method according to claim 3, wherein said calculating sub-region similarity between a target sub-object detection grid region of said preliminary object detection region and a target sub-template grid region of said target template image comprises:

5. The method of claim 1, wherein determining the image complexity of each target template image based on pixel value differences between pixels in each target template image comprises:

6. The method of claim 5, wherein the variance parameters include a lateral variance parameter and a longitudinal variance parameter; the determining the image complexity of each target template image based on the difference parameters comprises the following steps:

7. The method according to claim 1, wherein determining the detection threshold corresponding to each target object according to the image complexity of each target template image comprises:

8. The method of claim 7, wherein the set of preset mapping relationships comprises a first set of sub-mapping relationships and a second set of sub-mapping relationships; the first sub-mapping relation set comprises an inverse mapping relation between preset image complexity and a preset detection threshold; the second sub-mapping relation set comprises a fixed mapping relation between preset image complexity and a preset detection threshold;

The determining the detection threshold corresponding to each target object based on the image complexity of each target template image and the preset mapping relation set comprises the following steps:

9. The method of claim 1, wherein the determining at least one target object detection area of the target object from among the initial selection object detection areas of the target object comprises:

10. The method of claim 9, wherein selecting the candidate target object detection area corresponding to the target object from the preliminary object detection areas based on the distance between the candidate target object detection area and the preliminary object detection area, comprises:

11. An object detection apparatus, comprising:

12. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations in the object detection method according to any one of claims 1 to 10.

13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the object detection method of any one of claims 1 to 10.