CN118097121B - Target recognition and counting method and device based on image segmentation and deep learning - Google Patents

Target recognition and counting method and device based on image segmentation and deep learning Download PDF

Info

Publication number
CN118097121B
CN118097121B CN202410465612.1A CN202410465612A CN118097121B CN 118097121 B CN118097121 B CN 118097121B CN 202410465612 A CN202410465612 A CN 202410465612A CN 118097121 B CN118097121 B CN 118097121B
Authority
CN
China
Prior art keywords
area
sub
overlapping
image
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410465612.1A
Other languages
Chinese (zh)
Other versions
CN118097121A (en
Inventor
钟洪萍
阮永蔚
韦云声
郑建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuangyuan Technology Co ltd
Original Assignee
Zhejiang Shuangyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shuangyuan Technology Co ltd filed Critical Zhejiang Shuangyuan Technology Co ltd
Priority to CN202410465612.1A priority Critical patent/CN118097121B/en
Publication of CN118097121A publication Critical patent/CN118097121A/en
Application granted granted Critical
Publication of CN118097121B publication Critical patent/CN118097121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a target identification counting method and device based on image segmentation and deep learning, wherein the method comprises the following steps: collecting an original image to be identified, and acquiring the size of the original image; determining the size of an overlapping area according to the maximum size and the maximum offset of the target identification frame; according to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapping area, carrying out overlapping segmentation on the original image to obtain at least two subgraphs, wherein an overlapping area exists between every two adjacent subgraphs; performing target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame; traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area; the method can improve the accuracy of target identification and counting.

Description

Target recognition and counting method and device based on image segmentation and deep learning
Technical Field
The invention relates to the technical field of image processing, in particular to a target identification counting method and device based on image segmentation and deep learning.
Background
The identification and counting of targets are an important application in the field of computer vision, and relate to technologies such as target detection, image processing, pattern recognition and the like. In recent years, deep learning has been widely used for recognition and counting of targets due to rapid development of own theory and algorithm and progress of hardware technology. Compared with the traditional image processing technology, the deep learning method can better process the target recognition problem in the complex scene, especially under the conditions of target shielding, different illumination conditions, background interference and the like. According to the target recognition and counting method based on deep learning, a deep learning model is trained by automatically learning and extracting complex features of targets in images and using a large amount of labeling data, and then the model is used for realizing high-precision recognition and counting of the targets.
However, many deep learning models, particularly convolutional neural networks, require a fixed input size to be specified for use, and when the acquired image size is too large, it is necessary to divide the image into the fixed size before identifying and counting the objects. In this case, a situation may occur that one object is segmented in two, three or even four images, and for the model, the segmented part of the object is still regarded as one object, and labeling and counting are performed, so that the segmented object is repeatedly counted, as shown in fig. 1a to 1c, fig. 1a is a segmented left image, fig. 1b is a segmented right image, and fig. 1c is an image after the segmented left image and the segmented right image are combined, it can be seen that the object at the segmented edge is identified in both the left image and the right image, and the repetition count is caused. In addition to the problem of repeated counting caused by that all the segmented targets are identified, due to low accuracy of incomplete target identification by deep learning, target missing counting caused by that the segmented targets cannot be identified can also occur, and the result of target counting is further affected. Therefore, in order to solve the problem of repeated counting of the target at the boundary caused by image segmentation, an effective deduplication method needs to be provided to ensure the accuracy of counting.
For example, patent CN113313692B proposes an automatic identification and counting method for young banana plants based on aerial visible light images, which comprises designing a de-duplication algorithm, wherein the principle is to set different thresholds for candidate diagrams of young banana plants respectively, and to screen, reject and de-duplication correct candidate diagrams which appear at edges of subgraphs, too close to each other and too small in area, and complete counting based on the different thresholds. This method is efficient, but it is necessary to reset the threshold whenever the size of the object in the image changes due to the scene change of the image, and cannot be used in case the difference between the size and shape of the object is large.
Patent CN117115080 a proposes a region segmentation-based method for counting shrimp larvae in deep learning, which comprises a deduplication algorithm, wherein the principle is to splice images after segmentation and shrimp larvae recognition, calculate the overlapping rate of overlapping boundaries of two bounding boxes, and judge whether two detection regions are connected; and selecting a matching object with the largest overlapping rate, determining a de-duplication area, and completely wrapping the two detection frames by using a minimum circumscribed rectangular frame. The method can realize accurate counting, but involves splicing and matching of overlapping areas, is complex, and in addition, if the identification frames of the same target of two overlapping areas have offset, the matching result may be inaccurate.
Patent CN117292281 a proposes a method for detecting open-field vegetables based on unmanned aerial vehicle images, the principle of the method is that edge generator obtained by pre-training is used for carrying out edge complementation and de-duplication on a melon and vegetable segmentation result atlas to obtain an edge prediction image; and determining the yield of the melons and vegetables in the planting area to be detected based on the edge prediction image. The method can improve the accuracy and efficiency of yield estimation, but requires training an edge generator and an edge discriminator, complementing edges and mapping segmented images back to the original image, finding a repeated target and removing duplication, and has complex process and long time consumption.
It can be seen that the target identification and counting method in the prior art has the defects of low accuracy, complex process and low applicability.
Disclosure of Invention
The invention provides a target identification counting method and device based on image segmentation and deep learning, which can improve the accuracy of target identification counting.
An object recognition counting method based on image segmentation and deep learning comprises the following steps:
collecting an original image to be identified, and acquiring the size of the original image;
determining the size of an overlapping area according to the maximum size and the maximum offset of the target identification frame;
According to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapping area, carrying out overlapping segmentation on the original image to obtain at least two subgraphs, wherein an overlapping area exists between every two adjacent subgraphs;
Performing target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
Traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area.
An object recognition counting device based on image segmentation and deep learning applied to the method comprises:
The acquisition module is used for acquiring an original image to be identified and acquiring the size of the original image;
the size determining module is used for determining the size of the overlapped area according to the maximum size and the maximum offset of the target identification frame;
The segmentation module is used for carrying out overlapped segmentation on the original image according to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapped area to obtain at least two subgraphs, wherein an overlapped area exists between every two adjacent subgraphs;
the target recognition module is used for carrying out target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
and the de-duplication counting module is used for traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area.
The target recognition counting method and device based on image segmentation and deep learning provided by the invention at least comprise the following beneficial effects:
(1) When counting targets, the problem that targets at a segmentation boundary are repeatedly identified due to deep learning characteristics is considered, each target is guaranteed to be counted only once through a de-duplication algorithm, the de-duplication algorithm considers the offset of an identification frame of the same target on two or more images, and the accuracy of target identification and counting is improved;
(2) The target counting and de-duplication method does not need image splicing operation or extra training, can judge whether the target is repeatedly calculated according to the central position of the target identification frame, is convenient to operate, greatly improves de-duplication speed, and has high accurate counting efficiency;
(3) By setting the overlapping area, the targets to be counted are guaranteed to be the targets which are not segmented, the problem of missing target counting caused by low incomplete target recognition accuracy in deep learning is avoided, and the stability of target recognition and counting is further improved.
(4) The duplication elimination algorithm is simple in principle, easy to reproduce, applicable to target counting duplication elimination of any shape and size and strong in universality.
Drawings
Fig. 1a to 1c are schematic diagrams of repeated counting of objects in a segmented image according to the prior art.
Fig. 2 is a flowchart of an embodiment of a target recognition counting method based on image segmentation and deep learning provided by the invention.
Fig. 3 is a schematic diagram of an embodiment of an overlapping area of an original image divided in a lateral direction in the object recognition counting method based on image division and deep learning according to the present invention.
Fig. 4a to fig. 4c are schematic diagrams illustrating an embodiment of performing deduplication based on an overlapping region in the object recognition counting method based on image segmentation and deep learning according to the present invention.
Fig. 5 and fig. 6 are schematic diagrams of offset and error caused by duplication elimination based on overlapping areas in the object recognition counting method based on image segmentation and depth learning provided by the invention.
Fig. 7 is a schematic diagram of an embodiment of an overlapping area of an original image divided in a lateral direction and divided in a longitudinal direction in the object recognition counting method based on image division and depth learning according to the present invention.
Fig. 8 is a schematic structural diagram of an embodiment of an object recognition counting device based on image segmentation and deep learning according to the present invention.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
Referring to fig. 2, in some embodiments, there is provided an object recognition counting method based on image segmentation and deep learning, comprising:
s1, acquiring an original image to be identified, and acquiring the size of the original image;
s2, determining the size of an overlapping area according to the maximum size and the maximum offset of the target identification frame;
s3, performing overlapped segmentation on the original image according to the maximum image size which can be processed by deep learning, the size of the original image and the size of an overlapped area to obtain at least two subgraphs, wherein an overlapped area exists between every two adjacent subgraphs;
s4, carrying out target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
s5, traversing the subgraph, counting targets according to the number of the target recognition frames, and performing de-duplication processing according to the geometric center position of the target recognition frames in the overlapping area.
Specifically, in step S1, the size of the original image may be expressed in terms of the number of pixel rows and the number of pixel columns of the original image, and for example, the size of the original image may be expressed as M rows and N columns.
Further, in step S2, the maximum size of the target recognition frame includes a maximum width and a maximum height; the maximum width and maximum height may be expressed in terms of the number of pixel rows and the number of pixel columns of the target recognition frame.
The maximum size of the target recognition frame is determined manually before image segmentation according to the actual application scene. For example, in all images, the height of the target is not more than 10 pixels, the width of the target is not more than 8 pixels, and then the height and the width of the target identification frame are not more than 10 and 8 respectively. In practical application, the maximum height Hmax and the maximum width Wmax of the largest target recognition frame do not need to be found accurately, so long as the heights of all the target recognition frames are guaranteed to be smaller than Hmax and the widths are smaller than Wmax, under the condition, the maximum width and the maximum height are as small as possible, and therefore the area of the overlapping area is smaller, the dividing times are smaller, and the efficiency is higher.
The maximum offset is also determined artificially, based on a priori experience. For example, when the same object appears on two different images and the neural network is used for identifying the object, the maximum offset can be 3 by taking the object as a reference system and enabling the identification frames on the two images to be offset by no more than 3 pixels. In practical application, the maximum offset does not need to accurately find the actual offsets of all the target identification frames and take the maximum value, and only needs to make the offsets of all the target identification frames smaller than the value. Under this condition, the maximum offset is as small as possible, so that the area of the overlapping area is smaller, the dividing times are smaller, and the efficiency is higher.
The overlapping area is divided into a left overlapping area, a left offset fault-tolerant area, a right overlapping area and a right offset fault-tolerant area according to the position relation; and/or, an upper overlap region, an upper offset fault tolerance region, a lower overlap region, a lower offset fault tolerance region.
Specifically, referring to fig. 3, if the original image is divided only in the transverse direction, any two adjacent sub-images obtained by division are defined as a left sub-image a and a right sub-image B according to the relative positional relationship, and the overlapping area of the left sub-image a and the right sub-image B is divided into a left overlapping area, a left offset fault-tolerant area, a right overlapping area and a right offset fault-tolerant area which are sequentially connected from left to right according to the positional relationship.
Similarly, if the original image is divided only longitudinally, any two adjacent subgraphs obtained by division are defined as an upper subgraph and a lower subgraph according to the relative position relationship, and the overlapping area of the upper subgraph and the lower subgraph is divided into an upper overlapping area, an upper offset fault-tolerant area, a lower overlapping area and a lower offset fault-tolerant area which are connected in sequence from top to bottom according to the position relationship.
If the original image is divided transversely and longitudinally, the obtained subgraphs are in a matrix form, and the overlapping area in each subgraph comprises a left overlapping area, a left offset fault-tolerant area, a right overlapping area, a right offset fault-tolerant area, an upper overlapping area, an upper offset fault-tolerant area, a lower overlapping area and a lower offset fault-tolerant area. Namely, a left overlapping region, a left offset fault-tolerant region, a right overlapping region and a right offset fault-tolerant region are included between the left sub-graph and the right sub-graph, and an upper overlapping region, an upper offset fault-tolerant region, a lower overlapping region and a lower offset fault-tolerant region are included between the upper sub-graph and the lower sub-graph.
Referring to fig. 4a to 4c, the principle of de-duplication based on the overlapping area of two sub-graphs in this embodiment is as follows:
It is necessary to ensure that the width of the overlapping area of two adjacent sub-images is larger than the maximum width of the object recognition frame, then the overlapping area of the left sub-image and the right sub-image is divided into a left overlapping area and a right overlapping area along the width direction, and for the object which has been recognized and marked on the sub-image, if the center of the object recognition frame is located in the right overlapping area of the left sub-image or the left overlapping area of the right sub-image, the object is considered to not count the total number, so that each object can be ensured to be counted only once, and the object recognition frame counted the total number is ensured to be a complete object (because for the divided incomplete object, the center of the recognition frame can only appear in the right overlapping area of the left sub-image or the left overlapping area of the right sub-image), in principle, the object can be effectively de-duplicated. Fig. 4a shows the targets of the two subgraphs, fig. 4b shows the target recognition frame after the deep learning recognition, fig. 4c shows the count result of the target recognition frame, v shows the count, and x shows the non-count.
In practical cases, however, the position of the object recognition frame is not sufficiently determined when the object is marked. For the same object, the relative positions of the object recognition boxes and the object may also be different if they appear in different sub-graphs, i.e. there may be an offset in the recognition boxes. As shown in fig. 5.
The above-described theoretically overlap-based count deduplication method will no longer apply when there is an offset in the target recognition box. If the center of the object is just near the boundary of the left and right overlapping regions, the center of the object identification frame of the object may be in the left overlapping region of one sub-graph and in the right overlapping region of the other sub-graph due to the offset, resulting in the object being identified in both sub-graphs or not identified, resulting in a counting error, as shown in fig. 6.
Thus, considering the offset, the present embodiment proposes that the overlapping area should include an offset fault-tolerant area, i.e., a left offset fault-tolerant area and a right offset fault-tolerant area for the left sub-graph and the right sub-graph, an upper offset fault-tolerant area and a lower offset fault-tolerant area for the upper sub-graph and the lower sub-graph, and a left offset fault-tolerant area, a right offset fault-tolerant area, an upper offset fault-tolerant area and a lower offset fault-tolerant area for the sub-graph obtained by existing longitudinal segmentation and transverse segmentation.
The width and the height of the left overlapping area and the right overlapping area are equal, the width and the height of the left offset fault-tolerant area and the right offset fault-tolerant area are equal, the width and the height of the upper overlapping area and the lower overlapping area are equal, and the width and the height of the upper offset fault-tolerant area and the lower offset fault-tolerant area are equal;
the width of the left overlapping area is larger than half of the maximum width, the height of the upper overlapping area is larger than half of the maximum height, the width of the left offset fault-tolerant area is larger than the maximum offset, and the height of the upper offset fault-tolerant area is larger than the maximum offset.
Specifically, on the premise that the width of the left overlapping region is larger than half of the maximum width, the height of the upper overlapping region is larger than half of the maximum height, and the width of the left offset fault-tolerant region and the height of the upper offset fault-tolerant region are larger than the maximum offset, the width of the left overlapping region, the height of the upper overlapping region, the width of the left offset fault-tolerant region and the height of the upper offset fault-tolerant region are as small as possible, so that the area of the overlapping region is smaller, the dividing times are fewer, and the efficiency is higher. Provided that the overlap area is smaller than the size of the sub-graph, i.e.: left overlap area width + left offset fault tolerance area width + right overlap area width < sub-picture width/2, upper overlap area height + upper offset fault tolerance area height + lower overlap area height < sub-picture height/2.
For example:
width of left overlap region = half of the maximum width +1 pixel,
The height of the overlap region = half the maximum height +1 pixel,
This ensures that the overlap area is as small as possible.
Further, in step S3, the original image is divided into a row B column sub-images:
B=ceil((N+2*X)/n) ;
A=ceil((M+2*Y)/m);
Where ceil () represents an upward rounding, N represents the number of pixel columns of the original image, M represents the number of pixel columns of the original image, N represents the number of pixel columns of the maximum image size that can be handled by the deep learning, M represents the number of pixel columns of the maximum image size that can be handled by the deep learning, X represents the sum of the widths of the left overlap region, the left offset fault tolerance region, the right overlap region, and the right offset fault tolerance region, and Y represents the sum of the heights of the overlap region, the up offset fault tolerance region, the down overlap region, and the down offset fault tolerance region.
The original image is segmented into two adjacent sub-images which are segmented along the transverse direction, and the two adjacent sub-images obtained by segmentation comprise a left sub-image and a right sub-image according to the relative position relation.
In some embodiments, the original image is divided into two adjacent sub-images which are divided longitudinally, and the two sub-images comprise an upper sub-image and a lower sub-image according to the relative position relation.
In some embodiments, the segmentation of the original image with overlapping includes a lateral segmentation and a longitudinal segmentation, and four sub-graphs that are adjacent to each other in the left-right direction, the sub-graphs obtained by the segmentation include an upper left sub-graph, a lower left sub-graph, an upper right sub-graph, and a lower right sub-graph according to their relative positional relationships.
Further, in step S4, according to the deep learning model obtained by training in advance, the subgraph is input to the deep learning model, and the target recognition frame is output.
Further, in step S5, the subgraph is traversed, and the target count is performed according to the number of the target recognition frames, where the deduplication process is performed based on the overlapping area, and specifically includes the following three application scenarios.
The original image is divided into two adjacent sub-images which are divided transversely, the two sub-images comprise a left sub-image and a right sub-image according to the relative position relation, and the de-duplication processing is carried out according to the geometric center position of the target identification frame in the overlapping area, and the method comprises the following steps:
s51, if the geometric center of the target identification frame is located in a left overlapping area or a left offset fault-tolerant area in the left sub-graph, the corresponding target identification frame is counted into the total number of targets;
S52, if the geometric center of the target identification frame is located in the right overlapping area or the right offset fault-tolerant area in the left sub-graph, the corresponding target identification frame does not count the total number of targets;
s53, if the geometric center of the target identification frame is located in the left overlapping area in the right sub-graph, the corresponding target identification frame does not count the total number of targets;
S54, if the geometric center of the target identification frame is located in the right overlapping area in the right sub-graph, the corresponding target identification frame is counted into the total number of targets;
S55, if the geometric center of the target identification frame is located in the left offset fault-tolerant area or the right offset fault-tolerant area in the right sub-image, detecting the position of the geometric center of the target identification frame in the left sub-image, if the geometric center of the target identification frame is located in the left offset fault-tolerant area in the left sub-image, counting the target identification frame into the total number of targets, and if the geometric center of the target identification frame is located in the right offset fault-tolerant area in the left sub-image, counting the target identification frame out of the total number of targets.
Further, the original image is subjected to overlapped segmentation into transverse segmentation, and the sub-images obtained by segmentation comprise an upper sub-image and a lower sub-image according to the relative position relation;
performing de-duplication processing according to the geometric center position of the target identification frame in the overlapping area, including:
S5A, if the geometric center of the target identification frame is located in an upper overlapping area or an upper offset fault-tolerant area in the upper subgraph, counting the corresponding target identification frame into the total number of targets;
S5B, if the geometric center of the target identification frame is located in a lower overlapping area or a lower offset fault-tolerant area in the upper subgraph, the corresponding target identification frame does not count the total number of targets;
S5C, if the geometric center of the target identification frame is located in the upper overlapping area in the lower subgraph, the corresponding target identification frame does not count the total number of targets;
S5D, if the geometric center of the target identification frame is located in the lower overlapping area in the lower subgraph, the corresponding target identification frame is counted into the total number of targets;
S5E, if the geometric center of the target identification frame is located in the upper offset fault-tolerant area or the lower offset fault-tolerant area in the lower sub-image, detecting the position of the geometric center of the target identification frame in the upper sub-image, if the geometric center of the target identification frame is located in the upper offset fault-tolerant area in the upper sub-image, counting the target identification frame into the total number of targets, and if the geometric center of the target identification frame is located in the lower offset fault-tolerant area in the upper sub-image, counting the target identification frame out of the total number of targets.
Further, referring to fig. 7, the division of the original image with overlap includes a lateral division and a longitudinal division, and the sub-images obtained by the division include an upper left sub-image a, an upper right sub-image B, a lower left sub-image C, and a lower right sub-image D according to their relative positional relationship;
The overlapping portions of the left and upper overlapping regions in the left upper sub-picture a, the right upper sub-picture B, the left lower sub-picture C and the right lower sub-picture D form a first region 1, the overlapping portions of the left and upper offset fault-tolerant regions form a second region 2, the overlapping portions of the right and upper offset fault-tolerant regions form a third region 3, the overlapping portions of the right and upper overlapping regions form a fourth region 4, the overlapping portions of the left and upper offset fault-tolerant regions form a fifth region 5, the overlapping portions of the left and upper offset fault-tolerant regions form a sixth region 6, the overlapping portions of the right and upper offset fault-tolerant regions form a seventh region 7, the overlapping portions of the right and upper offset fault-tolerant regions form an eighth region 8, the overlapping portions of the left and lower offset fault-tolerant areas form a ninth area 9, the overlapping portions of the right and lower offset fault-tolerant areas form a tenth area 10, the overlapping portions of the left and lower offset fault-tolerant areas form an eleventh area 11, the overlapping portions of the right and lower offset fault-tolerant areas form a twelfth area 12, the overlapping portions of the left and lower overlapping areas form a thirteenth area 13, the overlapping portions of the right and lower overlapping areas form a fourteenth area 14, the overlapping portions of the left and lower offset fault-tolerant areas form a fifteenth area 15, and the overlapping portions of the right and lower offset fault-tolerant areas form a sixteenth area 16.
Further, performing a deduplication process according to a geometric center position of the target recognition frame located in the overlapping region, including:
If the geometric center of the target identification frame is positioned in the left overlapping area, the left offset fault-tolerant area, the upper overlapping area or the upper offset fault-tolerant area in the left upper sub-graph, the corresponding target identification frame is counted into the total number of targets, and if the geometric center of the target identification frame is positioned in the right overlapping area, the right offset fault-tolerant area, the lower overlapping area or the lower offset fault-tolerant area in the left upper sub-graph, the corresponding target identification frame is not counted into the total number of targets;
And if the geometric center of the target identification frame is positioned in the right overlapping area, the right offset fault-tolerant area, the lower overlapping area or the lower offset fault-tolerant area in the left lower subgraph, the corresponding target identification frame is not counted into the target total number.
Further, performing a deduplication process according to a geometric center position of the target recognition frame located in the overlapping region, including:
If the geometric center of the target recognition frame is located in the first area 1, the second area 2, the fifth area 5 or the sixth area 6 in the upper left sub-graph, the corresponding target recognition frame is counted into the total number of targets, and if the geometric center of the target recognition frame is located in the third area 3, the fourth area 4, the seventh area 7, the eighth area 8, the ninth area 9, the tenth area 10, the eleventh area 11, the twelfth area 12, the thirteenth area 13, the fourteenth area 14, the fifteenth area 15 or the sixteenth area 16 in the upper left sub-graph, the corresponding target recognition frame is counted out of the total number of targets;
If the geometric center of the target recognition frame is positioned in the fourth area 4 or the eighth area 8 of the upper right sub-graph, the corresponding target recognition frame is counted into the total number of targets; if the geometric center of the target recognition frame is positioned in the second area 2, the third area 3, the sixth area 4 or the seventh area 7 of the upper right sub-image, positioning the geometric center of the corresponding target recognition frame at the position of the upper left sub-image, if the target recognition frame is counted in the upper left sub-image, not counting the targets any more, and if the target recognition frame is not counted in the upper left sub-image, counting the targets any more; if the geometric center of the object recognition frame is located in the first region 1, the fifth region 5, the seventh region 7, the ninth region 9, the tenth region 10, the eleventh region 11, the twelfth region 12, the thirteenth region 13, the fourteenth region 14, the fifteenth region 15 or the sixteenth region 16 of the upper right sub-graph, the corresponding object recognition frame does not count the total number of objects.
Further, performing a deduplication process according to a geometric center position of the target recognition frame located in the overlapping region, including:
if the geometric center of the target recognition frame is located in a thirteenth area 13 or a fourteenth area 14 in the lower left sub-image, the corresponding target recognition frame is counted into the total target number, if the geometric center of the target recognition frame is located in a fifth area 5, a sixth area 6, a ninth area 9 or a tenth area 10 in the lower left sub-image, the geometric center of the corresponding target recognition frame is located in the positions of the upper left sub-image and the upper right sub-image, if the target recognition frame is counted into the total target number in the upper left sub-image or the upper right sub-image, the target recognition frame is not counted any more, if the target recognition frame is not counted into the total target number in the upper left sub-image and the upper right sub-image, the target recognition frame is counted into the total target number, and if the geometric center of the target recognition frame is located in a first area 1, a second area 2, a third area 3, a fourth area 4, a seventh area 7, an eighth area 8, an eleventh area 11, a twelfth area 12, a fifteenth area 15 or a sixteenth area 16 in the lower left sub-image;
If the geometric center of the target recognition frame is located in the sixteenth region 16 of the lower right sub-graph, counting the corresponding target recognition frame into the total number of targets; if the geometric center of the target recognition frame is located in the sixth region 6, the seventh region 7, the eighth region 8, the tenth region 10, the eleventh region 11, the twelfth region 12, the fourteenth region 14 or the fifteenth region 15 of the lower right sub-image, the geometric center of the corresponding target recognition frame is located in the positions of the upper left sub-image, the upper right sub-image and the lower left sub-image, if the target recognition frame is counted in the upper left sub-image, the upper right sub-image or the lower left sub-image, the target is not counted any more, and if the target recognition frame is not counted in the upper left sub-image, the upper right sub-image and the lower left sub-image, the target recognition frame is counted in the total number of targets; if the geometric center of the object recognition frame is located in the first area 1, the second area 2, the third area 3, the fourth area 4, the fifth area 5, the ninth area 9 or the thirteenth area 13 of the lower right sub-graph, the corresponding object recognition frame does not count the total number of objects.
Specific methods of operation are provided below.
First, the overlap area size is calculated:
Knowing that the size of an original image is M rows and N columns, the maximum width of a target identification frame is N obj_max columns, the maximum height of the target identification frame is M obj_max rows, the maximum offset of the same target on the upper and lower sides or the left and right sides of the target identification frame on different subgraphs is delta (unit: pixel), and the size of a picture required by deep learning is M rows and N columns.
Setting the width x l of the left overlapped area, which needs to meet x l>Nobj_max/2; the left offset fault tolerance zone width X lr is set to satisfy X ld > delta, and the right overlap zone width X r=xl and the right offset fault tolerance zone width X rd=xld are generally set, so that the lateral overlap zone width x= 2*x l+2*xld,X>Nobj_max +2×delta.
Setting the height y u of the upper overlapping area, wherein y u>Mobj_max/2 is required to be met; the upper offset fault tolerance height Y ud is set to satisfy Y ud > delta, the lower overlap height Y b=yu is generally set, and the lower offset fault tolerance height Y bd=yud is set, so the lateral overlap height y= 2*y u+2*yud,Y>Mobj_max +2×delta.
Further, the original image is segmented:
Given that the size of the original image is M rows and N columns, the maximum image size that can be processed by the deep learning is M rows and N columns, and the original image can be divided into a row and B column subgraphs, then:
B=ceil((N+2*X)/n)
A=ceil((M+2*Y)/m)
Where ceil () represents a round-up.
The original image is uniformly divided into a row B columns and the subgraph is named:
img0_0,img0_1,...img0_B-1
img1_0,img1_1,...img1_B-1
...
imgA-1_0,imgA-1_1,...imgA-1_B-1
img0_0,img0_1,...img0_B-2,img1_0,img1_1, ...img1_B-2, ...imgA-2_0, imgA-2_1, ...imgA-2_B-2 Is the size of the ceil (M/A) row ceil (N/B) column; img 0_B-1, img1_B-1, ...imgA-2_B-2 has a size of ceil (M/A) row N-ceil (N/B) column (B-1), and img A-1_0,imgA-1_1,...imgA-1_B-1 has a size of ceil (M/A) column (A-1) row ceil (N/B).
Further, after the deep learning object identification, object counting is performed, and the counting flow is as follows:
After all subgraphs mark the target recognition frame through the trained model, the target is started to be counted. The initial value of the target number NUM is set to 0, and the "counted target array" Arr is set, which array is initially empty.
Traversing all sub-graphs, and counting the number of targets.
If the sub-graph is img 0_0, its size is the ceil (M/a) row ceil (N/B) column, let the sub-graph row number be 0,1, 2..ceil (M/a) -1, column number be 0,1, 2..ceil (M/a) -1, traverse all target identification boxes on the sub-graph, assuming the geometric center point row number of the i-th target identification box is x i, column number is y i, then:
Case1: if x i<=ceil(M/A)-1-xl-xld and y i<=ceil(N/B)-1-yu-yud, the width w i and the height h i of the target identification frame are calculated, and three pieces of information (the location subgraph, (row number, column number), (width, height)) of the target identification frame are stored in an Arr array, that is, arr [ NUM ] = (img 0_0, (xi,yi), (wi,hi)), and the target identification frame counts the total number, that is, num=num+1.
Case 2: otherwise, the target identification frame does not count up, num=num.
If the sub-graph is img0_b (b=1, 2,..b-2) and has a size of ceil (M/a) row ceil (N/B) column, let the sub-graph row number be 0,1, 2..ceil (M/a) -1, column number be 0,1, 2..ceil (M/a) -1, traverse all object identification boxes on the sub-graph, called the i-th object identification box as OBJi, the center point row number of the object is x i, and column number is y i, then:
Case1 if x i<=ceil(M/A)-1-xl-xld and y u+2*yud-1<=yi<=ceil(N/B)-1-yu-yud are satisfied, the width w i and height h i of the target identification frame are calculated, and three pieces of information (the located subgraph, (row number, column number), (width, height)) of the target identification frame are stored in an Arr array, namely Arr [ NUM ] = (img 0_b, (xi,yi), (wi,hi)), and the target identification frame counts up, namely num=num+1.
Case2: if x i<=ceil(M/A)-1-xl-xld and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification box are calculated first, then the row number interval [ x i-delta, xi + delta ] of the target at the identification box geometric center in img 0_b-1, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi + ceil (N/B) -1-Y + delta ] of the target at the identification box geometric center in img 0_b-1 are calculated.
Traversing the target listed as 'img 0_b-1' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the located subgraph, (row number, column number), (width, height)) of the target identification frame are stored in an Arr array, namely Arr [ NUM ] = (img 0_b, (xi,yi), (wi,hi)), and the target identification frame counts up in total, namely num=num+1.
Case 3: otherwise, the target identification frame does not count up, num=num.
If the sub-graph is img 0_B-1, the size is the column of ceil (M/a) row N-ceil (N/B) (B-1), the sub-graph row number is 0,1, 2..ceil (M/a) -1, the column number is 0,1, 2..n-ceil (N/B) row (B-1) -1, all object recognition frames on the sub-graph are traversed, the i-th object recognition frame is called OBJ i, the center point row number of the object is x i, the column number is y i, then:
Case 1: if x i<=ceil(M/A)-1-xl-xld and y i>=yu+2*yud -1 are satisfied, the width w i and the height h i of the target identification frame are calculated, and three pieces of information (the location subgraph, (row number, column number), (width, height)) of the target identification frame are stored in an Arr array, namely Arr [ NUM ] = (img0_b-1, (x i,yi), (wi,hi)), and the target identification frame counts up, namely num=num+1.
Case 2: if x i<=ceil(M/A)-1-xl-xld and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification box are calculated first, then the row number interval [ x i-delta, xi + delta ] of the target at the center of the identification box in img 0_B-2, and the column number interval [ Y i+ceil(N/B)-1-Y -delta, yi + ceil (N/B) -1-Y + delta ] of the target at the center of the identification box in img 0_B-2 are calculated.
Traversing the target listed as 'img 0_B-2' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img 0_B-1, (xi,yi), (wi,hi)), and the target identification frame counts up, that is, num=num+1.
Case 3: otherwise, the target identification frame does not count up, num=num.
If the sub-graph is img a_0(a=1,2,...A-2), its size is the ceil (M/a) row ceil (N/B) column, let the sub-graph row number be 0,1, 2..ceil (M/a) -1, column number be 0,1, 2..ceil (N/B) -1, traverse all target identification boxes on the sub-graph, called the i-th target identification box is OBJ i, the center point row number of the target is x i, column number is y i, then:
Case1: if x l+2*xld-1<=xi<=ceil(M/A)-1-xl-xld and y i<=ceil(N/B)-1-yu-yud, calculating the width w i and the height h i of the target identification frame, and storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img a_0, (xi,yi), (wi,hi)), and counting the target identification frames into a total, namely num=num+1;
case 2: if X l-1<=xi<xl+2*xld -1 and y i<=ceil(N/B)-1-yu-yud are satisfied, the width w i and height h i of the target identification box are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi + ceil (M/A) -1-X + delta ] of the target at the center of the identification box in img a-1_0, and the column number interval [ y i-delta, yi + delta ] of the target at the center of the identification box in img a-1_0 are calculated.
Traversing the target listed as 'img a-1_0' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img a_0, (xi,yi), (wi,hi)), and the target identification frame counts up, that is, num=num+1.
Case 3 otherwise the target identification box does not count up, num=num.
If the sub-graph is img a_b (a=1, 2,..a-2, b=1, 2,..b-2) with a size of ceil (M/a) row ceil (N/B) column, let the sub-graph line number be 0,1, 2..ceil (M/a) -1, traversing all target identification frames on the subgraph, namely an ith target identification frame being OBJ i, wherein the central point line number of the target is x i, and the column number is y i:
Case1: if x l+2*xld-1<=xi<=ceil(M/A)-1-xl-xld and y u+2*yud-1<=yi<=ceil(N/B) -1-yu-yud, calculating the width w i and the height h i of the target identification frame, and storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img a_b, (xi,yi), (wi,hi)), and counting the target identification frames into a total, namely num=num+1;
Case2: if X l-1<=xi<xl+2*xld -1 and Y u+2*yud-1<=yi < = ceil (N/B), or X l+2*xld-1<=xi<=ceil(M/A)-1-xl-xld and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification frame are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi + ceil (M/a) -1-X + delta ] of the target at the center of the target identification frame in img a-1_b-1, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi + ceil (N/B) -1-Y + delta ] of the target at the center of the identification frame in img a-1_b-1 are calculated.
Traversing the object listed as 'img a-1_b-1' in the Arr array, if the row number X o at the central position of the identification frame of a certain object OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the object OBJ o are close to w i and h i, the object OBJ o is regarded as the object OBJ i, the object OBJ i does not count in total, and NUM=NUM;
Otherwise, calculating a row number interval [ X i+ceil(M/A)-1-X-delta, xi +ceil (M/A) -1-X+delta ] of the target at the center of the identification frame in img a-1_b, and a column number interval [ y i-delta, yi +delta ] of the target at the center of the identification frame in img a-1_b;
traversing the target listed as 'img a-1_b' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, calculate the row number interval [ x i-delta, xi +delta ] of the target at the center of the identification frame in img a_b-1, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi +ceil (N/B) -1-Y+delta ] of the target at the center of the identification frame in img a-1_b.
Traversing the target listed as 'img a-1_b' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img a_b, (xi,yi), (wi,hi)), and the target identification frame counts up, that is, num=num+1.
Case 3 otherwise the target identification box does not count up, num=num.
If the sub-graph is img a_B-1 (a=1, 2,..a-2) and has a size of ceil (M/a) row N-ceil (N/B) × (B-1) column, let the sub-graph row number be 0,1, 2..ceil (M/a) -1, column number be 0,1, 2..n-ceil (N/B) × (B-1) -1, traverse all object recognition boxes on the sub-graph, call the i-th object recognition box OBJ i, the center point row number of the object is x i, column number is y i, then:
Case1: if x l+2*xld-1<=xi<=ceil(M/A)-1-xl-xld, and yi > = yu+2 x yud-1, calculating the width w i and the height h i of the target identification frame, storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img a_B-1, (xi,yi), (wi,hi)), and counting the target identification frames into a total number, namely num=num+1;
Case 2: if X l-1<=xi<xl+2*xld -1 and Y i>=yu+2*yud -1, or X l+2*xld-1<=xi<=ceil(M/A)-1-xl-xld and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification frame are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi + ceil (M/a) -1-X + delta ] of the target at the center of the identification frame in img a-1_B-2, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi + ceil (N/B) -1-Y + delta ] of the target at the center of the identification frame in img a-1_B-2 are calculated.
Traversing the object listed as 'img a-1_B-2' in the Arr array, if the row number X o at the central position of the identification frame of a certain object OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the object OBJ o are close to w i and h i, the object OBJ o is regarded as the object OBJ i, the object OBJ i does not count in total, and NUM=NUM;
Otherwise, calculating a row number interval [ X i+ceil(M/A)-1-X-delta, xi +ceil (M/A) -1-X+delta ] of the target at the center of the identification frame in img a-1_B-1, and a column number interval [ y i-delta, yi +delta ] of the target at the center of the identification frame in img a-1_B-1;
Traversing the target listed as 'img a-1_B-1' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of OBJo are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count, and NUM=NUM;
Otherwise, calculate the row number interval [ x i-delta, xi +delta ] of the target at the center of the identification frame in img a_B-2, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi +ceil (N/B) -1-Y+delta ] of the target at the center of the identification frame in img a_B-2.
Traversing the target listed as 'img a_B-2' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img a_B-1, (xi,yi), (wi,hi)), and the target identification frame counts up, that is, num=num+1.
Case 3; otherwise, the target identification frame does not count up, num=num.
If the sub-graph is img A-1_0, its size is M-ceil (M/a) -row ceil (N/B) column, let the sub-graph row number be 0,1, 2..m-ceil (M/a) -1, column number be 0,1, 2..ceil (N/B) -1, traverse all object recognition boxes on the sub-graph, called i-th object recognition box OBJ i, center point row number of the object is x i, column number is y i, then:
Case 1 if x i>=xl+2*xld -1 and y i<=ceil(N/B)-1-yu-yud, calculating the width w i and the height h i of the target identification frame, and storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img a_0, (xi,yi), (wi,hi)), and counting the target identification frames into a total number, namely num=num+1;
Case 2 if X l-1<=xi<xl+2*xld -1 and y i<=ceil(N/B)-1-yu-yud are satisfied, the width w i and height h i of the target identification box are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi +ceil (M/A) -1-X+delta ] of the target at the center of the identification box in img A-2_0, and the column number interval [ y i-delta, yi +delta ] of the target at the center of the identification box in img A-2_0 are calculated.
Traversing the target listed as 'img A-2_0' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img A-1_0, (xi, yi), (wi, hi)), and the target identification frame counts up, that is, num=num+1.
Case 3 otherwise the target identification box does not count up, num=num.
If the sub-graph is img A-1_b (b=1, 2,..b-2) and has a size of M-ceil (M/a) × (a-1) row ceil (N/B) column, let the sub-graph row number be 0,1, 2..m-ceil (M/a) × (a-1) -1 and column number be 0,1, 2..ceil (N/B) -1, then all object recognition boxes on the sub-graph are traversed, called the i-th object recognition box as OBJ i, the center point row number of the object is x i and the column number is y i, then:
Case1: if x i>=xl+2*xld -1 and y u+2*yud-1<=yi<=ceil(N/B)-1-yu-yud, calculating the width w i and the height h i of the target identification frame, and storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img A-1_b, (xi,yi), (wi,hi)), and counting the target identification frames into a total, namely num=num+1;
Case 2: if X l-1<=xi<xl+2*xld -1 and Y u+2*yud-1<=yi < = ceil (N/B), or X i>=xl+2*xld -1 and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification frame are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi + ceil (M/a) -1-X + delta ] of the center of the identification frame of the target in img A-2_b-1, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi + ceil (N/B) -1-Y + delta ] of the center of the identification frame of the target in img A-2_b-1 are calculated.
Traversing the object listed as 'img A-2_b-1' in the Arr array, if the row number X o at the central position of the identification frame of a certain object OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the object OBJ o are close to w i and h i, the object OBJ o is regarded as the object OBJ i, the object OBJ i does not count in total, and NUM=NUM;
Otherwise, calculating a row number interval [ X i+ceil(M/A)-1-X-delta, xi +ceil (M/A) -1-X+delta ] of the target at the center of the identification frame in img A-2_b, and a column number interval [ y i-delta, yi +delta ] of the target at the center of the identification frame in img A-2_b;
Traversing the target listed as 'img A-2_b' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, calculate the row number interval [ x i-delta, xi +delta ] of the target at the center of the identification frame in img A-1_b-1, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi +ceil (N/B) -1-Y+delta ] of the target at the center of the identification frame in img A-1_b-1.
Traversing the target listed as 'img A-1_b-1' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img A-1_b, (xi, yi), (wi, hi)), and the target identification frame counts up, that is, num=num+1.
Case 3: otherwise, the target identification frame does not count up, num=num.
If the sub-graph is img A-1_B-1, its size is M-ceil (M/a) -row N-ceil (N/B) -column (B-1), let the sub-graph row number be 0,1, 2..m-ceil (M/a) -column number be 0,1, 2..n-ceil (N/B) -column number be (B-1) -column, traverse all object recognition boxes on the sub-graph, call the i-th object recognition box be OBJ i, the center point row number of the object be x i, column number be y i, then:
Case 1: if x i>=xl+2*xld -1 and y i>=yu+2*yud -1, calculating the width w i and the height h i of the target identification frame, and storing three pieces of information (the sub-graph (row number, column number), (width, height)) of the target identification frame into an Arr array, namely Arr [ NUM ] = (img a_B-1, (xi,yi), (wi,hi)), and counting the target identification frames, namely num=num+1;
Case 2: if X l-1<=xi<xl+2*xld -1 and Y i>=yu+2*yud -1, or X i>=xl+2*xld -1 and Y u-1<=yi<yu+2*yud -1 are satisfied, the width w i and height h i of the target identification frame are calculated first, then the row number interval [ X i+ceil(M/A)-1-X-delta, xi + ceil (M/a) -1-X + delta ] of the target at the center of the identification frame in img A-2_B-2, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi + ceil (N/B) -1-Y + delta ] of the target at the center of the identification frame in img A-2_B-2 are calculated.
Traversing the object listed as 'img A-2_B-2' in the Arr array, if the row number X o at the central position of the identification frame of a certain object OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the object OBJ o are close to w i and h i, the object OBJ o is regarded as the object OBJ i, the object OBJ i does not count in total, and NUM=NUM;
Otherwise, calculating a row number interval [ X i+ceil(M/A)-1 -X-delta, xi +ceil (M/A) -1-X+delta ] of the target at the center of the identification frame in img A-2_B-1, and a column number interval [ y i-delta, yi +delta ] of the target at the center of the identification frame in img a-1_B-1;
traversing the target listed as 'img A-2_B-1' in the 0 th column in the Arr array, if the row number X o at the central position of the identification frame of a certain target OBJ o meets X i+ceil(M/A)-1-X-delta<=xo<=xi +ceil (M/A) -1-X+delta, the column number y o meets y i-delta<=yo<=yi +delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
otherwise, calculate the row number interval [ x i-delta, xi +delta ] of the target at the center of the identification frame in img A-1_B-2, and the column number interval [ Y i+ceil(N/B)-1-Y-delta, yi +ceil (N/B) -1-Y+delta ] of the target at the center of the identification frame in img A-1_B-2.
Traversing the target listed as 'img A-1_B-2' in the 0 th column in the Arr array, if the row number x o of the central position of the identification frame of a certain target OBJ o meets x i-delta<=xo<=xi +delta, the column number Y o meets Y i+ceil(N/B)-1-Y -delta<=yo<=yi +ceil (N/B) -1-Y+delta, the width and height of the OBJ o are close to w i and h i, and the target OBJ o is regarded as the target OBJ i, the target OBJ i does not count in total, and NUM=NUM;
Otherwise, the width w i and the height h i of the target OBJ i identification frame are calculated, and three pieces of information (the sub-graph where the target identification frame is located, (row number, column number), (width, height)) are stored in an Arr array, that is, arr [ NUM ] = (img A-1_B-1, (xi,yi), (wi,hi)), and the target identification frame counts up, that is, num=num+1.
Case 3 otherwise the target identification box does not count up, num=num.
Referring to fig. 8, in some embodiments, there is also provided an image segmentation and deep learning-based object recognition counting apparatus applied to the above method, including:
The acquisition module is used for acquiring an original image to be identified and acquiring the size of the original image;
the size determining module is used for determining the size of the overlapped area according to the maximum size and the maximum offset of the target identification frame;
The segmentation module is used for carrying out overlapped segmentation on the original image according to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapped area to obtain at least two subgraphs, wherein an overlapped area exists between every two adjacent subgraphs;
the target recognition module is used for carrying out target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
and the de-duplication counting module is used for traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area.
The method and the device for identifying and counting targets based on image segmentation and deep learning provided by the embodiment at least comprise the following beneficial effects:
(1) When counting targets, the problem that targets at a segmentation boundary are repeatedly identified due to deep learning characteristics is considered, each target is guaranteed to be counted only once through a de-duplication algorithm, the de-duplication algorithm considers the offset of an identification frame of the same target on two or more images, and the accuracy of target identification and counting is improved;
(2) The target counting and de-duplication method does not need image splicing operation or extra training, can judge whether the target is repeatedly calculated according to the central position of the target identification frame, is convenient to operate, greatly improves de-duplication speed, and has high accurate counting efficiency;
(3) By setting the overlapping area, the targets to be counted are guaranteed to be the targets which are not segmented, the problem of missing target counting caused by low incomplete target recognition accuracy in deep learning is avoided, and the stability of target recognition and counting is further improved.
(4) The duplication elimination algorithm is simple in principle, easy to reproduce, applicable to target counting duplication elimination of any shape and size and strong in universality.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. The object recognition counting method based on image segmentation and deep learning is characterized by comprising the following steps of:
collecting an original image to be identified, and acquiring the size of the original image;
determining the size of an overlapping area according to the maximum size and the maximum offset of the target identification frame;
According to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapping area, carrying out overlapping segmentation on the original image to obtain at least two subgraphs, wherein an overlapping area exists between every two adjacent subgraphs;
Performing target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area;
The maximum size of the target recognition frame comprises a maximum width and a maximum height;
The overlapping area is divided into a left overlapping area, a left offset fault-tolerant area, a right overlapping area and a right offset fault-tolerant area according to the position relation; and/or, an upper overlap region, an upper offset fault tolerance region, a lower overlap region, a lower offset fault tolerance region;
The width and the height of the left overlapping area and the right overlapping area are equal, the width and the height of the left offset fault-tolerant area and the right offset fault-tolerant area are equal, the width and the height of the upper overlapping area and the lower overlapping area are equal, and the width and the height of the upper offset fault-tolerant area and the lower offset fault-tolerant area are equal;
the width of the left overlapping area is larger than half of the maximum width, the height of the upper overlapping area is larger than half of the maximum height, the width of the left offset fault-tolerant area is larger than the maximum offset, and the height of the upper offset fault-tolerant area is larger than the maximum offset;
the original image is partitioned into a row B column subgraphs:
B=ceil((N+2*X)/n) ;
A=ceil((M+2*Y)/m);
Where ceil () represents an upward rounding, N represents the number of pixel columns of the original image, M represents the number of pixel columns of the original image, N represents the number of pixel columns of the maximum image size that can be handled by the deep learning, M represents the number of pixel columns of the maximum image size that can be handled by the deep learning, X represents the sum of the widths of the left overlap region, the left offset fault tolerance region, the right overlap region, and the right offset fault tolerance region, and Y represents the sum of the heights of the overlap region, the up offset fault tolerance region, the down overlap region, and the down offset fault tolerance region.
2. The method according to claim 1, wherein the original image is divided into two parts overlapping each other in a lateral direction, and the divided sub-images include a left sub-image and a right sub-image in accordance with their relative positional relationship;
performing de-duplication processing according to the geometric center position of the target identification frame in the overlapping area, including:
if the geometric center of the target identification frame is positioned in the left overlapping area or the left offset fault-tolerant area in the left sub-graph, the corresponding target identification frame is counted into the total number of targets;
if the geometric center of the target identification frame is positioned in the right overlapping area or the right offset fault-tolerant area in the left sub-graph, the corresponding target identification frame does not count the total number of targets;
if the geometric center of the target recognition frame is positioned in the left overlapping area in the right sub-graph, the corresponding target recognition frame does not count the total number of targets;
If the geometric center of the target identification frame is positioned in the right overlapping area in the right sub-graph, counting the corresponding target identification frame into the total number of targets;
if the geometric center of the target identification frame is located in the left offset fault-tolerant area or the right offset fault-tolerant area in the right sub-image, detecting the position of the geometric center of the target identification frame in the left sub-image, if the geometric center of the target identification frame is located in the left offset fault-tolerant area in the left sub-image, counting the target identification frame into the total number of targets, and if the geometric center of the target identification frame is located in the right offset fault-tolerant area in the left sub-image, counting the target identification frame out of the total number of targets.
3. The method according to claim 1, wherein the original image is divided into segments in the longitudinal direction with overlapping, and the sub-images obtained by the division include an upper sub-image and a lower sub-image according to their relative positional relationship;
performing de-duplication processing according to the geometric center position of the target identification frame in the overlapping area, including:
If the geometric center of the target identification frame is positioned in an upper overlapping area or an upper offset fault-tolerant area in the upper subgraph, the corresponding target identification frame is counted into the total number of targets;
if the geometric center of the target identification frame is positioned in a lower overlapping area or a lower offset fault-tolerant area in the upper subgraph, the corresponding target identification frame does not count the total number of targets;
if the geometric center of the target recognition frame is positioned in the upper overlapping area in the lower subgraph, the corresponding target recognition frame does not count the total number of targets;
if the geometric center of the target recognition frame is positioned in the lower overlapping area in the lower subgraph, the corresponding target recognition frame is counted into the total number of targets;
If the geometric center of the target identification frame is located in the upper offset fault-tolerant area or the lower offset fault-tolerant area in the lower sub-image, detecting the position of the geometric center of the target identification frame in the upper sub-image, if the geometric center of the target identification frame is located in the upper offset fault-tolerant area in the upper sub-image, counting the target identification frame into the total number of targets, and if the geometric center of the target identification frame is located in the lower offset fault-tolerant area in the upper sub-image, counting the target identification frame out of the total number of targets.
4. The method according to claim 1, wherein the division of the original image with overlap includes a lateral division and a longitudinal division, and the divided sub-images include an upper left sub-image, an upper right sub-image, a lower left sub-image, and a lower right sub-image according to their relative positional relationship;
The overlapping parts of the left overlapping region and the upper overlapping region in the left upper sub-graph, the right upper sub-graph, the left lower sub-graph and the right lower sub-graph form a first region, the overlapping parts of the left offset fault-tolerant region and the upper overlapping region form a second region, the overlapping parts of the right offset fault-tolerant region and the upper overlapping region form a third region, the overlapping parts of the right overlapping region and the upper overlapping region form a fourth region, the overlapping parts of the left overlapping region and the upper offset fault-tolerant region form a fifth region, the overlapping parts of the left offset fault-tolerant region and the upper offset fault-tolerant region form a sixth region, the overlapping parts of the right offset fault-tolerant region and the upper offset fault-tolerant region form a seventh region, the overlapping parts of the right overlapping region and the upper offset fault-tolerant region form an eighth region, the overlapping portions of the left overlapping region and the lower offset fault-tolerant region form a ninth region, the overlapping portions of the right overlapping region and the lower offset fault-tolerant region form a tenth region, the overlapping portions of the left offset fault-tolerant region and the lower offset fault-tolerant region form an eleventh region, the overlapping portions of the right offset fault-tolerant region and the lower offset fault-tolerant region form a twelfth region, the overlapping portions of the left overlapping region and the lower overlapping region form a thirteenth region, the overlapping portions of the right overlapping region and the lower overlapping region form a fourteenth region, the overlapping portions of the left offset fault-tolerant region and the lower overlapping region form a fifteenth region, and the overlapping portions of the right offset fault-tolerant region and the lower overlapping region form a sixteenth region.
5. The method of claim 4, wherein performing the deduplication process based on the geometric center position of the object recognition frame located in the overlapping region comprises:
If the geometric center of the target identification frame is positioned in the left overlapping area, the left offset fault-tolerant area, the upper overlapping area or the upper offset fault-tolerant area in the left upper sub-graph, the corresponding target identification frame is counted into the total number of targets, and if the geometric center of the target identification frame is positioned in the right overlapping area, the right offset fault-tolerant area, the lower overlapping area or the lower offset fault-tolerant area in the left upper sub-graph, the corresponding target identification frame is not counted into the total number of targets;
And if the geometric center of the target identification frame is positioned in the right overlapping area, the right offset fault-tolerant area, the lower overlapping area or the lower offset fault-tolerant area in the left lower subgraph, the corresponding target identification frame is not counted into the target total number.
6. The method of claim 5, wherein performing the deduplication process based on the geometric center position of the object recognition frame located in the overlapping region comprises:
If the geometric center of the target recognition frame is positioned in a first area, a second area, a fifth area or a sixth area in the upper left sub-graph, the corresponding target recognition frame is counted into the total number of targets, and if the geometric center of the target recognition frame is positioned in a third area, a fourth area, a seventh area, an eighth area, a ninth area, a tenth area, an eleventh area, a twelfth area, a thirteenth area, a fourteenth area, a fifteenth area or a sixteenth area in the upper left sub-graph, the corresponding target recognition frame is not counted into the total number of targets;
If the geometric center of the target recognition frame is positioned in the fourth area or the eighth area of the upper right sub-graph, the corresponding target recognition frame is counted into the total number of targets; if the geometric center of the target recognition frame is positioned in the second area, the third area, the sixth area or the seventh area of the upper right sub-image, positioning the geometric center of the corresponding target recognition frame at the position of the upper left sub-image, if the target recognition frame is counted in the upper left sub-image, not counting the targets any more, and if the target recognition frame is not counted in the upper left sub-image, counting the targets any more; if the geometric center of the object recognition frame is located in the first area, the fifth area, the seventh area, the ninth area, the tenth area, the eleventh area, the twelfth area, the thirteenth area, the fourteenth area, the fifteenth area or the sixteenth area of the upper right sub-graph, the corresponding object recognition frame does not count the total number of objects.
7. The method of claim 4, wherein performing the deduplication process based on the geometric center position of the object recognition frame located in the overlapping region comprises:
If the geometric center of the target recognition frame is positioned in a thirteenth area or a fourteenth area in the left lower sub-image, the corresponding target recognition frame is counted into the total number of targets, if the geometric center of the target recognition frame is positioned in a fifth area, a sixth area, a ninth area or a tenth area in the left lower sub-image, the geometric center of the corresponding target recognition frame is positioned at the positions of the left upper sub-image and the right upper sub-image, if the target recognition frame is counted into the total number of targets in the left upper sub-image or the right upper sub-image, the target recognition frame is not counted into the total number of targets in the left upper sub-image and the right upper sub-image, the target recognition frame is counted into the total number of targets in the left lower sub-image, and if the geometric center of the target recognition frame is positioned in the first area, the second area, the third area, the fourth area, the seventh area, the eighth area, the eleventh area, the twelfth area, the fifteenth area or the sixteenth area, the corresponding target recognition frame is not counted into the total number of targets;
If the geometric center of the target recognition frame is positioned in the sixteenth area of the right lower sub-graph, counting the corresponding target recognition frame into the total number of targets; if the geometric center of the target recognition frame is positioned in the sixth area, the seventh area, the eighth area, the tenth area, the eleventh area, the twelfth area, the fourteenth area or the fifteenth area of the lower right sub-image, positioning the geometric center of the corresponding target recognition frame at the positions of the upper left sub-image, the upper right sub-image and the lower left sub-image, if the target recognition frame is counted in the upper left sub-image, the upper right sub-image or the lower left sub-image, not counting the targets any more, and if the target recognition frame is not counted in the upper left sub-image, the upper right sub-image and the lower left sub-image, counting the targets any more; if the geometric center of the target recognition frame is located in the first area, the second area, the third area, the fourth area, the fifth area, the ninth area or the thirteenth area of the right lower sub-graph, the corresponding target recognition frame does not count the total number of targets.
8. An image segmentation and deep learning based object recognition counting apparatus applied to the method of any one of claims 1 to 7, comprising:
The acquisition module is used for acquiring an original image to be identified and acquiring the size of the original image;
the size determining module is used for determining the size of the overlapped area according to the maximum size and the maximum offset of the target identification frame;
The segmentation module is used for carrying out overlapped segmentation on the original image according to the maximum image size which can be processed by the deep learning, the size of the original image and the size of the overlapped area to obtain at least two subgraphs, wherein an overlapped area exists between every two adjacent subgraphs;
the target recognition module is used for carrying out target recognition on the at least two sub-graphs based on deep learning to obtain a target recognition frame;
and the de-duplication counting module is used for traversing the subgraph, counting targets according to the number of the target identification frames, and performing de-duplication processing according to the geometric center position of the target identification frames positioned in the overlapping area.
CN202410465612.1A 2024-04-18 2024-04-18 Target recognition and counting method and device based on image segmentation and deep learning Active CN118097121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410465612.1A CN118097121B (en) 2024-04-18 2024-04-18 Target recognition and counting method and device based on image segmentation and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410465612.1A CN118097121B (en) 2024-04-18 2024-04-18 Target recognition and counting method and device based on image segmentation and deep learning

Publications (2)

Publication Number Publication Date
CN118097121A CN118097121A (en) 2024-05-28
CN118097121B true CN118097121B (en) 2024-06-28

Family

ID=91149897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410465612.1A Active CN118097121B (en) 2024-04-18 2024-04-18 Target recognition and counting method and device based on image segmentation and deep learning

Country Status (1)

Country Link
CN (1) CN118097121B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997459A (en) * 2017-04-28 2017-08-01 成都艾联科创科技有限公司 A kind of demographic method split based on neutral net and image congruencing and system
CN113313692A (en) * 2021-06-03 2021-08-27 广西大学 Automatic banana young plant identification and counting method based on aerial visible light image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6249939B2 (en) * 2014-12-26 2017-12-20 三菱電機株式会社 Image processing apparatus, image processing method, image reading apparatus, and image processing program
CN114943729A (en) * 2022-06-29 2022-08-26 南京九川科学技术有限公司 Cell counting method and system for high-resolution cell image
CN115035128B (en) * 2022-08-10 2022-11-08 之江实验室 Image overlapping sliding window segmentation method and system based on FPGA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997459A (en) * 2017-04-28 2017-08-01 成都艾联科创科技有限公司 A kind of demographic method split based on neutral net and image congruencing and system
CN113313692A (en) * 2021-06-03 2021-08-27 广西大学 Automatic banana young plant identification and counting method based on aerial visible light image

Also Published As

Publication number Publication date
CN118097121A (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN113077453B (en) Circuit board component defect detection method based on deep learning
CN103927526B (en) Vehicle detecting method based on Gauss difference multi-scale edge fusion
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
US8744168B2 (en) Target analysis apparatus, method and computer-readable medium
US11042742B1 (en) Apparatus and method for detecting road based on convolutional neural network
CN110930390B (en) Chip pin missing detection method based on semi-supervised deep learning
CN111027539B (en) License plate character segmentation method based on spatial position information
CN107833213A (en) A kind of Weakly supervised object detecting method based on pseudo- true value adaptive method
CN109034245A (en) A kind of object detection method merged using characteristic pattern
CN105303153A (en) Vehicle license plate identification method and apparatus
CN112183301B (en) Intelligent building floor identification method and device
KR101997048B1 (en) Method for recognizing distant multiple codes for logistics management and code recognizing apparatus using the same
EP2813973A1 (en) Method and system for processing video image
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN113362385A (en) Cargo volume measuring method and device based on depth image
CN114781514A (en) Floater target detection method and system integrating attention mechanism
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN115761674A (en) Road edge positioning detection method, equipment and medium
CN111881752B (en) Guardrail detection classification method and device, electronic equipment and storage medium
CN112784675B (en) Target detection method and device, storage medium and terminal
CN114004858A (en) Method and device for identifying aviation cable surface code based on machine vision
CN118097121B (en) Target recognition and counting method and device based on image segmentation and deep learning
CN111738310B (en) Material classification method, device, electronic equipment and storage medium
CN111881914A (en) License plate character segmentation method and system based on self-learning threshold
CN116310293A (en) Method for detecting target of generating high-quality candidate frame based on weak supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant