CN109948497A - A kind of object detecting method, device and electronic equipment - Google Patents

A kind of object detecting method, device and electronic equipment Download PDF

Info

Publication number
CN109948497A
CN109948497A CN201910186133.5A CN201910186133A CN109948497A CN 109948497 A CN109948497 A CN 109948497A CN 201910186133 A CN201910186133 A CN 201910186133A CN 109948497 A CN109948497 A CN 109948497A
Authority
CN
China
Prior art keywords
frame
selection
selection frame
group
visible
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910186133.5A
Other languages
Chinese (zh)
Other versions
CN109948497B (en
Inventor
李作新
俞刚
袁野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201910186133.5A priority Critical patent/CN109948497B/en
Publication of CN109948497A publication Critical patent/CN109948497A/en
Priority to PCT/CN2019/126435 priority patent/WO2020181872A1/en
Application granted granted Critical
Publication of CN109948497B publication Critical patent/CN109948497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of object detecting method, device and electronic equipments, are related to the technical field of image recognition, this method comprises: obtaining comprising one or more test objects image to be processed;Object detection is carried out to image to be processed, obtain at least one pre-selection frame, wherein pre-selection frame includes visible frame and/or complete frame, complete frame is the encirclement frame to a test object entirety, it is seen that frame is the encirclement frame of each test object visibility region in image to be processed;Grouping belonging to each pre-selection frame at least one described pre-selection frame is determined by relevance modeler model, obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;Duplicate removal processing is carried out to each pre-selection frame group, obtains the pre-selection frame group after duplicate removal processing;The target detection frame of each test object is determined based on the pre-selection frame group after duplicate removal processing.The present invention can be effectively prevented from the missing inspection of test object.

Description

A kind of object detecting method, device and electronic equipment
Technical field
The present invention relates to the technical fields of image procossing, set more particularly, to a kind of object detecting method, device and electronics It is standby.
Background technique
Object detection is one of the classical problem in computer vision, and task is to mark objects in images with encirclement frame Position, and provide the classification of object.Add the frame of shallow-layer classifier from traditional artificial design features, to based on deep learning Detection framework end to end, object detection becomes further mature.Intensively go out currently, being especially similar object in the more objects of appearance It is existing, and generate and occur under conditions of blocking between object, existing object detection algorithms only consider the object detection of class hierarchy, Cause the prior art that can not carry out accurately object detection in the case where blocking very well.The case where being blocked mutually between object Under, it can not effectively be distinguished since method in the prior art often leads to the problem of the object that is blocked with object is blocked, So as to cause the object missing inspection that is blocked.
Summary of the invention
In view of this, the present invention is slow the purpose of the present invention is to provide a kind of object detecting method, device and electronic equipment When having solved the prior art and carrying out object detection under the intensive circumstance of occlusion of object, the technology that similar object is easy to appear missing inspection is asked Topic.
In a first aspect, the embodiment of the invention provides a kind of object detecting methods, comprising: obtain comprising one or more inspections Survey object image to be processed;Object detection is carried out to the image to be processed, obtains at least one pre-selection frame, wherein described pre- Selecting frame includes visible frame and/or complete frame, and the complete frame is the encirclement frame to a test object entirety, and the visible frame is The encirclement frame of each test object visibility region in the image to be processed;It is described at least by the determination of relevance modeler model Grouping belonging to each pre-selection frame in one pre-selection frame obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group Belong to identical test object;Duplicate removal processing is carried out to each pre-selection frame group, obtains the pre-selection frame group after duplicate removal processing;It is based on Pre-selection frame group after the duplicate removal processing determines the target detection frame of each test object.
Further, it is determined at least one described pre-selection frame by relevance modeler model and is divided belonging to each pre-selection frame Group, obtaining at least one pre-selection frame group includes: to be obtained by the instance properties Projection Character network of the relevance modeler model The attribute feature vector of each pre-selection frame at least one described pre-selection frame;Pass through the cluster mould of the relevance modeler model Block, based on belonging to each pre-selection frame at least one determining described pre-selection frame of attribute feature vector of each pre-selection frame points Group obtains at least one described pre-selection frame group.
Further, the instance properties Projection Character network passes through Lpull loss function and the training of Lpush loss function It obtains;Wherein, the distance of the attribute feature vector for the pre-selection frame for belonging to the same test object is drawn by Lpull loss function Closely, the distance of the attribute feature vector for the pre-selection frame for belonging to different test objects is zoomed out by Lpush loss function.
Further, by the cluster module of the relevance modeler model, the attribute based on each pre-selection frame is special It levies vector and determines grouping belonging to each pre-selection frame at least one described pre-selection frame, obtain at least one described pre-selection frame group packet It includes: calculating the vector distance value between attribute feature vector described in any two, obtain multiple vector distance values;It will be the multiple Be added to identical grouping less than two pre-selection frames of preset threshold in vector distance value, be not added with into grouping other are each A pre-selection frame is grouped individually as one;Clustering is carried out at least one obtained grouping by clustering algorithm, is obtained To at least one described pre-selection frame group.
Further, each pre-selection frame group includes visible frame group and complete frame group;Each pre-selection frame group is gone It handles again, obtaining the pre-selection frame after duplicate removal processing includes: to go to the visible frame group at least one described pre-selection frame group It handles again, obtains the visible frame group after duplicate removal processing;Each detection is determined based on the pre-selection frame group after the duplicate removal processing The target detection frame of object include: based on after the duplicate removal processing visible frame group and the complete frame group determine each detection The target detection frame of object.
Further, duplicate removal processing is carried out to the visible frame group at least one described pre-selection frame group, obtains duplicate removal processing Visible frame group later includes: to be carried out using non-maxima suppression algorithm to the visible frame group at least one described pre-selection frame group Duplicate removal processing obtains the visible frame group after duplicate removal processing.
Further, based on after the duplicate removal processing visible frame group and the complete frame group determine each test object Target detection frame include: in the visible frame group after the duplicate removal processing each visible frame carry out local feature alignment at Reason;And local feature registration process is carried out to each complete frame in the complete frame group;After feature registration process Complete frame after visible frame and feature registration process is input to object detection model and carries out detection processing, obtains the feature Visible frame position coordinates and class probability value after registration process, and obtain the position of the complete frame after feature registration process Set coordinate and class probability value;The target detection of each test object is determined based on target location coordinate and target classification probability value Frame, wherein the target location coordinate includes: the visible frame position coordinates and/or the spy after the feature registration process The position coordinates of the complete frame after registration process are levied, after the target classification probability value includes: the feature registration process Visible frame class probability value and/or the complete frame after the feature registration process class probability value.
Further, the target detection frame of each test object is determined based on target location coordinate and target classification probability value It include: using the target classification probability value as the weight of corresponding target location coordinate;According to the target classification probability value Weighted average is calculated to the target location coordinate of each test object, obtains the target detection frame of the test object; The target detection frame includes target visible frame and/or the complete frame of target.
Further, object detection is carried out to the image to be processed, obtain at least one pre-selection frame include: by it is described to Processing image, which is input in feature pyramid network, to be handled, and feature pyramid is obtained;Utilize region candidate network RPN model The feature pyramid is handled, at least one described pre-selection frame is obtained, wherein is at least one described pre-selection frame every A pre-selection frame carries attribute tags, and the attribute tags include complete for determining the affiliated type of each pre-selection frame, the type Frame and visible frame.
Second aspect, the embodiment of the invention also provides a kind of article detection devices, comprising: image acquisition unit is used for It obtains comprising one or more test objects image to be processed;Frame acquiring unit is preselected, for carrying out to the image to be processed Object detection obtains at least one pre-selection frame, wherein the pre-selection frame includes visible frame and/or complete frame, and the complete frame is To the encirclement frame of a test object entirety, the visible frame is each test object visibility region in the image to be processed Encirclement frame;Grouped element, for determining each pre-selection frame institute at least one described pre-selection frame by relevance modeler model The grouping of category obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;Duplicate removal list Member obtains the pre-selection frame group after duplicate removal processing for carrying out duplicate removal processing to each pre-selection frame group;Determination unit is used for base Pre-selection frame group after the duplicate removal processing determines the target detection frame of each test object.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including memory, processor, the storages The computer program that can be run on the processor is stored in device, the processor is realized when executing the computer program The step of method described above.
Fourth aspect, the embodiment of the invention also provides a kind of non-volatile program codes that can be performed with processor Computer-readable medium, said program code make the processor execute method described above.
In embodiments of the present invention, firstly, obtaining comprising one or more test objects image to be processed;Then, it treats It handles image and carries out object detection, obtain at least one pre-selection frame;It is next determined that each pre- at least one described pre-selection frame Grouping belonging to frame is selected, at least one pre-selection frame group is obtained.The embodiment of the present invention, which passes through, determines grouping belonging to each pre-selection frame, At least one pre-selection frame group is obtained, since the pre-selection frame in identical pre-selection frame group belongs to identical test object, thus by pre- It selects frame group to distinguish the pre-selection frame for belonging to different test objects, prevents from making the pre-selection frame of occluded object during duplicate removal It is removed for the redundancy pre-selection frame of occlusion objects, alleviates the prior art and carry out object detection under the intensive circumstance of occlusion of object When, the technical issues of similar object is easy to appear missing inspection, realizes the inspection to test objects one or more in image to be processed It surveys, and is effectively prevented from the purpose of the missing inspection of test object.
Meanwhile at least one pre-selection frame group is determined by relevance modeler model, relevance modeler model is by neural network It realizes, after at least one pre-selection frame is input to relevance modeler model, makes full use of the characteristic information of image in pre-selection frame, pre- It selects the location information of frame to be grouped pre-selection frame, the pre-selection frame of different test objects can be effectively distinguished, especially for close Collection object blocks in scene, can be to position neighbour in the complete higher situation of frame registration of occlusion objects and occluded object Closely, size is similar, but the pre-selection frame for belonging to different test objects is accurately grouped.
Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.
To enable the above objects, features, and advantages of the disclosure to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the schematic diagram of a kind of electronic equipment according to an embodiment of the present invention;
Fig. 2 is a kind of flow chart of object detecting method according to an embodiment of the present invention;
Fig. 3 be one kind according to an embodiment of the present invention intensively block similar object visible frame and complete frame schematic diagram;
Fig. 4 is a kind of pre-selection frame according to an embodiment of the present invention and test object correspondence diagram;
Fig. 5 is a kind of schematic diagram of article detection device according to an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Embodiment one:
Firstly, describing the exemplary electronic device 100 of the object detecting method for realizing the embodiment of the present invention referring to Fig.1.
As shown in Figure 1, electronic equipment 100 include one or more processors 102, it is one or more storage device 104, defeated Enter device 106, output device 108 and video camera 110, the connection that these components pass through bus system 112 and/or other forms The interconnection of mechanism (not shown).It should be noted that the component and structure of electronic equipment 100 shown in FIG. 1 are only exemplary, rather than limit Property processed, as needed, the electronic equipment also can have other assemblies and structure.
The processor 102 can use digital signal processor (DSP), field programmable gate array (FPGA), can compile Journey logic array (PLA) and ASIC(Application Specific Integrated Circuit) at least one of it is hard Part form realizes that the processor 102 can be central processing unit (CPU) or have data-handling capacity and/or refer to The processing unit of the other forms of executive capability is enabled, and can control other components in the electronic equipment 100 to execute Desired function.
The storage device 104 may include one or more computer program products, and the computer program product can To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non- Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or The various data etc. generated.
The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..
The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..
The video camera 110 is for carrying out obtaining image to be processed, wherein image process to be processed acquired in video camera The object detecting method obtains the target detection frame of test object after being handled, for example, video camera can shoot user Desired image (such as photo, video etc.) then obtains after being handled the image by the object detecting method Captured image can also be stored in the memory 104 for other by the target detection frame of test object, video camera Component uses.
It illustratively, can be by reality for realizing the exemplary electronic device of object detecting method according to an embodiment of the present invention Now on the mobile terminals such as smart phone, tablet computer.
Embodiment two:
According to embodiments of the present invention, a kind of embodiment of object detecting method is provided, it should be noted that in the process of attached drawing The step of illustrating can execute in a computer system such as a set of computer executable instructions, although also, in process Logical order is shown in figure, but in some cases, it can be to be different from shown or described by sequence execution herein Step.
Fig. 2 is a kind of flow chart of object detecting method according to an embodiment of the present invention, as shown in Fig. 2, this method includes Following steps:
Step S202 is obtained comprising one or more test objects image to be processed.
It in embodiments of the present invention, may include the test object of plurality of classes in image to be processed, it may for example comprise the mankind And non-human, wherein non-human includes dynamic object and static object, and dynamic object can be the object of animal class, Static object can be other objects to remain static other than human and animal.
In each image to be processed, may include the object of plurality of classes, every other object of type can have one or It is multiple, such as have 2 people and 3 dogs in image.It can be shown independently of each other between each type objects in image to be processed, it can also The some of objects of energy are blocked by other object, and cause to show completely.
It should be noted that test object can be the one or more of pending object detection step in image to be processed The object of classification.User can test object determine according to actual needs classification, the present embodiment is not specifically limited.
Explanation is needed further exist for, in the present embodiment, image to be processed can be to pass through the electronics in embodiment one The image that the video camera of equipment is shot can also be the image being stored in advance in memory in the electronic device, this implementation Example is not specifically limited in this embodiment.
Step S204 carries out object detection to the image to be processed, obtains at least one pre-selection frame, wherein described pre- Selecting frame includes visible frame and/or complete frame, and the complete frame is the encirclement frame to a test object entirety, and the visible frame is The encirclement frame of each test object visibility region in the image to be processed.
In embodiments of the present invention, it after getting image to be processed, can be detected at network handles by pre-selection frame It manages image and carries out object detection.It is the inspection to not being blocked in image to be processed to the process that image to be processed carries out object detection It surveys object and carries out object detection, to export complete frame, which can be with are as follows: the object being blocked in image to be processed is carried out Object detection, while exporting complete frame and visible frame.
Multiple visible frames or multiple complete frames may be generated to the same test object, different visible frame or different complete Whole frame may have the scaling of different proportion relative to image to be processed.
Step S206 is determined at least one described pre-selection frame by relevance modeler model and is divided belonging to each pre-selection frame Group obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;
In embodiments of the present invention, after object detection, multiple pre-selection frames are generated respectively for different test objects, wherein It include visible frame and/or complete frame in the pre-selection frame.In general, the pre-selection frame that testing result includes is redundancy, gone It handles, is gone during duplicate removal using the pre-selection frame of occluded object as the redundancy of occlusion objects pre-selection frame in order to prevent again Remove, it is thus necessary to determine that grouping belonging to each pre-selection frame obtains a pre-selection frame group according to grouping is corresponding, it is available at least One pre-selection frame group, to be distinguished the pre-selection frame for belonging to different test objects by preselecting frame group.The relevance modeling Model is a kind of model of incidence relation that can obtain input data, can be preselected at least one by neural fusion After frame is input to relevance modeler model, relevance modeler model can be according to the characteristic information of image in pre-selection frame, and combines pre- The location information of frame is selected effectively to be grouped pre-selection frame.
At least one pre-selection frame is grouped through the above way, can will belong to the pre-selection frame of the same test object A pre-selection frame group is formed, due to may simultaneously include visible frame and complete frame in the pre-selection frame group of the same test object, be somebody's turn to do It can also simultaneously include a visible frame group and a complete frame group in the pre-selection frame group of test object.
It should be noted that as shown in Figure 4 a kind of pre-selection frame and test object correspondence diagram, in figure, detection Object includes the occluded object Q that occlusion objects P and occluded object P are blocked, and pre-selection frame includes 7 to ten No. two frames of No. seven frames 12.7, No. eight frames 8 of No. seven frames and No. nine frames 9 belong to occlusion objects P in figure, No. ten frames 10, ride on Bus No. 11 frame 11 and ten No. two frame 12 belong to occluded object Q in figure.7, No. eight frames 8 of No. seven frames and No. nine frames 9 form a pre-selection frame group, No. ten frames 10, ten No.1 frame 11 and ten No. two frame 12 forms another pre-selection frame group.
The grouping belonging to each pre-selection frame in obtaining 7 to ten No. two frames 12 of No. seven frames, and after obtaining pre-selection frame group, it can be with Duplicate removal processing is carried out to the pre-selection frame in each pre-selection frame group respectively, prevents the higher feelings of pre-selection frame registration in different objects Under condition, there is obscuring for frame between different objects, prevents the pre-selection frame of occluded object Q (such as No. ten during duplicate removal Frame 10) as occlusion objects P redundancy pre-selection frame be removed, greatly reduce the probability to the object missing inspection that is blocked.
Step S208 carries out duplicate removal processing to each pre-selection frame group, obtains the pre-selection frame group after duplicate removal processing;
In embodiments of the present invention, after object belonging to pre-selection frame has been determined, to the pre-selection frame component of each test object Not carry out duplicate removal processing, by be grouped duplicate removal, the pre-selection frame for avoiding different test objects is mutually obscured, specifically, avoiding It is got rid of during duplicate removal using the pre-selection frame of occluded object as the redundancy of occlusion objects pre-selection frame, and then avoids the occurrence of and hidden The problem of missing inspection of block material body.
Step S210 determines the target detection frame of each test object based on the pre-selection frame group after the duplicate removal processing.
In embodiments of the present invention, after obtaining the pre-selection frame group after duplicate removal processing, can based on duplicate removal processing it Pre-selection frame group afterwards determines the target detection frame of each test object.If test object is not blocked in image to be processed, Then the target detection frame of the test object includes the complete frame of target;It, should if test object is blocked in image to be processed It include the complete frame of target and target visible frame in the target detection frame of test object.The complete frame of target can be used for being detected The location information of object, and the image feature information of the test object in addition to occluded object;The target visible frame can For obtaining the image feature information of occluded object, since the embodiment of the present invention can obtain two kinds of target detection Frame, and then more comprehensive, more accurate detection object information can be obtained, for images such as subsequent identification, verifyings Processing.
In embodiments of the present invention, above-mentioned step can be executed with processor in electronic equipment in through the foregoing embodiment one Rapid S202 to step S210.
It should be noted that being able to carry out the processor of above-mentioned steps S202 to step S210 can apply in the present invention In embodiment, this is not especially limited.
In embodiments of the present invention, firstly, obtaining comprising one or more test objects image to be processed;Then, it treats It handles image and carries out object detection, obtain at least one pre-selection frame;It is next determined that each pre- at least one described pre-selection frame Grouping belonging to frame is selected, at least one pre-selection frame group is obtained.The embodiment of the present invention, which passes through, determines grouping belonging to each pre-selection frame, At least one pre-selection frame group is obtained, since the pre-selection frame in identical pre-selection frame group belongs to identical test object, thus by pre- It selects frame group to distinguish the pre-selection frame for belonging to different test objects, prevents from making the pre-selection frame of occluded object during duplicate removal It is removed for the redundancy pre-selection frame of occlusion objects, alleviates the prior art and carry out object detection under the intensive circumstance of occlusion of object When, the technical issues of similar object is easy to appear missing inspection, realizes the inspection to test objects one or more in image to be processed It surveys, and is effectively prevented from the purpose of the missing inspection of test object.
In addition, the complete frame registration of occlusion objects and occluded object is higher, only in the case where globe blocks scene The information such as the positions and dimensions by complete frame can not effectively distinguish the complete frame of different test objects, be grouped effect Difference, and then effective duplicate removal can not be carried out to complete frame.In the embodiment of the present invention, relevance modeler model is by neural network reality It is existing, after at least one pre-selection frame is input to relevance modeler model, it is effective using the characteristic information of image in pre-selection frame, it is pre- It selects the location information of frame to be grouped pre-selection frame, the pre-selection frame of different test objects can be effectively distinguished, especially for close Collection object blocks in scene, can be to position neighbour in the complete higher situation of frame registration of occlusion objects and occluded object Closely, size is similar, but the pre-selection frame for belonging to different test objects is accurately grouped.
The embodiment of the present invention is described in detail below in conjunction with specific embodiment.
As can be seen from the above description, in the present embodiment, it obtains first comprising the to be processed of one or more test objects Image.Later, so that it may object detection be carried out to image to be processed, obtain at least one pre-selection frame.
In an optional embodiment, step S204 carries out object detection to image to be processed, obtains at least one Pre-selection frame includes the following steps:
The image to be processed is input in feature pyramid network and handles, obtains feature pyramid by step S2041;
Step S2042 utilizes region candidate network RPN(Region Proposal Networks) model is to the feature gold word Tower is handled, at least one described pre-selection frame is obtained, wherein each pre-selection frame at least one described pre-selection frame, which carries, to be belonged to Property label, the attribute tags are for determining that the affiliated type of each pre-selection frame, the type include complete frame and visible frame.
As can be seen from the above description, in embodiments of the present invention, feature pyramid network is for generating feature pyramid.It can To select such as 16 models of VGG(Visual Geometry Group), Resnet or FPN(Feature Pyramid ) etc. Networks basic networks model is as feature pyramid network.In the present embodiment, image to be processed can be input to It is handled in feature pyramid network, obtains feature pyramid.
Utilizing region candidate network RPN(Region Proposal Networks) model to feature pyramid at Before reason, need to be trained region candidate network RPN model by default training set, it, can will be basic in the present embodiment Network model (for example, FPN) and RPN model are trained together.Wherein, presetting includes multiple training samples in training set, often A training sample includes: training image and its corresponding image tag.Wherein, image tag is preselected for marking in training image The type of frame, the type include complete frame or visible frame.The present invention can be used multiple training samples and carry out to RPN model Training, so that RPN model can identify and identify the pre-selection box type in image.
After being trained using above-mentioned default training set to basic network model and region candidate network RPN model, Feature pyramid can be handled using region candidate network RPN model after training, obtain at least one pre-selection The attribute tags of frame and each pre-selection frame, which is visible frame or complete frame for characterizing the pre-selection frame.
Specifically, which can be expressed as " 1 " or " 2 ", for example, " 1 " indicates that the pre-selection frame is visible frame, " 2 " Indicate that the pre-selection frame is complete frame.Other than " 1 " and " 2 ", the data that can also select other machines that can identify are as category Property label is not specifically limited in the present embodiment.
In the present embodiment, it in such a way that region candidate network RPN model handles feature pyramid, can obtain To more accurate pre-selection frame testing result.
After obtaining more accurately preselecting frame testing result, so that it may determine each at least one described pre-selection frame Grouping belonging to frame is preselected, at least one pre-selection frame group is obtained.
In an optional embodiment, step S206 determines that described at least one is pre- by relevance modeler model Grouping belonging to each pre-selection frame in frame is selected, at least one pre-selection frame group is obtained and includes the following steps:
Step S11 obtains at least one described pre-selection by the instance properties Projection Character network of the relevance modeler model The attribute feature vector of each pre-selection frame in frame;
Step S12, by the cluster module of the relevance modeler model, the attribute feature vector based on each pre-selection frame It determines grouping belonging to each pre-selection frame at least one described pre-selection frame, obtains at least one described pre-selection frame group.
In embodiments of the present invention, relevance modeler model can be Associate embedding model.Relevance is built Instance properties Projection Character network in mould model can be also referred to as embedding encoding(, embedded coding) network.It will At least one pre-selection frame is input to the embedding encoding network in relevance modeler model, returns for each pre-selection frame Corresponding attribute feature vector, the corresponding attribute feature vector of each pre-selection frame.Then, special according to attribute by cluster module Sign vector makes the pre-selection frame of same test object assign to the same grouping, the corresponding different test object of different groupings.
Before determining grouping belonging to each pre-selection frame using Associate embedding model, it is also necessary to right Embedding encoding network in Associate embedding model is trained.To determine embedding Which kind of attribute feature vector encoding network exports.In training process, the constraint condition of above-mentioned attribute feature vector is attribute The distance between feature vector can be Euclidean distance, COS distance etc..It can will be belonged to together by first constraint condition The distance of the attribute feature vector of the pre-selection frame of one test object furthers, so by attribute feature vector will belong to it is same The pre-selection frame of test object is added to the same grouping;The pre-selection frame of different test objects will be belonged to by second constraint condition The distance of attribute feature vector zoom out, and then the pre-selection frame for belonging to different test objects is added to by attribute feature vector Different groupings.Specifically, first constraint condition can be Lpull loss function, second constraint condition can be Lpush Loss function.Lpull loss function can be first passed through distance is carried out to embedding encoding network and furthered training, then led to It crosses Lpush loss function and training is zoomed out to embedding encoding network progress distance;It can also be damaged simultaneously using Lpull It loses function and Lpush loss function is trained embedding encoding network.
It should be noted that above-mentioned Lpull loss function shaped like:, Wherein, M is the number of attribute feature vector, ek、ejIndicate arbitrary attribute feature vector, CmIndicate corresponding test object pair The number for the attribute feature vector answered;Above-mentioned Lpush loss function shaped like:, Wherein, M is the number of attribute feature vector, ek、ejIndicate arbitrary attribute feature vector,Indicate preset distance value.
Embedding encoding network training is completed, and is obtaining pre-selection frame by region candidate network RPN model Later, the attribute feature vector of each pre-selection frame is obtained using embedding encoding network to get embedding is arrived Value(value embedded).Embedding value can be N-dimensional vector, obtain a N-dimensional vector, the N-dimensional to each pre-selection frame Vector can indicate are as follows:
In embodiments of the present invention, it obtains attribute feature vector purpose and is to discriminate between object example different in pre-selection frame (instance, that is, test object), this feature vector need that each inspection can be distinguished with the other separating capacity of instance-level Object is surveyed, rather than just the separating capacity (type for distinguishing test object) of class hierarchies, so to feature extraction network Choosing has certain requirement, and the attribute feature vector embedding value that instance properties Projection Character network obtains, and has The other separating capacity of good instance-level.
In addition, the generation of attribute feature vector embedding encoding is modeled using with grouping relationship Model Associate embedding, which directly optimizes according to the incidence relation of practical pre-selection frame, to be obtained, and is directly according to pre-selection Frame grouping task optimizes, therefore available more direct, good performance boost.
Further, instance properties Projection Character network is by neural fusion, can with the detection network of pre-selection frame (such as Feature pyramid network and region candidate network RPN) it is merged, the two shares the foundation characteristic of network, reduces calculation amount.And And can be directly combined with instance properties Projection Character network in the detection network training process of pre-selection frame, realize two The joint training of person's overall network, without increasing other external informations, training process is fairly simple.
Further, after obtaining above-mentioned N-dimensional vector, so that it may by comparing two different pre-selection frames N-dimensionals to The size of Euclidean distance between amount determines the two to judge whether the two different pre-selection frames belong to the same grouping Whether different pre-selection frames belong to the same test object.
The size of Euclidean distance between two N-dimensional vectors can be judged by setting preset threshold.For example, for default Threshold value x, if Euclidean distance is less than x between the N-dimensional vector of two different pre-selection frames, then it is assumed that between the two pre-selection frames Distance is smaller, it is believed that they belong to the same grouping.
Frame is preselected for other, above-mentioned described mode is all made of and determines grouping belonging to it, be no longer situated between one by one herein It continues.
By above-mentioned processing mode, test object belonging to each pre-selection frame can be accurately determined, to further drop The probability of low test object missing inspection.
Optionally, by the cluster module of the relevance modeler model, the attributive character based on each pre-selection frame Vector determines grouping belonging to each pre-selection frame at least one described pre-selection frame, obtains at least one described pre-selection frame group, can To be realized by following embodiments:
Step S1 calculates the vector distance value between attribute feature vector described in any two, obtains multiple vector distance values;
Two pre-selection frames for being less than preset threshold in the multiple vector distance value are added to identical grouping, not by step S2 Other each the pre-selection frames being added in grouping are grouped individually as one;
Step S3 carries out Clustering at least one obtained grouping by clustering algorithm, obtains at least one described pre-selection Frame group.
In embodiments of the present invention, all pre-selection frames are returned using above-mentioned embedding encoding network and is belonged to Property feature vector, calculates separately the vector distance value between any two attribute feature vector, vector distance value can pass through Europe Formula is calculated apart from equidistant calculation method.
Later, the size of all the vector distance values and preset threshold that are respectively compared, wherein the size of preset threshold Can be according to actual needs or empirically determined, the present embodiment is not especially limited this.If vector distance value is less than Preset threshold can determine that the vector distance value is object vector distance value, it is believed that the object vector distance value is two corresponding Pre-selection frame corresponds to the same test object, therefore, two pre-selection frames corresponding to the object vector distance value is added to same Grouping.It is corresponding pre- that vector distance value between other attribute feature vectors is not less than the attribute feature vector of preset threshold Frame is selected to be grouped individually as one.It is thus possible to obtain at least one grouping.
It should be noted that if having identical in the corresponding two groups of pre-selection frames of two different object vector distance values Frame is preselected, i.e., the corresponding three different pre-selection frames of two different object vector distance values can be by this three different pre-selection frames It is added to same grouping.
After obtaining at least one grouping, Clustering is carried out at least one obtained grouping by clustering algorithm.
It should be noted that clustering algorithm can be common algorithm, for example, can be K mean cluster algorithm (K- Means clustering algorithm, K-means) or mean shift clustering algorithm etc..
For example, having No. f1-f8 pre-selection frame and four test objects A, B, C and D in object to be processed.No. f1-f8 is preselected Frame uses embedding encoding algorithm to return out its attribute feature vector, i.e. embedding value respectively.It counts respectively The vector distance value between any two attribute feature vector is calculated, then is filtered out from multiple vector distance values less than preset threshold Object vector distance value be followed successively by s1-s4, wherein s1 f1, No. f2 pre-selection frame between vector distance value, s2 f2, f3 Number pre-selection frame between vector distance value, s3 f4, No. f5 preselect frame between vector distance value, s4 f5, No. f8 pre-selection frame Between vector distance value.According to above- mentioned information, the corresponding f1 of object vector distance value s1, No. f2 pre-selection frame are added to same The corresponding f2 of object vector distance value s2, No. f3 pre-selection frame are added to same be grouped due to f1, No. f2 pre-selection frame by grouping In same grouping, f2, No. f3 pre-selection frame are also in same grouping, and therefore, f1, f2 and No. f3 pre-selection frame are in same grouping, similarly, F4, f5 and No. f8 pre-selection frame are in same grouping.The attribute feature vector and arbitrary characteristics vector for preselecting frame due to No. f6 and No. f7 Between vector distance value be not less than preset threshold, therefore using No. f6 and No. f7 pre-selection frame as a grouping.Grouping knot It altogether include four groupings in fruit, wherein a grouping includes f1, f2 and No. f3 pre-selection frame;One grouping includes f4, f5 and No. f8 Preselect frame;One grouping includes No. f6 pre-selection frame;One grouping includes No. f7 pre-selection frame.It is carried out again according to 4 obtained groupings 4 pre-selection frame groups can be obtained in Clustering.
In another example there are tri- test objects of No. f1-f4 pre-selection frame and A, B and C in image to be processed, No. f1-f4 is preselected Frame uses embedding encoding algorithm to return out its attribute feature vector respectively, i.e. embedding value is respectively A1, a2, a3, a4, if the Euclidean distance between vector a1 and vector a4 is less than preset threshold, then it is assumed that vector a1 and vector a4 Belong to the same test object in A, B or C;If vector a1 and vector a2, vector a1 and vector a3, vector a2 and vector a3 Between vector distance value be not less than preset threshold, then it is assumed that vector a1, a2 and a3 three is not admitted to same between any two A test object, if the vector distance value for also meeting between vector a2 and vector a4, vector a3 and vector a4 be all not less than it is default Threshold value, it may be determined that vector a2 belongs to some test object in A, B or C;Vector a3 belongs in A, B or C the inspection for being different from a2 Object is surveyed, also different from the test object of vector a1 and the corresponding test object of vector a4.The group result obtained may are as follows: A1 and vector a4 belongs to A, and vector a2 belongs to B, and vector a3 belongs to C.
The grouping belonging to each pre-selection frame in determining at least one described pre-selection frame, obtain at least one preselect frame group it Afterwards, so that it may duplicate removal processing be carried out to each pre-selection frame group, obtain the pre-selection frame group after duplicate removal processing;And it is based on duplicate removal processing Pre-selection frame group later determines the target detection frame of each test object.
As can be seen from the above description, each pre-selection frame group may include visible frame group and complete frame group, be based on this, step S208 carries out duplicate removal processing to each pre-selection frame group, and obtaining the pre-selection frame after duplicate removal processing includes: that at least one is pre- to described It selects the visible frame group in frame group to carry out duplicate removal processing, obtains the visible frame group after duplicate removal processing, it is visible after duplicate removal processing It may also include one group of visible frame that frame group, which may include a visible frame,.
Step S210 determines the target detection frame packet of each test object based on the pre-selection frame group after the duplicate removal processing Include: based on after the duplicate removal processing visible frame group and the complete frame group determine the target detection frame of each test object.
Specifically, in the present embodiment, firstly, obtaining the image to be processed comprising one or more test objects;Then, Object detection is carried out to image to be processed, obtains at least one pre-selection frame;Later, it determines each at least one described pre-selection frame Grouping belonging to frame is preselected, at least one pre-selection frame group is obtained;Next, at least one pre-selection frame group in visible frame group into Row duplicate removal processing obtains the visible frame group after duplicate removal processing;Finally, based on after duplicate removal processing visible frame group and complete frame Group determines the target detection frame of each test object.
As can be seen from the above description, in embodiments of the present invention, since the test object of identification of the embodiment of the present invention may Intensively it is present in image to be processed, the registration so as to cause the complete frame of test object is higher, in order to reduce answering for duplicate removal Miscellaneous degree only can carry out duplicate removal processing to the visible frame group in pre-selection frame group.Later, so that it may according to the visible frame after duplicate removal It organizes with the complete frame group of non-duplicate removal and determines the target detection frame of each test object.
Specifically, in the present embodiment, the complete frame group of visible frame group and non-duplicate removal after duplicate removal can be input to Object detection is carried out in R-CNN model, and then obtains the target detection frame of each test object.
It should be noted that in embodiments of the present invention, according to after duplicate removal visible frame group and non-duplicate removal it is complete Input of the frame group as R-CNN model, when re-starting object detection, for the object that is blocked, can only it will be seen that frame group or Complete input of the frame group as R-CNN model can also will be seen that frame group and complete frame group are made together to improve the efficiency of detection For the input of R-CNN model, to improve the precision of detection, the present embodiment is not specifically limited in this embodiment.
Optionally, in the present embodiment, step carries out at duplicate removal the visible frame group at least one described pre-selection frame group Reason, obtaining the visible frame group after duplicate removal processing includes: using non-maxima suppression algorithm at least one described pre-selection frame group In visible frame group carry out duplicate removal processing, obtain the visible frame group after duplicate removal processing.
In embodiments of the present invention, using non-maxima suppression algorithm (non maximum suppression, nms) from Remove extra pre-selection frame in pre-selection frame group, by the threshold value in setting nms algorithm, the visible frame group in pre-selection frame group is carried out Duplicate removal processing.After obtaining the pre-selection frame group of each test object, since the registration of each complete frame in complete frame group is higher, Duplicate removal processing can not be carried out to complete frame.Therefore, duplicate removal processing only is carried out to visible frame group using nms algorithm, obtained at duplicate removal Visible frame group after reason.That is, in the present embodiment, after obtaining the pre-selection frame group of test object, if pre-selection frame Include visible frame group and complete frame group in group, then can visible frame group to test object carry out duplicate removal processing.
It should be noted that a kind of visible frame for intensively blocking similar object shown in Figure 3 and complete frame schematic diagram. In Fig. 3, the No.1 frame 1 in left side and No. three frames 3 respectively complete frame of occlusion objects P and occluded object Q.Usually intensively hiding During the human testing of gear crowd, the pre-selection frame using nms algorithm only for same kind of all test objects is gone Weight can not carry out good differentiation and cognition for example (different test objects), and the friendship between No.1 frame 1 and No. three frames 3 is simultaneously Than being generally higher than preset threshold value in nms, this results in two problems: if threshold value is excessively high, can not effectively preselect to repetition Frame carries out duplicate removal;If threshold value is too low, it is easy No. three frames 3 of occluded object Q below to delete, causes occluded object Q Missing inspection.
There is also the same problems between No. five frames 5 and No. six frames 6 on right side.And No. two frames 2 of dotted line frame are to be blocked pair As the visible frame of Q, it can be seen that the coincidence of the No.1 frame 1 of No. two frames 2 and occlusion objects P of the visible part of occluded object Q Degree is the registration for being significantly less than No. three frames 3 and No.1 frame 1, therefore, to occlusion objects P and can be blocked by No. two frames 2 Object Q is distinguished, and will be bound as visible frame No. two frames 2 and No. three frames 3 as complete frame, is become a pre-selection Frame group avoids duplicate removal from the process removing No. three frames 3 as the redundancy of occlusion objects P.
By the visible frame group and complete frame group after duplicate removal processing, calculating process can be simplified, improve R-CNN model Calculating speed and accuracy in computation, to obtain more accurate target detection frame.
Optionally, in the present embodiment, step is based on the visible frame group and the complete frame group after the duplicate removal processing The target detection frame for determining each test object includes:
Step S21 carries out local feature registration process to each visible frame in the visible frame group after the duplicate removal processing;With And local feature registration process is carried out to each complete frame in the complete frame group;
Complete frame after visible frame and feature registration process after feature registration process is input to object by step S22 Detection model carries out detection processing, obtains the visible frame position coordinates and class probability value after the feature registration process, with And obtain the position coordinates and class probability value of the complete frame after feature registration process;
Step S23 determines the target detection frame of each test object based on target location coordinate and target classification probability value, In, the target location coordinate includes: the visible frame position coordinates and/or feature alignment after the feature registration process The position coordinates of complete frame after processing, the target classification probability value include: visible after the feature registration process The class probability value of complete frame after the class probability value of frame and/or the feature registration process.
In embodiments of the present invention, firstly, to each complete in each visible frame and complete frame group in visible frame group Frame carries out local feature registration process.The purpose of local feature registration process is it will be seen that each visible frame in frame group and complete Each complete frame in frame group is adjusted to same size.
Optionally, above-mentioned object detection model can choose R-CNN model.To the visible frame group after duplicate removal processing Local feature registration process is carried out, and completely after the complete frame progress local feature registration process in frame, so that it may utilize The complete frame after visible frame and registration process after registration process determines the target detection frame of the test object corresponding to it.
It is alternatively possible to using the complete frame after the visible frame and/or registration process after registration process as object The input of detection model (such as R-CNN model) after the detection processing by object detection model, respectively obtains each visible The coordinate position and class probability value of frame, and obtain the coordinate position and class probability value of each complete frame.
As having determined that test object belonging to each visible frame or complete frame, to each test object include can See frame or complete frame, can be merged respectively according to their target location coordinate and target classification probability value, it is fused Visible frame or fused complete frame are the target detection frame of corresponding test object.For the test object not being blocked, Target detection frame is its final complete frame, and final complete frame is one or more detection merged to complete frame Frame;For the test object being blocked, the final complete frame and final visible frame, final visible frame that target detection frame is it are The detection block that one or more visible frames are merged.Wherein, for the test object being blocked, to its complete frame and Visible frame is merged respectively, obtains final complete frame and final visible frame.
It should be noted that can be only using the visible frame after feature registration process as the defeated of object detection model Enter, can also be only using the complete frame after feature registration process as the input of object detection model, it can also be by feature pair Input of the complete frame after visible frame and feature registration process together as object detection model after neat processing, this reality It applies example and this is not especially limited.
Optionally, in the present embodiment, step S23 is determined each based on target location coordinate and target classification probability value The target detection frame of test object includes the following steps:
Step S231, using the target classification probability value as the weight of corresponding target location coordinate;
It is flat to calculate weighting according to the target location coordinate of the target classification probability value to each test object by step S232 Mean value obtains the target detection frame of the test object;The target detection frame includes final visible frame and/or final complete Frame.
In embodiments of the present invention, it is seen that the target location coordinate of frame indicates visible frame corresponding position in image to be processed Confidence breath, it is seen that the target classification probability value of frame indicates the assessment to the detection processing result of visible frame.The target position of complete frame The complete frame of coordinate representation corresponding location information in image to be processed is set, the target classification probability value of complete frame is indicated to complete The assessment of the detection processing result of frame.Target classification probability value is higher, indicates the detection processing result of the visible frame or complete frame It is better, therefore, its higher weight is assigned, it can be using target classification probability value as weighted value, thus to target location coordinate Weighted average is calculated, the target detection frame of object is obtained, the target detection frame obtained by weighted average method has merged each A visible frame or the comprehensive detection of complete frame handle assessment result, obtain the position of target detection frame also more close to test object Physical location situation.
It should be noted that target detection frame be final test object accurate visible frame or accurate complete frame.Wherein, smart True visible frame be can accurately describe to be blocked test object maximum visibility region smallest enclosing box.
Optionally, in the present embodiment, if in feature pyramid including multiple characteristic patterns;So to the duplicate removal processing Each visible frame in visible frame group later carries out local feature registration process and includes the following steps:
Step S31 selects first object characteristic pattern in the feature pyramid;
Step S32, based on each visible frame in the visible frame group after the duplicate removal processing in the feature pyramid First object characteristic pattern carries out feature cutting, obtains the first cutting result;Result, which is cut, to described first carries out local feature pair Neat processing.
In embodiments of the present invention, first object characteristic pattern refers to that the visible frame in visible frame group is right in feature pyramid The characteristic pattern answered.Since, comprising the characteristic pattern of different scale, the characteristic pattern of different scale passes through pyramid network in feature pyramid The scaling that network carries out different proportion to image to be processed obtains.
It, can be by the visible frame according to first object characteristic pattern phase after determining the corresponding first object characteristic pattern of visible frame Scaling is carried out for the ratio of image zooming to be processed, and determines the position of the visible frame after scaling in first object characteristic pattern It sets.And then feature and its location information in the corresponding first object characteristic pattern in the position are obtained, result is cut as first.It is right First, which cuts result, carries out local feature registration process, and the first cutting result after registration process is input to target quality testing It surveys in model and carries out object detection.
It should be noted that the ROI Align module that can use in Mask RCNN will be seen that the corresponding feature of frame is cut Out, it recycles RCNN model to cut result to first and carries out further local feature registration process.
Optionally, in the present embodiment, if in feature pyramid including multiple characteristic patterns;To in the complete frame group Each complete frame carries out local feature registration process and includes the following steps:
Step S41 selects the second target signature in the feature pyramid;
Step S42, based on each complete frame in the complete frame group to the second target signature in the feature pyramid Feature cutting is carried out, the second cutting result is obtained;
Step S43 cuts result to described second and carries out local feature registration process.
In embodiments of the present invention, the second target signature refers to that the complete frame in complete frame group is right in feature pyramid The characteristic pattern answered.Since, comprising the characteristic pattern of different scale, the characteristic pattern of different scale passes through to be processed in feature pyramid Image carry out different proportion scaling obtain, after corresponding second target signature of the complete frame of determination, by the complete frame according to Second target signature carries out scaling relative to the ratio of image zooming to be processed, and scaling is determined in the second target signature The position of complete frame afterwards obtains feature and its location information in corresponding second target signature in the position, as second Cut result.Before the second cutting result is input to object detection model, result is cut to second and carries out local feature Registration process.
It should be noted that the ROI Align module that can use in Mask RCNN will be seen that the corresponding feature of frame is cut Out, it recycles RCNN model to cut result to second and carries out further local feature registration process.
In embodiments of the present invention, the object detection of class hierarchy, this hair are only considered compared with existing object detection algorithms The method that bright embodiment provides can carry out good differentiation and cognition to test object, and it is close to be especially similar object in more objects Collection occurs, and in the case that generation is blocked, uses visible frame and complete frame as regressive object in the RPN stage, meanwhile, for generating Pre-selection frame carry out hidden variable (embedding value) according to its corresponding different test object and distinguish, thus not only area Divide the pre-selection frame of different classes of object, while also distinguishing the pre-selection frame of different test objects, then using R-CNN to duplicate removal knot Fruit is returned again, and the regression result of different test objects is carried out to the fusion of frame, obtains testing result to the end, thus It realizes to the identification under intensive circumstance of occlusion to the object that is blocked, avoids the missing inspection for the object that is blocked.
Embodiment three:
The embodiment of the invention also provides a kind of article detection device, which is mainly used for executing implementation of the present invention Object detecting method provided by example above content, is specific Jie to article detection device provided in an embodiment of the present invention below It continues.
Fig. 5 is a kind of schematic diagram of article detection device according to an embodiment of the present invention, as shown in figure 5, the object detection Device mainly includes image acquisition unit 10, pre-selection frame acquiring unit, and 20, grouped element 30, duplicate removal unit 40, determination unit 50, in which:
Image acquisition unit 10, for obtaining comprising one or more test objects image to be processed;
Frame acquiring unit 20 is preselected, for obtaining at least one pre-selection frame to the image progress object detection to be processed, In, the pre-selection frame includes visible frame and/or complete frame, and the complete frame is the encirclement frame to a test object entirety, institute State the encirclement frame that visible frame is each test object visibility region in the image to be processed;
Grouped element 30, for being determined at least one described pre-selection frame belonging to each pre-selection frame by relevance modeler model Grouping obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;
Duplicate removal unit 40 obtains the pre-selection frame group after duplicate removal processing for carrying out duplicate removal processing to each pre-selection frame group;
Determination unit 50, for determining the target detection of each test object based on the pre-selection frame group after the duplicate removal processing Frame.
In embodiments of the present invention, it obtains first and then treats place comprising one or more test object images to be processed It manages image and carries out object detection, obtain at least one pre-selection frame, it is next determined that each pre-selection at least one described pre-selection frame Grouping belonging to frame obtains at least one pre-selection frame group, by carrying out duplicate removal processing to pre-selection frame group, removes the pre-selection of redundancy Frame obtains the pre-selection frame group after duplicate removal, to determine the mesh of each test object based on the pre-selection frame group after duplicate removal processing Detection block is marked, and then realizes the detection to test objects one or more in image to be processed, is effectively prevented from test object Missing inspection.
Optionally, each pre-selection frame group includes visible frame group and complete frame group;Duplicate removal unit 40 is also used to: to described Visible frame group at least one pre-selection frame group carries out duplicate removal processing, obtains the visible frame group after duplicate removal processing;Based on described Pre-selection frame group after duplicate removal processing determines that the target detection frame of each test object includes: based on after the duplicate removal processing Visible frame group and the complete frame group determine the target detection frame of each test object.
Optionally, frame acquiring unit 20 is preselected, is also used to: the image to be processed is input in feature pyramid network It is handled, obtains feature pyramid;The feature pyramid is handled using region candidate network RPN model, is obtained At least one described pre-selection frame, wherein each pre-selection frame at least one described pre-selection frame carries attribute tags, the attribute Label is for determining that the affiliated type of each pre-selection frame, the type include complete frame and visible frame.
Optionally, grouped element 30 determines each pre-selection frame at least one described pre-selection frame by relevance modeler model Affiliated grouping, obtaining at least one pre-selection frame group includes: the instance properties Projection Character by the relevance modeler model Network obtains the attribute feature vector of each pre-selection frame at least one described pre-selection frame;Pass through the relevance modeler model Cluster module determines each pre-selection frame institute at least one described pre-selection frame based on the attribute feature vector of each pre-selection frame The grouping of category obtains at least one described pre-selection frame group.
Optionally, the instance properties Projection Character network is obtained by Lpull loss function and the training of Lpush loss function ?;Wherein, the distance of the attribute feature vector for the pre-selection frame for belonging to the same test object is drawn by Lpull loss function Closely, the distance of the attribute feature vector for the pre-selection frame for belonging to different test objects is zoomed out by Lpush loss function.
Optionally, grouped element 30 calculates attribute described in any two by the cluster module of the relevance modeler model Vector distance value between feature vector obtains multiple vector distance values;Default threshold will be less than in the multiple vector distance value Two pre-selection frames of value are added to identical grouping, are not added with other each pre-selection frames into grouping individually as one A grouping;Clustering is carried out at least one obtained grouping by clustering algorithm, obtains at least one described pre-selection frame group.
Optionally, duplicate removal unit 40, is also used to: using non-maxima suppression algorithm at least one described pre-selection frame group Visible frame group carry out duplicate removal processing, obtain the visible frame group after duplicate removal processing.
Optionally it is determined that unit 50, is also used to: to each visible frame in the visible frame group after the duplicate removal processing into Row local feature registration process;And local feature registration process is carried out to each complete frame in the complete frame group;It will be special The complete frame after visible frame and feature registration process after sign registration process is input to object detection model and is detected Processing obtains the visible frame position coordinates and class probability value after the feature registration process, and obtains at feature alignment The position coordinates and class probability value of complete frame after reason;It is determined based on target location coordinate and target classification probability value each The target detection frame of test object, wherein the target location coordinate includes: the visible frame position after the feature registration process The position coordinates of the complete frame after coordinate and/or the feature registration process are set, the target classification probability value includes: described The classification of complete frame after the class probability value of visible frame after feature registration process and/or the feature registration process is general Rate value.
Optionally it is determined that unit 50, is also used to: using the target classification probability value as corresponding target location coordinate Weight;Weighted average is calculated according to the target location coordinate of the target classification probability value to each test object, is obtained To the target detection frame of the test object;The target detection frame includes final visible frame and/or final complete frame.
It optionally, include multiple characteristic patterns in the feature pyramid, determination unit 50 is also used to: in the feature gold First object characteristic pattern is selected in word tower;Based on each visible frame in the visible frame group after the duplicate removal processing to the spy The first object characteristic pattern levied in pyramid carries out feature cutting, obtains the first cutting result;To it is described first cut result into Row local feature registration process.
It optionally, include multiple characteristic patterns in the feature pyramid, determination unit 50 is also used to: to the complete frame It includes: to select the second target signature in the feature pyramid that each complete frame in group, which carries out local feature registration process, Figure;Feature sanction is carried out to the second target signature in the feature pyramid based on each complete frame in the complete frame group It cuts, obtains the second cutting result;Result, which is cut, to described second carries out local feature registration process.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can To be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediary Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
A kind of computer program product of object detecting method provided by the embodiment of the present invention, including store processor The computer readable storage medium of executable non-volatile program code, the instruction that said program code includes can be used for executing Previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. a kind of object detecting method characterized by comprising
Obtain the image to be processed comprising one or more test objects;
Object detection is carried out to the image to be processed, obtains at least one pre-selection frame, wherein the pre-selection frame includes visible frame And/or complete frame, the complete frame are the encirclement frame to a test object entirety, the visible frame is that each test object exists The encirclement frame of visibility region in the image to be processed;
Grouping belonging to each pre-selection frame at least one described pre-selection frame is determined by relevance modeler model, obtains at least one A pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;
Duplicate removal processing is carried out to each pre-selection frame group, obtains the pre-selection frame group after duplicate removal processing;
The target detection frame of each test object is determined based on the pre-selection frame group after the duplicate removal processing.
2. the method according to claim 1, wherein determining that described at least one is pre- by relevance modeler model Grouping belonging to each pre-selection frame in frame is selected, obtaining at least one pre-selection frame group includes:
It is obtained by the instance properties Projection Character network of the relevance modeler model each at least one described pre-selection frame Preselect the attribute feature vector of frame;
By the cluster module of the relevance modeler model, described in the attribute feature vector determination based on each pre-selection frame Grouping belonging to each pre-selection frame at least one pre-selection frame obtains at least one described pre-selection frame group.
3. according to the method described in claim 2, it is characterized in that, the instance properties Projection Character network is damaged by Lpull It loses function and the training of Lpush loss function obtains;
Wherein, the distance of the attribute feature vector for the pre-selection frame for belonging to the same test object is drawn by Lpull loss function Closely, the distance of the attribute feature vector for the pre-selection frame for belonging to different test objects is zoomed out by Lpush loss function.
4. according to the method described in claim 2, it is characterized in that, by the cluster module of the relevance modeler model, base Grouping belonging to each pre-selection frame at least one described pre-selection frame is determined in the attribute feature vector of each pre-selection frame, is obtained Include: at least one described pre-selection frame group
The vector distance value between attribute feature vector described in any two is calculated, multiple vector distance values are obtained;
By in the multiple vector distance value be less than preset threshold two pre-selection frames be added to identical grouping, be not added with to divide Other each pre-selection frames in group are grouped individually as one;
Clustering is carried out at least one obtained grouping by clustering algorithm, obtains at least one described pre-selection frame group.
5. the method according to claim 1, wherein each pre-selection frame group includes visible frame group and complete frame Group;
Duplicate removal processing is carried out to each pre-selection frame group, obtaining the pre-selection frame after duplicate removal processing includes: that at least one is pre- to described It selects the visible frame group in frame group to carry out duplicate removal processing, obtains the visible frame group after duplicate removal processing;
The target detection frame that each test object is determined based on the pre-selection frame group after the duplicate removal processing includes: to be gone based on described Visible frame group and the complete frame group handle again after determine the target detection frame of each test object.
6. according to the method described in claim 5, it is characterized in that, to it is described at least one pre-selection frame group in visible frame group into Row duplicate removal processing, obtaining the visible frame group after duplicate removal processing includes:
Duplicate removal processing is carried out to the visible frame group at least one described pre-selection frame group using non-maxima suppression algorithm, is gone Visible frame group handle again after.
7. according to the method described in claim 6, it is characterized in that, based on visible frame group after the duplicate removal processing and described Complete frame group determines that the target detection frame of each test object includes:
Local feature registration process is carried out to each visible frame in the visible frame group after the duplicate removal processing;And to described Each complete frame in complete frame group carries out local feature registration process;
Complete frame after visible frame and feature registration process after feature registration process is input to object detection model Detection processing is carried out, obtains the visible frame position coordinates and class probability value after the feature registration process, and obtain spy Levy the position coordinates and class probability value of the complete frame after registration process;
The target detection frame of each test object is determined based on target location coordinate and target classification probability value, wherein the mesh After cursor position coordinate includes: visible frame position coordinates and/or the feature registration process after the feature registration process Complete frame position coordinates, the target classification probability value includes: the classification of the visible frame after the feature registration process The class probability value of complete frame after probability value and/or the feature registration process.
8. the method according to the description of claim 7 is characterized in that being determined based on target location coordinate and target classification probability value The target detection frame of each test object includes:
Using the target classification probability value as the weight of corresponding target location coordinate;
Weighted average is calculated according to the target location coordinate of the target classification probability value to each test object, is obtained The target detection frame of the test object;The target detection frame includes target visible frame and/or the complete frame of target.
9. the method according to claim 1, wherein carry out object detection to the image to be processed, obtain to A pre-selection frame includes: less
The image to be processed is input in feature pyramid network and is handled, feature pyramid is obtained;
The feature pyramid is handled using region candidate network RPN model, obtains at least one described pre-selection frame, In, each pre-selection frame at least one described pre-selection frame carries attribute tags, and the attribute tags are for determining each pre-selection The affiliated type of frame, the type include complete frame and visible frame.
10. a kind of article detection device characterized by comprising
Image acquisition unit, for obtaining comprising one or more test objects image to be processed;
Frame acquiring unit is preselected, for carrying out object detection to the image to be processed, obtains at least one pre-selection frame, wherein The pre-selection frame includes visible frame and/or complete frame, and the complete frame is the encirclement frame to a test object entirety, it is described can See that frame is the encirclement frame of each test object visibility region in the image to be processed;
Grouped element divides belonging to each pre-selection frame for being determined at least one described pre-selection frame by relevance modeler model Group obtains at least one pre-selection frame group;Pre-selection frame in identical pre-selection frame group belongs to identical test object;
Duplicate removal unit obtains the pre-selection frame group after duplicate removal processing for carrying out duplicate removal processing to each pre-selection frame group;
Determination unit, for determining the target detection frame of each test object based on the pre-selection frame group after the duplicate removal processing.
11. a kind of electronic equipment, including memory, processor, it is stored with and can runs on the processor in the memory Computer program, which is characterized in that the processor is realized in the claims 1 to 9 when executing the computer program The step of described in any item methods.
12. a kind of computer-readable medium for the non-volatile program code that can be performed with processor, which is characterized in that described Program code makes the processor execute any the method in the claim 1-9.
CN201910186133.5A 2019-03-12 2019-03-12 Object detection method and device and electronic equipment Active CN109948497B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910186133.5A CN109948497B (en) 2019-03-12 2019-03-12 Object detection method and device and electronic equipment
PCT/CN2019/126435 WO2020181872A1 (en) 2019-03-12 2019-12-18 Object detection method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910186133.5A CN109948497B (en) 2019-03-12 2019-03-12 Object detection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109948497A true CN109948497A (en) 2019-06-28
CN109948497B CN109948497B (en) 2022-01-28

Family

ID=67009787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910186133.5A Active CN109948497B (en) 2019-03-12 2019-03-12 Object detection method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN109948497B (en)
WO (1) WO2020181872A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532897A (en) * 2019-08-07 2019-12-03 北京科技大学 The method and apparatus of components image recognition
CN110827261A (en) * 2019-11-05 2020-02-21 泰康保险集团股份有限公司 Image quality detection method and device, storage medium and electronic equipment
CN111178128A (en) * 2019-11-22 2020-05-19 北京迈格威科技有限公司 Image recognition method and device, computer equipment and storage medium
CN111582177A (en) * 2020-05-09 2020-08-25 北京爱笔科技有限公司 Image detection method and related device
WO2020181872A1 (en) * 2019-03-12 2020-09-17 北京旷视科技有限公司 Object detection method and apparatus, and electronic device
CN113761245A (en) * 2021-05-11 2021-12-07 腾讯科技(深圳)有限公司 Image recognition method and device, electronic equipment and computer readable storage medium
WO2022095854A1 (en) * 2020-11-04 2022-05-12 深圳Tcl新技术有限公司 Image recognition method, apparatus, and device, and computer-readable storage medium
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117372919A (en) * 2023-09-22 2024-01-09 北京市燃气集团有限责任公司 Third party construction threat detection method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699881A (en) * 2020-12-31 2021-04-23 北京一起教育科技有限责任公司 Image identification method and device and electronic equipment
CN113469174A (en) * 2021-04-12 2021-10-01 北京迈格威科技有限公司 Dense object detection method, apparatus, device and storage medium
CN113743333B (en) * 2021-09-08 2024-03-01 苏州大学应用技术学院 Strawberry maturity recognition method and device
CN113987667B (en) * 2021-12-29 2022-05-03 深圳小库科技有限公司 Building layout grade determining method and device, electronic equipment and storage medium
CN115731517B (en) * 2022-11-22 2024-02-20 南京邮电大学 Crowded Crowd detection method based on crown-RetinaNet network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103597514A (en) * 2011-06-10 2014-02-19 松下电器产业株式会社 Object detection frame display device and object detection frame display method
US20160371833A1 (en) * 2015-06-17 2016-12-22 Xerox Corporation Determining a respiratory pattern from a video of a subject
CN106529527A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Object detection method and device, data processing deice, and electronic equipment
US20180089505A1 (en) * 2016-09-23 2018-03-29 Samsung Electronics Co., Ltd. System and method for deep network fusion for fast and robust object detection
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN109190458A (en) * 2018-07-20 2019-01-11 华南理工大学 A kind of person of low position's head inspecting method based on deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2341231A (en) * 1998-09-05 2000-03-08 Sharp Kk Face detection in an image
CN106557778B (en) * 2016-06-17 2020-02-07 北京市商汤科技开发有限公司 General object detection method and device, data processing device and terminal equipment
CN108960266B (en) * 2017-05-22 2022-02-08 阿里巴巴集团控股有限公司 Image target detection method and device
CN109948497B (en) * 2019-03-12 2022-01-28 北京旷视科技有限公司 Object detection method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103597514A (en) * 2011-06-10 2014-02-19 松下电器产业株式会社 Object detection frame display device and object detection frame display method
US20160371833A1 (en) * 2015-06-17 2016-12-22 Xerox Corporation Determining a respiratory pattern from a video of a subject
CN106529527A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Object detection method and device, data processing deice, and electronic equipment
US20180089505A1 (en) * 2016-09-23 2018-03-29 Samsung Electronics Co., Ltd. System and method for deep network fusion for fast and robust object detection
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN109190458A (en) * 2018-07-20 2019-01-11 华南理工大学 A kind of person of low position's head inspecting method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GEORGE PAPANDREOU ET AL.: "Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 *
HEI LAW ET AL.: "CornerNet: Detecting Objects as Paired Keypoints", 《ECCV 2018: COMPUTER VISION》 *
覃剑 等: "基于区域复合概率的行人候选框生成", 《电子学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181872A1 (en) * 2019-03-12 2020-09-17 北京旷视科技有限公司 Object detection method and apparatus, and electronic device
CN110532897A (en) * 2019-08-07 2019-12-03 北京科技大学 The method and apparatus of components image recognition
CN110827261A (en) * 2019-11-05 2020-02-21 泰康保险集团股份有限公司 Image quality detection method and device, storage medium and electronic equipment
CN111178128A (en) * 2019-11-22 2020-05-19 北京迈格威科技有限公司 Image recognition method and device, computer equipment and storage medium
CN111178128B (en) * 2019-11-22 2024-03-19 北京迈格威科技有限公司 Image recognition method, device, computer equipment and storage medium
CN111582177A (en) * 2020-05-09 2020-08-25 北京爱笔科技有限公司 Image detection method and related device
WO2022095854A1 (en) * 2020-11-04 2022-05-12 深圳Tcl新技术有限公司 Image recognition method, apparatus, and device, and computer-readable storage medium
CN113761245A (en) * 2021-05-11 2021-12-07 腾讯科技(深圳)有限公司 Image recognition method and device, electronic equipment and computer readable storage medium
CN113761245B (en) * 2021-05-11 2023-10-13 腾讯科技(深圳)有限公司 Image recognition method, device, electronic equipment and computer readable storage medium
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117237697B (en) * 2023-08-01 2024-05-17 北京邮电大学 Small sample image detection method, system, medium and equipment
CN117372919A (en) * 2023-09-22 2024-01-09 北京市燃气集团有限责任公司 Third party construction threat detection method and device

Also Published As

Publication number Publication date
WO2020181872A1 (en) 2020-09-17
CN109948497B (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN109948497A (en) A kind of object detecting method, device and electronic equipment
Hinami et al. Joint detection and recounting of abnormal events by learning deep generic knowledge
Hariharan et al. Object instance segmentation and fine-grained localization using hypercolumns
Tran et al. Video event detection: From subvolume localization to spatiotemporal path search
CN108520229A (en) Image detecting method, device, electronic equipment and computer-readable medium
CN109727264A (en) Image generating method, the training method of neural network, device and electronic equipment
CN109145766B (en) Model training method and device, recognition method, electronic device and storage medium
CN108416250A (en) Demographic method and device
CN109447169A (en) The training method of image processing method and its model, device and electronic system
CN110147743A (en) Real-time online pedestrian analysis and number system and method under a kind of complex scene
CN109657533A (en) Pedestrian recognition methods and Related product again
US20100278391A1 (en) Apparatus for behavior analysis and method thereof
CN109978918A (en) A kind of trajectory track method, apparatus and storage medium
KR101930940B1 (en) Apparatus and method for analyzing image
Pehlivan et al. A new pose-based representation for recognizing actions from multiple cameras
CN105096300B (en) Method for checking object and equipment
Chen et al. TriViews: A general framework to use 3D depth data effectively for action recognition
CN108960192A (en) Action identification method and its neural network generation method, device and electronic equipment
CN110263712A (en) A kind of coarse-fine pedestrian detection method based on region candidate
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN110084175A (en) A kind of object detection method, object detecting device and electronic equipment
CN109670517A (en) Object detection method, device, electronic equipment and target detection model
CN113673607A (en) Method and device for training image annotation model and image annotation
CN112270381A (en) People flow detection method based on deep learning
CN109522970A (en) Image classification method, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant