CN110175975A

CN110175975A - Method for checking object, device, computer readable storage medium and computer equipment

Info

Publication number: CN110175975A
Application number: CN201811536563.7A
Authority: CN
Inventors: 李峰; 邱日明; 赵世杰; 易阳; 左小祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2019-08-27

Abstract

This application involves a kind of method for checking object, device, computer readable storage medium and computer equipments, this method comprises: obtaining image to be detected；Image to be detected is input to target object detection model, target object detection model extracts collection of network by fisrt feature and obtains Obj State characteristic information to image to be detected progress feature extraction, and fisrt feature extracts collection of network and includes parallel Obj State feature extraction sub-network and merge sub-network with the first of each parallel Obj State feature extraction sub-network connection；Obj State characteristic information is input to second feature and extracts collection of network by target object detection model, the corresponding position feature information of target detection object in image to be detected is obtained, it includes that parallel position feature extracts sub-network and merges sub-network with the second of each parallel position feature extraction sub-network connection that second feature, which extracts collection of network,；According to the corresponding band of position of position feature information output target test object.

Description

Method for checking object, device, computer readable storage medium and computer equipment

Technical field

This application involves field of computer technology, more particularly to a kind of method for checking object, device, computer-readable deposit Storage media and computer equipment.

Background technique

With the rapid development of computer technology, detect to obtain the target pair in image usually using object detection model As the region at place, for example, carrying out the specific location etc. where the gesture in detection image using object detection model.However, existing Have in technology, for target object lesser in image, such as the Small object object in the higher image of resolution ratio, use is general Object detection model is more difficult to be detected, therefore be easy to cause the accuracy rate of object detection model lower.

Summary of the invention

Based on this, it is necessary to which in view of the above technical problems, target object accuracy in detection and mesh can be guaranteed by providing one kind Mark method for checking object, device, computer readable storage medium and the computer equipment of the balance between calculation and object complexity.

A kind of method for checking object, comprising:

Obtain image to be detected；

Image to be detected is input to target object detection model, target object detection model extracts net by fisrt feature Network set carries out feature extraction to image to be detected and obtains Obj State characteristic information, and it includes simultaneously that fisrt feature, which extracts collection of network, It capable Obj State feature extraction sub-network and is merged with the first of each parallel Obj State feature extraction sub-network connection Sub-network；

Obj State characteristic information is input to second feature and extracts collection of network by target object detection model, is obtained to be checked The corresponding position feature information of target detection object in altimetric image, it includes that parallel position is special that second feature, which extracts collection of network, Sign, which extracts sub-network and extracts the second of sub-network connection with each parallel position feature, merges sub-network；

According to the corresponding band of position of position feature information output target test object.

A kind of object test equipment, the device include:

Image to be detected obtains module, for obtaining image to be detected；

Target object detection model detection module, for image to be detected to be input to target object detection model, target Object detection model extracts collection of network by fisrt feature and obtains Obj State feature to image to be detected progress feature extraction Information, fisrt feature extract collection of network include parallel Obj State feature extraction sub-network and with each parallel object shape First fusion sub-network of state feature extraction sub-network connection；

Target object detection model detection module is also used to target object detection model and inputs Obj State characteristic information Collection of network is extracted to second feature, obtains the corresponding position feature information of target detection object in image to be detected, second Feature extraction collection of network includes that parallel position feature extracts sub-network and extracts sub-network with each parallel position feature Second fusion sub-network of connection；

Band of position output module, for according to the corresponding band of position of position feature information output target test object.

A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor perform the steps of when executing described program

Obtain image to be detected；

A kind of computer readable storage medium is stored thereon with computer program, when computer program is executed by processor, So that processor executes following steps:

Obtain image to be detected；

Above-mentioned method for checking object, device, computer readable storage medium and computer equipment, image to be detected is inputted Into trained target object detection model, target object detection model extracts collection of network by fisrt feature and obtains pair As state characteristic information, Obj State characteristic information is input to second feature and extracts collection of network by target object detection model In, second feature extracts the target detection pair first determined according to Obj State characteristic information in image to be detected in collection of network As, then extract to obtain the corresponding position feature information of target detection object, finally detected according to position feature information output target The corresponding band of position of object.Wherein, it includes parallel Obj State feature extraction sub-network that fisrt feature, which extracts collection of network, First connected with each parallel Obj State feature extraction sub-network merges sub-network, and second feature extracts collection of network Sub-network is extracted including parallel position feature and extracts the second fusant that sub-network is connect with each parallel position feature Network.By the network structure of optimization aim object detection model, so that target object detection model only calculates image to be detected In characteristic information relevant to target detection object, reduce target object detection model calculation amount, improve target object inspection It ensure that the accuracy in detection of target object detection model while surveying model inspection efficiency.

Detailed description of the invention

Fig. 1 is the applied environment figure of method for checking object in one embodiment；

Fig. 2 is the flow diagram of method for checking object in one embodiment；

Fig. 3 is that the process that fisrt feature extracts collection of network extraction Obj State characteristic information step in one embodiment is shown It is intended to；

Fig. 3 A is the structural schematic diagram that fisrt feature extracts collection of network in one embodiment；

Fig. 4 is that second feature extracts the process signal that collection of network extracts position feature information Step in one embodiment Figure；

Fig. 4 A is the structural schematic diagram that second feature extracts collection of network in one embodiment；

Fig. 5 is that the band of position of target detection object in one embodiment determines the flow diagram of step；

Fig. 5 A is the interface schematic diagram of the testing result of method for checking object in one embodiment；

Fig. 6 is the flow diagram of the training step of target object detection model in one embodiment；

Fig. 7 is the flow diagram of the training step of target object detection model in another embodiment；

Fig. 8 is the flow diagram of the obtaining step of training image collection in one embodiment；

Fig. 9 is the structural block diagram of object test equipment in one embodiment；

Figure 10 is the structural block diagram of target object detection model detection module in one embodiment；

Figure 11 is the structural block diagram of target object detection model detection module in another embodiment；

Figure 12 is the structural block diagram of band of position output module in one embodiment；

Figure 13 is the structural block diagram of object test equipment in another embodiment；

Figure 14 is the structural block diagram of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

Fig. 1 is the applied environment figure of method for checking object in one embodiment.Referring to Fig.1, the method for checking object application In object detection systems.The object detection systems include terminal 110 and server 120.Terminal 110 and server 120 pass through net Network connection.Terminal 110 specifically can be terminal console or mobile terminal, and mobile terminal specifically can be with mobile phone, tablet computer, notes At least one of this computer etc..Server 120 can use the server of the either multiple server compositions of independent server Cluster is realized.

Specifically, terminal 110 gets image to be detected, and image to be detected is sent to server 120.Server 120 Image to be detected is input to target object detection model, target object detection model extracts collection of network pair by fisrt feature Image to be detected carries out feature extraction and obtains Obj State characteristic information, and it includes parallel object that fisrt feature, which extracts collection of network, State feature extraction sub-network and sub-network, mesh are merged with the first of each parallel Obj State feature extraction sub-network connection Obj State characteristic information is input to second feature and extracts collection of network by mark object detection model, is obtained in image to be detected The corresponding position feature information of target detection object, it includes that parallel position feature extracts subnet that second feature, which extracts collection of network, Network and the second of sub-network connection is extracted with each parallel position feature merge sub-network, according to position feature information output mesh Mark the corresponding band of position of test object.Finally, the band of position can be sent to terminal 110 by server 120, terminal 110 can be shown Show the band of position in image to be detected where target detection object.

As shown in Fig. 2, in one embodiment, providing a kind of method for checking object.The present embodiment is mainly in this way Applied in above-mentioned Fig. 1 terminal 110 or server 120 illustrate.Referring to Fig. 2, which is specifically included Following steps:

Step 202, image to be detected is obtained.

Wherein, image to be detected includes but is not limited to picture, photo, film etc..It can be and shot by the camera of terminal Photo, the picture that is obtained by terminal screenshotss or pass through can upload image that the application program of image uploads etc..If When executing subject is server 120, then these image to be detected can be sent to server 120 and carry out subsequent processing by terminal. Here image to be detected can be but not limited to the Small object object in detection image, such as image to be detected is high score The image of resolution includes many Small object objects in high-definition picture, due to small in high-resolution image to be detected Target object can compare unobtrusively in high-resolution image, it is therefore desirable to by target object detection model by high-resolution Image to be detected in Small object object where the band of position detected.

Step 204, image to be detected is input to target object detection model, target object detection model is special by first Sign extracts collection of network and obtains Obj State characteristic information to image to be detected progress feature extraction, and fisrt feature extracts network collection Conjunction includes parallel Obj State feature extraction sub-network and connect with each parallel Obj State feature extraction sub-network First fusion sub-network.

Wherein, target object detection model is the band of position where detecting the target object in image to be detected Network model, target object detection model be it is trained, can directly bring detection.Target object detection model includes But be not limited to fisrt feature and extract collection of network and second feature to extract collection of network, fisrt feature extract collection of network be for Extract Obj State characteristic information, and it is for extracting object's position characteristic information that second feature, which extracts collection of network, One feature extraction collection of network and second feature are extracted collection of network and connects, and the output of fisrt feature extraction collection of network is as the The input of two feature extraction collection of network.Wherein, Obj State characteristic information here is related with target detection Obj State Characteristic information, can determine target detection object according to Obj State characteristic information, Obj State characteristic information can be color, The information such as profile size.

Wherein, it includes but is not limited to parallel object shape that the fisrt feature in target object detection model, which extracts collection of network, State feature extraction sub-network and the first fusion sub-network.Wherein, parallel each object state feature extraction sub-network is corresponding Feature extraction mode is different, the convolution kernel between each object state feature extraction sub-network is of different sizes, each object state The step-length of feature extraction sub-network is mutually same.

Wherein, the parallel mode of each object state feature extraction sub-network can customize, customized to can be first pair As state feature extraction sub-network is separately connected the second Obj State feature extraction sub-network and third Obj State feature extraction Sub-network, i.e. the second Obj State feature extraction sub-network and third Obj State feature extraction sub-network are connected in parallel, and first The output of Obj State feature extraction sub-network is as the second Obj State feature extraction sub-network and third Obj State feature Extract the input of sub-network.Wherein, the quantity of Obj State feature extraction sub-network can be configured according to actual needs, and first It is best that the quantity of the Obj State feature extraction sub-network of feature extraction collection of network is set as 3 effects.

Wherein, fisrt feature extracts Obj State feature extraction sub-network parallel in collection of network and merges subnet with first The connection type of network can customize, customized to can be using the output of parallel Obj State feature extraction sub-network as first The input of sub-network is merged, the first fusion sub-network spells the output of all parallel Obj State feature extraction sub-networks It connects, obtains Obj State characteristic information.For example, Obj State feature extraction sub-network includes the first Obj State feature extraction Network, the second Obj State feature extraction sub-network and third Obj State feature extraction sub-network, the first Obj State feature The output of sub-network is extracted as the second Obj State feature extraction sub-network and third Obj State feature extraction sub-network Input, the output of the second Obj State feature extraction sub-network and third Obj State feature extraction sub-network is as the first fusion The input of sub-network.

Wherein, it all includes down-sampling mistake that fisrt feature, which extracts Obj State feature extraction sub-network parallel in collection of network, Journey.The down-sampling that fisrt feature is extracted in collection of network is reduced to image, so that image meets the size of display area With make the complexity of image smaller.

Step 206, Obj State characteristic information is input to second feature and extracts collection of network by target object detection model, The corresponding position feature information of target detection object in image to be detected is obtained, it includes parallel that second feature, which extracts collection of network, Position feature extract sub-network and extract the second of sub-network connection with each parallel position feature and merge sub-network.

Wherein, it is for extracting object's position characteristic information, object's position that second feature here, which extracts collection of network, Characteristic information is characteristic information related with target detection object's position, can determine target detection according to object's position characteristic information Region where object, object's position characteristic information can be area size, target detection object center point position etc..

Wherein, since target object detection model includes but is not limited to that fisrt feature extraction collection of network and second feature mention Collection of network is taken, fisrt feature is extracted collection of network and connected with second feature extraction collection of network, and fisrt feature extracts network collection The input of collection of network is extracted in the output of conjunction as second feature.Therefore, fisrt feature is extracted to the object of collection of network output State characteristic information is input to second feature and extracts in collection of network, and second feature extracts collection of network can be according to Obj State spy Reference ceases the target detection object determined in image to be detected, obtains corresponding position feature further according to target detection object extraction Information.For example, fisrt feature extract collection of network output location status characteristic information be hand profile size and hand color, Can by fisrt feature extract collection of network output location status characteristic information be input to second feature extract collection of network, second Feature extraction collection of network first determines that target detection object is hand according to the profile size of hand and the color of hand, then extracts and palmistry The position feature information of pass, the band of position etc. where center position, hand as where hand.

Wherein, second feature extract collection of network include but is not limited to parallel position feature extract sub-network and with it is each Parallel position feature extracts the second fusion sub-network of sub-network connection.Wherein, parallel each position feature extracts subnet The corresponding feature extraction mode of network is different, each position feature extracts that the convolution kernel between sub-network is of different sizes, each position The step-length of feature extraction sub-network is mutually same.

Wherein, the parallel mode that each position feature extracts sub-network can customize, customized to can be first position spy Sign extracts sub-network and is separately connected second position feature extraction sub-network, the third place feature extraction sub-network and the 4th position spy Sign extracts sub-network, i.e. second position feature extraction sub-network, the third place feature extraction sub-network and the 4th position feature mentions Sub-network is taken to be connected in parallel, the output of first position feature extraction sub-network is as second position feature extraction sub-network, third Position feature extracts sub-network and the 4th position feature extracts the input of sub-network.Wherein, position feature extracts the number of sub-network Amount can be configured according to actual needs, and the quantity that the position feature that second feature extracts collection of network extracts sub-network is set as 4 effects are best.

Wherein, second feature extracts position feature extraction sub-network parallel in collection of network and merges sub-network with second Connection type can customize, and parallel position feature is extracted the output of sub-network as the second fusion subnet by customized can be The input of network, the second fusion sub-network splice the output that all parallel position features extract sub-network, obtain position Characteristic information.For example, it includes first position feature extraction sub-network, second position feature extraction that position feature, which extracts sub-network, Network, the third place feature extraction sub-network and the 4th position feature extract sub-network, first position feature extraction sub-network Output extracts sub-network as second position feature extraction sub-network, the third place feature extraction sub-network and the 4th position feature Input, second position feature extraction sub-network, the third place feature extraction sub-network and the 4th position feature extract sub-network Output as second merge sub-network input.

Wherein, it all includes down-sampling process that second feature, which extracts position feature parallel in collection of network and extracts sub-network,. Second feature, which extracts the down-sampling in collection of network, to be reduced to image, so that image meets the size of display area and makes The complexity for obtaining image is smaller.

Step 208, according to the corresponding band of position of position feature information output target test object.

Wherein, the band of position refers to the specific region where the target detection object in image to be detected.Specifically, in mesh After marking the corresponding position feature information of object detection model output target detection object, since position feature information is examined with target The relevant location information of object is surveyed, therefore can determine the specific location area where target detection object according to position feature information Domain finally exports the corresponding band of position of target detection object determined again.Wherein, target is determined according to position feature information The mode of the corresponding band of position of test object can customize, customized to can be the center first determined where target detection object Point position, determines corresponding band of position range according to the profile size of target detection object, finally according to center position and The band of position of band of position range drafting target detection object.

Image to be detected is input in trained target object detection model, target by above-mentioned method for checking object Object detection model extracts collection of network by fisrt feature and obtains Obj State characteristic information, and target object detection model will be right It is extracted in collection of network as state characteristic information is input to second feature, second feature is extracted in collection of network first according to object shape State characteristic information determines the target detection object in image to be detected, then extracts to obtain the corresponding position feature of target detection object Information, finally according to the corresponding band of position of position feature information output target test object.Wherein, fisrt feature extracts network Set includes parallel Obj State feature extraction sub-network and connect with each parallel Obj State feature extraction sub-network The first fusion sub-network, second feature extract collection of network include parallel position feature extract sub-network and with it is each parallel Position feature extract sub-network connection second fusion sub-network.By the network structure of optimization aim object detection model, So that target object detection model only calculates characteristic information relevant to target detection object in image to be detected, target pair is reduced As the calculation amount of detection model, target object detection model ensure that while improving target object detection model detection efficiency Accuracy in detection.

In one embodiment, as shown in figure 3, target object detection model is treated by fisrt feature extraction collection of network Detection image carries out feature extraction and obtains Obj State characteristic information, comprising:

Step 302, image to be detected is inputted into the first shared convolution sub-network, extraction obtains the first shared characteristics of image.

Wherein, as shown in Figure 3A, Fig. 3 A shows the structural schematic diagram that fisrt feature in one embodiment extracts collection of network, It includes the first shared convolution sub-network, the first convolution sub-network, the first pond that fisrt feature shown in Fig. 3 A, which extracts collection of network, Sub-network and the first fusion sub-network, wherein the first shared convolution sub-network extracts the input of collection of network as fisrt feature, Input of the output of first shared convolution sub-network as the first convolution sub-network and first pond beggar's network, i.e. the first convolution Network and first pond beggar's network are to connect parallel with the first shared convolution sub-network.First convolution sub-network and the first pond Input of the output of sub-network as the first fusion sub-network, the first fusion sub-network are that fisrt feature extracts the defeated of collection of network Out, for exporting Obj State characteristic information.Wherein, the first shared convolution sub-network, the first convolution sub-network, the first pond beggar Network all includes down-sampling process.Wherein the first shared convolution sub-network, the first convolution sub-network, first pond beggar's network are corresponding Feature extraction mode it is different, convolution kernel is of different sizes, step-length is mutually same.Wherein, the first shared convolution sub-network, the first convolution Sub-network, the convolution kernel size of first pond beggar's network and step-length can be all configured according to actual needs, by the first shared volume Product sub-network, the first convolution sub-network, the convolution kernel size of first pond beggar's network are arranged respectively to 3*3,3*3,2*2, step-length 2 are both configured to, the effect for extracting Obj State characteristic information is best.

Wherein, the first shared convolution sub-network extracts the input network of collection of network as fisrt feature, by mapping to be checked As being input to the first shared convolution sub-network, the first shared convolution sub-network carries out convolution feature extraction to image to be detected, obtains To the first shared characteristics of image.Here the shared convolution sub-network of first is for extracting the first shared figure in image to be detected As feature, due to other than target detection object, there is also other unconnected objects, then needing to first pass through first in image to be detected Shared convolution sub-network carries out initial convolution feature extraction to image to be detected, to obtain the first shared characteristics of image.Here The first shared characteristics of image be the first convolution sub-network and first pond beggar's network input picture, be to the first convolution subnet The characteristics of image of network and first pond beggar's network share.

Step 304, the first shared characteristics of image is input to parallel the first convolution sub-network and first pond beggar's network, First shared convolution sub-network, the first convolution sub-network and first pond beggar's network include down-sampling process.

Step 306, the first convolution sub-network exports image convolution feature, and first pond beggar's network exports image detail letter Breath.

Wherein, as shown in Figure 3A, image to be detected is input to the first shared convolution sub-network, the first shared convolution subnet Network carries out initial convolution feature extraction to image to be detected, obtains the first shared characteristics of image.First shared characteristics of image is defeated Enter to the first parallel convolution sub-network and first pond beggar's network, the feature of the first convolution sub-network and first pond beggar's network Extracting mode is different, convolution kernel is of different sizes, step-length is identical, and parallel the first convolution sub-network and first pond beggar's network is arranged It is to be able to extract the state characteristic information of more different aspects.Wherein, in order to reduce the complexity of image, and it is more preferable The Obj State characteristic information extracted in image, need to be in the first shared convolution sub-network, the first convolution sub-network and first It all include down-sampling process in the beggar's network of pond.

Specifically, the first shared characteristics of image by the first shared convolution sub-network output is input to the first parallel convolution After sub-network and first pond beggar's network, the first convolution sub-network carries out corresponding convolution feature to the first shared characteristics of image and mentions It takes, exports corresponding image convolution feature.Here image convolution is characterized in that the first convolution sub-network is mentioned by corresponding feature Mode is taken to carry out what convolution was extracted to the first shared characteristics of image.Similarly, first pond beggar's network is to the first shared figure As the corresponding feature extraction of feature progress, corresponding image detail information is exported.Here image detail information is the first pond Sub-network carries out what convolution was extracted to the first shared characteristics of image by corresponding feature extraction mode.Wherein, the first volume Product sub-network and first pond beggar's network have corresponding feature extraction mode, can extract from the first shared characteristics of image Different features.

Step 308, image convolution feature and image detail information merging features are carried out by the first fusion sub-network to obtain To Obj State characteristic information.

Wherein, as shown in Figure 3A, the first convolution sub-network and first pond beggar's network export image convolution spy simultaneously respectively It seeks peace after image detail information, image convolution feature and image detail information is input to the first fusion sub-network, the first fusion Image convolution feature and image detail information are carried out merging features by sub-network, to obtain Obj State characteristic information.Wherein, The mode that first fusion sub-network carries out merging features can customize, customized to can be image convolution feature and image detail Information is set out to form Obj State characteristic information etc. with user-defined format.It include image volume i.e. in Obj State characteristic information Product feature and image detail information.Wherein, the first fusion sub-network carries out that merging features mode is customized can also be image Convolution feature and corresponding image detail information are merged, and Obj State characteristic information is obtained.

In one embodiment, as shown in figure 4, Obj State characteristic information is input to second by target object detection model Feature extraction collection of network obtains the corresponding position feature information of target detection object in image to be detected, comprising:

Step 402, Obj State characteristic information is inputted into the second shared convolution sub-network, extraction obtains the second shared image Feature.

Wherein, as shown in Figure 4 A, Fig. 4 A shows the structural schematic diagram that second feature in one embodiment extracts collection of network, It includes the second shared convolution sub-network, the second convolution sub-network, third convolution that second feature shown in Fig. 4 A, which extracts collection of network, Sub-network, second pond beggar's network and the second fusion sub-network.Wherein, the second shared convolution sub-network is extracted as second feature The input of collection of network, the output of the second shared convolution sub-network is as the second convolution sub-network, third convolution sub-network, second The input of pond beggar's network, i.e. the second convolution sub-network, third convolution sub-network, second pond beggar's network are total with second parallel Enjoy the connection of convolution sub-network.The output of second convolution sub-network, third convolution sub-network, second pond beggar's network is as second The input of sub-network is merged, the second fusion sub-network is the output that second feature extracts collection of network, is used to output position feature Information.Wherein, the second shared convolution sub-network, the second convolution sub-network, third convolution sub-network, second pond beggar's network be all Including down-sampling process.Wherein the second shared convolution sub-network, the second convolution sub-network, third convolution sub-network, the second pond The corresponding feature extraction mode of sub-network is different, convolution kernel is of different sizes, step-length is mutually same.Wherein, the second shared convolution subnet Network, the second convolution sub-network, third convolution sub-network, the convolution kernel size of second pond beggar's network and step-length all can be according to reality Demand is configured, by the second shared convolution sub-network, the second convolution sub-network, third convolution sub-network, second pond beggar's net Network is arranged respectively to 1*1,3*3,5*5,2*2, and step-length is arranged to 1, and the effect for extracting position feature information is best.

Specifically, it extracts collection of network due to fisrt feature to connect with second feature extraction collection of network, fisrt feature mentions The output of collection of network is taken to extract the input of collection of network as second feature, and the second shared convolution sub-network is as the second spy Sign extracts the input network of collection of network, therefore fisrt feature is extracted the Obj State characteristic information that collection of network exports and is input to In second shared convolution sub-network.Second shared convolution sub-network is according to corresponding feature extraction mode, convolution kernel size to right As the progress convolution feature extraction of state characteristic information, the second shared characteristics of image is obtained.Similarly, the second shared characteristics of image is Second convolution sub-network, third convolution sub-network, second pond beggar's network input picture, be to the second convolution sub-network, The characteristics of image of three convolution sub-networks, second pond beggar's network share.

Step 404, by the second shared characteristics of image be input to the second parallel convolution sub-network, third convolution sub-network and The feature extraction mesh size of second pond beggar's network, the second convolution sub-network and third convolution sub-network is different, and second is shared Convolution sub-network, the second convolution sub-network, third convolution sub-network and second pond beggar's network include down-sampling process.

Step 406, the second convolution sub-network exports first position convolution feature, and third convolution sub-network exports the second position Convolution feature, second pond beggar's network output position detailed information.

Wherein, as shown in Figure 4 A, Obj State characteristic information is input to the second shared convolution sub-network, the second shared volume Product sub-network carries out shared convolution feature extraction to Obj State characteristic information, obtains the second shared characteristics of image.Second is total to It enjoys characteristics of image and is input to the second parallel convolution sub-network, third convolution sub-network and second pond beggar's network, the second convolution Sub-network, third convolution sub-network are different with the feature extraction mesh size of second pond beggar's network, wherein feature extraction network Scale difference can be that feature extraction mode is different, convolution kernel is of different sizes, step-length is mutually same.Parallel the second convolution is set Network, third convolution sub-network and second pond beggar's network are to be able to extract the location information of more different aspects.Wherein, It, need to be in the second shared convolution in order to reduce the complexity of image, and the better position feature information extracted in image It all include down-sampling process in network, the second convolution sub-network, third convolution sub-network and second pond beggar's network.

Specifically, by the second shared characteristics of image be input to the second parallel convolution sub-network, third convolution sub-network and After second pond beggar's network, the second convolution sub-network carries out corresponding convolution feature extraction, output to the second shared characteristics of image Corresponding first position convolution feature.Here first position convolution is characterized in that the second convolution sub-network is mentioned according to corresponding feature Mode and convolution kernel size is taken to carry out what convolution was extracted to the second shared characteristics of image.Similarly, third convolution sub-network Corresponding convolution feature extraction is carried out to the second shared characteristics of image, exports corresponding second position convolution feature.Here Two position convolution are characterized in third convolution sub-network according to corresponding feature extraction mode and convolution sum size to the second shared figure It extracts to obtain as feature carries out convolution.And second pond beggar's network carries out corresponding feature to the second shared characteristics of image and mentions It takes, exports corresponding position details information.Here position details information be in the second shared characteristics of image with position details phase The information of pass carries out convolutional calculation according to corresponding feature extraction mode and convolution kernel size especially by second pond beggar's network It obtains.

Step 408, first position convolution feature, second position convolution feature and position details information are passed through into the second fusion Sub-network carries out merging features and obtains position feature information.

Wherein, as shown in Figure 4 A, the second convolution sub-network, third convolution sub-network and second pond beggar's network are distinguished simultaneously After exporting first position convolution feature, second position convolution feature and position details information, by first position convolution feature, second To the second fusion sub-network, the second fusion sub-network is special by first position convolution for position convolution feature and position details information input Sign, second position convolution feature and position details information carry out merging features, to obtain position feature information.Similarly, The mode that two fusion sub-networks carry out merging features can customize, customized to can be first position convolution feature, second It sets convolution feature and position details information is set out to form position feature information etc. with user-defined format.That is position characteristic information In include first position convolution feature, second position convolution feature and position details information.Wherein, the second fusion sub-network carries out Merging features mode is customized to be can also be according to first position convolution feature, second position convolution feature and position details information Between corresponding relationship merged, obtain position feature information.

In one embodiment, as shown in figure 5, according to the corresponding position area of position feature information output target test object Domain, comprising:

Step 502, the profile of target detection object in image to be detected is determined according to position feature information.

Wherein, since position feature information is that second feature in target object detection model extracts collection of network to object State characteristic information carries out what feature extraction obtained, therefore includes but is not limited to Obj State feature letter in position feature information Breath.Specifically, the target detection pair in image to be detected can be determined according to the Obj State characteristic information in position feature information The profile of elephant.Such as, Obj State characteristic information is the color of hand and the profile of hand, and second feature extracts collection of network output Position feature information includes the color of hand and the profile of hand, can determine the target in image to be detected according to the profile and color of hand Test object is hand.

Step 504, the regional scope of target detection object is determined according to the profile of target detection object.

Wherein, since the profile of different target detection objects is of different sizes, the corresponding region of target detection object Range will be different, and the regional scope of target detection object, i.e. target detection pair can be determined according to the profile of target detection object It must include target detection object in the regional scope of elephant.Wherein, the determination principle of the regional scope of target detection object can be with With the profile of target detection object, the bigger principle of the regional scope of target detection object.

Step 506, the center position of target detection object is determined according to position feature information.

Wherein, center position here is the center where target detection object.Wherein, target detection object Center position can be the center in the region where target detection object.Specifically, since position feature information is and target The relevant information in test object position, the position where center position, target detection object as where target detection object Region etc., therefore the center position of target detection object can be determined according to position feature information.

Step 508, the band of position and center of target detection object are drawn according to regional scope and center position.

Specifically, after the center position and corresponding regional scope for determining target detection object, in order to by last energy Detection shows the band of position and center of target detection object in image to be detected, therefore need to be according to target detection pair The regional scope and center position of elephant draw the corresponding band of position and center.Wherein, the center of target detection object Position is located in the band of position of target detection object.For example, as shown in Figure 5A, Fig. 5 A shows object detection in one embodiment The interface schematic diagram of the testing result of method, the target detection object shown in Fig. 5 A is the hand of people, therefore, finally according to region model The band of position and center drawn where hand with center position are enclosed, the region that the solid box in Fig. 5 A comes out is exactly hand institute The band of position, the centre point in Fig. 5 A is exactly the center where hand.

In one embodiment, as shown in fig. 6, the training step of target object detection model includes:

Step 602, training image collection is obtained, the training image that training image is concentrated includes the corresponding mark of training objective object Infuse regional location.

Wherein, training image collection is the training data set for training initial object detection model, and training image is concentrated Including multiple training images, each training image includes training objective object, and each training objective object is labelled with pair The tab area position answered.Wherein, tab area position is the mark where marking the training objective object in training image Quasi- regional location, that is to say, that tab area position is the reference standard of the output result of initial detecting model.Tab area position The label for being equivalent to training image is set, it is subsequent to be used to detect initial object together with the output result of initial object detection model The foundation that the model parameter of model is adjusted.It should be noted that training image collection here be all by cleaning treatment, The cleaning treatment of specific training image collection can illustrate in subsequent embodiment, and this will not be repeated here.

Step 604, the training image that training image is concentrated is input to initial object detection model, initial object detects mould Type by fisrt feature extract collection of network to training image carry out feature extraction obtain train Obj State characteristic information, first Feature extraction collection of network includes parallel Obj State feature extraction sub-network and mentions with each parallel Obj State feature Take the first fusion sub-network that sub-network connects.

Wherein, initial object detection model here refers to that primary object detection model carried out the object detection of initialization Model can specifically illustrate the initialization procedure to primary object detection model, herein not in subsequent embodiment It repeats.Specifically, training image training image concentrated is as the input data of initial object detection model, initial object The fisrt feature of detection model extracts collection of network and obtains training Obj State characteristic information to training image progress feature extraction, It specifically can be, fisrt feature extracts at least two parallel Obj State feature extraction sub-networks in collection of network to training Image carries out convolution feature extraction, obtains corresponding Obj State convolution characteristic information, then will by the first fusion sub-network The Obj State convolution characteristic information of each parallel Obj State feature extraction sub-network output is spliced, to be instructed Practice the corresponding trained Obj State characteristic information of image.Wherein, training Obj State characteristic information refer in training image with instruction Practice the related status information of target object, can determine the training objective pair in training image according to training Obj State characteristic information As training Obj State characteristic information can be the color of training objective object, profile size of training objective object etc..

In one embodiment, the network structure that fisrt feature in initial object detection model extracts collection of network can be with It is the network structure that fisrt feature as shown in Figure 3A extracts collection of network.If the fisrt feature of initial object detection model is extracted Collection of network is that the fisrt feature shown in Fig. 3 A extracts collection of network, then it includes the first shared volume that fisrt feature, which extracts collection of network, Product sub-network, the first convolution sub-network, first pond beggar's network and the first fusion sub-network, wherein the first shared convolution sub-network Extract the input of collection of network as fisrt feature, the output of the first shared convolution sub-network is as the first convolution sub-network and the The input of one pond beggar's network, i.e. the first convolution sub-network and first pond beggar's network be parallel with the first shared convolution sub-network Connection.Input of the output of first convolution sub-network and first pond beggar's network as the first fusion sub-network, the first fusion Sub-network is the output that fisrt feature extracts collection of network, for exporting trained Obj State characteristic information.

Step 606, training Obj State characteristic information is input to second feature and extracts network by initial object detection model Set, obtains the corresponding trained position feature information of training objective object in training image, and second feature extracts collection of network Sub-network is extracted including parallel position feature and extracts the second fusant that sub-network is connect with each parallel position feature Network.

Wherein, the fisrt feature in initial object detection model extracts collection of network and second feature is extracted collection of network and connected It connects, the input of collection of network is extracted in the output that fisrt feature extracts collection of network as second feature.Therefore, initial object detects The training Obj State characteristic information that fisrt feature extracts collection of network output is input to second feature and extracts network collection by model It closes, second feature extracts collection of network and obtains training position feature letter to training Obj State characteristic information progress feature extraction Breath, specifically can be, and second feature extracts at least two parallel position features in collection of network and extracts sub-network to training Obj State characteristic information carries out convolution feature extraction, obtains corresponding trained object's position convolution characteristic information, then passes through Second fusion sub-network by each parallel position feature extract the training object's position convolution characteristic information of sub-network output into Row splicing, to obtain the corresponding trained position feature information of training objective object in training image.

In one embodiment, the network structure that second feature in initial object detection model extracts collection of network can be with It is the network structure that second feature as shown in Figure 4 A extracts collection of network.If the second feature of initial object detection model is extracted Collection of network is that second feature shown in Fig. 4 A extracts collection of network, then it includes the second shared volume that second feature, which extracts collection of network, Product sub-network, the second convolution sub-network, third convolution sub-network, second pond beggar's network and the second fusion sub-network.Wherein, Two shared convolution sub-networks extract the input of collection of network as second feature, and the output of the second shared convolution sub-network is as the The input of two convolution sub-networks, third convolution sub-network, second pond beggar's network, i.e. the second convolution sub-network, third convolution Network, second pond beggar's network are to connect parallel with the second shared convolution sub-network.Second convolution sub-network, third convolution Input of the output of network, second pond beggar's network as the second fusion sub-network, the second fusion sub-network is that second feature mentions The output of collection of network is taken, for exporting trained position feature information.

Step 608, according to the model of training position feature information and tab area position adjustment initial object detection model Parameter has been trained until the training position feature information output of initial object detection model output meets the condition of convergence Target object detection model.

Wherein, since the training image that training image is concentrated all includes the corresponding tab area position of training objective object, And the corresponding tab area position of training objective object of training image is the standard mark of training objective object, therefore will be initial The training position feature information of object detection model output and corresponding tab area position are compared, and detect mould to initial object The model parameter of type is adjusted, until the training position feature information output of initial object detection model output meets convergence item Part, the target object detection model trained.Wherein, the condition of convergence can customize, and customized can be is schemed according to training The loss function that the training position feature information of picture and corresponding tab area position are calculated reaches minimum value, then it is believed that Meet the condition of convergence.The wherein loss that the training position feature information of training image and corresponding tab area position are calculated Function is smaller, illustrates to train position feature information closer to tab area position, i.e. the output result of initial object detection model It is more accurate.

In one embodiment, as shown in fig. 7, the training step of the target object detection model shown in Fig. 6 further include:

Step 702, the corresponding labeled data in the corresponding tab area position of training objective object in training image is obtained.

Wherein, initial object detection model is the object detection model initialized, therefore in the present embodiment, Mainly initialization is carried out to object detection model to be explained in detail.Specifically, since training image includes training objective pair As corresponding tab area position, then the corresponding mark in the corresponding tab area position of training objective object in training image is obtained Data are infused, specifically the corresponding tab area position of training objective object in training image can be converted into corresponding customized lattice The coordinate data of formula, using coordinate data as labeled data.Wherein, the coordinate data of user-defined format can be (x, y, w, h), X represents abscissa, and y represents ordinate, and w represents width, and h represents height.Such as, the training image in a certain training image is corresponding It is (3,5,7,9) that tab area position, which is converted into corresponding labeled data, wherein 3 and 5 represent is abscissa and ordinate, 7 With 9 represent be width and height.

Step 704, the corresponding initialization value of object detection model is calculated according to labeled data.

Wherein, the corresponding mark number in the corresponding tab area position of the training objective object in training image is being acquired According to rear, the corresponding initialization value of object detection model can be calculated according to the corresponding labeled data of each training image.Wherein, The calculation of the corresponding initialization value of computing object detection model can customize, customized to can be each training image pair The labeled data answered, which is averaging, to be calculated, or be can be and carried out cluster calculation to the corresponding labeled data of each training image, or Person, which can be, is weighted averaging calculating etc. to the corresponding labeled data of each training image.

In one embodiment, the corresponding tab area position pair of the training objective object in training image is being acquired After the labeled data answered, the corresponding labeled data of each training image is averaging and is calculated, each training being calculated is schemed As the average value of corresponding labeled data is as the corresponding initialization value of object detection model.

Step 706, it is initialized according to model parameter of the initialization value to object detection model, obtains initial object inspection Survey model.

Specifically, after the corresponding initialization value of object detection model is calculated according to labeled data, according to initialization Value initializes the model parameter of object detection model, obtains initial object detection model.It is so, to the first of initialization The target object detection model that source object detection model is trained can carry out target position region according to initialization value Detection, specifically can find the detection in target position region, so that target position can be improved in a certain range of initialization value The detection efficiency in region.For example, initialization value is (5,18,35,12), then finally trained target object detection model is detecting When band of position range where target detection object, can preferentially it be looked into the range of this initialization value (5,18,35,12) It looks for, it is first from this because the possibility of range of the band of position range near the initialization value where target detection object is larger The detection time that can reduce target object detection model is begun looking in the range of beginning value (5,18,35,12), improves detection Efficiency.

In one embodiment, as shown in figure 8, obtaining training image collection, the training image that training image is concentrated includes instruction Practice the corresponding tab area position of target object, comprising:

Step 802, sample graph image set is obtained, the sample image that sample image is concentrated includes corresponding sample object region.

Specifically, sample graph image set is the set of collected sample image composition, includes sample object in sample image Corresponding sample object region, sample object region are exactly to come out the area marking where the sample object in sample image. The sample image for having marked sample object region is formed into sample graph image set.

Step 804, default screening rule is obtained, default screening rule includes the sample object and sample of different zones range The matching relationship of image procossing rule.

Wherein, default screening rule is for Screening Samples image set, and default screening rule can customize, customized tool Body can be determined according to business demand, or according to sample image concentrate sample object region be determined etc..Wherein Default screening rule includes the sample object of different zones range and the matching relationship of sample image processing rule.Wherein, sample Image procossing rule is the rule for handling the sample object of corresponding region range, the sample object and sample of different zones range Matching relationship between this image procossing rule is one-to-one.For example, being determined in default screening rule according to business demand Including the first estate sample image processing rule, the second grade sample image processing rule, tertiary gradient sample image processing rule It is then regular with the processing of fourth estate sample image, the corresponding sample object regional scope of the first estate sample image processing rule Are as follows: it is less than 80*80, the corresponding sample object regional scope of the second grade sample image processing rule are as follows: 80*80-150*150, The corresponding sample object regional scope of tertiary gradient sample image processing rule are as follows: 150*150-300*300, fourth estate sample The corresponding sample object regional scope of image procossing rule are as follows: 300*300 or more.

Step 806, according to the range and matching relationship in the sample object region in sample image, each sample image is obtained Corresponding target sample data processing rule carries out sample graph to corresponding sample image according to target sample data processing rule Training data is obtained as handling, training data includes the corresponding tab area position of training objective object.

Specifically, it after obtaining default screening rule, according to the range in the sample object region in sample image and presets Screening rule determines the corresponding target sample data processing rule of sample image, further according to target sample data processing rule to right The sample image answered carries out sample image processing, obtains training data.Wherein, sample image processing includes but is not limited to by sample Image is filtered, and is perhaps amplified sample image or the sample image such as is reduced.Wherein, by sample image Amplifying diminution is that the size of sample image zooms in or out according to presupposition multiple, and wherein sample image is carrying out When zooming in or out, the sample object region in sample image can also zoom in or out therewith.After zoom Sample object region is corresponding tab area position.

In one embodiment, default screening rule includes the first estate sample image processing rule, the second grade sample Image procossing rule, tertiary gradient sample image processing rule and fourth estate sample image processing rule, according to sample image In sample object region range and matching relationship, obtain the corresponding target sample data processing rule of each sample image, Sample image is carried out to corresponding sample image according to target sample data processing rule to handle to obtain training data, training data Including the corresponding tab area position of training objective object, comprising: when the sample object region in sample image is less than first etc. When grade sample image handles regular corresponding first area range, and/or when the sample object region in sample image is greater than the When the four grade sample images processing corresponding the fourth region range of rule, then sample image is filtered；When in sample image Sample object region when being located at the second grade sample image and handling regular corresponding second area range, obtain presupposition multiple, Part sample image is amplified according to presupposition multiple, obtains amplified sample image, by amplified sample image group At training image collection；When the sample object region in sample image is located at the corresponding third of tertiary gradient sample image processing rule When regional scope, then sample image is directly formed into training image collection.

It wherein, include that the processing of the first estate sample image is regular according to the default screening rule that business demand determines, second Grade sample image processing rule, tertiary gradient sample image processing rule and fourth estate sample image processing rule, wherein Each sample image processing rule has corresponding regional scope, therefore can be determined according to the sample object region in sample image Corresponding sample image processing rule.

In one embodiment, when the sample object region in sample image is less than the first estate sample image processing rule When corresponding first area range, illustrate that the sample object in sample image is undersized, and in practical application scene not It will appear or do not need to be concerned about this one small sample object, and the study of too small sample object needs to consume initial object inspection The model capacity and resource of model are surveyed, therefore corresponding sample image is filtered.Such as, if the first estate sample image is handled Sample image is is filtered by rule, and the corresponding first area range of the first estate sample image processing rule is 80*80, When the sample object region in sample image is less than 80*80, then sample image is filtered.

In one embodiment, when the sample object region in sample image is located at the second grade sample image processing rule When corresponding second area range, illustrate the regional scope of sample object within the scope of second area relatively normal subjects, And a possibility that second area range is fallen into practical application scene and little, but be available with and fall into the second area model The sample image enclosed, to enhance the diversity of sample image.Specifically, for falling into the sample graph within the scope of second area Picture can obtain at random preset quantity sample image from sample image, presupposition multiple be obtained, according to presupposition multiple to part sample This image amplifies, and training image is added in amplified sample image and is concentrated.Wherein, it is amplified to sample image When, the sample object region in sample image is amplified therewith, and amplified sample object region is the marked area in training image Domain position.Such as, second area may range from 80*80-150*150.

In one embodiment, when the sample object region in sample image is located at tertiary gradient sample image processing rule When corresponding third regional scope, illustrate that the sample image fallen into the third regional scope belongs to the overwhelming majority, and in reality A possibility that third regional scope is fallen into application scenarios is bigger than normal, therefore, falls into the sample image in the third regional scope It can be directly added to training image concentration.It, can be to the sample graph fallen into third regional scope before training image collection is added As carrying out simple image procossing, such as brightness, contrast, saturation degree adjustment.To the sample image size in the regional scope Any processing is not made.Such as, third regional scope can be 150*150-300*300.

In one embodiment, when the sample object region in sample image is greater than fourth estate sample image processing rule When corresponding the fourth region range, illustrate that the sample object in sample image is oversized, and is fallen into practical application scene A possibility that the fourth region range, is minimum, and the sample image fallen within the scope of the fourth region may be the sample object of mistake Mark or other reasons cause, therefore need to be filtered the corresponding sample image of the fourth region range.Such as, if 4th etc. Sample image is is filtered by grade sample image processing rule, and corresponding 4th area of fourth estate sample image processing rule Domain range is 300*300, when the sample object region in sample image is greater than 300*300, is then filtered sample image Fall.

In a specific embodiment, a kind of method for checking object is provided, specifically includes the following steps:

1, the corresponding labeled data in the corresponding tab area position of training objective object in training image is obtained.

2, the corresponding initialization value of object detection model is calculated according to labeled data.

3, it is initialized according to model parameter of the initialization value to object detection model, obtains initial object detection mould Type.

4, sample graph image set is obtained, the sample image that sample image is concentrated includes corresponding sample object region.

5, default screening rule is obtained, default screening rule includes at the sample object and sample image of different zones range Manage the matching relationship of rule.

6, according to the range and matching relationship in the sample object region in sample image, it is corresponding to obtain each sample image Target sample data processing rule carries out sample image processing to corresponding sample image according to target sample data processing rule Training data is obtained, training data includes the corresponding tab area position of training objective object.

7, the training image that training image is concentrated is input to initial object detection model, initial object detection model passes through Fisrt feature extracts collection of network and obtains training Obj State characteristic information to training image progress feature extraction, and fisrt feature mentions Take collection of network include parallel Obj State feature extraction sub-network and with each parallel Obj State feature extraction subnet First fusion sub-network of network connection.

8, training Obj State characteristic information is input to second feature and extracts collection of network by initial object detection model, is obtained The corresponding trained position feature information of training objective object into training image, it includes parallel that second feature, which extracts collection of network, Position feature extract sub-network and extract the second of sub-network connection with each parallel position feature and merge sub-network.

9, according to the model parameter of training position feature information and tab area position adjustment initial object detection model, directly Meet the condition of convergence to the training position feature information output that initial object detection model exports, the target object trained Detection model.

10, image to be detected is obtained.

11, image to be detected is input to target object detection model, target object detection model is mentioned by fisrt feature It takes collection of network to carry out feature extraction to image to be detected and obtains Obj State characteristic information, fisrt feature extracts collection of network packet First for including parallel Obj State feature extraction sub-network and being connect with each parallel Obj State feature extraction sub-network Merge sub-network.

11-1, image to be detected is inputted to the first shared convolution sub-network, extraction obtains the first shared characteristics of image.

11-2, the first shared characteristics of image is input to parallel the first convolution sub-network and first pond beggar's network, the One shared convolution sub-network, the first convolution sub-network and first pond beggar's network include down-sampling process.

11-3, the first convolution sub-network export image convolution feature, and first pond beggar's network exports image detail information.

11-4, it image convolution feature and image detail information is subjected to merging features by the first fusion sub-network obtains pair As state characteristic information.

12, Obj State characteristic information is input to second feature and extracts collection of network by target object detection model, is obtained The corresponding position feature information of target detection object in image to be detected, it includes parallel position that second feature, which extracts collection of network, It sets feature extraction sub-network and extracts the second of sub-network connection with each parallel position feature and merge sub-network.

12-1, Obj State characteristic information is inputted to the second shared convolution sub-network, it is special that extraction obtains the second shared image Sign.

12-2, the second shared characteristics of image is input to the second parallel convolution sub-network, third convolution sub-network and The feature extraction mesh size of two pond beggar's networks, the second convolution sub-network and third convolution sub-network is different, the second shared volume Product sub-network, the second convolution sub-network, third convolution sub-network and second pond beggar's network include down-sampling process.

12-3, the second convolution sub-network export first position convolution feature, and third convolution sub-network exports second position volume Product feature, second pond beggar's network output position detailed information.

12-4, first position convolution feature, second position convolution feature and position details information are passed through into the second fusant Network carries out merging features and obtains position feature information.

13, according to the corresponding band of position of position feature information output target test object.

13-1, the profile that target detection object in image to be detected is determined according to position feature information.

13-2, the regional scope that target detection object is determined according to the profile of target detection object.

13-3, the center position that target detection object is determined according to position feature information.

13-4, the band of position and center that target detection object is drawn according to regional scope and center position.

For Small object (being not less than 80*80) detection scene under high-resolution (1080p) scene, Small object is Image to be detected is specifically input to trained target object detection model by hand, and target object detection model passes through the One feature extraction collection of network to image to be detected carry out feature extraction, obtain Obj State characteristic information, as hand color and Obj State characteristic information is input to second feature and extracts collection of network by the profile of hand, target object detection model, and second is special Sign extracts collection of network and determines that target detection object is hand according to Obj State characteristic information, and then the part of opponent carries out spy again Sign is extracted, and position feature information, the band of position as where hand, the center position where hand are obtained.Finally, target object is examined It surveys model and the corresponding band of position of selling is drawn according to position feature information.Wherein, if having multiple hands, mesh in image to be detected Mark object detection model can finally draw out the band of position where each hand in image to be detected.

It should be understood that although each step in above-mentioned flow chart is successively shown according to the instruction of arrow, this A little steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, in above-mentioned flow chart at least A part of step may include that perhaps these sub-steps of multiple stages or stage are not necessarily in same a period of time to multiple sub-steps Quarter executes completion, but can execute at different times, the execution in these sub-steps or stage be sequentially also not necessarily according to Secondary progress, but in turn or can replace at least part of the sub-step or stage of other steps or other steps Ground executes.

In one embodiment, as shown in figure 9, providing a kind of object test equipment 900, which includes:

Image to be detected obtains module 902, for obtaining image to be detected.

Target object detection model detection module 904, for image to be detected to be input to target object detection model, mesh Mark object detection model extracts collection of network by fisrt feature and obtains Obj State spy to image to be detected progress feature extraction Reference breath, fisrt feature extract collection of network include parallel Obj State feature extraction sub-network and with each parallel object First fusion sub-network of state feature extraction sub-network connection.

Target object detection model detection module 904 is also used to target object detection model, and Obj State characteristic information is defeated Enter to second feature and extract collection of network, obtains the corresponding position feature information of target detection object in image to be detected, the Two feature extraction collection of network include that parallel position feature extracts sub-network and extracts subnet with each parallel position feature Second fusion sub-network of network connection.

Band of position output module 906, for according to the corresponding position area of position feature information output target test object Domain.

In one embodiment, as shown in Figure 10, target object detection model detection module 904 includes:

First shared convolution sub-network processing unit 904a, for image to be detected to be inputted the first shared convolution subnet Network, extraction obtain the first shared characteristics of image.

First shared characteristics of image processing unit 904b, for the first shared characteristics of image to be input to the parallel first volume Product sub-network and first pond beggar's network, the first shared convolution sub-network, the first convolution sub-network and first pond beggar's network packet Include down-sampling process.

Image output unit 904c exports image convolution feature for the first convolution sub-network, and first pond beggar's network is defeated Image detail information out.

Obj State characteristic information concatenation unit 904d, for image convolution feature and image detail information to be passed through first Fusion sub-network carries out merging features and obtains Obj State characteristic information.

In one embodiment, as shown in figure 11, target object detection model detection module 904 further include:

Second shared convolution sub-network processing unit 904e, for Obj State characteristic information to be inputted the second shared convolution Sub-network, extraction obtain the second shared characteristics of image.

Second shared characteristics of image processing unit 904f, for the second shared characteristics of image to be input to parallel volume Two Product sub-network, third convolution sub-network and second pond beggar's network, the feature of the second convolution sub-network and third convolution sub-network It is different to extract mesh size, the second shared convolution sub-network, the second convolution sub-network, third convolution sub-network and the second pond beggar Network includes down-sampling process.

Position details information output unit 904g exports first position convolution feature, third for the second convolution sub-network Convolution sub-network exports second position convolution feature, second pond beggar's network output position detailed information.

Position feature information concatenation unit 904h is used for first position convolution feature, second position convolution feature and position It sets detailed information and obtains position feature information by the second fusion sub-network progress merging features.

In one embodiment, as shown in figure 12, band of position output module 906 includes:

Outline specifying unit 906a, for determining the wheel of target detection object in image to be detected according to position feature information It is wide.

Regional scope determination unit 906b, for determining the region of target detection object according to the profile of target detection object Range.

Center position determination unit 906c, for determining the center point of target detection object according to position feature information It sets.

Band of position drawing unit 906d, for drawing the position of target detection object according to regional scope and center position Set region and center.

In one embodiment, as shown in figure 13, the object test equipment 900 further include:

Training image collection obtains module 1302, and for obtaining training image collection, the training image that training image is concentrated includes The corresponding tab area position of training objective object.

Initial object detection model processing module 1304, the training image for concentrating training image are input to initial right As detection model, initial object detection model extracts collection of network by fisrt feature and obtains to training image progress feature extraction Training Obj State characteristic information, fisrt feature extract collection of network include parallel Obj State feature extraction sub-network and with First fusion sub-network of each parallel Obj State feature extraction sub-network connection.

Initial object detection model processing module 1304 is also used to initial object detection model will training Obj State feature Information input to second feature extracts collection of network, obtains the corresponding trained position feature of training objective object in training image Information, it includes that parallel position feature extracts sub-network and mentions with each parallel position feature that second feature, which extracts collection of network, Take the second fusion sub-network that sub-network connects.

Initial object detection model training module 1306, for according to training position feature information and tab area position tune The model parameter of whole initial object detection model, until the training position feature information output of initial object detection model output is full The sufficient condition of convergence, the target object detection model trained.

In one embodiment, it is corresponding to be also used to obtain the training objective object in training image for object test equipment 900 The corresponding labeled data in tab area position；The corresponding initialization value of object detection model is calculated according to labeled data； It is initialized according to model parameter of the initialization value to object detection model, obtains initial object detection model.

In one embodiment, object test equipment 900 is also used to obtain sample graph image set, the sample that sample image is concentrated Image includes corresponding sample object region；Default screening rule is obtained, default screening rule includes the sample of different zones range The matching relationship of this object and sample image processing rule；It is closed according to the range in the sample object region in sample image and matching System, obtains the corresponding target sample data processing rule of each sample image, according to target sample data processing rule to correspondence Sample image carry out sample image handle to obtain training data, training data includes the corresponding tab area of training objective object Position.

In one embodiment, object test equipment 900 is also used to the sample object region in the sample image less than the When the one grade sample image processing corresponding first area range of rule, and/or when the sample object region in sample image is big When fourth estate sample image processing regular corresponding the fourth region range, then sample image is filtered；Work as sample graph When sample object region as in is located at the second grade sample image processing rule corresponding second area range, default times is obtained Number, amplifies part sample image according to presupposition multiple, obtains amplified sample image, by amplified sample image Form training image collection；When the sample object region in sample image is located at tertiary gradient sample image processing rule corresponding the When three regional scopes, then sample image is directly formed into training image collection.

Figure 14 shows the internal structure chart of computer equipment in one embodiment.The computer equipment specifically can be figure Terminal 110 or server 120 in 1.As shown in figure 14, it includes total by system which, which includes the computer equipment, Processor, memory, network interface, input unit and the display screen of line connection.Wherein, memory includes that non-volatile memories are situated between Matter and built-in storage.The non-volatile memory medium of the computer equipment is stored with operating system, can also be stored with computer journey Sequence when the computer program is executed by processor, may make processor to realize method for checking object.It can also be stored up in the built-in storage There is computer program, when which is executed by processor, processor may make to execute method for checking object.Computer The display screen of equipment can be liquid crystal display or electric ink display screen, and the input unit of computer equipment can be display The touch layer covered on screen is also possible to the key being arranged on computer equipment shell, trace ball or Trackpad, can also be outer Keyboard, Trackpad or mouse for connecing etc..Should be noted that if computer equipment is server 120, computer equipment is not Including display screen.

It will be understood by those skilled in the art that structure shown in Figure 14, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, object test equipment provided by the present application can be implemented as a kind of shape of computer program Formula, computer program can be run in computer equipment as shown in figure 14.Composition can be stored in the memory of computer equipment Each program module of the object test equipment, for example, image to be detected shown in Fig. 9 obtains module, target object detects mould Type detection module and band of position output module.The computer program that each program module is constituted makes processor execute this explanation Step in the method for checking object of each embodiment of the application described in book.

For example, computer equipment shown in Figure 14 can pass through the mapping to be checked in object test equipment as shown in Figure 9 Acquisition image to be detected is executed as obtaining module.Target object detection model detection module, which is executed, is input to mesh for image to be detected Object detection model is marked, target object detection model extracts collection of network by fisrt feature and mentions to image to be detected progress feature Obtain Obj State characteristic information, fisrt feature extract collection of network include parallel Obj State feature extraction sub-network and Sub-network is merged with the first of each parallel Obj State feature extraction sub-network connection.Target object detection model detects mould Obj State characteristic information is input to second feature and extracts collection of network by block also performance objective object detection model, is obtained to be checked The corresponding position feature information of target detection object in altimetric image, it includes that parallel position is special that second feature, which extracts collection of network, Sign, which extracts sub-network and extracts the second of sub-network connection with each parallel position feature, merges sub-network.Band of position output Module is executed according to the corresponding band of position of position feature information output target test object.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor, so that the step of processor executes above-mentioned method for checking object.It is right herein As the step of detection method can be the step in above-mentioned each embodiment method for checking object.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer journey are stored with When sequence is executed by processor, so that the step of processor executes above-mentioned method for checking object.The step of method for checking object herein It can be the step in the method for checking object of above-mentioned each embodiment.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of method for checking object, comprising:

Obtain image to be detected；

Described image to be detected is input to target object detection model, the target object detection model is mentioned by fisrt feature It takes collection of network to carry out feature extraction to described image to be detected and obtains Obj State characteristic information, the fisrt feature extracts net Network set includes parallel Obj State feature extraction sub-network and connects with each parallel Obj State feature extraction sub-network The the first fusion sub-network connect；

The Obj State characteristic information is input to second feature and extracts collection of network by the target object detection model, is obtained The corresponding position feature information of target detection object in described image to be detected, the second feature extract collection of network and include Parallel position feature, which extracts sub-network and extracts the second of sub-network connection with each parallel position feature, merges sub-network；

The corresponding band of position of the target detection object is exported according to the position feature information.

2. the method according to claim 1, wherein the target object detection model is extracted by fisrt feature Collection of network carries out feature extraction to image to be detected and obtains Obj State characteristic information, comprising:

Described image to be detected is inputted into the first shared convolution sub-network, extraction obtains the first shared characteristics of image；

Described first shared characteristics of image is input to parallel the first convolution sub-network and first pond beggar's network, described first Shared convolution sub-network, the first convolution sub-network and the first pond beggar's network include down-sampling process；

The first convolution sub-network exports image convolution feature, and the first pond beggar's network exports image detail information；

Described image convolution feature and described image detailed information are carried out merging features by the first fusion sub-network to obtain To the Obj State characteristic information.

3. the method according to claim 1, wherein the target object detection model is special by the Obj State Reference breath is input to second feature and extracts collection of network, obtains the corresponding position of target detection object in described image to be detected Characteristic information, comprising:

The Obj State characteristic information is inputted into the second shared convolution sub-network, extraction obtains the second shared characteristics of image；

Described second shared characteristics of image is input to parallel the second convolution sub-network, third convolution sub-network and the second pond The feature extraction mesh size of sub-network, the second convolution sub-network and the third convolution sub-network is different, and described second Shared convolution sub-network, the second convolution sub-network, the third convolution sub-network and the second pond beggar's network include Down-sampling process；

The second convolution sub-network exports first position convolution feature, and the third convolution sub-network exports second position convolution Feature, the second pond beggar's network output position detailed information；

The first position convolution feature, the second position convolution feature and the position details information are passed through described second Fusion sub-network carries out merging features and obtains the position feature information.

4. the method according to claim 1, wherein described export the target according to the position feature information The corresponding band of position of test object, comprising:

The profile of target detection object in described image to be detected is determined according to the position feature information；

The regional scope of the target detection object is determined according to the profile of the target detection object；

The center position of the target detection object is determined according to the position feature information；

The band of position and center of the target detection object are drawn according to the regional scope and the center position.

5. the method according to claim 1, wherein the training step of the target object detection model includes:

Training image collection is obtained, the training image that the training image is concentrated includes the corresponding tab area position of training objective object It sets；

The training image that the training image is concentrated is input to initial object detection model, the initial object detection model is logical It crosses fisrt feature extraction collection of network training image progress feature extraction is obtained training Obj State characteristic information, it is described It includes parallel Obj State feature extraction sub-network and special with each parallel Obj State that fisrt feature, which extracts collection of network, Sign extracts the first fusion sub-network of sub-network connection；

The trained Obj State characteristic information is input to second feature and extracts collection of network by the initial object detection model, The corresponding trained position feature information of training objective object in the training image is obtained, the second feature extracts network collection Close includes that parallel position feature extracts sub-network and merges with the second of each parallel position feature extraction sub-network connection Sub-network；

The model of the initial object detection model is adjusted according to the trained position feature information and the tab area position Parameter has been instructed until the training position feature information output of initial object detection model output meets the condition of convergence The experienced target object detection model.

6. according to the method described in claim 5, it is characterized in that, the method also includes:

Obtain the corresponding labeled data in the corresponding tab area position of the training objective object in the training image；

The corresponding initialization value of object detection model is calculated according to the labeled data；

It is initialized according to model parameter of the initialization value to the object detection model, obtains initial object detection mould Type.

7. according to the method described in claim 5, it is characterized in that, the acquisition training image collection, the training image are concentrated Training image include the corresponding tab area position of training objective object, comprising:

Sample graph image set is obtained, the sample image that the sample image is concentrated includes corresponding sample object region；

Default screening rule is obtained, the default screening rule includes that the sample object of different zones range and sample image are handled The matching relationship of rule；

According to the range in the sample object region in the sample image and the matching relationship, it is corresponding to obtain each sample image Target sample data processing rule, according to target sample data processing rule to corresponding sample image carry out sample image at Reason obtains training data, and the training data includes the corresponding tab area position of training objective object.

8. the method according to the description of claim 7 is characterized in that the default screening rule includes the first estate sample image Processing rule, the second grade sample image processing rule, tertiary gradient sample image processing rule and fourth estate sample image Processing rule, the range according to the sample object region in the sample image and the matching relationship, obtain each sample The corresponding target sample data processing rule of this image carries out corresponding sample image according to target sample data processing rule Sample image handles to obtain training data, and the training data includes the corresponding tab area position of training objective object, comprising:

When the sample object region in the sample image is corresponding less than the first estate sample image processing rule When the range of first area, and/or

When the sample object region in the sample image is corresponding greater than fourth estate sample image processing rule When the fourth region range, then the sample image is filtered；

When the sample object region in the sample image be located at the second grade sample image processing rule it is corresponding When second area range, presupposition multiple is obtained, part sample image is amplified according to the presupposition multiple, after obtaining amplification Sample image, the amplified sample image is formed into the training image collection；

When the sample object region in the sample image be located at the tertiary gradient sample image processing rule it is corresponding When third regional scope, then the sample image is directly formed into the training image collection.

9. a kind of object test equipment, which is characterized in that described device includes:

Image to be detected obtains module, for obtaining image to be detected；

Target object detection model detection module, it is described for described image to be detected to be input to target object detection model Target object detection model extracts collection of network by fisrt feature and obtains object to the progress feature extraction of described image to be detected State characteristic information, the fisrt feature extract collection of network include parallel Obj State feature extraction sub-network and with it is each First fusion sub-network of parallel Obj State feature extraction sub-network connection；

The target object detection model detection module is also used to the target object detection model for the Obj State feature Information input to second feature extracts collection of network, and the corresponding position of target detection object obtained in described image to be detected is special Reference breath, the second feature extract collection of network include parallel position feature extract sub-network and with each parallel position Second fusion sub-network of feature extraction sub-network connection；

Band of position output module, for exporting the corresponding position area of the target detection object according to the position feature information Domain.

10. device according to claim 9, which is characterized in that the target object detection model detection module includes:

First shared convolution sub-network processing unit is mentioned for described image to be detected to be inputted the first shared convolution sub-network It obtains to the first shared characteristics of image；

First shared characteristics of image processing unit, for the described first shared characteristics of image to be input to parallel the first convolution Network and first pond beggar's network, the first shared convolution sub-network, the first convolution sub-network and first pond Sub-network includes down-sampling process；

Image output unit exports image convolution feature for the first convolution sub-network, and the first pond beggar's network is defeated Image detail information out；

Obj State characteristic information concatenation unit, it is described for passing through described image convolution feature and described image detailed information First fusion sub-network carries out merging features and obtains the Obj State characteristic information.

11. device according to claim 9, which is characterized in that the target object detection model detection module includes:

Second shared convolution sub-network processing unit, for the Obj State characteristic information to be inputted the second shared convolution subnet Network, extraction obtain the second shared characteristics of image；

Second shared characteristics of image processing unit, for the described second shared characteristics of image to be input to parallel the second convolution Network, third convolution sub-network and second pond beggar's network, the second convolution sub-network and the third convolution sub-network Feature extraction mesh size is different, the second shared convolution sub-network, the second convolution sub-network, third convolution Network and the second pond beggar's network include down-sampling process；

Position details information output unit exports first position convolution feature, the third for the second convolution sub-network Convolution sub-network exports second position convolution feature, the second pond beggar's network output position detailed information；

Position feature information concatenation unit is used for the first position convolution feature, the second position convolution feature and institute Rheme sets detailed information and obtains the position feature information by the second fusion sub-network progress merging features.

12. device according to claim 9, which is characterized in that the band of position output module includes:

Outline specifying unit, for determining the wheel of target detection object in described image to be detected according to the position feature information It is wide；

Regional scope determination unit, for determining the region of the target detection object according to the profile of the target detection object Range；

Center position determination unit, for determining the center point of the target detection object according to the position feature information It sets；

Band of position drawing unit, for drawing the target detection object according to the regional scope and the center position The band of position and center.

13. device according to claim 9, which is characterized in that described device further include:

Training image collection obtains module, and for obtaining training image collection, the training image that the training image is concentrated includes training The corresponding tab area position of target object；

Initial object detection model processing module, the training image for concentrating the training image are input to initial object inspection Model is surveyed, the initial object detection model extracts collection of network by fisrt feature and carries out feature extraction to the training image It obtains training Obj State characteristic information, it includes parallel Obj State feature extraction that the fisrt feature, which extracts collection of network, Network and sub-network is merged with the first of each parallel Obj State feature extraction sub-network connection；

The initial object detection model processing module is also used to the initial object detection model for the trained Obj State Characteristic information is input to second feature and extracts collection of network, obtains the corresponding training of training objective object in the training image Position feature information, the second feature extract collection of network include parallel position feature extract sub-network and with it is each parallel Position feature extract sub-network connection second fusion sub-network；

Initial object detection model training module, for according to the trained position feature information and tab area position tune The model parameter of the whole initial object detection model, until the training position feature letter of initial object detection model output Breath output meets the condition of convergence, the target object detection model trained.

14. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claims 1 to 8 the method.

15. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 8 the method Suddenly.