CN110349135A

CN110349135A - Object detection method and device

Info

Publication number: CN110349135A
Application number: CN201910568841.5A
Authority: CN
Inventors: 高巍; 张一凡; 田继锋
Original assignee: Goertek Inc
Current assignee: Goertek Inc
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2019-10-18

Abstract

The invention discloses a kind of object detection method and devices, this method comprises: carrying out feature extraction to the target image comprising target to be detected, obtain the characteristic vector data for reflecting target signature to be detected；Number of paths and network level quantity are set, cluster translation network is constructed in example segmentation framework, wherein the number of paths in cluster translation network is two or more, is carried out data transmission between corresponding path between different levels；Described eigenvector data are inputted into cluster translation network, characteristic vector data is divided into different levels, the data of each level are corresponded into each path and are divided into each feature group；In cluster translation network, the feature group of each level on each path of parallel processing obtains multiple processing results；Multiple processing results are merged, the characteristic image of the target image is obtained.

Description

Object detection method and device

Technical field

The present invention relates to image object detection and segmentation technology more particularly to a kind of object detection method, Yi Jiyi Kind object detecting device.

Background technique

In manufacture manufacture field etc., it usually needs carry out defects detection to the product produced.Currently, being carried out to product When defects detection, the traditional artificial appearance detection for carrying out product defects is usually substituted using artificial intelligence technology, is thought substantially Lu Shi: by a certain number of related original image data of collection/acquisition, feeding example segmentation framework Mask R-CNN afterwards, and Convolution feature extraction network backbone in Case-based Reasoning segmentation framework obtains the characteristic image of product to be detected, and then to this Characteristic image is handled, to differentiate whether product to be detected is qualified.

However, existing convolution feature extraction network backbone is all made of depth residual error network structure to comprising to be detected The original image of product is handled to obtain the characteristic image of target object, since the characteristic information of original image is many and diverse, and Quantity is more, and therefore, using existing depth residual error network structure when handling original image, processing speed is slow, moreover, Treatment effeciency is low.

Summary of the invention

The purpose of the present invention is to provide a kind of new technical solutions for target detection.

According to the first aspect of the invention, a kind of object detection method is provided comprising:

Feature extraction is carried out to the target image comprising target to be detected, obtain reflecting the feature of target signature to be detected to Measure data；

Number of paths and network level quantity are set, cluster translation network is constructed in example segmentation framework, wherein polymerization Number of paths in converting network is two or more, is carried out data transmission between corresponding path between different levels；

Characteristic vector data is inputted into cluster translation network, characteristic vector data is divided into different levels, by each The data of level correspond to each path and are divided into each feature group；

In cluster translation network, the feature group of each level on each path of parallel processing obtains multiple processing knots Fruit；

Multiple processing results are merged, the characteristic image of target image is obtained.

Optionally, number of paths and network level quantity are set, cluster translation network, packet are constructed in example segmentation framework It includes:

According to number of paths, more than two paths are set；

According to network level quantity, being arranged includes an input layer, one or more transition zones and one on each path Output layer.

Optionally, in cluster translation network, the feature group of each level on each path of parallel processing is obtained multiple Processing result, comprising:

Dimension-reduction treatment is carried out by characteristic vector data of the input layer on each path to input, the spy after obtaining dimensionality reduction Sign group；

By the transition zone in the feature group input respective path after dimensionality reduction, the feature group after dimensionality reduction is carried out by transition zone Feature extraction obtains intermediate features group；

Intermediate features group is inputted into the output layer in respective path, a liter Wei Chu is carried out to intermediate features group by output layer Reason obtains the feature group after rising dimension, the processing result as respective path.

It optionally, include attention network RPN in region in example segmentation framework, method is in the characteristic pattern for obtaining target image As after, further includes:

By characteristic image input area attention network, by region attention network, according to default fixed window in spy The region of existing defects in sliding identification feature image is carried out on sign image, the region of the existing defects after being identified is as just Beginning area-of-interest, and generate the characteristic image for being identified with each initial area-of-interest.

Optionally, the area of existing defects in sliding identification feature image is carried out on characteristic image according to default fixed window Domain, comprising:

The fixation window that size for window is N × N at least provides the filter of N × 1 and a 1 × N filter；Its In, N is the size of fixed window；

The filter of N × 1 and 1 × N filter are passed sequentially through, dimension-reduction treatment is carried out to characteristic image；

The region of existing defects is marked in the characteristic image after dimension-reduction treatment.

It optionally, further include full articulamentum FC used for positioning, method in example segmentation framework further include:

Obtain the matrix of image data handled in full articulamentum；

According to the matrix of image data, processing array is constructed；

The corresponding weight matrix of extraction process matrix and bias matrix, decompose the matrix of image data, using point Matrix after solution carries out operation in full articulamentum.

According to the second aspect of the invention, a kind of object detecting device is also provided characterized by comprising

Extraction module obtains reflecting mesh to be detected for carrying out feature extraction to the target image comprising target to be detected Mark the characteristic vector data of feature；

Network struction module constructs polymerization for number of paths and network level quantity to be arranged in example segmentation framework Converting network, wherein the number of paths in cluster translation network is two or more, between different levels between corresponding path Carry out data transmission；

Characteristic vector data is divided into difference for characteristic vector data to be inputted cluster translation network by input module The data of each level are corresponded to each path and are divided into each feature group by level；

Parallel processing module is used for the feature of each level on each path of parallel processing in cluster translation network Group obtains multiple processing results；

Fusion Module obtains the characteristic image of target image for merging multiple processing results.

It optionally, include region attention network RPN in example segmentation framework, device further includes region extraction module, area Domain extraction module is used for:

Optionally, region extraction module is specifically used for:

Filter and filter are passed sequentially through, dimension-reduction treatment is carried out to characteristic image；And

It optionally, include full articulamentum FC used for positioning in example segmentation framework, device further includes matrix disposal module, Matrix disposal module is used for:

Obtain the matrix of image data handled in full articulamentum；

According to the matrix of image data, processing array is constructed；

A beneficial effect of the invention is, according to the method for the embodiment of the present invention and device, is directed to traditional Convolution feature extraction network in example segmentation framework improves, the convolution feature extraction in traditional example segmentation framework Cluster translation network is constructed in network backbone, which has the structure of multilayer multiple groups, and then can be to defeated The characteristic vector data of the reflection target signature to be detected entered carries out parallel processing, that is, it can be thought using hierarchic parallel processing Optimize existing convolution feature extraction network presumably, so that convolution feature extraction network under the premise of complexity is not increased, is located Reason efficiency further increases.

By referring to the drawings to the detailed description of exemplary embodiment of the present invention, other feature of the invention and its Advantage will become apparent.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described.It should be appreciated that the following drawings illustrates only certain embodiments of the present invention, therefore it is not construed as pair The restriction of range.It for those of ordinary skill in the art, without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 is the hardware structural diagram of object detection system according to an embodiment of the present invention；

Fig. 2 is the flow diagram of object detection method according to an embodiment of the present invention；

Fig. 3 is example segmentation framework schematic diagram according to an embodiment of the present invention；

Fig. 4 is conventional residual schematic network structure according to an embodiment of the present invention；

Fig. 5 is converging operation schematic diagram according to an embodiment of the present invention；

Fig. 6 is convergence converting network schematic diagram according to an embodiment of the present invention；

Fig. 7 is convolutional coding structure schematic diagram according to an embodiment of the present invention；

Fig. 8 is the flow diagram of the object detection method of an example according to the present invention；

Fig. 9 is the block diagram of object detecting device according to an embodiment of the present invention.

Specific embodiment

Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the present invention And its application or any restrictions used.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, technology, method and apparatus should be considered as part of specification.

It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without It is as limitation.Therefore, other examples of exemplary embodiment can have different values.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.

Fig. 1 is the block diagram of the hardware configuration of object detection system 100 according to an embodiment of the present invention.

As shown in Figure 1, object detection system 100 includes image collecting device 1000 and object detecting device 2000.

Image collecting device 1000 is provided to target detection dress for acquiring target image, and by collected target image Set 2000.

The image collecting device 1000 can be can be to any imaging device that target to be detected is taken pictures, such as takes the photograph As head etc..

Object detecting device 2000 can be arbitrary electronic equipment, such as PC machine, laptop, server etc..

In the present embodiment, shown in referring to Fig.1, object detecting device 2000 may include processor 2100, memory 2200, interface arrangement 2300, communication device 2400, display device 2500, input unit 2600, loudspeaker 2700, microphone 2800 etc..

Processor 2100 can be mobile edition processor.Memory 2200 for example including ROM (read-only memory), RAM (with Machine accesses memory), the nonvolatile memory of hard disk etc..Interface arrangement 2300 is for example including USB interface, earphone interface Deng.Communication device 2400 is for example able to carry out wired or wireless communication, and communication device 2400 may include short-range communication device, E.g. based on Hilink agreement, WiFi (802.11 agreement of IEEE), Mesh, bluetooth, ZigBee, Thread, Z-Wave, The short-range wireless communication protocols such as NFC, UWB, LiFi carry out any device of short-distance wireless communication, and communication device 2400 can also To include remote communication devices, any device of WLAN, GPRS, 2G/3G/4G/5G telecommunication is e.g. carried out.Display device 2500 be, for example, liquid crystal display, touch display screen etc., and display device 2500 is used to show the target of image acquisition device Image.Input unit 2600 is such as may include touch screen, keyboard.User can pass through loudspeaker 2700 and microphone 2800 Inputting/outputting voice information.

In this embodiment, for storing instruction, the instruction is for controlling for the memory 2200 of object detecting device 2000 Processor 2100 is operated at least to execute the object detection method of any embodiment according to the present invention.Technical staff can root It is instructed according to presently disclosed conceptual design.How control processor is operated for instruction, this is it is known in the art that therefore herein not It is described in detail again.

Although multiple devices of object detecting device 2000 are shown in FIG. 1, the present invention can only relate to wherein Partial devices, for example, object detecting device 2000 pertains only to memory 2200, processor 2100 and display device 2500.

In the present embodiment, image collecting device is provided to object detecting device 2000, target for acquiring target image The method that detection device 2000 then implements any embodiment according to the present invention based on the target image.

It should be understood that although Fig. 1 only shows an image collecting device 1000 and an object detecting device 2000, But it is not intended to limit respective quantity, may include multiple images acquisition device 1000 and/or mesh in object detection system 100 Mark detection device 2000.

The embodiment of the invention also provides a kind of object detection methods, according to Fig.2, the target detection side of the present embodiment Method specifically comprises the following steps S2100~S2500:

Step S2100 carries out feature extraction to the target image comprising target to be detected, obtains reflecting that target to be detected is special The characteristic vector data of sign.

Target image is take pictures to target to be detected the image of acquisition, can be by it needs to be determined that it is qualified either not Qualified product is as target to be detected.For example, it is desired to determine some components it is whether qualified, here, can be by this zero Component is as target to be detected, that is, can be the image for acquiring the components as target image.

In one example, which can be digital picture, which can be by numerous pixel Composition.Here, the digital picture can be shown based on different color modes, which is such as, but not limited to RGB (Red, Green, Blue) color mode, CMYK (Cyan, Magenta, Yellow, Black) color mode, HSB (Hue Saturate Bright) color mode and pattern bitmap etc..

In another example, which may be analog image.

In the present embodiment, it can be and feature extraction carried out to the target image comprising target to be detected, to be reflected The characteristic vector data of target signature to be detected, wherein this feature vector data can be the characteristic vector data of 256 dimensions, It is, of course, also possible to be the characteristic vector data of other dimensions, it is not limited here.

Feature extraction is carried out to the target image comprising target to be detected executing step S2100, obtains reflecting to be detected After the characteristic vector data of target signature, S2200 is entered step.

Number of paths and network level quantity is arranged in step S2200, and cluster translation net is constructed in example segmentation framework Network.

Number of paths in cluster translation network is two or more, is counted between corresponding path between different levels It, can be according to concrete application scene and emulation experiment setting number of paths and network level quantity according to transmission.

Currently, the convolution feature extraction network in example segmentation framework mostly uses greatly depth residual error network structure to step The characteristic vector data that S2100 is obtained carries out semantic feature extraction, and the target image of input is abstracted as characteristic image Feature Maps, useful region is activated in this feature image, and uncorrelated or indefinite characteristic information is weakened.

However, as shown in Figure 3 and Figure 4, existing depth residual error network structure mostly uses greatly " Resnet50+FPN " structure, Wherein, 256-d in indicates that the characteristic vector data of input depth residual error network structure, 256-d out indicate to pass through depth residual error The characteristic vector data exported after network structure processing, first layer reduce the characteristic vector data of input with 256,1 × 1 structures Dimension, that is, 256 dimensions are reduced to 64 dimensions, the second layer carries out local shape factor, third by 64,3 × 3 structures Layer carries out the dilatation of output dimension by 64,1 × 1 structure, that is, by 64 dimension dilatations at 256 dimensions, what is appeared above is all Inner product operation of the "×" operation both from convolution.In conjunction with Fig. 3 as can be seen that existing depth residual error network structure using The network structure of single-input single-output, that is, 256-d in passes through the same path and is handled, thus, lead to processing speed It slowly, is the convolution feature extraction network struction cluster translation network in example segmentation framework and in the present embodiment, thus, It is handled by characteristic vector data of multiple parallel paths to input, to improve processing speed.

Cluster translation network provided in this embodiment changes inner product operation as shown in Figure 4, is become by a kind of polymerization Inner product operation is transformed to the combination of splitting-transforming-aggregating by the form changed, such as formula 1 It is shown:

Wherein, the value of x is each natural number from 1 to D, and D indicates input number of dimensions, x=[x₁, x₂..., x_D] indicate input vector, w_iIndicate the filter weight of i-th of input vector, wherein " Splitting " in transformation is operated will Input vector x points are embedded in vector characteristics, the i.e. x in one-dimensional space for low dimensional_i；" Transforming " operation in transformation will be low Dimensional space is transformed to low dimensional expression, i.e. w_ix_i；" Aggregating " operation in transformation by be added all insertion to Measure feature carries out transformation polymerization, which can be is indicated with topological diagram shown in fig. 5.

According to the above analysis, w can will be converted by element one by one_ix_iFunction T (x) expression is replaced with, such as formula 2:

Wherein, T (x) can be arbitrary function, and x is projected to embedded space (usually low dimensional space) and is carried out Transformation, wherein C is the size of mapping ensemble to be polymerized, and similar to the D in formula 1, the dimension of C determines the number of complex transformations Mesh.Design for transforming function transformation function requires all T (i) topological structures all the same by the way of.

The thought of reference depth residual error network design, such as formula 3, and by the F inner product expression formula in formula 3 in formula 2 F convergence expression formula replacement, obtain the transformation of cluster translation entirety of formula 4, formula 4 is cluster translation replacement inner product operation The step of "×", provides theories integration.

Y=F (x, { w_i)+x (formula 3)

Illustratively, which is M, and M is the natural number greater than 1, which is 3, here, can root According to number of paths M and network level quantity 3, building obtains cluster translation network as shown in FIG. 6 in example segmentation framework.

Step S2200 setting number of paths and network level quantity are being executed, building polymerization becomes in example segmentation framework After switching network, S2300 is entered step.

Characteristic vector data is inputted cluster translation network, characteristic vector data is divided into different layers by step S2300 The data of each level are corresponded to each path and are divided into each feature group by grade.

As shown in fig. 6, to input the characteristic vector data that the characteristic vector data of cluster translation network is 256 dimensions, and, For number of paths is M, network level quantity is 3, the characteristic vector data of 256 dimensions is divided into M spy here, can be Sign group includes in each feature groupA characteristic vector data.

Characteristic vector data is inputted into cluster translation network executing step S2300, characteristic vector data is divided into not Same level enters step S2400 after the data of each level are divided into each feature group corresponding to each path.

Step S2400, in cluster translation network, the feature group of each level on each path of parallel processing is obtained Multiple processing results.

Illustratively, it such as can be in cluster translation network shown in Fig. 6, it is each on this M path of parallel processing The feature group of a level, and then obtain the processing result in corresponding M path.

Step S2400 is being executed in cluster translation network, the feature group of each level on each path of parallel processing, After obtaining multiple processing results, S2500 is entered step.

Step S2500 merges multiple processing results, obtains the characteristic image of target image.

In the present embodiment, such as the multiple processing results of superposition be can be, obtains the characteristic image of target image.Such as Fig. 6 It is shown, it can be the output layer output being superimposed on each pathThe characteristic vector data of dimension, obtains target image Characteristic image.

According to the method for the embodiment of the present invention, the convolution feature extraction network being directed in traditional example segmentation framework It improves, cluster translation network is constructed in the convolution feature extraction network backbone in traditional example segmentation framework, The cluster translation network has the structure of multilayer multiple groups, and then can be to the feature vector of the reflection target signature to be detected of input Data carry out parallel processing, that is, it can handle thought using hierarchic parallel to optimize existing convolution feature extraction network, make Convolution feature extraction network is obtained under the premise of complexity is not increased, treatment effeciency further increases.

The embodiment of the invention also provides a kind of object detection methods, in the present embodiment, are arranged in above-mentioned steps S2200 Number of paths and network level quantity, constructing cluster translation network in example segmentation framework may further include following steps S2210~S2220:

More than two paths are arranged according to number of paths in step S2210.

As shown in fig. 6, the number of paths is M, that is, corresponding M path can be arranged according to number of paths M.

Step S2220, according to network level quantity, being arranged includes an input layer, one or more mistakes on each path Cross layer and an output layer.

As shown in fig. 6, the network level quantity is 3, wrapped on such as M path of several paths here, can be set Include an input layer, a transition zone and an output layer, it is, of course, also possible to be arranged each path include be, for example, two or The transition zone of person's other quantity, it is not limited here.

The present embodiment is based on number of paths and more than two paths is arranged, and, according to network level quantity, setting is per all the way It include an input layer, one or more transition zones and an output layer on diameter, this is conducive to the building effect of cluster translation network Rate, and be further conducive to handle corresponding feature vector quantity by multiple parallel paths, to improve processing speed.

The embodiment of the invention also provides a kind of object detection methods, in the present embodiment, in above-mentioned steps step S2400 In cluster translation network, the feature group of each level on each path of parallel processing, obtaining multiple processing results can be into One step includes the following steps S2410~S2430:

Step S2410 carries out dimension-reduction treatment by characteristic vector data of the input layer on each path to input, obtains Feature group after dimensionality reduction.

As shown in fig. 6, sharing M path, the characteristic vector data always inputted is 256 dimensions, here, on each path Input layer inputThe characteristic vector data of dimension passes through the 1x1 filter pair being arranged in the input layer on each path InputThe characteristic vector data of dimension carries out dimension-reduction treatment, obtainsThe characteristic vector data of dimension.

Step S2420, by after dimensionality reduction feature group input respective path on transition zone, by transition zone to dimensionality reduction after Feature group carries out feature extraction, obtains intermediate features group.

As shown in fig. 6, by the input layer output on each pathThe characteristic vector data of dimension inputs respective path On transition zone, by the 3x3 filter that is arranged in transition zone to inputThe characteristic vector data of dimension carries out feature It extracts, obtainsThe characteristic vector data of dimension.

Step S2430, by intermediate features group input respective path on output layer, by output layer to intermediate features group into Row rises dimension processing, obtains the feature group after rising dimension, the processing result as respective path.

As shown in fig. 6, by the transition zone output on each pathThe characteristic vector data of dimension inputs respective path On output layer, a liter Wei Chu is carried out to the characteristic vector data of 4 dimensions of input by the 1x1 filter being arranged in output layer Reason obtainsThe characteristic vector data of dimension.

According to the present embodiment, it can be and corresponding feature vector quantity is handled by multiple parallel paths, into One step improves processing speed.

The embodiment of the invention also provides a kind of object detection methods, as shown in figure 3, further including area in example segmentation framework Domain attention network (Region Proposal Network, RPN), after the characteristic image for obtaining target image, the present invention Object detection method further include:

In the present embodiment, attention network in region can be generated in step S2500 with the receptive field of a particular size Characteristic image on slide a fixed window, the size of the fixation window can be 3, and to be potentially present of in characteristic image to Detection clarification of objective region is identified, a certain number of initial area-of-interests, these initial area-of-interests are generated It is exactly that region attention network thinks region in characteristic image there may be defect.

In the present embodiment, which carries out in sliding identification feature image according to default fixed window on characteristic image The region of existing defects may further include following steps S2610~S2630:

Step S2610, the fixation window that the size for being window is N × N at least provide the filter of N × 1 and one 1 × N filter.

N is the size for extracting window, and the size of N can be arranged according to concrete application scene and concrete application demand, For example, N can be 3.

As shown in fig. 7, it provides the filter of N × 1 and a 1 × N filter for the fixation window of N × N, for example, N It is 3, then can be and 3 × 3 filters in Fig. 3 in region attention network RPN are replaced with into 3 × 1 filters and one 1 × 3 filters.

Step S2620 passes sequentially through the filter of N × 1 and 1 × N filter, carries out dimension-reduction treatment to characteristic image.

As shown in fig. 7, characteristic image input area attention network is passed sequentially through the filter of N × 1 and 1 × N filtering Device carries out dimension-reduction treatment to characteristic image.

It can also be by connecing two 1 × 1 filters as shown in Figure 3 respectively after the filter of N × 1 and 1 × N filter, Dimension-reduction treatment is carried out to characteristic image.

Step S2630 marks the region of existing defects in the characteristic image after dimension-reduction treatment.

In this step S2630, the region that existing defects are marked in the characteristic image after dimension-reduction treatment can be, As initial area-of-interest.

According to above step S2610~S2630, existing N × N filter is replaced with into the filter of N × 1 and one A two-part structure of 1 × N filter, thus, it is possible to reduce model parameter amount.

It further include used for positioning complete the embodiment of the invention also provides a kind of object detection method, in example segmentation framework Articulamentum FC, the further comprising the steps of S2710~S2730 of object detection method of the present invention:

Step S2710 obtains the matrix of image data handled in full articulamentum.

As shown in figure 3, full articulamentum used for positioning can be in " three branches " branched structure shown in Fig. 3 Coordinate " coordinates " sub-branch in second " FC layer " (box reg), here, can be three branches The image data of the output of first " FC layer " in " three branches " branched structure is used as " coordinates " The image data of the input of second " FC layer " in sub-branch.

Step S2720 constructs processing array according to the matrix of image data.

Step S2730, the corresponding weight matrix of extraction process matrix and bias matrix, divide the matrix of image data Solution, carries out operation using the matrix after decomposition in full articulamentum.

In this step S2730, can be by the way of singular value decomposition the corresponding weight matrix of extraction process matrix and Bias matrix.It is appreciated that this matrix decomposition processing mode can be applied only on FC layers of element branches, such as it is applied only to In second " FC layer " (box reg).

According to above step S2710~S2730, complicated matrix can be decomposed, on the basis for guaranteeing precision On, computation complexity is significantly reduced, therefore also in a disguised form improve the efficiency of coordinate setting.

Fig. 8 shows the object detection method of an example, referring to shown in Fig. 3 and Fig. 8, in the example, and target detection side Method may include steps of:

Step S8100, input (input) include the target image of target to be detected, and carry out feature to target image and mention It takes, obtains the characteristic vector data for reflecting target signature to be detected.

Step S8200 is arranged number of paths and network level quantity, constructs in convolution feature extraction network backbone Cluster translation network.

By step S8200, can be will be in convolution feature extraction network backbone shown in Fig. 3 " Resnet50+FPN " structure is optimized using cluster translation network.

Characteristic vector data is inputted cluster translation network, characteristic vector data is divided into different layers by step S8300 The data of each level are corresponded to each path and are divided into each feature group by grade.

Step S8400, in cluster translation network, the feature group of each level on each path of parallel processing is obtained Multiple processing results merge multiple processing results, obtain the characteristic image of target image.

As shown in figure 3, the characteristic image Feature of target image can be obtained by above step S8200~S8400 Maps。

Characteristic image is input to region attention network RPN, by region attention network, according to pre- by step S8500 If fixed window carries out the region of existing defects in sliding identification feature image on characteristic image, the presence after being identified is lacked Sunken region generates the characteristic image for being identified with each initial area-of-interest as initial area-of-interest.

As shown in figures 3 and 8, it can be and 3 × 3conv in Fig. 3 in region attention network RPN replaced with one 3 After × 1conv and 1 × 3conv, and a 1 × 1conv is extracted with carrying out dimension-reduction treatment to characteristic image respectively It is candidate out, there may be the regions of defect as initial area-of-interest.

The characteristic image input region of interest for being identified with each initial area-of-interest is aligned network by step S8600 ROI Align is aligned network ROI Align by area-of-interest, according to default pond window to each initial region of interest Domain carries out pond, and generates the characteristic image for being identified with corresponding area-of-interest.

As shown in figure 3, can be by area-of-interest be aligned network ROI Align by each initial area-of-interest into Row alignment operation exports the characteristic pattern that a fixed size is 7x7.

The characteristic image for being identified with corresponding area-of-interest is inputted the full connection for being used to position and classify by step S8700 Layer FC, obtains classification information and coordinate information belonging to defect, is obtained by full convolutional network Fully Convolution Nets Obtain the mask information of defect.

As shown in figure 3, passing through first " FC layer " and coordinate in " three branches " branched structure Second " FC layer " (bbox reg) in " coordinates " sub-branch obtains location information belonging to defect, passes through Second in first " FC layer " and coordinate " Category " sub-branch in " three branches " branched structure " FC layer " (softmax) obtains the classification information of defect, and, pass through full convolutional network Fully Convolution Nets (Mask), obtains the mask information of defect.

According to the example, the specific location for orienting target existing defects to be detected not only can detecte, but also can It is partitioned into the profile information of defect, the defect of variety classes, different instances is divided one by one.Moreover, because the example The convolution feature extraction network in example segmentation framework is optimized in son, and, it is carried out in the attention network of region The replacement of filter thus, it is possible to further promote detection accuracy, while decreasing training time and mode inference time.

Fig. 8 is the functional block diagram according to the object detecting device 8000 of the embodiment of the present invention.

According to Fig.8, the object detecting device 8000 of the present embodiment may include extraction module 8100, network struction mould Block 8200, input module 8300, parallel processing module 8400 and Fusion Module 8500.

The extraction module 8100, for carrying out feature extraction to the target image comprising target to be detected, obtain reflecting to Detect the characteristic vector data of target signature.

The network struction module 8200, for number of paths and network level quantity to be arranged, the structure in example segmentation framework Build cluster translation network, wherein the number of paths in cluster translation network is two or more, corresponding road between different levels Carry out data transmission between diameter.

The input module 8300 divides characteristic vector data for characteristic vector data to be inputted cluster translation network For different levels, the data of each level are corresponded into each path and are divided into each feature group.

The parallel processing module 8400, for each level in cluster translation network, on each path of parallel processing Feature group, obtain multiple processing results.

The Fusion Module 8500 obtains the characteristic image of target image for merging multiple processing results.

In one embodiment, the network struction module 8200, is specifically used for:

According to number of paths, more than two paths are set；

In one embodiment, the parallel processing module 8400, is specifically used for:

Dimension-reduction treatment is carried out by characteristic vector data of the input layer on each path to input, the spy after obtaining dimensionality reduction Sign group.

By the transition zone in the feature group input respective path after dimensionality reduction, the feature group after dimensionality reduction is carried out by transition zone Feature extraction obtains intermediate features group.

It in one embodiment, include region attention network RPN in example segmentation framework, device 8000 further includes region Extraction module (not shown), region extraction module are used for:

In one embodiment, region extraction module (not shown) is specifically used for:

The filter of N × 1 and 1 × N filter are passed sequentially through, dimension-reduction treatment is carried out to characteristic image；And

It in one embodiment, include full articulamentum FC used for positioning in example segmentation framework, device further includes matrix Processing module (not shown), matrix disposal module are used for:

Obtain the matrix of image data handled in full articulamentum；

According to the matrix of image data, processing array is constructed；And

The specific implementation of each module may refer to the phase in embodiment of the present invention method in apparatus of the present invention embodiment Hold inside the Pass, details are not described herein.

The present invention can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.

Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

Computer program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, programming language include object-oriented programming language-such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program refers to Order can be executed fully on the user computer, partly be executed on the user computer, as an independent software package Execute, part on the user computer part on the remote computer execute or completely on a remote computer or server It executes.In situations involving remote computers, remote computer can include local area network by the network-of any kind (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet Service provider is connected by internet).In some embodiments, by being believed using the state of computer-readable program instructions Breath comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic Array (PLA), which can execute computer-readable program instructions, to realize various aspects of the invention.

Referring herein to according to the method for the embodiment of the present invention, the flow chart of device (system) and computer program product and/ Or block diagram describes various aspects of the invention.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.

The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, module, program segment or a part of instruction include one or more for real The executable instruction of logic function as defined in existing.In some implementations as replacements, function marked in the box can also be with Occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be basically executed in parallel, it Can also execute in the opposite order sometimes, this depends on the function involved.It is also noted that block diagram and/or process The combination of each box in figure and the box in block diagram and or flow chart, can as defined in executing function or movement Dedicated hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.For this It is well known that, realized by hardware mode for the technical staff of field, software and hardware is realized and passed through by software mode In conjunction with mode realize it is all of equal value.

Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.The scope of the present invention is defined by the appended claims.

Claims

1. a kind of object detection method, wherein include:

Feature extraction is carried out to the target image comprising target to be detected, obtains the feature vector number for reflecting target signature to be detected According to；

Number of paths and network level quantity are set, cluster translation network is constructed in example segmentation framework, wherein cluster translation Number of paths in network is two or more, is carried out data transmission between corresponding path between different levels；

Described eigenvector data are inputted into cluster translation network, characteristic vector data are divided into different levels, by each The data of level correspond to each path and are divided into each feature group；

In cluster translation network, the feature group of each level on each path of parallel processing obtains multiple processing results；

Multiple processing results are merged, the characteristic image of the target image is obtained.

2. according to the method described in claim 1, wherein, the setting number of paths and network level quantity are divided in example Cluster translation network is constructed in frame, comprising:

According to number of paths, more than two paths are set；

According to network level quantity, being arranged includes an input layer on each path, one or more transition zones and an output Layer.

3. it is described in cluster translation network according to the method described in claim 2, wherein, on each path of parallel processing The feature group of each level, obtains multiple processing results, comprising:

Dimension-reduction treatment is carried out by characteristic vector data of the input layer on each path to input, the feature after obtaining dimensionality reduction Group；

By the transition zone in the feature group input respective path after the dimensionality reduction, by the transition zone to the spy after the dimensionality reduction Sign group carries out feature extraction, obtains intermediate features group；

The intermediate features group is inputted into the output layer in respective path, the intermediate features group is carried out by the output layer Dimension processing is risen, the feature group after rising dimension, the processing result as respective path are obtained.

4. according to the method described in claim 1, including region attention network RPN, institute in the example segmentation framework wherein Method is stated after the characteristic image for obtaining target image, further includes:

The characteristic image is inputted into the region attention network, by the region attention network, according to default fixation Window carries out the region that sliding identifies existing defects in the characteristic image, the presence after being identified on the characteristic image The region of defect generates the characteristic image for being identified with each initial area-of-interest as initial area-of-interest.

5. described to be slided on the characteristic image according to default fixed window according to the method described in claim 4, wherein The region of existing defects in the dynamic identification characteristic image, comprising:

The fixation window that size for window is N × N at least provides the filter of N × 1 and a 1 × N filter；Wherein, N For the size of the fixed window；

The filter of the N × 1 and the 1 × N filter are passed sequentially through, dimension-reduction treatment is carried out to the characteristic image；

The region of existing defects is marked in the characteristic image after the dimension-reduction treatment.

6. according to the method described in claim 1, further including full articulamentum used for positioning in the example segmentation framework wherein FC, the method also includes:

Obtain the matrix of image data handled in the full articulamentum；

According to the matrix of described image data, processing array is constructed；

The corresponding weight matrix of the processing array and bias matrix are extracted, the matrix of described image data is decomposed, is made Operation is carried out in the full articulamentum with the matrix after decomposition.

7. a kind of object detecting device characterized by comprising

Extraction module obtains reflecting that target to be detected is special for carrying out feature extraction to the target image comprising target to be detected The characteristic vector data of sign；

Network struction module constructs cluster translation in example segmentation framework for number of paths and network level quantity to be arranged Network, wherein the number of paths in cluster translation network is two or more, is carried out between corresponding path between different levels Data transmission；

Characteristic vector data is divided into difference for described eigenvector data to be inputted cluster translation network by input module The data of each level are corresponded to each path and are divided into each feature group by level；

Parallel processing module, in cluster translation network, the feature group of each level on each path of parallel processing to be obtained Obtain multiple processing results；

Fusion Module obtains the characteristic image of the target image for merging multiple processing results.

8. device according to claim 7, which is characterized in that include attention network in region in the example segmentation framework RPN, described device further include region extraction module, and the region extraction module is used for:

9. device according to claim 8, which is characterized in that the region extraction module is specifically used for:

The filter of the N × 1 and the 1 × N filter are passed sequentially through, dimension-reduction treatment is carried out to the characteristic image；And

10. device according to claim 7, which is characterized in that include used for positioning complete in the example segmentation framework Articulamentum FC, described device further include matrix disposal module, and the matrix disposal module is used for:

Obtain the matrix of image data handled in the full articulamentum；

According to the matrix of described image data, processing array is constructed；And