CN108805203A

CN108805203A - Image procossing and object recognition methods, device, equipment and storage medium again

Info

Publication number: CN108805203A
Application number: CN201810595450.8A
Authority: CN
Inventors: 刘皓
Original assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2018-11-13

Abstract

This application involves a kind of image processing method, device, computer equipment and storage mediums, obtain video image；Extract each source feature of object in video image；Each source feature is merged by Fusion Features network, obtains fusion feature；Each source feature inputs next layer of middle preset layer after the corresponding output superposition of middle preset layer in Fusion Features network；Characteristics of objects is determined according to fusion feature, and the characteristics of objects determined by fusion feature can be made to carry out object and be identified again with higher accuracy rate.The application also provides another image processing method, device, computer equipment and the storage medium for reaching the effect and a kind of with image processing method, object recognition methods, device, computer equipment and the storage medium again of device.

Description

Image procossing and object recognition methods, device, equipment and storage medium again

Technical field

This application involves field of computer technology, more particularly to a kind of image processing method, device, computer equipment and Storage medium and a kind of object recognition methods, device, computer equipment and storage medium again.

Background technology

The purpose that object identifies again is to judge whether the object that different cameras is shot in non-overlapping visual field is same Object, thus be also across the ken target retrieval, tracking basis, it is significant for intelligent vision monitoring.It is imaged due to existing The problems such as video resolution is low, the light between different cameras is different, shooting angle is different and object blocks of machine shooting, So that there may be larger differences under different cameras for homogeneous object, therefore object identifies have much challenge again.

Currently, recognition methods is all based on rgb video frame and is extracted to objects looks feature object again mostly, and to right As motion feature is extracted offline, then carry out simple feature series connection.However, simply by motion feature and barment tag into Row series connection can not the relevance of effectively expressing between the two, the accuracy rate identified again so as to cause object is relatively low.

Invention content

Based on this, it is necessary in view of the above technical problems, provide a kind of higher image processing method of accuracy rate, device, Computer equipment and storage medium and a kind of object recognition methods, device, computer equipment and storage medium again.

A kind of image processing method, the method includes：

Obtain video image；

Extract each source feature of object in the video image；

Each source feature is merged by Fusion Features network, obtains fusion feature；The Fusion Features network In each source feature after the corresponding output superposition of middle preset layer, input next layer of the middle preset layer；

Characteristics of objects is determined according to the fusion feature.

A kind of image processing apparatus, described device, including：

Image collection module, for obtaining video image；

Characteristic extracting module, each source feature for extracting object in the video image；

Fusion Features module merges each source feature for passing through Fusion Features network, obtains fusion feature； Each source feature inputs the middle preset after the corresponding output superposition of middle preset layer in the Fusion Features network Next layer of layer；

Characteristic determination module, for determining characteristics of objects according to the fusion feature.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing The step of device realizes above-mentioned image processing method when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of above-mentioned image processing method is realized when row.

Above-mentioned image procossing and object recognition methods, device, computer equipment and storage medium again, obtain video image； Extract each source feature of object in the video image；By each source feature input feature vector converged network to each source feature It is merged, obtains fusion feature；Each source feature is folded in the corresponding output of middle preset layer in the Fusion Features network In addition after, next layer of the middle preset layer is inputted；Characteristics of objects is determined according to the fusion feature.Due to leading to each source feature It is entered Fusion Features network and carries out Fusion Features, each source feature is corresponding in middle preset layer in this feature converged network After output superposition, next layer of the middle preset layer is inputted, therefore, to the process that each source feature is merged, is not Simply each source feature is connected, but next layer of the middle preset layer in Fusion Features network beginning is corresponding defeated Go out to start with correlation, be identified again with higher standard so that the characteristics of objects determined by fusion feature carries out object True rate.

A kind of image processing method, including：

Obtain video image；

Background is carried out to the video image to filter out, and is obtained background and is filtered out image；

The feature that the background filters out the appearance of objects in images is extracted, is obtained in the video image outside the foreground of object Table feature；

Characteristics of objects is determined according to the foreground barment tag.

A kind of image processing apparatus, described device, including：

Image collection module, for obtaining video image；

Background filters out module, is filtered out for carrying out background to the video image, obtains background and filter out image；

Foreground features extraction module, the feature for filtering out the appearance of objects in images for extracting the background obtain described The foreground barment tag of object in video image；

Characteristic determination module, for determining characteristics of objects according to the foreground barment tag.

Above-mentioned image processing method, device, computer equipment and storage medium obtain video image；To video image into Row background filters out, and obtains background and filters out image；Extraction background filters out the feature of the appearance of objects in images, obtains in video image The foreground barment tag of object；Characteristics of objects is determined according to foreground barment tag.It filters out, obtains due to carrying out background to video image Image is filtered out to background, the feature for then filtering out the appearance of objects in images by extracting background obtains object in video image Foreground barment tag；Therefore, which more can accurately express object relative to common barment tag Feature.It is identified again with higher accurate so that carrying out object by the characteristics of objects determined according to foreground barment tag Rate.

A kind of object recognition methods again, including：

Obtain the video data of each video capture device acquisition；

The characteristics of objects of each object in each video image in the video data is determined according to features described above building method；

According to each characteristics of objects, recognition result again is obtained.

A kind of object identification device again, including：

Video acquiring module, the video data for obtaining each video capture device acquisition；

Characteristics of objects determining module determines in the video data for above-mentioned object formation device in each video image The characteristics of objects of each object；

As a result determining module, for according to each characteristics of objects, obtaining recognition result again.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing The step of device realizes above-mentioned object identifying method when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of above-mentioned object identifying method is realized when row.

Above-mentioned object recognition methods, device, computer equipment and storage medium again, due to using above-mentioned object formation device The characteristics of objects for determining each object in each video image in video data can obtain more accurate recognition result again.

Description of the drawings

Fig. 1 is the applied environment figure of image procossing and object recognition methods again in one embodiment；

Fig. 2 is the flow diagram of image processing method in one embodiment；

Fig. 3 is in an embodiment, and view image, light stream image and background filter out the exemplary plot of artwork plate；

Fig. 4 is the flow diagram of image processing method in a specific embodiment；

Fig. 5 is the flow diagram of image processing method in another embodiment；

Fig. 6 is the flow diagram of object recognition methods again in one embodiment；

Fig. 7 is flow diagram of the object in recognition methods in a specific embodiment；

Fig. 8 is the structure diagram of image processing apparatus in one embodiment；

Fig. 9 is the structure diagram of image processing apparatus in another embodiment；

Figure 10 is the structure diagram of object identification device again in one embodiment；

Figure 11 is the internal structure chart of one embodiment Computer equipment.

Specific implementation mode

It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Image processing method provided by the present application can be applied to during object identifies again.When image processing method construction Characteristics of objects identified again applied to object in when, application environment is as shown in Figure 1.Wherein, terminal 102 passes through network and server 104 are communicated.Image processing method can run on server 104.102 sending object of client identifies request to service again Device 104, server 104 identify that request carries out object and identifies again according to the object received again.In the mistake that progress object identifies again Cheng Zhong determines characteristics of objects by image processing method, and recognition result again is determined further according to characteristics of objects.Server 104 can be with By this, recognition result is sent to terminal 102 by network again.Wherein, terminal 102 can be, but not limited to be various personal computers, Laptop, smart mobile phone, tablet computer and portable wearable device.Server 104, can with independent server or Person is the server cluster of multiple server compositions to realize.

In one embodiment, as shown in Fig. 2, providing a kind of image processing method, include the following steps：

S202 obtains video image.

Video image can be the frame image in the video data acquired by different video collecting device.The video image It include the object of characteristics of objects to be determined.The object can be object, plant, personage or animal etc..Object can be to calculate Machine, barcode scanning equipment, balloon etc. have the object of certain form.Plant can be the objects such as flower, grass, trees.Personage can be row People, the people of working condition, people by bus, the people of driving or the people etc. on the default vehicles.Animal can be various dynamic Object, such as cat, dog, pig, bird, fish.

S204 extracts each source feature of object in video image.

Source feature refers to the feature for the object that can be individually extracted from video image.Such as, can be the size spy of object Sign, distance feature, shape feature and color characteristic etc..When object is pedestrian, source feature can also be movement Optical-flow Feature, outer Table feature etc., wherein barment tag includes global barment tag and/or foreground barment tag.Moving Optical-flow Feature can refer to Pass through apparent motion (apparent motion) information for the brightness of image pattern that vector indicates.Moving Optical-flow Feature can be with table Up to the situation of change of objects in images, therefore, movement Optical-flow Feature can be used to determine the motion conditions of object.The movement light stream Feature can protrude the local motion feature of object.Barment tag can indicate that the appearance information of object, appearance information can wrap It includes stature ratio, clothes color, whether wear wearable product, wherein wearable product can be the dresses such as wrist-watch, packet, glasses, cap Ornaments.

S206 merges each source feature by Fusion Features network, obtains fusion feature.It is each in Fusion Features network Source feature inputs next layer of middle preset layer after the corresponding output superposition of middle preset layer.

Fusion Features network is a kind of neural network for Fusion Features, which can be convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN) and deep neural network (DNN).In terms of image being more suitable for due to convolutional neural networks, In a preferred embodiment, Fusion Features network is convolutional neural networks, in this way, better syncretizing effect can be obtained, from And make accuracy rate higher of the determining characteristics of objects when carrying out object and identifying again.

In the present embodiment, Fusion Features are that the corresponding output of middle preset layer by each source feature in neural network carries out Next layer of superposition and then input middle preset layer so that next layer of each source feature in the middle preset layer of neural network Start with correlation, then, the fusion feature exported by the neural network.In other words, each source feature is input to spy Converged network is levied, the corresponding output of each source feature is overlapped by this feature converged network in middle preset layer, and will be after superposition Result input middle preset layer next layer, then, pass through Fusion Features network export fusion feature.

Wherein, middle preset layer can be preset hidden layer, and hidden layer refers to the network between input layer and output layer Layer.The quantity of middle preset layer can be 1, or no less than 2 natural number.Middle preset layer can be arbitrary hides Layer.In a wherein preferred embodiment, middle preset layer is on the middle and senior level, and on the middle and senior level is the number of plies in hidden layer more than half, To so that each source feature has correlation in the corresponding output of next layer on the middle and senior level.Such as, the nerve with 7 layers of hidden layer The 5th layer in network or the 5th, 6 and 7 layer.

Each source feature can be in feature in the mode of the corresponding output superposition of middle preset layer in Fusion Features network It is overlapped on channel.The port number of feature is equal to the number of the convolution kernel of convolutional layer in feature convolutional neural networks.It is superimposed it Afterwards, the width of feature and height are constant, and the port number of feature increases N times, and N is the quantity of source feature.The width of the feature of feature, Height can be characterized width, the height of corresponding characteristic pattern.

In a wherein specific embodiment, Fusion Features network is convolutional neural networks, can be by each source feature at this 5th layer of corresponding output of convolutional neural networks is connected on a passage, final real to realize the superposition of the 5th layer of output The fusion of existing feature.In a wherein specific embodiment, the feature after fusion can be expressed as：F_fused=concat (S_{1_i}, S_{2_i}, S_{3_i}, i=5,6,7), F_fused∈R^W×H×C, F_fusedIndicate that fusion feature, i indicate middle preset layer, W, H, C are indicated respectively The width of feature, height and port number after fusion, R indicate real number.

S208 determines characteristics of objects according to fusion feature.

Can be using the fusion feature obtained after fusion as characteristics of objects, it can also be to the feature after the fusion into advancing one Step processing, obtains characteristics of objects.

Above-mentioned image processing method obtains video image；Extract each source feature of object in video image；By each source feature Input feature vector converged network merges each source feature, obtains fusion feature；Each source feature is in centre in Fusion Features network After presetting the corresponding output superposition of layer, next layer of middle preset layer is inputted；Characteristics of objects is determined according to fusion feature.Due to Lead to each source feature and be entered Fusion Features network progress Fusion Features, each source feature is in middle preset layer in this feature converged network After corresponding output superposition, next layer of middle preset layer is inputted, therefore, to the process that each source feature is merged, not It is simply each source feature to be connected, but next layer of beginning of the middle preset layer in Fusion Features network is corresponding Output starts with correlation, is identified again with higher so that the characteristics of objects determined by fusion feature carries out object Accuracy rate.

In a wherein embodiment, each source feature of object in video image is extracted, including：Extract object in video image Movement Optical-flow Feature；Extract the barment tag of object in video image.

In technical solution based on the present embodiment, source feature includes at least movement Optical-flow Feature and barment tag.It can be straight Connect the algorithm using movement light stream extraction, extraction movement Optical-flow Feature；Movement Optical-flow Feature can also be extracted by neural network. Similarly, it can directly use the algorithm of barment tag extraction to extract barment tag, appearance can also be extracted by neural network Feature.Barment tag can be global barment tag or foreground barment tag.Wherein, global barment tag refers to not carrying out background The barment tag for the object that the image filtered out directly extracts, i.e., the barment tag directly extracted from target image；Foreground appearance Feature refers to the barment tag of the object of extraction after being filtered out to image progress background.

Since source feature includes at least movement Optical-flow Feature and barment tag, it is particularly suitable for the application that pedestrian identifies again In, i.e., the object is pedestrian.At this point, can improve the characteristics of objects that is determined by fusion feature carry out object identify again it is accurate Rate.

It is to be appreciated that barment tag includes global barment tag in one of the embodiments,；It extracts in video image The barment tag of object, including：Extract the global barment tag of object in video image.In another embodiment, appearance is special Sign includes foreground barment tag；The barment tag of object in video image is extracted, including：Extract the foreground of object in video image Barment tag.It is filtered out since foreground barment tag has carried out background, the feature of the appearance of more accurate object can be obtained, from And the characteristics of objects determined by fusion feature can be further increased and carry out the accuracy rate that object identifies again.

In a wherein embodiment, the movement Optical-flow Feature of object in video image is extracted, including：Pass through the first convolution god The movement Optical-flow Feature of object in video image is extracted through network.The barment tag of object in video image is extracted, including：Pass through Second convolutional neural networks extract the barment tag of object in video image.

In the present embodiment, the first convolutional neural networks and the second convolution neural network are two sets of neural networks, and are two Convolutional neural networks are covered, compared to the extraction for carrying out multiple source features by same set of neural network, there is higher accuracy, And there is higher accuracy compared to non-convolutional neural networks, it is thus possible to which it is special to improve the object determined by fusion feature Sign carries out the accuracy rate that object identifies again.

It is to be appreciated that in other embodiments, the extraction of multiple source features, phase can also be carried out by same set of network Compared with not being the source feature extracted by neural network, there is higher accuracy.

In a wherein embodiment, barment tag includes foreground barment tag；The appearance for extracting object in video image is special Sign, including：Background is carried out to video image and filters out to obtain background to filter out image；Extraction background filters out the appearance of objects in images Feature obtains the foreground barment tag of object in video image.

In the present embodiment, each source feature includes movement Optical-flow Feature and foreground barment tag.By to video image into Row background, which filters out to obtain background, filters out image, then extracts the feature that background filters out the appearance of objects in images, obtains video image The foreground barment tag of middle object.In this way, adding the embodiment of global barment tag compared to movement Optical-flow Feature, due to foreground Barment tag has carried out background and has filtered out, and the feature of the appearance of more accurate object can be obtained, it is thus possible to further increase The characteristics of objects determined by fusion feature carries out the accuracy rate that object identifies again.

In a wherein embodiment, barment tag includes global barment tag；The appearance for extracting object in video image is special It levies, further includes：Extract the global barment tag of object in video image.Include movement Optical-flow Feature and foreground compared to source feature The embodiment of barment tag, source feature further includes global barment tag in the present embodiment, is passed through so as to further increase The characteristics of objects that fusion feature determines carries out the accuracy rate that object identifies again.

In a wherein embodiment, carry out background to video image and filter out to obtain background to filter out image, including：To video figure The posture of object is assessed as in, obtains object gesture；Determine that background filters out template according to object gesture；It is filtered out by background Template carries out masking operations to video image, obtains background and filters out image.

Network can be assessed by posture the posture of object is assessed to obtain object gesture, which assesses network can Think various neural networks.In a preferred embodiment, posture assessment network is storehouse hourglass network, is more suitable for pedestrian The assessment of posture has better performance and efficiency.

After obtaining object gesture, directly template can be filtered out using the object gesture as background.It can also be to object After posture carries out expansion process, template is filtered out using expansion results as background, can be to avoid filtering out excessive content so that into Background is obtained after line mask operation and filters out that image is more accurate, so as to further increase the object determined by fusion feature Feature carries out the accuracy rate that object identifies again.

Template is filtered out by background, masking operations are carried out to video image, be that background is filtered out into template as mask, by this Mask is multiplied with video image, is shielded to the region except object gesture on video image.Template is filtered out by background Masking operations are carried out to video image, background is obtained and filters out image, can be formulated as：I_masked=I ⊙ M, wherein I is Video image, M are that background filters out template, and ⊙ indicates masking operations, I_maskedIndicate the knot to video image progress masking operations Fruit, i.e. background filter out image.

In a wherein embodiment, the movement Optical-flow Feature of object in video image is extracted, including：Extract video image Light stream image；Optical-flow Feature extraction is carried out to light stream image, obtains the movement Optical-flow Feature of object in video image.

The light stream image of video image can be extracted by light stream image zooming-out algorithm, which can be with Retain difference for boundary and responds optical flow algorithm (Epic Flow algorithms)；The light of video image can also be extracted by light stream network Stream picture extracts the light stream image of video image in the light stream network by light stream image image algorithm, and by the number of extraction Pre-training is carried out according to as supervision message, after the completion of pre-training, obtains trained network model.The light stream network can be each Kind neural network e.g. includes a series of a kind of FlowNet (light stream network) of convolution sum warp laminations, can effectively force The generating process of dipped beam stream.

Optical-flow Feature extraction can be carried out to light stream image by a convolutional neural networks, obtain object in video image Movement Optical-flow Feature.To improve the accuracy of movement Optical-flow Feature.

Fig. 3 gives the corresponding light stream image of some video images and background filters out the example of template, the first row in Fig. 3 It is video image, the second row is light stream image, and the third line is that corresponding background filters out template, and each row correspond.

The step of characteristics of objects is determined according to fusion feature in one of the embodiments, including：Using default attention Weight sequence pair fusion feature carries out masking operations, obtains part and pays close attention to feature, presets attention weight sequence and fusion feature Corresponding object corresponds to；Part concern feature is determined as characteristics of objects.

Masking operations are carried out using default attention weight sequence pair fusion feature, refer to by default attention weight sequence It is multiplied with fusion feature as mask, local concern is carried out to region of interest area image in fusion feature.Attention weight sequence Refer to the weight sequence locally paid close attention to an object, is indicated to object by each weighted value in attention weight sequence Each degree locally paid close attention to, in attention weight sequence each weighted value and be preset value, preset value can be 1.At it In in a preferred embodiment, grasped into line mask by attention network using attention weight sequence pair fusion feature is preset Make, obtains part and pay close attention to feature.The attention network may include two layers of time convolutional layer and one layer of full articulamentum, in the longitudinal direction Local concern carried out to object, the length of attention weight sequence and fusion feature it is highly consistent.It is defeated by attention network The characteristics of objects gone out can be expressed as：F_final=F_fused⊙w_H, wherein F_fusedIndicate fusion feature, F_finalIndicate that object is special Sign, w_HIndicate attention weight sequence.The effect of the attention network is to carry out synthesis to the feature of object in higher level to carry It takes, for example the upper body of pedestrian wears red clothes, and arms swing amplitude is larger, then the weight in attention weight sequence Value is bigger in the corresponding position of characteristic pattern arm.

Scheme based on the present embodiment, due to adding attention mechanism, can further increase by characteristics of objects into The accuracy rate that row object identifies again.

In a wherein specific embodiment, as shown in figure 4, image processing method, including：Obtain video image；To regarding The posture of frequency objects in images is assessed, and object gesture is obtained；Determine that background filters out template according to object gesture；Pass through background It filters out template and masking operations is carried out to video image, obtain background and filter out image；Background is extracted by the first convolutional neural networks The feature for filtering out the appearance of objects in images obtains the foreground barment tag S1 of object in video image；Extract video image Light stream image；Optical-flow Feature extraction is carried out to light stream image by the second convolutional neural networks, obtains object in video image Move Optical-flow Feature S2；The global barment tag S3 of object in video image is extracted by third convolutional neural networks；Pass through spy Sign converged network merges S1, S2, S3, obtains fusion feature, each source feature is in middle preset layer in Fusion Features network After corresponding output superposition, next layer of middle preset layer is inputted；Using default attention weight sequence pair fusion feature into Line mask operates, and obtains part and pays close attention to feature, it is corresponding to preset attention weight sequence object corresponding with fusion feature；It will be local Concern feature is determined as characteristics of objects.

In one embodiment, as shown in figure 5, providing a kind of image processing method, include the following steps：

S502 obtains video image.

S504 carries out background to video image and filters out, obtains background and filter out image.

It refers to filtering out the image data except object in video image that background, which filters out, and obtained background filters out image It is the image after wiping out background.The mode filtered out may include deleting, corresponding data being set as 0.

S506, extraction background filter out the feature of the appearance of objects in images, obtain the foreground appearance of object in video image Feature.

The feature that background filters out the appearance of objects in images can be extracted by the extraction algorithm of barment tag, obtain video The foreground barment tag of objects in images.The spy that background filters out the appearance of objects in images can also be extracted by neural network Sign, obtains the foreground barment tag of object in video image.Neural network can be convolutional neural networks (CNN), cycle nerve Network (RNN) and deep neural network (DNN).In terms of being more suitable for image due to convolutional neural networks, preferably implement at one In example, Fusion Features network is convolutional neural networks, in this way, better syncretizing effect can be obtained, so that pair determined As accuracy rate higher of the feature when carrying out object and identifying again.

S508 determines characteristics of objects according to foreground barment tag.

The foreground barment tag can also further can be located by foreground barment tag directly as characteristics of objects Manage and then obtain characteristics of objects.It may include increasing other source features, part concern etc. that this, which is further processed,.

Above-mentioned image processing method obtains video image；Background is carried out to video image to filter out, and is obtained background and is filtered out figure Picture；Extraction background filters out the feature of the appearance of objects in images, obtains the foreground barment tag of object in video image；According to preceding Scape barment tag determines characteristics of objects.It is filtered out due to carrying out background to video image, obtains background and filter out image, then by carrying It takes background to filter out the feature of the appearance of objects in images, obtains the foreground barment tag of object in video image；Therefore, the foreground Barment tag more can accurately express the feature of object relative to common barment tag.So that by according to foreground The characteristics of objects that barment tag determines carries out object and is identified again with higher accuracy rate.

In a wherein embodiment, background is carried out to video image and is filtered out, the step of background filters out image is obtained, including： The posture of object in video image is assessed, object gesture is obtained；Determine that background filters out template according to object gesture；Pass through Background filters out template and carries out masking operations to video image, obtains background and filters out image.

Template is filtered out by background, masking operations are carried out to video image, obtain background and filter out image, can use formula table It is shown as：I_masked=I ⊙ M, wherein I is video image, and M is that background filters out template, and ⊙ indicates masking operations, I_maskedExpression pair Video image carry out masking operations as a result, i.e. background filters out image.

The step of posture of object in video image is assessed, obtains object gesture in one of the embodiments, Including：The posture of object in video image is assessed, object artis is obtained；Each object is connected using the line of predetermined width Artis obtains object gesture.

Object artis can be obtained by the algorithm that posture is assessed, network can also be assessed by posture and obtains object pass Node.Posture assessment network can be such as various neural networks, preferably implement at one for estimating pedestrian's posture In example, which can be storehouse hourglass network (SHN, Stacked Hourglass Networks), Ke Yiti The accuracy of high artis carries out the accuracy rate that object identifies again so as to further increase by characteristics of objects.

The step of background filters out template is determined according to object gesture in one of the embodiments, including：To object gesture Expansion process is carried out, background is obtained and filters out template.Expansion process refers to, based on each boundary pixel point of object gesture, Extend predetermined width to outside boundary, ensures that background filters out template and will not filter out the content of object gesture.The predetermined width can lead to Pixel value, ratio etc. are crossed to weigh.In this way, can be to avoid filtering out excessive content so that carry out masking operations after obtain background It filters out that image is more accurate, the accuracy rate that object identifies again is carried out by characteristics of objects so as to further increase.

In a wherein embodiment, the step of characteristics of objects is determined according to foreground barment tag, including：Extract video image The movement Optical-flow Feature of middle object；Characteristics of objects is determined according to foreground barment tag and movement Optical-flow Feature.

In a wherein embodiment, according to foreground barment tag and the step of Optical-flow Feature determines characteristics of objects is moved, packet It includes：Foreground barment tag and movement Optical-flow Feature are merged by Fusion Features network, obtain fusion feature；Fusion Features Each source feature inputs next layer of middle preset layer after the corresponding output superposition of middle preset layer in network；According to fusion Feature determines characteristics of objects.

Wherein, middle preset layer can be preset hidden layer, and hidden layer refers to the network between input layer and output layer Layer.The quantity of middle preset layer can be 1, or no less than 2 multilayer.Middle preset layer can be arbitrary hidden layer. In a wherein preferred embodiment, middle preset layer is on the middle and senior level, and on the middle and senior level is the number of plies in hidden layer more than half, from And so that each source feature has correlation in the corresponding output of next layer on the middle and senior level.Such as, the nerve net with 7 layers of hidden layer The 5th layer in network or the 5th, 6 and 7 layer.

In a wherein specific embodiment, Fusion Features network is convolutional neural networks, can be by each source feature at this 5th layer of corresponding output of convolutional neural networks is connected on a passage, final real to realize the superposition of the 5th layer of output The fusion of existing feature.In a wherein specific embodiment, the feature after fusion can be expressed as：F_fused=concat (S_{1_i}, S_{2_i}, S_{3_i}, i=5,6,7), F_fused∈R^W×H×C, i indicate middle preset layer, W, H, C indicates respectively merge after feature width, Height and port number, R indicate real number.

Since simply each source feature not being connected in this feature fusion process, but in Fusion Features network Next layer of middle preset layer start corresponding output and start with correlation so that passing through pair that fusion feature determines It is identified again with higher accuracy rate as feature carries out object.

In a wherein embodiment, the step of characteristics of objects is determined according to foreground barment tag, including：Extract video image The movement Optical-flow Feature of middle object；Extract the global barment tag of object in video image；According to foreground barment tag, movement light Stream feature and global barment tag determine characteristics of objects.

Wherein, the mode that characteristics of objects is determined according to foreground barment tag, movement Optical-flow Feature and global barment tag, can To be the series connection that foreground barment tag, movement Optical-flow Feature and global appearance spy connect as a result, true according to series connection result Determine characteristics of objects；It can also be that foreground barment tag, movement Optical-flow Feature and global appearance spy are subjected to Fusion Features, melted Feature is closed, characteristics of objects is being determined according to fusion feature.Scheme based on the present embodiment, source feature include foreground barment tag, Optical-flow Feature and global barment tag are moved, it is thus possible to improve carrying out the accuracy rate that object identifies again by characteristics of objects.

In a wherein specific embodiment, the movement light of object in video image is extracted by the first convolutional neural networks Feature is flowed, the global barment tag of object in video image is extracted by the second convolutional neural networks.In the present embodiment, first Convolutional neural networks and the second convolution neural network are two sets of neural networks, and are two sets of convolutional neural networks, compared to passing through Same set of neural network carries out the extraction of different characteristic, has higher accuracy, and have compared to non-convolutional neural networks Higher accuracy, it is thus possible to which improving the characteristics of objects determined by characteristics of objects carries out the accuracy rate that object identifies again.

Further, the appearance that extraction extraction background filters out objects in images can also be kept by third convolutional Neural Feature, obtain the foreground barment tag of object in video image.To further increase the object determined by characteristics of objects Feature carries out the accuracy rate that object identifies again.

In a wherein embodiment, the step of characteristics of objects is determined according to foreground barment tag, including：Using default attention Power weight sequence pair foreground barment tag carries out masking operations, obtains part and pays close attention to feature, presets attention weight sequence with before The corresponding object of scape barment tag corresponds to；Part concern feature is determined as characteristics of objects.

Attention weight sequence refers to the weight sequence locally paid close attention to an object, passes through attention weight sequence In each weighted value indicate to each degree locally paid close attention to of object, in attention weight sequence each weighted value and be default Value, preset value can be 1.In a wherein preferred embodiment, by attention network using default attention weight sequence Masking operations are carried out to foreground barment tag, part is obtained and pays close attention to feature.The attention network may include two layers of time convolution Layer and one layer of full articulamentum, carry out local concern to object in the longitudinal direction, and the length and foreground appearance of attention weight sequence are special That levies is highly consistent.The characteristics of objects exported by attention network can be expressed as：F_final=F_fused⊙w_H, wherein F_fused Indicate fusion feature, F_finalIndicate characteristics of objects, w_HIndicate attention weight sequence.The effect of the attention network is in higher Level carries out comprehensive extraction, such as the red clothes of upper body dress of pedestrian to the feature of object, and arms swing amplitude is larger, Weighted value so in attention weight sequence is bigger in the corresponding position of characteristic pattern arm.

In one embodiment, as shown in fig. 6, providing a kind of object recognition methods again, include the following steps：

S602 obtains the video data of each video capture device acquisition.

S604 determines the characteristics of objects of each object in each video image in video data according to above-mentioned image processing method.

S606 obtains recognition result again according to each characteristics of objects.

Above-mentioned object formation method can be embedded into entire object again in identification process by the present embodiment, to form one A recognition methods again of object end to end, the input of entire method are the video data of each video capture device acquisition, export and are Recognition result again.The object again recognition methods due to being determined in video data in each video image using above-mentioned object formation method The characteristics of objects of each object can obtain more accurate recognition result again.

In a wherein preferred embodiment, object is pedestrian.That is the object is identified as pedestrian and identifies again again.Pedestrian knows again Not Ye Cheng pedestrian identify (Person re-identification) again, be to judge image using computer vision technique or regard It whether there is the technology of specific pedestrian in frequency sequence.

As shown in fig. 7, in a wherein specific embodiment, realize that above-mentioned object is known again by a convolutional neural networks Other method, the convolutional neural networks obtain the video data of each video capture device acquisition first, and the convolutional neural networks are true After having determined characteristics of objects, characteristics of objects is integrated by a Recognition with Recurrent Neural Network RNN, then, to integrated results into Row time pond obtains sequence signature, is trained to the convolutional neural networks finally by Classification Loss and comparison loss.

Object is that pedestrian carries out effect pair when object identifies again with other pedestrians recognition methods again by the embodiment Than the results are shown in Table 1.As shown in Table 1, on the standard data set (iLIDS-VID) that pedestrian identifies again, the performance of identification compared with Recognition methods has higher discrimination to pedestrian of the mainstream based on deep learning again at present.Wherein, RX indicates that target object comes Before X position, X=1 is taken in table, for 5,10,20.

1 comparing result of table

Wherein, method 1 is the method based on Siamese Network (Siam's network) network；Method 2 is DVR (Discriminative Video Ranking, distinction video ranking method)；Method 3 is TDL (Top-push Distance Learning, first place push distance study method)；Method 4 is TAPR (Emporal Aligned Pooling Represnetation, time unifying pond representation)；Method 5 is AFDA (Daptive Fisher Discriminant Analysis, adaptive Fei Sheer techniquess of discriminant analysis)；Method 6 is RFA-Net (Recurrent Feature Aggregation Network, cycle specificity converging network).

It should be understood that although each step in the flow chart of Fig. 2,5,6 is shown successively according to the instruction of arrow, Be these steps it is not that the inevitable sequence indicated according to arrow executes successively.Unless expressly stating otherwise herein, these steps There is no stringent sequences to limit for rapid execution, these steps can execute in other order.Moreover, in Fig. 2,5,6 extremely Few a part of step may include that either these sub-steps of multiple stages or stage are not necessarily same to multiple sub-steps Moment executes completion, but can execute at different times, and the execution sequence in these sub-steps or stage is also not necessarily It carries out successively, but can either the sub-step of other steps or at least part in stage in turn or are handed over other steps Alternately execute.

In one embodiment, as shown in figure 8, providing a kind of image processing apparatus, including：

Image collection module 802, for obtaining video image；

Characteristic extracting module 804, each source feature for extracting object in the video image；

Fusion Features module 806 merges each source feature for passing through Fusion Features network, and it is special to obtain fusion Sign；Each source feature inputs the centre after the corresponding output superposition of middle preset layer in the Fusion Features network Next layer of default layer；

Characteristic determination module 808, for determining characteristics of objects according to the fusion feature.

Above-mentioned image processing apparatus obtains video image；Extract each source feature of object in video image；By each source feature Input feature vector converged network merges each source feature, obtains fusion feature；Each source feature is in centre in Fusion Features network After presetting the corresponding output superposition of layer, next layer of middle preset layer is inputted；Characteristics of objects is determined according to fusion feature.Due to Lead to each source feature and be entered Fusion Features network progress Fusion Features, each source feature is in middle preset layer in this feature converged network After corresponding output superposition, next layer of middle preset layer is inputted, therefore, to the process that each source feature is merged, not It is simply each source feature to be connected, but next layer of beginning of the middle preset layer in Fusion Features network is corresponding Output starts with correlation, is identified again with higher so that the characteristics of objects determined by fusion feature carries out object Accuracy rate.

In a wherein embodiment, the characteristic extracting module, including：

Optical-flow Feature extraction unit, the movement Optical-flow Feature for extracting object in the video image；

Barment tag extraction unit, the barment tag for extracting object in the video image.

In a wherein embodiment, the Optical-flow Feature extraction unit, for extracting institute by the first convolutional neural networks State the movement Optical-flow Feature of object in video image；

The barment tag extraction unit, for extracting object in the video image by the second convolutional neural networks Barment tag.

In a wherein embodiment, the barment tag includes the foreground barment tag；The characteristic extracting module, also Including：Background filters out unit；

The background filters out unit, filters out to obtain background for carrying out background to the video image and filters out image；

The barment tag extraction unit, the feature for filtering out the appearance of objects in images for extracting the background, obtains The foreground barment tag of object in the video image.

In a wherein embodiment, the characteristic extracting module further includes：Posture assessment unit and template determination unit；

The posture assessment unit is assessed for the posture to object in the video image, obtains object gesture；

The template determination unit, for determining that background filters out template according to the object gesture；

The background filters out unit, and masking operations are carried out to the video image for filtering out template by the background, It obtains background and filters out image.

In a wherein embodiment, the template determination unit is obtained for carrying out expansion process to the object gesture Background filters out template.

In a wherein embodiment, the barment tag includes the global barment tag；

The barment tag extraction unit, the global appearance for being used to or being additionally operable to extract object in the video image are special Sign.

In a wherein embodiment, the characteristic extracting module further includes that light stream image extraction unit and Optical-flow Feature carry Take unit；

The light stream image extraction unit, the light stream image for extracting the video image；

The Optical-flow Feature extraction unit obtains the video for carrying out Optical-flow Feature extraction to the light stream image The movement Optical-flow Feature of objects in images.

In a wherein embodiment, described device further includes part concern module；

The part concern mould, for carrying out masking operations using fusion feature described in default attention weight sequence pair, It obtains part and pays close attention to feature, default attention weight sequence object corresponding with the fusion feature is corresponding；

The characteristic determination module is determined as characteristics of objects for feature to be paid close attention in the part.

In one embodiment, as shown in figure 9, providing a kind of image processing apparatus, including：

Image collection module 902, for obtaining video image；

Background filters out module 904, is filtered out for carrying out background to the video image, obtains background and filter out image；

Foreground features extraction module 906, the feature for filtering out the appearance of objects in images for extracting the background, obtains institute State the foreground barment tag of object in video image；

Characteristic determination module 908, for determining characteristics of objects according to the foreground barment tag.

Above-mentioned image processing apparatus obtains video image；Background is carried out to video image to filter out, and is obtained background and is filtered out figure Picture；Extraction background filters out the feature of the appearance of objects in images, obtains the foreground barment tag of object in video image；According to preceding Scape barment tag determines characteristics of objects.It is filtered out due to carrying out background to video image, obtains background and filter out image, then by carrying It takes background to filter out the feature of the appearance of objects in images, obtains the foreground barment tag of object in video image；Therefore, the foreground Barment tag more can accurately express the feature of object relative to common barment tag.So that by according to foreground The characteristics of objects that barment tag determines carries out object and is identified again with higher accuracy rate.

In a wherein embodiment, described device further includes posture evaluation module and template determining module；

The posture evaluation module is assessed for the posture to object in the video image, obtains object gesture；

The template determining module, for determining that background filters out template according to the object gesture；

The background filters out module, and masking operations are carried out to the video image for filtering out template by the background, It obtains background and filters out image.

In a wherein embodiment, the template determining module is obtained for carrying out expansion process to the object gesture Background filters out template.

In a wherein embodiment, the posture evaluation module, including：

Joint determination unit is assessed for the posture to object in the video image, obtains object artis；

Posture determination unit obtains object gesture for connecting each object artis using the line of predetermined width.

In a wherein embodiment, described device further includes Optical-flow Feature extraction module；

The Optical-flow Feature extraction module, the movement Optical-flow Feature for extracting object in the video image；

The characteristic determination module, for determining object spy according to the foreground barment tag and the movement Optical-flow Feature Sign.

In a wherein embodiment, described device further includes Fusion Features module；

The Fusion Features module, for passing through Fusion Features network to the foreground barment tag and the movement light stream Feature is merged, and fusion feature is obtained；Each source feature is corresponding defeated in middle preset layer in the Fusion Features network Go out after superposition, inputs next layer of the middle preset layer；

The characteristic determination module, for determining characteristics of objects according to the fusion feature.

In a wherein embodiment, described device further includes Optical-flow Feature extraction module and Global characteristics extraction module；

The Global characteristics extraction module, the global barment tag for extracting object in the video image；

The characteristic determination module, for according to the foreground barment tag, the movement Optical-flow Feature and the overall situation Barment tag determines characteristics of objects.

The part concern module presets foreground barment tag described in attention weight sequence pair into line mask for using Operation obtains part and pays close attention to feature, and default attention weight sequence object corresponding with the foreground barment tag is corresponding；

In one embodiment, as shown in Figure 10, a kind of object identification device again is provided, including：

Video acquiring module 1002, the video data for obtaining each video capture device acquisition；

Characteristics of objects determining module 1004 determines each video figure in the video data for above-mentioned object formation device The characteristics of objects of each object as in；

As a result determining module 1006, for according to each characteristics of objects, obtaining recognition result again.

Identification device is each in each video image due to being determined in video data using above-mentioned object formation device again for the object The characteristics of objects of object can obtain more accurate recognition result again.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in figure 11.The computer equipment includes processor, memory and the network interface connected by system bus. Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory of the computer equipment includes non-easy The property lost storage medium, built-in storage.The non-volatile memory medium is stored with operating system and computer program.The built-in storage Operation for operating system and computer program in non-volatile memory medium provides environment.The network of the computer equipment connects Mouth with external terminal by network connection for being communicated.To realize that a kind of webpage is rung when the computer program is executed by processor Induction method.

It will be understood by those skilled in the art that structure shown in Figure 11, only with the relevant part of application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.

In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory The step of computer program, which realizes above-mentioned image processing method or object identifying method when executing computer program.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the step of above-mentioned image processing method or object identifying method when being executed by processor.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims

1. a kind of image processing method, the method includes：

Obtain video image；

Extract each source feature of object in the video image；

Each source feature is merged by Fusion Features network, obtains fusion feature；It is each in the Fusion Features network The source feature inputs next layer of the middle preset layer after the corresponding output superposition of middle preset layer；

Characteristics of objects is determined according to the fusion feature.

2. according to the method described in claim 1, it is characterized in that, each source of object is special in the extraction video image Sign, including：

Extract the movement Optical-flow Feature of object in the video image；

Extract the barment tag of object in the video image.

3. according to the method described in claim 2, it is characterized in that：

The movement Optical-flow Feature of object in the extraction video image, including：

The movement Optical-flow Feature of object in the video image is extracted by the first convolutional neural networks；

The barment tag of object in the extraction video image, including：

The barment tag of object in the video image is extracted by the second convolutional neural networks.

4. according to the method described in claim 2, it is characterized in that, the barment tag includes the foreground barment tag；Institute The barment tag for extracting object in the video image is stated, including：

Background is carried out to the video image and filters out to obtain background to filter out image；

The feature that the background filters out the appearance of objects in images is extracted, the foreground appearance for obtaining object in the video image is special Sign.

5. according to the method described in claim 4, it is characterized in that, described filtered out to video image progress background is carried on the back Scape filters out image, including：

The posture of object in the video image is assessed, object gesture is obtained；

Determine that background filters out template according to the object gesture；

Template is filtered out by the background, masking operations are carried out to the video image, obtain background and filter out image.

6. according to the method described in claim 2 or 4 any one, which is characterized in that the barment tag includes the overall situation Barment tag；The barment tag of object in the extraction video image, including or further include：

Extract the global barment tag of object in the video image.

7. according to the method described in claim 2, it is characterized in that, the movement light stream for extracting object in the video image Feature, including：

Extract the light stream image of the video image；

Optical-flow Feature extraction is carried out to the light stream image, obtains the movement Optical-flow Feature of object in the video image.

8. according to the method described in claim 1, it is characterized in that, the step for determining characteristics of objects according to the fusion feature Suddenly, including：

Masking operations are carried out using fusion feature described in default attention weight sequence pair, part is obtained and pays close attention to feature, it is described pre- If attention weight sequence object corresponding with the fusion feature is corresponding；

Feature is paid close attention into the part and is determined as characteristics of objects.

9. a kind of image processing method, including：

Obtain video image；

The feature that the background filters out the appearance of objects in images is extracted, the foreground appearance for obtaining object in the video image is special Sign；

10. according to the method described in claim 9, it is characterized in that, it is described to the video image carry out background filter out, obtain Background filters out the step of image, including：

11. according to the method described in claim 10, it is characterized in that, described determine that background filters out mould according to the object gesture The step of plate, including：

Expansion process is carried out to the object gesture, background is obtained and filters out template.

12. according to the method described in claim 10, it is characterized in that, the posture to object in the video image carries out The step of assessing, obtaining object gesture, including：

The posture of object in the video image is assessed, object artis is obtained；

Each object artis is connected using the line of predetermined width, obtains object gesture.

13. according to the method described in claim 9, it is characterized in that, described determine object spy according to the foreground barment tag The step of sign, including：

Extract the movement Optical-flow Feature of object in the video image；

Characteristics of objects is determined according to the foreground barment tag and the movement Optical-flow Feature.

14. according to the method for claim 13, which is characterized in that described according to the foreground barment tag and the movement Optical-flow Feature determines the step of characteristics of objects, including：

The foreground barment tag and the movement Optical-flow Feature are merged by Fusion Features network, it is special to obtain fusion Sign；Each source feature inputs the centre after the corresponding output superposition of middle preset layer in the Fusion Features network Next layer of default layer；

Characteristics of objects is determined according to the fusion feature.

15. according to the method described in claim 9, it is characterized in that, described determine object spy according to the foreground barment tag The step of sign, including：

Extract the movement Optical-flow Feature of object in the video image；

Extract the global barment tag of object in the video image；

Characteristics of objects is determined according to the foreground barment tag, the movement Optical-flow Feature and the global barment tag.

16. according to the method described in claim 9, it is characterized in that, described determine object spy according to the foreground barment tag The step of sign, including：

Masking operations are carried out using foreground barment tag described in default attention weight sequence pair, part is obtained and pays close attention to feature, institute It is corresponding to state default attention weight sequence object corresponding with the foreground barment tag；

17. a kind of recognition methods again of object, which is characterized in that including：

Obtain the video data of each video capture device acquisition；

Method according to claim 1-16 any one determines in the video data each object in each video image Characteristics of objects；

18. a kind of image processing apparatus, described device,

Image collection module, for obtaining video image；

Fusion Features module merges each source feature for passing through Fusion Features network, obtains fusion feature；It is described Each source feature inputs the middle preset layer after the corresponding output superposition of middle preset layer in Fusion Features network Next layer；

19. a kind of image processing apparatus, described device, including：

Image collection module, for obtaining video image；

Foreground features extraction module, the feature for filtering out the appearance of objects in images for extracting the background, obtains the video The foreground barment tag of objects in images；

20. a kind of object identification device again, which is characterized in that including：

Characteristics of objects determining module determines the video data for the device according to claim 18-19 any one In in each video image each object characteristics of objects；

21. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In when the processor executes the computer program the step of any one of realization claim 1 to 17 the method.

22. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claim 1 to 17 is realized when being executed by processor.