CN110580503A

CN110580503A - AI-based double-spectrum target automatic identification method

Info

Publication number: CN110580503A
Application number: CN201910780121.5A
Authority: CN
Inventors: 甘欣辉; 宋亮; 姚连喜; 万韬; 郭贺; 蒋晓峰; 刘鹏; 杨苏文; 程智林; 彭硕玲; 王祥; 杨宏昊
Original assignee: Jiangsu And Special Equipment Co Ltd
Current assignee: Jiangsu And Special Equipment Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-12-17

Abstract

the invention discloses an AI-based double-spectrum target automatic identification method, which comprises the following steps: step 1, constructing a target database; training a target database image classification model by a deep learning network; step 2, collecting an image to be identified through target collection equipment, wherein the image to be identified comprises: an image containing the object and an image not containing the object; step 3, CNN feature extraction is carried out on the image to be identified; step 4, extracting local CNN features with small dimension from the acquired image and converting the extracted local CNN features into binary codes; step 5, reducing the number of candidate target pictures by using binary coding; step 6, retrieving the CNN characteristics; step 7, generating new data by linear combination of the feature vectors, and calculating the probability of each category by the new data through a softmax function; and 8, selecting the value with the maximum probability as the category of the target.

Description

AI-based double-spectrum target automatic identification method

Technical Field

the invention belongs to the field of machine vision, and particularly relates to an AI-based double-spectrum target automatic identification method.

background

the eye mark is an emerging application direction in the field of computer vision, is one of research hotspots in the recent years, and develops rapidly. Early target recognition was mainly: object detection is performed using Scale-invariant feature transform (SIFT), and pedestrian detection is performed using Histogram of Oriented Gradients (HOG). The above methods are all to input the extracted features into the classifier for recognition, which is essentially a manually designed feature engineering, and for a certain kind of specific target to be recognized, people often do not know which kind of features are more useful for realizing the recognition task, which requires people to have professional knowledge in the field of the problem to be solved, and even if the knowledge in the professional field is used as a support, sometimes manual feature selection experiments cannot be avoided. The manual feature selection means that the data size of the recognition task cannot be too large, the models are specially designed for a certain task, and if another task needs to be completed, the feature engineering needs to be redesigned, so that the generalization performance of the models is poor. These have all inhibited further development of object recognition technology.

In recent years, deep learning has become the most popular topic in the field of computer vision. In the middle of the nineties of the twentieth century, the shortcomings of poor theoretical interpretability and overlong training time of a neural network are exposed due to the rise of statistical learning and a support vector machine, and the research of the neural network is in the low tide. However, deep learning is better at processing large-scale data than traditional Machine learning algorithms such as Support Vector Machine (SVM) and logistic regression. With the advent of the big data era, the performance of computers is rapidly refreshed, so that the training of more complicated network structures with more hidden layers becomes possible. The rapid growth of deep learning makes breakthrough in many fields such as speech recognition, image recognition, and natural language processing. The application of deep learning to target recognition successfully improves the defects that the prior target recognition needs manual feature extraction and wastes time and labor, and enables the model to have the capability of large-scale data, thereby improving the generalization performance of the model. Deep learning has made significant progress in the study of object recognition.

Disclosure of Invention

the target recognition problem is always a hot direction in the field of image processing, and the flow comprises four parts of image acquisition, preprocessing, feature extraction and recognition and classification, wherein the feature extraction is the most core part. In the field of traditional image recognition, features are basically extracted manually, time and labor are wasted, the effectiveness of the features cannot be guaranteed, and the problem is avoided by utilizing deep learning to extract the features.

the invention provides a double-spectrum target recognition method and a double-spectrum target recognition system based on AI, which are applied to visible light imaging and infrared imaging to automatically recognize targets in the visible light imaging and the infrared imaging, are different from the traditional target classification, detection and tracking, do not need to research a new algorithm according to different scenes and objects, and achieve the capability of accurate recognition after a computer system extracts, cognizes, perceives, learns and self-trains the objects in the real world.

In order to solve the technical problems, the invention adopts the following technical scheme.

The invention aims to provide a target identification method and a system based on deep learning, which can be used for automatically identifying an image target based on visible light/infrared imaging by training a classification model of a target database, additionally extracting a CNN characteristic with smaller dimension and converting the CNN characteristic into binary coding when using the CNN to extract the image characteristic, reducing the number of candidate target images through coding comparison, and searching the CNN characteristic in a reduced image range, thereby solving the problem that the existing CNN characteristic needs to obtain higher dimension for searching, so that the time required for searching is longer, and indirectly improving the identification speed.

Step 1, constructing a target database; training a target database image classification model by a deep learning network;

step 2, collecting an image to be identified through target collection equipment, wherein the image to be identified comprises: an image containing the object and an image not containing the object;

The target acquisition equipment is a visible light camera and an infrared camera;

step 3, performing CNN feature extraction on the image to be identified to obtain a CNN feature vector; CNN feature extraction enables automatic feature extraction. The CNN algorithm automatically extracts feature points, and a gradient back propagation algorithm is used: the parameters of the convolution kernel are initially initialized randomly, and then the values of the convolution kernel are adaptively adjusted through an optimization algorithm based on a gradient back propagation algorithm, thereby minimizing the error between the predicted value and the true value of the model. The convolution kernel parameters obtained in the way are not necessarily intuitive, but the features can be effectively extracted, so that the error between the model predicted value and the true value is minimized.

the input of the neuron is connected to the local receptor of the previous layer and the local features are extracted. After the feature is extracted, the positional relationship between the feature and the remaining features is determined. CNNs generally consist of convolutional layers, pooling layers, and fully-connected layers. Each layer is composed of a plurality of two-dimensional planes, each plane is composed of a plurality of independent neurons, and the result is finally output through a full-connection layer through a plurality of convolution layers and pooling layers.

Step 4, respectively extracting the local CNN features with small dimensions from the images in the target database and the images to be identified collected by the target collection equipment and converting the extracted local CNN features into binary codes;

The collected image firstly refers to the image in the database in the step 1, and local CNN characteristic vectors of the template image are obtained. And then referring to the newly acquired image in the step 2, and performing local CNN feature extraction on the newly acquired image.

The dimension is small: the local CNN feature extraction needs to be selected according to specific local features, and the feature quantity needed by different features is different. The partial image includes object colors, object composition, and images of the object at different angles. Since the present invention is a dual spectrum target automatic identification method based on AI, different targets cannot be identified by the color of the target, but the target automatic identification is achieved using a contour. The dimensions used in the present invention are 1 x 1 templates. Due to the fact that the template is too large, the calculation amount is large, real-time performance cannot be well met, and the effect is affected.

and (3) CNN feature extraction with small dimension: the CNN features with smaller dimension are connected with a convolution layer after ResNet (residual neural network), the size of the convolution kernel is 1 x 1, 256 channels are used for compressing the features to 256 dimensions, a dichotomy is adopted for each dimension, namely, each dimension of the 256 dimensions ranges from 0 to 1, and if the dimension is larger than 0.5, 1 is selected; if the dimension is less than or equal to 0.5, 0 is selected. The binary values of the 256-dimensional feature vectors of the local region are finally obtained.

step 5, reducing the number of images to be identified by using binary coding;

binary coding has the following advantages: the technology is simple to realize, and the computer is composed of a logic circuit, and the logic circuit generally has only two states, namely on and off of a switch, which can be just represented by '1' and '0'. Meanwhile, binary representation data has the advantages of strong anti-interference capability, high reliability and the like. Because each bit of data has only two states, high and low, it can still be reliably distinguished whether it is high or low when disturbed to some extent.

And solving a Hamming distance between a binary value of an image in the target database and a binary value of an image to be recognized acquired by the target acquisition equipment, wherein if the Hamming distance is less than a preset threshold value 51 (more than 80 percent of characteristic point repetition rate), the target image to be recognized and the template image are possibly of the same type. Otherwise, the image to be recognized is not the image of the known category, and if the Hamming distance value is larger than 51, the image to be recognized is deleted, so that the number of the images to be recognized is effectively reduced.

Step 6, retrieving the image to be identified to obtain a CNN characteristic vector of the image area containing the target;

The purpose of retrieval is as follows: retrieval measures the degree of similarity between two images by extracting the underlying features of the images and computationally comparing the distances between these features and the search objects.

How to retrieve: the invention searches the CNN characteristics mainly by extracting the pre-trained CNN model and the fine-tuned CNN model. The pre-trained CNN model enables coarse extraction of feature vectors. Because the image set of the target task and the pre-training image set have great difference in both the category number and the image style, in the retrieval task of the target image set to be identified, it is often difficult to achieve the optimal performance by directly using the pre-training CNN model to extract the visual features of the images. Fine adjustment is carried out on parameters of the pre-trained CNN model by using the image of the target image set, so that a good effect can be obtained.

And (3) retrieval conditions: the target image area to be identified is larger than the sub-block area image in the training process.

Step 7, generating new data by linearly combining the CNN characteristic vectors retrieved in the step 6, and calculating the probability of each category by the new data through a softmax function;

And 8, selecting the value with the maximum probability as the category of the target.

the step 1 comprises the following steps: and downloading target images from a website, wherein the images comprise images of targets with different angles, different colors and sizes and different angles, and the downloaded target images form a target database.

Step 4 comprises the following steps: respectively executing the following operations on the images in the target database and the images to be identified acquired by the target acquisition equipment: aiming at different target areas, selecting a proper local area on an image to obtain a local image, wherein the local image comprises a target color, a target composition and different imaging angles of the target, and performing CNN feature extraction on the local image, wherein the CNN feature with smaller dimension is connected with a convolution layer after ResNet (residual neural network), the size of the convolution kernel is 1 x 1, 256 channels compress the feature to 256 dimensions, a bisection method is adopted for each dimension, namely, each dimension of the 256 dimensions is 0-1, and if the dimension is greater than 0.5, 1 is selected; and if the dimension is less than or equal to 0.5, taking 0, and finally obtaining the binary value of the 256-dimensional feature vector of the local area, thereby obtaining the binary value of the image in the target database and the binary value of the image to be identified acquired by the target acquisition equipment.

The step 5 comprises the following steps: and (3) solving the Hamming distance between the binary value of the image in the target database and the binary value of the image to be recognized acquired by the target acquisition equipment, if the Hamming distance is less than a preset threshold value 51, indicating that the image to be recognized and the image in the target database are of one type, and reserving the image to be recognized for accurate recognition of later-stage images.

The step 6 comprises the following steps: and (3) retrieving an image area containing the target in the image to be identified in the step (2), wherein the retrieved image area is required to be larger than an area in the image classification model of the target database trained by the deep learning network in the step (1), and finally, a CNN (convolutional neural network) feature vector of the image area containing the target is obtained.

The step 7 comprises the following steps: and (4) combining 256 preset constants into a group, multiplying the group by the CNN eigenvector obtained in the step (6), and adding the items to obtain new data.

In step 7, the softmax function is:

Wherein p (i) represents the probability that the input feature is the ith feature,A value representing the ith characteristic of the input,Represents the value of the input kth feature, k representing the dimension of the input feature.

the invention also provides a target recognition system based on deep learning. The method comprises the following steps: imaging system, preprocessing system, information processing system and display system:

An imaging system: used for imaging the object and storing the object as a picture or a video;

A pretreatment system: the image is mainly preprocessed, such as noise reduction, contrast enhancement and brightness enhancement.

an information processing system: and the database classification module is used for correspondingly processing the input images, classifying the processed images through the database classification model and outputting a classification result.

A display system: for presenting the classification results to the user.

Has the advantages that:

(1) the AI-based dual-spectrum target automatic identification can solve the problem of long retrieval time.

(2) the invention can respectively carry out target recognition on the visible light images and can also carry out target recognition on the infrared images.

(3) The method is different from the traditional target classification, detection and tracking, does not need to research a new algorithm according to different scenes and objects, and achieves the capability of accurate identification after a computer system extracts, cognizes, perceives, learns and trains objects in the real world.

drawings

The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a diagram of steps of an AI-based dual-spectrum target automatic identification method of the present invention

Fig. 2 is a schematic structural diagram of an AI-based dual-spectrum target recognition system according to the present invention.

FIG. 3 is a block diagram of an object imaging and recognition system of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

as shown in fig. 1, an AI-based dual-spectrum target automatic identification method of the present invention includes the following steps:

1. constructing a target database, and downloading 10000 target images to be identified from corresponding websites, wherein the images comprise images of targets with different angles, different colors and sizes and different angles;

2. Training a target database image classification model by a deep learning network;

3. The target acquisition equipment acquires an image containing/not containing 1000 targets to be identified;

4. performing CNN feature extraction on an image to be identified, wherein the feature value is a 512-dimensional vector;

5. extracting local CNN features with small dimensions from the acquired images and converting the local CNN features into binary codes convenient for comparison, wherein the local CNN feature vectors are 256-dimensional, and the local binary codes of 1000 acquired images are obtained according to a preset conversion rule;

6. the CNN features and binary codes of the local images are extracted, the number of target pictures is reduced by using the codes, the Hamming distance between the binary value of the target image to be identified and the binary value of the template image is calculated, and if the Hamming distance is smaller than a preset threshold 51 (more than 80 percent of feature point repetition rate), the target image and the template image are possibly classified. Otherwise, the target image is not the known class image, and the target image is deleted. This effectively reduces the number of target images. Finally, 600 identification images are obtained.

7. The CNN features are retrieved within a reduced scope.

8. And generating new data by linear combination on the finally obtained characteristic vectors of the 600 identification images, and calculating the probability of each category through the softmax function by the new data.

The Softmax function can be expressed as:

9. And selecting the value with the maximum probability as the category of the target.

10. And the feature extraction uses a ResNet network structure to obtain a multidimensional feature vector of the picture feature.

The partial image adopted by the CNN feature extraction comprises target color, target composition and imaging of the target with different angles, wherein the CNN feature with smaller dimension is formed by connecting a convolution layer after ResNet, the convolution kernel size is 1 x 1, and 256 channels compress the feature to 256 dimensions. Adopting a dichotomy for each dimension, namely, each dimension of 256 dimensions ranges from 0 to 1, and if the dimension is greater than 0.5, taking 1; if the dimension is less than or equal to 0.5, 0 is selected.

As shown in fig. 2 and 3, the present invention also provides an AI-based target recognition system, including: imaging system, preprocessing system, information processing system and display system:

(1) The system mainly comprises a CNN feature extraction module, a local feature extraction module, a binary conversion module, a target database, a model training module and a comparison and retrieval module.

(2) The CNN feature extraction module is used for performing CNN feature extraction on the image to be retrieved; the local feature extraction module is used for extracting features of local images needing to be retrieved; the binary conversion module is used for carrying out binary conversion on the local image with smaller dimension; the target database is used for providing a large number of target picture samples for the training model; the model training module is used for training a target database picture classification model through a deep neural network; the comparison module is used for comparing the binary code with a smaller dimension with the training model; the retrieval module is used for retrieving the characteristic picture to be identified from the rest target pictures after comparison.

A display system: for presenting the classification results to the user.

The present invention provides a dual-spectrum target automatic identification method based on AI, and a plurality of methods and approaches for implementing the technical solution, and the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. An AI-based dual-spectrum target automatic identification method is characterized by comprising the following steps:

step 3, performing CNN feature extraction on the image to be identified to obtain a CNN feature vector;

step 5, reducing the number of images to be identified by using binary coding;

2. the method of claim 1, wherein step 1 comprises: and downloading target images from a website, wherein the images comprise images of targets with different angles, different colors and sizes and different angles, and the downloaded target images form a target database.

3. The method of claim 2, wherein step 4 comprises: respectively executing the following operations on the images in the target database and the images to be identified acquired by the target acquisition equipment: selecting a proper local area on an image to obtain a local image aiming at different target areas, wherein the local image comprises a target color, a target composition and different imaging angles of the target, and performing CNN feature extraction on the local image, the CNN feature with smaller dimension is connected with a convolution layer after ResNet, the size of the convolution kernel is 1 x 1, 256 channels are used for compressing the feature to 256 dimensions, a dichotomy is adopted for each dimension, namely, each dimension of the 256 dimensions ranges from 0 to 1, and if the dimension is greater than 0.5, 1 is selected; and if the dimension is less than or equal to 0.5, taking 0, and finally obtaining the binary value of the 256-dimensional feature vector of the local area, thereby obtaining the binary value of the image in the target database and the binary value of the image to be identified acquired by the target acquisition equipment.

4. the method of claim 3, wherein step 5 comprises: and (3) solving the Hamming distance between the binary value of the image in the target database and the binary value of the image to be recognized acquired by the target acquisition equipment, if the Hamming distance is less than a preset threshold value 51, indicating that the image to be recognized and the image in the target database are of one type, and reserving the image to be recognized.

5. the method of claim 4, wherein step 6 comprises: and (3) retrieving an image area containing the target in the image to be identified in the step (2), wherein the retrieved image area is required to be larger than an area in the image classification model of the target database trained by the deep learning network in the step (1), and finally, a CNN (convolutional neural network) feature vector of the image area containing the target is obtained.

6. The method of claim 5, wherein step 7 comprises: and (3) combining 256 preset constants into a group, multiplying the group by the CNN characteristic vector obtained in the step (6), adding each product result to obtain new data, and calculating the probability of each category through the softmax function by the new data.

7. The method according to claim 6, characterized in that in step 7, the softmax function is: