CN111144392A

CN111144392A - Neural network-based extremely-low-power-consumption optical target detection method and device

Info

Publication number: CN111144392A
Application number: CN201911148056.0A
Authority: CN
Inventors: 罗子江; 马原东; 徐斌; 王继红; 杨晨; 郭祥; 杨秀璋
Original assignee: Guizhou University of Finance and Economics
Current assignee: Guizhou University of Finance and Economics
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-05-12

Abstract

The invention relates to the field of target detection, in particular to an extremely low power consumption optical target detection method based on a neural network, which comprises the following steps: constructing a convolutional neural network, and extracting the characteristics of a target image; normalizing the extracted target image characteristics to obtain a target sample; searching a convolution kernel weight value, selecting a convolution neural network frame with the highest average recognition precision through training, building a learnable optical prism device and applying the learnable optical prism device to a standard target detection database, learning to obtain a convolution kernel weight value of each layer, searching a threshold value of an excitation layer of the convolution neural network frame, and selecting and forming an excitation function with the highest target position precision through training and testing; and determining the dispersion coefficient of the optical prism and the threshold value of the filter membrane, customizing a scattering mirror, forming a target light spot through the scattering mirror, and finally imaging through a camera. The target frame is calibrated by adopting a plurality of layers of optical prisms of the convolutional neural network, so that the method has high-efficiency real-time performance and accuracy and does not consume any battery energy consumption.

Description

Neural network-based extremely-low-power-consumption optical target detection method and device

Technical Field

The invention relates to the technical field of target detection, in particular to an optical target detection method and device with extremely low power consumption based on a neural network.

Background

Object detection is a relatively popular technology, and the method acquires specified image information in a specific way and marks the position of a positioned object on a picture. The target detection system takes a target identification technology as a core, and is a high-precision technology for the current international scientific and technological field. The great breakthrough of the neural network in the field of image recognition promotes the rapid development of target detection as image application, provides higher stability for target detection, and promotes the wide application of target detection in more and more fields of entertainment, safety and the like. Meanwhile, higher requirements are provided for target extraction, higher real-time performance and high efficiency are required for target detection, and power consumption can be reduced to the minimum. Therefore, the invention provides a method and a device for detecting an optical target with extremely low power consumption based on a neural network.

Disclosure of Invention

The invention provides a method and a device for detecting an optical target with extremely low power consumption based on a neural network, and aims to solve the problem that a method for comprehensively, automatically, accurately and extremely low power consumption to extract and identify target image features is lacked in the prior art.

The technical scheme adopted by the invention is as follows:

a method for detecting an optical target with extremely low power consumption based on a neural network comprises the following steps:

a. constructing a convolutional neural network, and sending each image in the target database into the constructed convolutional neural network to extract the characteristics of the target image;

b. performing normalization processing on the extracted target image characteristics, performing affine projection on the target image characteristics to obtain a projection matrix, and training the projection matrix through a loss function to obtain a target sample of each image;

c. searching a convolution kernel weight value, and selecting a convolution neural network framework with the highest average recognition accuracy through training, wherein the structure of the convolution neural network framework comprises: light input layer-convolution layer-excitation function-total connection layer-light output layer;

d. establishing a learnable optical prism device, applying the learnable optical prism device to a standard target detection database, converting the characteristics of a target sample to be detected into light with target information through the optical prism device, injecting the light into a convolutional neural network frame for training, and learning to obtain the weight of each layer of convolutional kernel; searching a threshold value of an excitation layer of a convolutional neural network framework, and selecting an excitation function with the highest target position precision through training and testing;

e. setting dispersion coefficients of optical prisms in an optical prism frame and threshold values of filter films according to convolution kernel values and excitation functions of all layers, customizing a scattering mirror according to the dispersion coefficients of the optical prisms of all layers, respectively carrying out dispersion and filtration through the optical prisms and the filter films, forming target light spots through the scattering mirror, and finally imaging through a camera.

D, after obtaining the convolution kernel weight of each layer, verifying the target detection precision, and if the target detection precision is greater than or equal to a set threshold, storing the model parameters trained in the step d; if the value is less than the set threshold value, the step c is executed again.

Obtaining an accurate convolution kernel weight value by a mode of combining forward propagation and backward propagation:

forward propagation: the target image enters the convolutional neural network frame from the light input layer, is weighted and operated with the weight of the corresponding convolutional kernel, the bias term value is 0, and then passes through an excitation layer, and the finally obtained result is the output result of the layer; if the actual output of the output layer is the same as the expected output, finishing learning and storing the convolution kernel weight; if the output result obtained by the output layer is different from the expected output, switching to an error back propagation process;

and (3) back propagation: calculating the difference between the actual output and the expected value according to the back transmission of the original convolution layer channel, reversely transmitting the difference to the light input layer through the convolution layer, distributing the error to each unit of each layer in the back transmission process to obtain the error signal of each single layer, and using the error signal as the basis for correcting the weight of each unit; after the convolution kernel weight values and the threshold values of each layer are continuously adjusted, the error signals are reduced to the minimum.

And when the high-dimensional input is mapped in the reverse propagation process, the convolution kernel is turned by 180 degrees according to a deconvolution principle, the target information is gradually extracted, the angle of the optical prism is changed according to the turned convolution kernel weight by 180 degrees, and the dispersion coefficient of each layer of prism is obtained.

After light with target information enters the optical prism, performing light dispersion operation with different weights on each layer, and acquiring corresponding target information light by each layer according to different dispersion coefficients; the filter film screens the light scattered by the optical prism and filters out light outside a threshold value; and the scattering mirror forms a target light spot at the target position according to the dispersed light containing the target information.

After the target light spot is formed, the light spot is mapped back to the original picture through a special scattering mirror according to the dispersion coefficient, the target position is calibrated, and a target frame is formed.

The target database can adopt a CASIA-Webface database or an original face picture which is acquired through a development interface of an electronic terminal and acquired by an image acquisition unit of electronic terminal equipment.

An optical target detection device with extremely low power consumption based on a neural network comprises an optical prism device capable of being installed at the position of a front lens of electronic equipment, wherein the optical prism device comprises a multilayer optical prism, a multilayer filter film and a scattering mirror with a specific scattering rate; the prisms are overlapped layer by layer; the number of the filter films is determined according to the excitation function and is embedded behind the optical prism; and the scattering mirror is embedded behind the last optical prism layer.

The optical prism is made of rigid optical glass or alkali metal halide crystal material capable of changing dispersion coefficient, the filter film is made of glass crystal or silicon carbide material by using ion amplitude dielectric film process technology, and the scattering mirror is made of coated silicon wafer.

And a spectroscope is connected behind the optical prism device.

Compared with the prior art, the invention has the beneficial effects that: the method carries out target detection through operations such as light dispersion, filtering and the like, directly takes target light spots as output, carries out autonomous learning through training sample data, has the characteristic of sharing the weight of each layer of pixels, reduces the complexity of a frame, enhances the robustness of the light spots through the operation of a filter membrane, and can better receive the target light rays under different light rays. The invention adopts a plurality of layers of optical prism structures similar to a convolutional neural network to calibrate the target frame, has few prism structures, is convenient to apply on a small data set, has high-efficiency real-time and accuracy, does not consume any battery energy consumption, and can greatly improve the shooting time and service life of cameras and other electronic equipment.

Drawings

FIG. 1 is a schematic diagram of an optical prism apparatus according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network based embodiment of the present invention;

FIG. 3 is a block diagram of an optical face detection system based on convolution process according to an embodiment of the present invention;

fig. 4 is a flowchart of a method for detecting a face frame of a low-power-consumption optical camera according to an embodiment of the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings.

It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1-4, a method for detecting an optical target with extremely low power consumption based on a neural network includes the following steps:

e. the dispersion coefficient of an optical prism in an optical prism frame and the threshold value of a filter film are set according to the convolution kernel weight values and the excitation function of each layer, a scattering mirror is customized according to the dispersion coefficient of each layer of optical prism, dispersion and filtration are respectively carried out through the optical prism and the filter film, a target light spot is formed through the scattering mirror, therefore, the target light spot is formed before light reaches a camera screen, and finally, imaging is carried out through a camera, so that the high efficiency and the real-time performance of the camera are guaranteed.

The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a depth structure, has the characteristic learning capacity, can perform translation classification on input information according to a hierarchical structure through weight change, and judges neuron output through an excitation function threshold. The research shows that: the precision of target detection can be extracted through a trained convolutional neural network, researches show that in the process of transmitting light, a prism has the effect similar to a convolutional layer, the target light spot can be extracted through operations such as dispersion and filtration of an optical prism, and no energy except the light is consumed. Based on the method, the target detection is carried out through operations such as light dispersion, filtering and the like, the target light spots are directly used as output, autonomous learning is carried out through training sample data, the method has the characteristic of sharing the weight of each layer of pixels, the frame complexity is reduced, the robustness of the light spots is enhanced through the operation of the filter film, the target light rays under different light rays can be better received, the method does not consume any power consumption except light ray information, and meanwhile, the sensitivity of the camera for imaging needs to be improved. The invention adopts a plurality of layers of optical prism structures similar to a convolutional neural network to calibrate the target frame, has few prism structures, is convenient to apply on a small data set, has high-efficiency real-time and accuracy, does not consume any battery energy consumption, and can greatly improve the shooting time and service life of cameras and other electronic equipment.

The target database comprises a large number of original target images, and the original target images can be directly obtained by a CASIA-Webface database or original face pictures which are obtained through a development interface of an electronic terminal and are collected by an image collecting unit of electronic terminal equipment.

Taking the example of adopting the CASIA-Webface face database to detect the face target, the process comprises the following steps:

the method comprises the following steps: dividing face images in a CASIA-Webface database into three types, namely training samples, test samples and verification samples, sending a training set in the face database into a constructed convolutional neural network to extract characteristics, and reading data to be trained;

step two: carrying out normalization processing on the face image with the training data read in the first step;

the normalization processing is carried out on the incident face image light, the reason is that the neural network is trained and predicted respectively by statistics of samples in events, and the face position is obtained by statistics and training of images in a training set. Normalization is a statistical probability distribution between 0 and 1, and when the input signals of all samples are positive values, the weights corresponding to the convolution kernels can only be increased or decreased at the same time, which can result in a slow learning speed. To avoid this, the neural network learning speed is increased, and the input signal can be normalized so that the input signal mean of all samples is close to 0 or smaller than its mean square error, thus increasing the normalization step.

Step three: constructing a convolutional neural network frame structure, wherein the structure is as follows: light input layer-convolution layer-excitation function-total connection layer-light output layer;

step four: putting the training samples subjected to normalization processing in the step two into the frame structure constructed in the step three for training, wherein the training process comprises forward propagation and backward propagation; putting the normalized verification sample in the step two into the frame structure constructed in the step three for verification, if the precision of the test result of the verification sample is greater than or equal to the set threshold value, executing the step five, and if the test result of the verification sample is less than the set threshold value, re-executing the step three (namely, re-selecting the convolutional neural network frame structure);

step five: storing the model parameters trained in the step four, and detecting the test sample subjected to the normalization processing in the step two by using the trained model parameters to obtain a model detection result;

step six: determining the prism frame structure according to the convolutional neural network frame structure and the convolutional kernel weight, wherein the structure is as follows: a light input layer, an optical prism layer, a scattering mirror layer and a light output layer; the optical prism layer includes (optical prism-filter).

The foregoing embodiment is further described below with reference to a specific calculation formula, taking face detection based on a convolutional neural network as an example:

(1) selecting a database of face images and pre-processing the face images

The method comprises the steps of selecting a CASIA-Webface face database of plum leaves, wherein the number of face pictures exceeds 50 thousands, 10575 faces are shared, the number of face images in each category is different from dozens to hundreds, and the database is very suitable for training a face detection network. The image preprocessing comprises the steps of firstly, carrying out face feature point detection on original pictures with different sizes in a database, detecting the positions of face images in a classification mode, and adjusting convolution kernel values according to the positions of the face images.

(2) Construction of optical prism device

The optical prism device comprises at least 9 optical prism layers, each optical prism layer being followed by 1 filter film. After the light rays enter the input layer, the face information is extracted and corrected through the optical prism layer, and finally the face information is transmitted into the output layer through the scattering mirror. The following details the building principle of the prism device:

optical prism layer operation: the formula is expressed as follows:

wherein the content of the first and second substances,

point of j light ray, M, representing t layer_jRepresenting the set of input rays selected by the user,

indicating the corresponding offset value of the jth ray (in the present invention the offset value takes 0),

representing the weight value of the jth beam of light of the t layer between the ith prism pixel points of the t +1 layer, wherein the prime represents similar convolution operation; the learnable optical prism device is arranged according to the light input layer, the optical prism layer, the scattering mirror layer and the light output layer, similar to the characteristics of gradient unsaturation of the ReLu function and high calculation speed, and selects the characteristics similar to the activation function ReLu in the selection of the filter film, as shown in fig. 2, the current output is represented as:

x^e＝f(u^e)

u^e＝W^ex^e-1+b^e

wherein x^eRepresents the output of the current layer, u^eRepresenting the input to the filter (the result of the weighting calculated at the current layer), f () representing the filter activation function, W^eIs the weight of the current layer, b^eIs biasable (alternatively 0 in the filter).

Other steps and parameters are the same as those in the previous embodiment.

Then, taking the original face image database as an example for face target detection, the process includes the following steps:

(1) each image in the face database is sent to the three constructed convolutional neural networks to extract features;

(2) respectively normalizing the output features, performing affine projection on the output features to obtain a projection matrix, and training the projection matrix through a loss function to obtain the face position of each image;

(3) searching for a convolution kernel weight value, and selecting a convolution neural network frame with the highest average recognition accuracy through training;

(4) applying the selected learnable optical prism frame to a standard human face detection database, injecting characteristic light of a human face image to be detected into a trained frame, searching for a threshold value of an excitation layer, and selecting and forming an excitation function with the highest human face position precision through a training test;

(5) setting the dispersion coefficient of the prism and the threshold value of the filter membrane according to the convolution kernel weight values and the excitation function of each layer; and customizing the scattering mirror according to the dispersion coefficient of each layer of prism device to form a face light spot.

Wherein, the convolutional neural network framework constructed by the three in the step (1) specifically comprises: a convolutional neural network framework A, a convolutional neural network framework B and a convolutional neural network framework C; the convolutional neural network framework a comprises 9 modules (i.e. contains 9 prism layers): each module passes through the optical prism layer firstly and then is applied with a corrective light filtering film; the product neural network framework B has one prism layer more than the convolution neural network framework A; the convolutional neural network framework C has two more prism layers than the convolutional neural network framework a.

The predicted recognition accuracy of the three convolutional neural network frameworks is given in the following table:

and the convolutional neural network framework with the highest average recognition accuracy is selected after the training test in the step (3), and the method specifically comprises the following steps: and removing the training set of the three convolutional neural network frames, and detecting the recognition accuracy of the three frames through the test set.

In order to obtain an accurate convolution kernel value, the method is carried out by combining forward propagation and backward propagation, wherein the forward propagation comprises the following steps: the target image enters the convolutional neural network frame from the light input layer, is weighted and operated with the weight of the corresponding convolutional kernel, the bias term value is 0, and then passes through an excitation layer, and the finally obtained result is the output result of the layer; if the actual output of the output layer is the same as the expected output, finishing learning and storing the convolution kernel weight; and if the output result obtained by the output layer is different from the expected output, switching to the error back propagation process. And (3) back propagation: calculating the difference between the actual output and the expected value according to the back transmission of the original convolution layer channel, reversely transmitting the difference to the light input layer through the convolution layer, distributing the error to each unit of each layer in the back transmission process to obtain the error signal of each single layer, and using the error signal as the basis for correcting the weight of each unit; after the convolution kernel values and thresholds of the layers are continuously adjusted, the error signal is minimized.

The process of continuously adjusting the weight and the threshold is the learning and training process of the convolutional neural network, and the adjustment of the weight and the threshold is repeatedly carried out through forward propagation and backward propagation until the preset learning and training times are reached, so that the output error is reduced to an allowable range.

After light with target information enters the optical prism, performing light dispersion operation with different weights on each layer, and acquiring corresponding target information light by each layer according to different dispersion coefficients; the filter film screens the light scattered by the optical prism and filters out light outside a threshold value; and the scattering mirror forms a target light spot at the target position according to the dispersed light containing the target information. After the target light spot is formed, the light spot is mapped back to the original picture through a special scattering mirror according to the dispersion coefficient, the target position is calibrated, and a target frame is formed, for example, the collected human face light spot is converted through the scattering mirror to form a human face frame, and an original human face frame capturing mode is replaced, so that the aim of reducing the power consumption of a camera battery is achieved.

An optical target detection device with extremely low power consumption based on a neural network comprises an optical prism device capable of being installed at the position of a front lens of electronic equipment, wherein the optical prism device comprises a multilayer optical prism, a multilayer filter film and a scattering mirror with a specific scattering rate; the prisms are overlapped layer by layer; the number of the filter films is determined according to the excitation function and is embedded behind the optical prism; and the scattering mirror is embedded behind the last optical prism layer. The number of layers of prisms and the number of small blocks in each layer are set according to training conditions, light is dispersed into light rays with different weights (shown as lines with different thicknesses in the figure) after passing through the prism layers, the light rays containing the target light rays are dispersed layer by layer and overlapped layer by layer, and finally target light ray information is extracted; and then forming a target light spot through the set scattering rate by the scattering mirror. To simplify the prism arrangement, each layer of prisms may be uniform in size.

In the optical field, the prism has the functions of dispersion, refraction and the like on light, the filter film performs filtering operation on the light through a set threshold, the forward propagation process of a convolutional neural network is similar, the light containing target information penetrates through multilayer prism equipment after being transmitted into photographic equipment, the target information light is obtained through the specific dispersion coefficient of the prism, and then the target light spot is formed through the scattering mirror; the dispersion rate of the prism is determined by convolution kernel weight values of each layer of the deconvolution neural network, the filter membrane is similar to a convolution neural network excitation function, the scattering mirror is used for mapping the extracted target light spot back to the original image, and the position of the original image where the target image is located is calibrated.

On the basis of the previous embodiment, the optical prism is made of rigid optical glass or alkali metal halide crystal material which can change the dispersion coefficient, and the optical glass or alkali metal halide crystal material has the advantages of easy change of the dispersion coefficient, high uniformity, no crack and small temperature coefficient; the filter film is made of a glass crystal or silicon carbide material by using an ion amplitude dielectric film process technology, the filter film made in the way comprises a specific threshold value, light rays outside the threshold value can be filtered, and the material can be replaced by a similar material with the function of filtering the specific light rays; the scattering mirror is made of a coated silicon wafer, such as an oxidized silicon wafer, a low-stress SiN layer silicon wafer, a gold-plated silicon wafer or a platinum-plated metal film silicon wafer.

On the basis of the previous embodiment, a spectroscope is connected behind the optical prism device to ensure that the target light spot does not influence the imaging of the camera.

The principle of the method and the device for detecting the optical target with extremely low power consumption based on the neural network is similar to a convolution process of deep learning, large-scale training is carried out on a target data set by constructing an optical prism device, an optical prism layer in the device is similar to convolution operation, a filter plate is similar to an excitation function, target extraction is carried out through operations such as dispersion and filtration of light, scattering is carried out through a scattering mirror, and a target light spot is directly used as output. The real-time performance and the high efficiency of the target detection of the camera are improved under the condition of not consuming any power consumption except the light information.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims

1. A method for detecting an optical target with extremely low power consumption based on a neural network is characterized by comprising the following steps: the method comprises the following steps:

2. The method of claim 1 for extremely low power optical target detection based on neural networks, characterized in that: d, after obtaining the convolution kernel weight of each layer, verifying the target detection precision, and if the target detection precision is greater than or equal to a set threshold, storing the model parameters trained in the step d; if the value is less than the set threshold value, the step c is executed again.

3. The method of claim 1 for extremely low power optical target detection based on neural networks, characterized in that: obtaining an accurate convolution kernel weight value by a mode of combining forward propagation and backward propagation:

4. The method of claim 3, wherein the method comprises: and when the high-dimensional input is mapped in the reverse propagation process, the convolution kernel is turned by 180 degrees according to a deconvolution principle, the target information is gradually extracted, the angle of the optical prism is changed according to the turned convolution kernel weight by 180 degrees, and the dispersion coefficient of each layer of prism is obtained.

5. The method of claim 1 for extremely low power optical target detection based on neural networks, characterized in that: after light with target information enters the optical prism, performing light dispersion operation with different weights on each layer, and acquiring corresponding target information light by each layer according to different dispersion coefficients; the filter film screens the light scattered by the optical prism and filters out light outside a threshold value; and the scattering mirror forms a target light spot at the target position according to the dispersed light containing the target information.

6. The method of claim 1 for extremely low power optical target detection based on neural networks, characterized in that: after the target light spot is formed, the light spot is mapped back to the original picture through a special scattering mirror according to the dispersion coefficient, the target position is calibrated, and a target frame is formed.

7. The method of claim 1 for extremely low power optical target detection based on neural networks, characterized in that: the target database can adopt a CASIA-Webface database or an original face picture which is acquired through a development interface of an electronic terminal and acquired by an image acquisition unit of electronic terminal equipment.

8. Apparatus for a neural network based very low power optical target detection method as claimed in any of claims 1-7, characterized by: the optical prism device comprises a multilayer optical prism, a multilayer filter film and a scattering mirror with a specific scattering rate, wherein the optical prism device can be mounted at the position of a front lens of electronic equipment; the prisms are overlapped layer by layer; the number of the filter films is determined according to the excitation function and is embedded behind the optical prism; and the scattering mirror is embedded behind the last optical prism layer.

9. The neural network-based very low power optical target detection device of claim 8, wherein: the optical prism is made of rigid optical glass or alkali metal halide crystal material capable of changing dispersion coefficient, the filter film is made of glass crystal or silicon carbide material by using ion amplitude dielectric film process technology, and the scattering mirror is made of coated silicon wafer.

10. The neural network-based very low power optical target detection device of claim 8, wherein: and a spectroscope is connected behind the optical prism device.