CN115984124A

CN115984124A - Method and device for de-noising and super-resolution of neuromorphic pulse signals

Info

Publication number: CN115984124A
Application number: CN202211543963.7A
Authority: CN
Inventors: 施柏鑫; 段沛奇; 马逸; 周鑫渝; 施新宇
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-18

Abstract

The invention discloses a method and a device for denoising and super-resolution of a neuromorphic pulse signal. Meanwhile, a deep learning method is used, the 3D-UNet network model is used for learning an end-to-end mapping model for pulse signal denoising and super-resolution reconstruction, and under the condition that only a pulse sequence is input, denoising and super-resolution tasks of events can be effectively realized, the existing method is prevented from depending on video frames and IMU information, the process of solving optical flow information is omitted, a large amount of running time is saved, and the processing speed is greatly improved.

Description

Method and device for de-noising and super-resolution of neuromorphic pulse signals

Technical Field

The invention relates to the technical field of computer vision, in particular to a neural morphological pulse signal denoising and super-resolution method and device.

Background

With the development of computer technology, computer computing power is gradually strengthened, machine learning and deep learning technologies are rapidly advanced, and computer vision related technologies are gradually applied to various scenes, such as human face detection, image beautifying, night photographing and other functions of a mobile phone camera, pedestrian detection and road recognition in unmanned driving, human face recognition of mobile payment and station identity detection, or synchronous positioning and image construction tasks of robots and the like. With the coming of big data and intelligent times, more and more application scenes need the support of a computational vision technology, massive video and image data are in urgent need of processing, and the important significance of a bottom-layer vision task is more highlighted. Therefore, the irreplaceability of the underlying image processing technology and the significance of the underlying image processing technology to tasks with higher semantic levels are widely concerned by society. Imaging with characteristics such as low noise, low blur, high spatial resolution, high temporal resolution, high dynamic range, etc., is an essential task for computational photography, and its development is extremely important for other computer vision technologies.

However, through decades of development, the conventional digital camera has entered into various fields of people's lives. With the advent of artificial intelligence research in recent years, the traditional digital camera seems to be useless in solving the vision problems in the application fields of automatic driving, unmanned aerial vehicle control, intelligent robots and the like. The reason for this is that these emerging applications have high requirements for capturing high-speed motion, whereas the fixed frame rate sampling mode of the conventional digital camera can only generate blurred images or videos when facing high-speed motion. In recent years, the popular neuromorphic pulse sensor imitating the biological retina imaging principle has entered into a plurality of visual analysis application fields due to the advantages of high dynamic range, high time resolution and the like. However, the disadvantages of high noise and low resolution limit the application of pulsed cameras in the field of industrial vision.

Compared with the traditional digital camera, the pulse camera abandons the concepts of frame and exposure, each pixel independently senses and integrates the light gun, when the integral of the light intensity exceeds a threshold value, a pulse is issued and transmitted in a binary form, 0 represents that the pixel has no pulse at the moment, and 1 represents that the pixel outputs a pulse at the moment. The pulses that are generated continuously as the light intensity changes constitute a pulse train. Unlike conventional 2D picture or video sequences, the triggered pulse time sequence is presented in the form of a three-dimensional spatio-temporal point cloud. Due to the special imaging principle of the pulse camera and the limitation of the existing sensor manufacturing process level, the current pulse camera has the problems of high noise, low spatial resolution and the like, so that the application of the pulse camera in the industrial vision field is restricted. When a pulse camera is used for target tracking, object detection and other tasks, feature distortion or missing can be caused, and the result is greatly degraded. When the pulse camera is used for tasks of high frame rate image generation, image deblurring and image high dynamic range recovery, problems of loss of detail texture, poor visual experience and the like may occur.

For the problem of denoising of pulse signals, no relevant method exists at present. The signal denoising problem of the event camera belonging to the neuromorphic camera together with the pulse camera is mainly solved by the following three methods: method 1) noise events, such as Super reactive Dynamic Scene from Continuous Spike Streams, are removed based on the spatio-temporal correlation of event signals within local spatio-temporal blocks. Method 2) predicts the Probability of whether an Event in a local spatio-temporal block is noise by using a video frame and camera motion information synchronously recorded in the DVS, thereby labeling a training sample, further learning an Event noise classification Network based on a Neural Network, and then performing noise removal on an Event signal, such as Event Probability Mask (EPM) and Event Denoising conditional Neural Network (EDnCNN) for Neural Cameras. Method 3) a mixed camera system of the event camera and the traditional camera is set up, the relation between the image signal and the event signal is set up by calculating the space-time gradient, so that the quality of the event signal is improved by utilizing the characteristics of low Noise and High Resolution of the image signal and by guiding the Filtering (Joint Filtering of Intensity Images and neural Events for High-Resolution Noise-Robust Imaging). However, the three methods have problems of themselves, and the method 1) and the method 2) cannot process event signals with complex shot motion scenes, cannot realize super-resolution processing of events, can only eliminate events marked as noise, and cannot recover events which are not triggered; the performance of the method 3) depends on the quality of the image signal, and the optical flow information of the local space-time block needs to be calculated, so that the processing speed is low.

For the above super-resolution problem of pulse signals, there are two main methods to solve at present: method 4) based on the spatio-temporal correlation between the image intensity and the pulse signal, a super-resolution algorithm from a low-resolution pulse signal to a high-resolution image guided by the motion optical flow is established. Method 5) forming a data set using a pulse signal simulator, learning a mapping network of low resolution pulse signals to high resolution images based on a deep learning network. However, method 4) is extremely slow in super-resolution since it involves a process of optical flow estimation and pixel-by-pixel estimation. Method 5) because the existing pulse signal simulator is difficult to simulate the noise and high time precision of the real pulse signal, the trained network lacks the compatibility to the real pulse signal.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a neural morphological pulse signal denoising and super-resolution method based on real sample collection and deep learning, a large number of real data sets for network training are obtained by synchronously shooting the same scene with different resolutions by using a pulse camera, and the problem that a pulse signal simulator cannot accurately generate event data is solved; meanwhile, the 3D-UNet network model is used for learning an end-to-end mapping model for pulse signal denoising and super-resolution reconstruction, so that the dependence on video frames and IMU information in the existing method is avoided, the process of solving optical flow information is omitted, and a large amount of running time is saved.

In order to achieve the above purpose, the invention provides the following technical scheme:

in one aspect, the invention provides a neural morphological pulse signal denoising and super-resolution method, which comprises the following steps:

s1, training data acquisition: the method comprises the steps that a real training data set is obtained by synchronously shooting the same scene with different spatial resolutions through a pulse camera, a display screen is used for synchronously displaying motion videos with different resolutions, then pulse data with different resolutions are intercepted from data shot by the pulse camera, and finally a complete RGB frame + multi-resolution pulse data set is formed;

s2, pulse data conversion: performing Encoder-Decoder processing on the event information by adopting a 3D convolutional neural network;

s3, pulse denoising and spatial upsampling: obtaining a denoising model of an event signal in learning by a convolutional neural network based on an L2 norm, solving an optimal solution of the denoising model, and outputting a reconstructed image after denoising and upsampling in a 3D (three-dimensional) tenor form;

s4, pulse signal redistribution: and (3) redistributing the pulses of the reconstructed image in the 3D tenor form in a mode of distributing time stamps at equal intervals, and restoring a high-resolution pulse signal.

Further, each set of pulse data captured in step S1 includes an information combination of { RGB frame, S1A, S1B, S2, timestamp sequence }, { RGB frame, S1A, S2, timestamp sequence }, which is used to complete training of the 2-fold super-resolution network, and an information combination of { RGB frame, S1A, S1B, timestamp sequence } which is used to complete training of the de-noising network.

Further, in the original pulse data of step S1, 40000H × W0-1 matrices are included per second, and whether or not there is a pulse at each pixel is recorded with a time accuracy of 25 μ S, 0 indicating no pulse, and 1 indicating a pulse.

Further, in step S2, an image reconstruction preprocessing is performed on the original pulse signal by using a method based on a pulse interval before the Encoder-Decoder processing.

Further, the pretreatment process comprises the following steps: the reciprocal of the light intensity at each moment is represented by the time interval of two adjacent pulses before and after each pixel, so as to form a preliminary reconstructed image at each moment, namely, a 40000 frames of preliminary image is reconstructed every second to serve as the input of a following network.

Further, the optimal solution of the denoising model in step S3 is represented as:

wherein S is input and output training data polluted by noise, and omega is the solved denoising model.

Further, in step S3, a structure of the 3D UNet is used to simultaneously implement denoising and super-resolution tasks, and in a 2-fold super-resolution network, a 3D deconvolution layer is added to each level of the 3D UNet to perform cross-level feature fusion, so as to implement resolution magnification.

Further, during training, 24000 LR-HR pulse pairs are generated from the acquired data as a training set; benchsize was set to 8 and 100 epochs were trained; the optimizer is ADAM, and the loss function loss is represented by a weight ratio of 1:0.005 Charbonnier loss and TV loss; the initial learning rate was 0.001, decaying 0.5 times every 50 cycles.

In another aspect, the present invention further provides a neuromorphic-pulse-signal denoising and super-resolution device, including a display screen and a pulse camera, and the following modules to implement any one of the above methods:

a training data acquisition module: the device comprises a pulse camera, a display screen, a data acquisition module and a data acquisition module, wherein the pulse camera is used for synchronously shooting the same scene with different spatial resolutions so as to obtain a real training data set, synchronously displaying motion videos with different resolutions by using the display screen to intercept pulse data with different resolutions from data shot by the pulse camera, and finally forming a complete RGB frame + multi-resolution pulse data set;

the pulse data conversion module: the device is used for performing Encoder-Decoder processing on event information by adopting a 3D convolutional neural network;

a pulse denoising and spatial upsampling module: the method comprises the steps of obtaining a denoising model of an event signal, obtaining an optimal solution of the denoising model, and outputting a reconstructed image after denoising and upsampling in a 3D (three-dimensional) tenor form;

the pulse signal redistribution module: the method is used for pulse redistribution of a reconstructed image in a 3D tenor form in a mode of distributing time stamps at equal intervals, and restoring a high-resolution pulse signal.

In yet another aspect, the present invention further provides a device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein:

the memory is used for storing a computer program;

the processor is configured to implement any one of the methods described above when executing the program stored in the memory.

Compared with the prior art, the invention has the beneficial effects that:

according to the neural morphological pulse signal denoising and super-resolution method and device, the same videos with different resolutions are arranged in the display screen, the display screen is shot by the pulse camera, so that real pulse data pairs with different resolutions are obtained, the real shot data set is used as the training set, the problem that a trained network is incompatible with real data due to the fact that the difference between simulation data and the real data is too large is avoided, and the problem that an event data cannot be accurately generated by a pulse signal simulator is solved. Meanwhile, a deep learning method is used, the 3D-UNet network model is used for learning an end-to-end mapping model for pulse signal denoising and super-resolution reconstruction, and under the condition that only a pulse sequence is input, denoising and super-resolution tasks of events can be effectively realized, the existing method is prevented from depending on video frames and IMU information, the process of solving optical flow information is omitted, a large amount of running time is saved, and the processing speed is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below. It should be apparent that the drawings in the following description are merely some of the embodiments described in the present invention, and that other drawings may be obtained by those skilled in the art.

Fig. 1 is a flowchart of a method for denoising and super-resolving a neuromorphic-pulse signal according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a training data acquisition device according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a shooting system according to an embodiment of the present invention.

FIG. 4 shows three viewing windows in a display according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

The neural morphological pulse signal denoising and super-resolution method disclosed by the invention comprises the steps of training data acquisition, pulse data conversion, pulse signal space up-sampling and pulse signal redistribution, as shown in figure 1, and specifically comprises the following steps:

s1, training data acquisition

Because the existing simulator can not accurately simulate the distribution model of event data, the invention provides that the same scene with different spatial resolutions is synchronously shot by a pulse camera (such as a Vidar), so as to obtain a real low-resolution-high-resolution training data set, the motion videos with different resolutions are synchronously displayed by utilizing a display screen, then the pulse data with different resolutions is intercepted from the data shot by the pulse camera, and finally a complete RGB frame and multi-resolution pulse data set is formed. The information combination contained in each group of shot pulse data is that the information combination of { RGB frame, S1A, S1B, S2, timestamp sequence }, { RGB frame, S1A, S2, timestamp sequence } is used for finishing the training of 2 times super-resolution network, and the information combination of { RGB frame, S1A, S1B, timestamp sequence } is used for finishing the training of de-noising network.

S2, pulse data conversion

Because the invention adopts the 3D convolution neural network to carry out the Encoder-Decoder processing on the event information, in order to lead the input information to contain more space image information, the invention firstly carries out the preprocessing on the original pulse signal. In the raw pulse data, 40000H × W matrices are included per second, and the presence or absence of a pulse (0 means no pulse, 1 means pulse) at each pixel is recorded with a time accuracy of 25 μ s. The invention uses the TFI image reconstruction method to express the reciprocal of the light intensity at the moment by using the time interval of two adjacent pulses on each pixel (as shown in (a) in FIG. 4), thereby forming a preliminary reconstructed image at each moment, i.e. 40000 frames of preliminary images are reconstructed every second to be used as the input of the following network. This preprocessing operation is also required for high resolution pulse sequences as a ground truth.

S3, pulse denoising and spatial upsampling

Although the low-resolution and high-resolution data collected in the step S1 are polluted by noise, because the noise of the pulse signal basically conforms to the rule of Gaussian distribution, the method adopts a convolutional neural network based on L2 norm to obtain a denoising model of the event signal in learning, obtains the optimal solution of the denoising model, and outputs a reconstructed image after denoising and upsampling in a 3D tensor form; the optimal solution solving formula is as follows:

Therefore, the invention can train the denoised network model only by using real noisy data. As shown in fig. 4, the present invention utilizes the structure of 3D UNet to achieve both denoising and super-resolution tasks.

S4, pulse signal redistribution

Since the event after denoising and upsampling of the network output is output in the form of 3D tenor, the number in tenor must be converted into the expression formation of pulse, so as to finally realize the function that the input is event output and event. Because the invention reconstructs the image preliminarily based on the pulse interval during the preprocessing, the value of each pixel of the output image represents the interval of two pulses before and after the time point of the pixel on the time axis, and based on the principle, the pulse signal of the binary mode can be restored by the output result. Specifically, the invention redistributes the pulse of the reconstructed image in the 3D tensor form in a mode of distributing time stamps at equal intervals, and restores the high-resolution pulse signal.

The invention adopts the real data with different resolutions shot in the step S1 to train the neural network, and the specific training process is as follows:

(1) Shooting training data set

a) The disclosed high-speed slow-play video data set is downloaded over a network.

b) New video is recomposed, each frame of video comprising a plurality of local windows of the same temporal synchronization content but different spatial resolution, corresponding to 1 x, 2 x resolution (4 x and 8 x if followed by a pulsed camera of greater resolution).

c) The data acquisition device shown in the following figure is constructed, as shown in fig. 2, and comprises a display and a neuromorphic pulse camera (or a time camera), and can also comprise a level meter, a sighting device and the like, as shown in fig. 3, so as to ensure that the visual angle of event shooting is opposite to and parallel to the display.

d) Shooting data: when shooting is started, except the display screen, all indoor environment light sources are turned off so as to reduce the influence of the external environment on data shooting. The processed video is then captured.

e) Processing the pulse data: and sequentially intercepting local areas of each group of shot pulse data to form independent pulse data with different resolutions, wherein the independent pulse data are respectively corresponding to the pulse data with different resolutions, and finally forming a complete RGB frame and multi-resolution pulse data set, and the information contained in each group is combined into { RGB frame, S1A, S1B, S2 and time stamp sequence }. In the following network training, the invention adopts the combination of { RGB frame, S1A, S2, time stamp sequence } to complete the training of 2 times super-resolution network, and uses the combination of { RGB frame, S1A, S1B, time stamp sequence } to complete the training of de-noising network.

(2) Training of neural networks

a) Event information preprocessing: in training the denoising and upsampling network, both LR (low dynamic range) and HR (high dynamic range) events are merged into one 32-channel event tenor to complete supervised training. Each pixel in each channel sums the events over the time interval. We also tried different channel numbers and found that 32 channels had the best performance.

b) The main module of the whole network is 3D UNet, in a 2-time super-resolution network, the network adds a 3D deconvolution layer for jump connection of each level of the 3D UNet and cross-level feature fusion so as to realize the amplification of resolution. In the denoising network, the deconvolution network described above does not need to be added. The input and output pulse signals are subjected to preliminary image reconstruction preprocessing by adopting a method based on pulse intervals, the output tensor is rounded to take an integer value, so that a reconstructed image after the superseparation denoising is obtained, then the pulse redistribution is carried out by distributing time stamps at equal intervals, and the high-resolution pulse signals can be restored.

c) During training, we generated 24000 LR-HR pulse pairs from the acquired data as a training set. Benchsize was set to 8 and 100 epochs were trained. The optimizer is ADAM, and the loss is weight ratio of 1: the initial learning rate of 0.001 for a 0.005 charbonier loss and TV loss combined loss function decays 0.5 times per 50 cycles. Using a PyTorch 1.6 and NVIDIA 2080Ti GPU takes about 12 hours in total.

d) In the testing process, only the real beat pulse sequence is required to be input for denoising and super-resolution.

Corresponding to the method provided by the embodiment of the invention, the invention provides a neural morphological pulse signal denoising and super-resolution device, which comprises a display screen, a pulse camera and the following modules to realize the method of any one of the embodiments:

When the method or the device is applied, the following steps can be adopted:

a) The public high-speed slow-release video data set is downloaded over the network, the data set contains 45 video sequences for all color frames, the frame rate is adjusted to 30fps, and the spatial resolution of each frame is 1280 x 720.

b) As shown in fig. 4 as an example, new 45 videos are synthesized again, and each video has a frame rate adjusted to 360fps and a resolution of 1280 × 720. Each frame of the video contains 3 local windows with the same time-synchronized content but different spatial resolutions, wherein the maximum one of the windows has a resolution of 720 × 720 and the minimum two windows have a resolution of 360 × 360. To facilitate sufficient time to play the video and start the camera shot when it is shot, the beginning and ending frames of each video are left for two seconds.

c) The data acquisition device of the system shown in figure 2 is set up: in this embodiment, the display screen has a model number of ASUS PG259QNR, a resolution of 1920 × 1080, and a refresh rate of 360Hz. An event camera model VidarOne, resolution 400X 250, lens F/1.4 was placed horizontally approximately 180cm directly in front of the display screen. In order to ensure that the angle of view of the shot is directly opposite and parallel to the display.

d) Camera view and display screen display area registration: as shown in fig. 3, in the registration, the center point of the display is set as a cross target, the aiming point is placed right in front of the camera (defined by the bread board), and the plane of the camera and the plane of the display are finally ensured to be parallel by a three-point-one-line method, and the connecting line of the center points is perpendicular to the two planes. And limit the horizontal rotation angles of the camera and the display screen to be the same by placing a level on the camera and the display. Thereby ensuring that the camera view angle and the display area of the display screen are in perfect registration. The viewing angles of the event camera after registration are shown in fig. 4, and the viewing angles correspond to three windows in the display, wherein the largest window corresponds to a resolution of 240 × 240 for the pulse camera, and the two smallest windows correspond to a resolution of 120 × 120.

e) When shooting is started, except the display screen, all indoor environment light sources are turned off so as to reduce the influence of the external environment on data shooting. Then 45 processed videos were taken in sequence.

f) Processing pulse data: and realizing the time registration of the pulse data and the color video. The alignment of the time points of the start and end is done with the flag that is reserved at the shooting. And sequentially intercepting local areas of each group of shot event data according to the spatial coordinate position in the figure to form independent event data with different resolutions, wherein the independent event data correspond to the data S1A, the data S1B and the data S2 respectively. The invention finally forms a complete 45-group RGB frame + multi-resolution pulse data set, and the information contained in each group is combined into { RGB frame, S1A, S1B, S2, time stamp sequence }. In the following network training, the invention adopts the combination of { S1A, S2, time stamp sequence } to complete the training of 2 times super-resolution network, and uses the combination of { S1A, S1B, time stamp sequence } to complete the training of de-noising network.

In conclusion, compared with the prior art, the method and the device use the real shooting data set as the training set, and avoid the problem that the trained network is incompatible with the real data due to too large difference between the simulation data and the real data. And setting the same videos with different resolutions in the display screen, and shooting the display screen by using a pulse camera, thereby obtaining real pulse data pairs with different resolutions. Meanwhile, by using a deep learning method, under the condition that only a pulse sequence is input, the denoising and super-resolution tasks of the event can be effectively realized, and the processing speed is greatly improved.

Corresponding to the method provided by the embodiment of the present invention, an embodiment of the present invention further provides an electronic device, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

the processor is configured to implement the method flow provided by the embodiment of the present invention when executing the program stored in the memory.

The communication bus mentioned in the above-mentioned control apparatus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the methods provided by the above-mentioned embodiments of the present invention.

In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of any of the methods provided by the embodiments of the present invention described above.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optics, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "...," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A neural morphological pulse signal denoising and super-resolution method is characterized by comprising the following steps:

s1, training data acquisition: the method comprises the steps that a pulse camera is used for synchronously shooting the same scene with different spatial resolutions so as to obtain a real training data set, a display screen is used for synchronously displaying motion videos with different resolutions, then pulse data with different resolutions are intercepted from data shot by the pulse camera, and finally a complete RGB frame and multi-resolution pulse data set is formed;

2. The neuromorphic pulse signal denoising and super-resolution method according to claim 1, wherein each set of shot pulse data in step S1 comprises an information combination of { RGB frame, S1A, S1B, S2, timestamp sequence }, { RGB frame, S1A, S2, timestamp sequence } for completing training of a 2-fold super-resolution network, and an information combination of { RGB frame, S1A, S1B, timestamp sequence } for completing training of a denoising network.

3. The neuromorphic-pulse-signal denoising and super-resolution method according to claim 1, wherein the raw pulse data of step S1 comprises 40000H × W0-1 matrices per second, and whether there is a pulse at each pixel is recorded with a time accuracy of 25 μ S, 0 indicating no pulse and 1 indicating a pulse.

4. The neuromorphic-pulse-signal denoising and super-resolution method according to claim 3, wherein in step S2, the Encode-Decoder processing is preceded by an image reconstruction preprocessing on the original pulse signal by a pulse interval-based method.

5. The neuromorphic-pulse-signal denoising and super-resolution method of claim 4, wherein the pre-processing procedure is: the reciprocal of the light intensity at each moment is represented by the time interval of two adjacent pulses before and after each pixel, so as to form a preliminary reconstructed image at each moment, namely, a 40000 frames of preliminary image is reconstructed every second to serve as the input of a following network.

6. The neuromorphic-pulse-signal denoising and super-resolution method according to claim 1, wherein the optimal solution of the denoising model in step S3 is represented as:

7. The neuromorphic-pulse-signal denoising and super-resolution method according to claim 1, wherein step S3 utilizes a 3D UNet structure to simultaneously implement denoising and super-resolution tasks, and in a 2-fold super-resolution network, a 3D deconvolution layer is added to each 3D UNet level to perform cross-level feature fusion, so as to achieve resolution magnification.

8. The neuromorphic-pulse-signal denoising and super-resolution method of claim 1, wherein during training, 24000 LR-HR pulse pairs are generated from the acquired data as a training set; benchsize was set to 8 and 100 epochs were trained; the optimizer is ADAM, and the loss function loss is represented by a weight ratio of 1:0.005 charbonierloss and TVloss; the initial learning rate was 0.001, decaying 0.5 times every 50 cycles.

9. A neuromorphic-pulse-signal denoising and super-resolution apparatus comprising a display screen and a pulse camera, and the following modules to implement the method of any one of claims 1-8:

10. The device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; it is characterized in that the preparation method is characterized in that,

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method of any of claims 1-8.