CN115953438A - Optical flow estimation method and device, chip and electronic equipment - Google Patents

Optical flow estimation method and device, chip and electronic equipment Download PDF

Info

Publication number
CN115953438A
CN115953438A CN202310256478.XA CN202310256478A CN115953438A CN 115953438 A CN115953438 A CN 115953438A CN 202310256478 A CN202310256478 A CN 202310256478A CN 115953438 A CN115953438 A CN 115953438A
Authority
CN
China
Prior art keywords
optical flow
neural network
feature vector
event
frame group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310256478.XA
Other languages
Chinese (zh)
Inventor
余淮
陈匡义
杨文�
余磊
王帆
乔宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shizhi Technology Co ltd
Shenzhen Shizhi Technology Co ltd
Original Assignee
Chengdu Shizhi Technology Co ltd
Shenzhen Shizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shizhi Technology Co ltd, Shenzhen Shizhi Technology Co ltd filed Critical Chengdu Shizhi Technology Co ltd
Priority to CN202310256478.XA priority Critical patent/CN115953438A/en
Publication of CN115953438A publication Critical patent/CN115953438A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an optical flow estimation method and device, a chip and electronic equipment. To achieve optical flow estimation, the present invention provides a method by: framing the pulse event stream, and dividing the pulse event stream into a front event frame group and a rear event frame group; sequentially sending the pre-event frame group and the post-event frame group into a first neural network, and respectively extracting a first feature vector and a second feature vector; sending the post-event frame group into a second neural network, and extracting a third feature vector; obtaining a correlation quantity pyramid according to the first feature vector and the second feature vector; and obtaining the updated current estimated optical flow according to the third feature vector, the relevant quantity pyramid and the current estimated optical flow, solving the technical problem of optical flow estimation, and the scheme can be more suitable for asynchronous event flows and can estimate the optical flow more efficiently and more accurately. The invention is suitable for the fields of event cameras, brain-like computing and computer vision.

Description

Optical flow estimation method and device, chip and electronic equipment
Technical Field
The present invention relates to image processing methods, and more particularly, to an optical flow estimation method and apparatus based on an event camera, a chip and an electronic device.
Background
Optical flow estimation is one of the basic tasks in the field of computer vision, the purpose of which is to provide information on the motion of objects in image space. Optical flow estimation is often seen as the main module of visual odometry, which plays a great role in the fields of robotics, virtual reality and autopilot.
Most optical flow estimation methods in the prior art are directed to images acquired by conventional frame-based cameras, which are a record of the illumination intensity of the environment over a given time interval. While these approaches have met with many success, they are also faced with challenges such as motion blur at high speed motion, inability to capture information in low light conditions, over-or under-saturation in high dynamic range environments, as well as computational load, power consumption, cost, real-time, and the like.
Prior art 1: hui T W, tang X, loy C. A light optical flow CNN-visualization data identification and visualization [ J ]. IEEE transactions on pattern analysis and machine interaction, 2020, 43 (8): 2555-2569
From the Flownet proposed by the ICCV-2015 to the LiteFlownet2 proposed by the prior art 1, the frame rate that the scheme can handle has been developed from 5 FPS to 25 FPS, which, although the progress is great, also indicates that the traditional scheme has been difficult to make a qualitative breakthrough.
Unlike the conventional frame-based camera, the event camera asynchronously acquires brightness change information in an environment, and thus has the characteristics of high temporal resolution and high dynamic range, and can well overcome the above problems. However, unlike the images captured by the conventional frame camera, the event stream captured by the event camera has the characteristics of sparseness, irregularity and asynchrony, and cannot be directly used as the input of the convolutional neural network, so that the mature network framework cannot be easily applied.
Existing methods typically preprocess the event stream, convert it into the form of image frames (i.e., frame compression), and then extract motion information using a network structure similar to U-Net. Although these methods have achieved some success, they still suffer from at least the following two problems:
1) The event frame is processed by using the convolutional neural network without considering the asynchronous and sparse characteristics of the event stream, so that the waste of computing resources is caused.
2) Compared with the design in the method based on the traditional camera, the U-Net structure is too simple, and the further improvement of the optical flow estimation precision is limited.
Based on this, there is a need in the art for a technical solution that can better adapt to the asynchronous characteristics of event data, more effectively extract motion information from event data, and implement optical flow estimation based on an event camera with higher accuracy.
Disclosure of Invention
In order to solve or alleviate some or all of the technical problems, the invention is realized by the following technical scheme:
a method of optical flow estimation, comprising: framing the pulse event stream, and dividing the pulse event stream into a front event frame group and a rear event frame group; sequentially sending the pre-event frame group and the post-event frame group into a first neural network, and respectively extracting a first feature vector and a second feature vector; sending the post-event frame group into a second neural network, and extracting a third feature vector; obtaining a correlation quantity pyramid according to the first feature vector and the second feature vector; and obtaining the updated current estimated optical flow according to the third feature vector, the relevant quantity pyramid and the current estimated optical flow.
In a certain class of embodiments, the first neural network and the second neural network are structurally identical; or/and the first neural network comprises a pulse neural network module and an artificial neural network module; and/or the second neural network comprises a pulse neural network module and an artificial neural network module.
In some kind of embodiment, an updated current estimated optical flow is obtained by using RNN according to the third eigenvector and the correlation quantity pyramid and the current estimated optical flow.
In some embodiments, a first correlation matrix C is obtained by calculation according to the first eigenvector and the second eigenvector ijkl I.e. by
Figure SMS_1
Where i, j represent the first feature vector F f On the abscissa and ordinate, k->
Figure SMS_2
Represents a second feature vector pick>
Figure SMS_3
H is the sign of the eigenvector channel.
In a certain type of embodiment, the first correlation matrix is pooled in different kernel sizes, and the correlation pyramid is obtained according to the pooled result and the first correlation matrix.
In some class of embodiments, the second matrix of correlation quantities is obtained by indexing in the pyramid of correlation quantities by a search operator, using the current estimated optical flow.
In certain class of embodiments, the RNN is a gated cycle unit; and taking the second correlation matrix and the current estimated optical flow as the input of the gating circulation unit, and obtaining the updated current estimated optical flow at least according to the output of the gating circulation unit.
In some class of embodiments, the first neural network or/and second neural network comprises: a striding convolutional layer, an IAF neuron group, wherein the IAF neuron group is positioned at the later stage of the striding convolutional layer.
In some embodiments, the first neural network or/and second neural network comprises a plurality of pairs of sequential strided convolutional layers and IAF neuron groups, wherein the IAF neuron group is located at a later stage of the strided convolutional layers, and the strided convolutional layer in the next pair is located at a later stage of the previous pair of IAF neuron groups.
In some embodiments, the first neural network or/and the second neural network comprises at least 2 pairs of sequentially strided convolutional layers and IAF neuron groups, and each pair of IAF neuron groups has a different size, and the size of the IAF neuron group in the last pair is the smallest.
In some embodiments, the IAF neuron group in the last pair further comprises a step convolutional layer; the outputs of the strided convolutional layers are accumulated by an output accumulator.
In some class of embodiments, the output result of the output accumulator is processed by a residual block.
In a certain class of embodiments, the residual block includes 2 convolutional layers; or/and, after processing the pre-event frame set or the post-event frame set, the output result in the output accumulator is emptied.
In some type of embodiment, there are 2 sequentially connected residual blocks to process the output result of the output accumulator.
In a certain type of embodiment, the result of the residual block processing is a first feature vector, a second feature vector, or a third feature vector.
In some class of embodiments, an IAF neuron in the group of IAF neurons outputs a 1 after the updated neuron state exceeds a threshold, otherwise outputs a 0.
In some class of embodiments, the same number of event frames are included in the pre-event frame set and the post-event frame set.
In certain embodiments, each event frame in the pre-event frame set and the post-event frame set is obtained after framing from the stream of pulse events at equal time intervals.
In certain embodiments, each event frame includes elements that represent the number of ON polarity events and the number of OFF polarity events, respectively.
In certain class of embodiments, the updated current estimated optical flow is obtained through several iterations to output the final estimated optical flow.
According to another aspect of the invention, it discloses: an optical flow estimation device, comprising: the frame building module is used for building frames for the pulse event stream and dividing the frames into a front event frame group and a rear event frame group; the first neural network module is used for respectively extracting a first feature vector and a second feature vector according to the pre-event frame group and the post-event frame group which are sequentially input; the second neural network module is used for extracting a third feature vector according to the input post-event frame group; a correlation quantity extraction module, configured to obtain a correlation quantity pyramid according to the first feature vector and the second feature vector; and the optical flow estimation module is used for obtaining the updated current estimated optical flow according to the third feature vector, the relevant quantity pyramid and the current estimated optical flow.
In some embodiments, the first neural network and the second neural network are structurally identical; or/and the first neural network comprises a pulse neural network module and an artificial neural network module; or/and the second neural network comprises a pulse neural network module and an artificial neural network module.
In some kind of embodiment, an updated current estimated optical flow is obtained by using RNN according to the third eigenvector and the correlation pyramid and the current estimated optical flow.
In some embodiments, a first correlation matrix C is obtained by calculation according to the first eigenvector and the second eigenvector ijkl I.e. by
Figure SMS_4
Where i, j represent the first feature vector F f On the abscissa and ordinate, k->
Figure SMS_5
Represents a second feature vector>
Figure SMS_6
H is the sign of the feature vector channel.
In a certain type of embodiment, the first correlation matrix is pooled in different kernel sizes, and the correlation pyramid is obtained according to the pooled result and the first correlation matrix.
In some class of embodiments, the second matrix of correlation quantities is obtained by indexing in the pyramid of correlation quantities by a search operator, using the current estimated optical flow.
In a certain class of embodiments, the RNN is a gated cycle unit; and taking the second correlation quantity matrix and the current estimated optical flow as the input of the gating circulation unit, and obtaining the updated current estimated optical flow at least according to the output of the gating circulation unit.
In some class of embodiments, the first neural network or/and second neural network comprises: a striding convolutional layer, an IAF neuron group, wherein the IAF neuron group is positioned at the later stage of the striding convolutional layer.
In some embodiments, the first neural network or/and second neural network comprises a plurality of pairs of sequential strided convolutional layers and IAF neuron groups, wherein the IAF neuron group is located at a later stage of the strided convolutional layers, and the strided convolutional layer in the next pair is located at a later stage of the previous pair of IAF neuron groups.
In a certain class of embodiments, the first or/and second neural network comprises at least 2 pairs of sequentially strided convolutional layers and IAF neuron groups, and each pair of IAF neuron groups is of a different size, the size of the IAF neuron group in the last pair being the smallest.
In some embodiments, the IAF neuron group in the last pair further comprises a step convolutional layer; the outputs of the strided convolutional layers are accumulated by an output accumulator.
In some class of embodiments, the output result of the output accumulator is processed by a residual block.
In a certain class of embodiments, the residual block includes 2 convolutional layers; or/and, after processing the pre-event frame set or the post-event frame set, the output result in the output accumulator is emptied.
In some type of embodiment, there are 2 sequentially connected residual blocks to process the output result of the output accumulator.
In some embodiment, the result after the residual block processing is the first feature vector, the second feature vector, or the third feature vector.
In some class of embodiments, an IAF neuron in the group of IAF neurons outputs a 1 after the updated neuron state exceeds a threshold, otherwise outputs a 0.
In some class of embodiments, the same number of event frames are included in the pre-event frame set and the post-event frame set.
In certain embodiments, each event frame in the pre-event frame set and the post-event frame set is obtained after framing from the stream of pulse events at equal time intervals.
In certain embodiments, each event frame includes elements that represent the number of ON polarity events and the number of OFF polarity events, respectively.
In certain class of embodiments, the updated current estimated optical flow is obtained through several iterations to output the final estimated optical flow.
According to another aspect of the invention, it discloses: a chip comprising an optical flow estimation device as claimed in any preceding claim.
According to another aspect of the invention, it discloses: an electronic device is provided with a chip as described above and uses the chip to estimate the optical flow in the environment.
Some or all embodiments of the invention have the following beneficial technical effects:
1) The asynchronous characteristic of the event data can be better adapted;
2) Motion information can be more effectively extracted from event data;
3) The optical flow estimation based on the event camera can be realized with higher precision.
In summary, the invention utilizes the impulse neural network to extract the high-level characteristics of the events, introduces related quantities to measure the similarity between the input event data, and utilizes GRU to iteratively optimize the predicted optical flow.
Further advantages will be further described in the preferred embodiments.
The technical solutions/features disclosed above are intended to be summarized in the detailed description, and thus the ranges may not be exactly the same. The technical features disclosed in this section, together with technical features disclosed in the subsequent detailed description and parts of the drawings not explicitly described in the specification, disclose further aspects in a mutually rational combination.
The technical scheme combined by all the technical features disclosed at any position of the invention is used for supporting the generalization of the technical scheme, the modification of the patent document and the disclosure of the technical scheme.
Drawings
FIG. 1 is a schematic diagram of framing an event stream;
FIG. 2 is a block diagram of a convolutional neural network used in the present invention;
FIG. 3 is a block diagram of the SNN module of the present invention;
FIG. 4 is an optical flow estimation result.
Detailed Description
Since various alternatives cannot be exhaustively described, the following will clearly and completely describe the main points in the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. It is to be understood that the invention is not limited to the details disclosed herein, which may vary widely from one implementation to another.
Unless defined otherwise, a "/" at any position in the present disclosure means a logical "or". The ordinal numbers "first," "second," etc. in any position of the invention are used merely as distinguishing labels in description and do not imply an absolute sequence in time or space, nor that the terms in which such a number is prefaced must be read differently than the terms in which it is prefaced by the same term in another definite sentence.
The present invention may be described in terms of various elements combined into various embodiments, which may be combined into various methods, articles of manufacture. In the present invention, even if the points are described only when introducing the method/product scheme, it means that the corresponding product/method scheme explicitly includes the technical features.
When a step, a module or a feature is described as being present or included in any position of the present invention, it is not implied that the presence is exclusive and only exists, and other embodiments can be fully realized by the technical solution disclosed by the present invention and other technical means. The embodiments disclosed herein are generally for the purpose of disclosing preferred embodiments, but this does not imply that the opposite embodiment to the preferred embodiment is excluded/excluded from the present invention, and it is intended to cover the present invention as long as such opposite embodiment solves at least some technical problem of the present invention. Based on the point described in the embodiments of the present invention, those skilled in the art can fully apply the means of replacement, deletion, addition, combination, and order exchange to some features to obtain a technical solution still following the concept of the present invention. Such solutions are also within the scope of protection of the present invention, without departing from the technical idea of the invention.
The various parameters listed in the following detailed description of the invention are only a preferred example and a person skilled in the art will know that these parameters can be replaced by other values at all, and therefore these examples do not constitute a limitation to other possible solutions.
As shown in fig. 1, a process for processing an event stream in some type of embodiment of the present invention is shown. The pulsed event stream captured by the event camera for a time interval (referred to as event stream) is divided into two groups, one before and one after the other, including the previous event (former events) and the subsequent event (later events), and each group is divided into 5 event streams with the same time interval (preferred), for example. Hereinafter, 5 event streams are taken as examples, but the invention is not limited thereto.
Then according to the polarity of the event, the method will beEach small event stream is converted into 2-dimensional image frames/event frames, 2 dimensions respectively represent the number of positive and negative polarity (ON/OFF events) events at each pixel position, the event frames (with the size of 2 multiplied by H multiplied by W) are combined to respectively obtain 2 previous event frame groups with the sizes of 2 multiplied by H multiplied by W multiplied by 5
Figure SMS_7
And post-event frame group>
Figure SMS_8
Where H and W represent the height and width of the event frame, respectively. Group of previous event frames->
Figure SMS_9
And post-event frame group>
Figure SMS_10
The two groups of event streams respectively shown are a set of event frames obtained by frame compression conversion.
As shown in FIG. 2, a flow chart of the optical flow estimation of the present invention is shown, and the flow of the method can be executed by an FPGA, an application specific integrated circuit ASIC (such as a neuromorphic chip), a GPU device, and the like.
Grouping previous event frames
Figure SMS_11
And post-event frame group>
Figure SMS_12
Are sequentially input to a first neural network D a In (1), a first feature vector is obtained>
Figure SMS_13
And a second feature vector +>
Figure SMS_14
And simultaneously grouping the post-event frames ≥ based on the combination of the preceding and subsequent frames>
Figure SMS_15
Into a second neural network D b Get the third feature vector->
Figure SMS_16
The feature vector is output after an event frame group with the size of 2 × H × W × 5 is input to the impulse neural network, and is used for being input to a subsequent Artificial Neural Network (ANN) to continue processing. This is similar to extracting features from images, typically using neural networks, except that the invention is extracted from the input pulse sequence.
Preferably, the first neural network D a Or/and a second neural network D b Including a spiking neural network module (SNN module) and an artificial neural network module (ANN), and thus can be considered as a hybrid neural network. Preferably, the spiking neurons included in the SNN module herein are further optimized, unlike the pulsing mode of conventional spiking neurons, as will be described later.
Preferably, two neural networks D a And D b The structure of (2) is completely the same and consists of an SNN module and 2 residual error blocks (examples). The SNN module is composed of three (example) stepping convolutional layers, and a group of IF (also called IAF (example) pulse neurons) are coupled behind part of the stepping convolutional layers.
In another class of embodiments, two neural networks D a And D b Are different, e.g. some (subtle) differences are allowed between the two.
Feature vectors obtained by extracting features of the image frames/event frames
Figure SMS_17
、/>
Figure SMS_18
、/>
Figure SMS_19
The dimensions of (a) are H × W × n, and each represents the height, width, and number of channels of the feature vector (determined by the convolutional layer to be passed).
The first and second neural networks, which may also be referred to as encoders, are used to extract featuresAnd (5) vector quantity. Feature vector output by first neural network
Figure SMS_20
、/>
Figure SMS_21
Is used to obtain the correlation quantity pyramid. The correlation quantity pyramid and the third feature vector are used to obtain the optical flow.
For the step of extracting feature vectors, for example, the previous event frame group I f The included 5 groups of ON event frames and OFF event frames are sent to the first neural network according to time sequence for processing. For example, the first generated ON event frame and OFF event frame (2 × H × W, referred to as a first event frame) are processed by the network at the same time, and then the second generated ON event frame and OFF event frame (2 × H × W, referred to as a second event frame) are processed.
The input of the first neural network is composed of 5 event frames of 2 XHXW, which are sent to D a . Preferably, each event frame is sequentially passed through the step convolution layer, the IAF neuron, the step convolution layer and the IAF neuron, and then output and accumulated in the output accumulator, and finally, the accumulated result is output obtained after the event frame group is input to the SNN module. Current event frame set
Figure SMS_22
After being processed, the event frame group processed by the first neural network is judged to be greater than or equal to>
Figure SMS_23
The method is similar to the above. Post-processing event frame group ≧ for a second neural network>
Figure SMS_24
And extracting features, similarly, and will not be described herein again.
Referring to fig. 3, specific details of the SNN module are shown. For example, for the first generated ON event frame and OFF event frame, after each step convolution based ON the convolution kernel, the result of the step convolution is used as the input of the IAF neuron. For example, after the first stride convolution, the convolution results are received by the neurons in the first group (e.g., H/2 xW/2) of IAF neurons, respectively.
A second stride convolution is then performed, the convolution results of which are received by neurons in a second set (illustratively, H/4 xW/4) of IAF neurons, respectively.
Then, a third step convolution is executed, and the convolution result is input into an output accumulator. For example, the size of the result of the third stride convolution is 256 × H/8 × W/8 by convolution with 256 convolution kernels. The output accumulator accumulates (i.e., accumulates) the third step convolution results of the 5 (ON and OFF) event frames. In other words, the third step convolution result obtained after the first (including ON and OFF) event frame is processed by the SNN module shown in fig. 3 is retained (the previous third step convolution result is cleared), and then the third step convolution result obtained after the second (ON and OFF) event frame is processed by the SNN module shown in fig. 3 is accumulated with the retained result, and so ON until the third step convolution result corresponding to the fifth (ON and OFF) event frame is obtained. Here, the third stride convolution results of the first five event frames are accumulated in the output accumulator, but the accumulated results do not affect the stride convolution of the next five (i.e., next group) event frames, i.e., there is also a step of clearing the data accumulator.
When the convolution kernel is shifted right/down by more than 1 grid (i.e. a convolution with a step greater than 1) after each convolution, the convolution is called a step convolution, and the use of a step convolution means controlling the size of the output feature map.
Preferably, the first neural network D a The multi-step convolutional layer is composed of three step convolutional layers, and the first two step convolutional layers are respectively coupled with a group of IAF neurons. The invention is not limited in this regard as to the number of strided convolutional layers (convolution operations) and whether other processing steps are included.
Illustratively, the output of the final third stride convolution is input to an output accumulator, illustratively 256 XH/8W/8 in size. The purpose of using an output accumulator (accumulator) is that the incoming (pulse) events are divided into 5 segments, which are fed into the neural network in turn, and the output accumulator accumulates the output results each time. The output accumulator accumulates the network output results of the 5 event frames sent into the neural network, and the accumulated results are sent into a subsequent network for continuous processing.
For 5 event frames input continuously, the output accumulator is not cleared during the accumulation. And the next event frame (containing
Figure SMS_25
And &>
Figure SMS_26
) When the group arrives, the output accumulator is cleared.
For the aforementioned groups of IAF neurons, the initial state (membrane voltage) of the IAF neuron is usually 0, and after an input (such as a common pulse event) arrives, the state is updated and it is determined whether a trigger threshold is reached, and if so, a pulse is issued (AER event). Optionally, after the IAF neuron receives an input, determining whether the result is greater than a given threshold, if so, outputting a number 1, otherwise, outputting a number 0; alternatively, the aforementioned IAF neuron may also express pulse firing through a conventional fire AER event. The input event stream is divided into 5 segments during framing, and the segments are alternately passed through the impulse neural network/SNN module to simulate the process of impulse, in the sense that a characteristic representation which can be fed into the convolutional neural network (example) can be obtained finally.
The IAF neuron is a basic unit of the impulse neural network, and the output accumulator is used to accumulate the output of the IAF neuron, because the input event frame group is divided into 5 frames with the size of 2 × H × W and then input into the IAF neuron for processing, which is also the simulation of the impulse.
Illustratively, a residual block comprises 2 convolutional layers, where a residual block may be the most classical residual block structure, each residual block comprising 2 convolutional layers and an edge directly added from the input to the output for input and output addition of said residual block, which belongs to the ANN. For some non-residual modules, the invention may also be used for feature extraction, and the invention is not limited thereto.
With continued reference to FIG. 2, the first eigenvector from the foregoing
Figure SMS_27
And a second feature vector +>
Figure SMS_28
Calculating to obtain a first correlation matrix C ijkl The specific calculation mode is->
Figure SMS_29
Wherein i, j denote the feature vector->
Figure SMS_30
On the horizontal and vertical coordinates, k,. And/or>
Figure SMS_31
Represents a feature vector pick>
Figure SMS_32
H represents the dimension of the feature vector, and is a mark of the feature vector channel.
The size of the first correlation matrix obtained by calculation is H × W × H × W, and pooling is performed in two subsequent dimensions with a kernel size of 2,4,8, respectively, to finally obtain a correlation pyramid C = { C = { (C) } 1 , C 2 , C 3 , C 4 The sizes of each layer are H × W × H × 0W, H × 1W × H/2 × W/2, H × W × H/4 × W/4, H × W × H/8 × W/8 in this order. The size of each layer of the pyramid (pyramid) of the related quantity is not limited to the example, as long as the principle of big and small top of the sole is satisfied.
Next, an initial optical flow S is constructed 0 It may be an all-zero matrix of H W2 (0 in FIG. 2). Setting a maximum number of iterationsNkFor the current iteration, the initial value is 1.
From the current estimated optical flow S k-1 In a correlation pyramid
Figure SMS_33
To be indexed. In particular, for event frame groups
Figure SMS_34
Per pixel p = (x, y) in terms of optical flow S k-1 Find its subsequent event frame group->
Figure SMS_35
Upper corresponding pixel p' = (x + u) k-1 (x,y), y+v k-1 (x, y)), wherein u k-1 (x, y) and v k-1 (x, y) respectively represent the optical flows S k-1 The optical flow on the u, v axis (coordinate axis of the image, distinct from the x, y coordinate values), then defines a search space, for example,
Figure SMS_36
r is set to 3 (hereinafter, this parameter is taken as an example), and each level of the correlation quantity pyramid C is extracted (refer to C = { C) = 1 , C 2 , C 3 , C 4 }) to obtain a second matrix with the size of 4 × H × W × 3 × 3, and adjusting the size of the second matrix to H × W × 36 to obtain a second correlation matrix.
First level C of pyramid C by related quantity 1 For example, the size of the image is H × W × H × W, which represents the result of multiplying feature vectors of two input event frames, and the corresponding position of any pixel in the first event frame on the second event frame can be determined according to the optical flow, and the 3 × 3 blocks around the pixel can be locked, so that one 3 × 3 block corresponds to each pixel in the first frame, and corresponds to the first layer C of the correlation quantity pyramid C 1 Then a matrix of hxw x 3 can be obtained, and for C 2 、C 3 And C 4 The same size matrix is obtained and combined to obtain the final second 4 × H × W × 3 × 3 matrix.
Second correlation matrix of size H × W × 36 and optical flow S of size H × W × 2 k-1 Splicing the three dimensions to obtain a new matrix, and then combining the new matrix with a third eigenvector
Figure SMS_37
Are sent into RNN network (preferably gated cyclic unit GRU) together and outputObtaining an updated current estimated optical flow S k
The gated loop unit GRU is a technique known to the person skilled in the art. The GRU module uses a gating mechanism to control the input, memory, etc. information to make a prediction at the current time step, which includes an update gate and a reset gate, the reset gate determining how to combine the new input information with the previous memory, the update gate defining the amount of the previous memory saved to the current time step. These two gating vectors determine which information can ultimately be used as the output of the gated loop unit. They are able to preserve information in long-term sequences and are not cleared over time or removed because they are not relevant to prediction. Based on the invention through feature vectors
Figure SMS_38
Second matrix of correlation quantities and GRU, how to obtain the optical flow S k Is within the ordinary skill of those in the art.
In the training phase, illustratively, the optical flow estimation loss function defined by the present invention is:
Figure SMS_39
wherein S k Optical flow, R, representing current iteration round estimate f
Figure SMS_40
Front and back gray charts (images of fixed frame rate captured synchronously at the time of collecting event data by using DAVIS event camera) corresponding to the input event streams are respectively displayed, and the images are shot by a common camera at the head and tail time corresponding to the segment of event stream, u k (x, y) and v k (x, y) respectively represent the optical flows S k The value of the optical flow in the u, v direction, ρ is the charbonier loss function ρ (x) = (x) 22 ) R R is a constant, such as 0.45, and this function is often used to eliminate outliers in the optical flow estimation task.
In other words, here the first (previous) frame grayscale map R is based on the predicted optical flow f Torsion (warp) rear meterCalculate it and the second frame (post) grey scale map
Figure SMS_41
The difference between the luminosity values, which is an unsupervised training mode, is based on the luminosity consistency assumption.
If the current iteration number k is less than or equal to N, returning to the optical flow S k-1 In the correlation quantity pyramid
Figure SMS_42
Otherwise the loss value calculated in a number of iterations is formulaically ^ based on>
Figure SMS_43
The sum is weighted and then summed up,
Figure SMS_44
and trained by a back propagation algorithm.
With continued reference to fig. 2, after the correlation pyramid C is obtained, a second correlation matrix is extracted by the search operator L and used as one of the inputs to the GRU.
As another input to the GRU, a third feature vector is also input to the GRU. Output and optical flow S of GRU k-1 Adding to obtain optical flow S k
Furthermore, the present invention is also an optical flow estimation device including: the frame building module is used for building frames for the pulse event stream and dividing the frames into a front event frame group and a rear event frame group; the first neural network module is used for respectively extracting a first feature vector and a second feature vector according to the pre-event frame group and the post-event frame group which are sequentially input; the second neural network module is used for extracting a third feature vector according to the input post-event frame group; a correlation quantity extraction module, configured to obtain a correlation quantity pyramid according to the first feature vector and the second feature vector; an optical flow estimation module for estimating the optical flow (e.g., S) based on the third feature vector and the pyramid of correlation quantities and the currently estimated optical flow k-1 ) Obtaining an updated current estimated optical flow (e.g., S) k )。
Alternatively, it may be implemented by a copy of the first neural network module when obtaining the second feature vector.
The modules described above may be relatively independent, or one module may be subordinate to another module and be a sub-module thereof, which are all reasonable adjustments and changes for those skilled in the art, and the present invention is not limited to a specific form.
For the above modules, it is configured to perform the substep of the optical flow estimation method corresponding to any of the foregoing, which is not described herein again.
For example, when implemented as an asic, the chip may include SNN, ANN, and GRU, and the iteration number N may also be set in the chip, and a larger iteration number means a higher accuracy. The pyramid of the correlation quantity and the extraction of various correlation quantity matrixes can be carried out in the chip.
In addition, the invention also discloses a chip which comprises any one of the optical flow estimation devices. The chip may be configured in an electronic device to perform optical flow estimation tasks.
In the experiment process, an Adam optimizer is adopted, the learning rate is set to be 0.00004, and an MVSEC data set is selected for experiment. The MVSEC dataset is a data set designed for the development of novel 3D perception algorithms for event-based cameras, containing two categories, indoor and outdoor, each containing 3 sequences. The method selects outdoor _ day1 as a training set, and selects indoor _ flying1 as a testing set.
Experimental results as shown in fig. 4, the DPS picture, the visualized event frame, the real optical flow value, the real optical flow after using the non-zero pixel mask, the predicted optical flow, and the predicted optical flow after using the non-zero pixel mask are sequentially represented from left to right in a column view including 3 instances in a row view. The accuracy of the predicted optical flow, measured using the Average Endpoint Error (AEE), was 0.78. According to the prediction accuracy, the method can effectively predict the corresponding optical flow according to the event data.
The optical flow estimation apparatus operating the optical flow estimation method of the present invention can be implemented as an application specific integrated circuit (or a module thereof), an FPGA, and a GPU platform (such as a Jetson Orin series module), which are collectively referred to as a chip in the present invention, and the present invention is not limited to a specific hardware implementation.
While the present invention has been described with reference to particular features and embodiments thereof, various modifications, combinations, and substitutions may be made thereto without departing from the invention. The scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification, and it is intended that the method, means, and method may be practiced in association with, inter-dependent on, inter-operative with, or after one or more other products, methods.
Therefore, the specification and drawings should be considered simply as a description of some embodiments of the technical solutions defined by the appended claims, and therefore the appended claims should be interpreted according to the principles of maximum reasonable interpretation and are intended to cover all modifications, variations, combinations, or equivalents within the scope of the disclosure as possible, while avoiding an unreasonable interpretation.
To achieve better technical results or for certain applications, a person skilled in the art may make further improvements on the technical solution based on the present invention. However, even if the partial improvement/design is inventive or/and advanced, the technical idea of the present invention is covered by the technical features defined in the claims, and the technical solution is also within the protection scope of the present invention.
Several technical features mentioned in the attached claims may be replaced by alternative technical features or the order of some technical processes, the order of materials organization may be recombined. Those skilled in the art will readily appreciate that various modifications, changes and substitutions can be made without departing from the scope of the present invention, and the technical problems and/or the sequences can be substantially solved by the same means.
The method steps or modules described in connection with the embodiments disclosed herein may be embodied in hardware, software, or a combination of both, and the steps and components of the embodiments have been described in a functional generic manner in the foregoing description for the sake of clarity in describing the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application or design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention as claimed.

Claims (10)

1. An optical flow estimation method, characterized by:
framing the pulse event stream, and dividing the pulse event stream into a front event frame group and a rear event frame group;
sequentially sending the pre-event frame group and the post-event frame group into a first neural network, and respectively extracting a first feature vector and a second feature vector;
sending the post-event frame group into a second neural network, and extracting a third feature vector;
obtaining a correlation quantity pyramid according to the first feature vector and the second feature vector;
and obtaining the updated current estimated optical flow according to the third feature vector, the relevant quantity pyramid and the current estimated optical flow.
2. The optical flow estimation method according to claim 1, characterized in that:
the first neural network and the second neural network are structurally identical; or/and (c) the first and/or second,
the first neural network comprises a pulse neural network module and an artificial neural network module; or/and
the second neural network comprises a pulse neural network module and an artificial neural network module.
3. The optical flow estimation method according to claim 1 or 2, characterized in that:
and acquiring the updated current estimated optical flow by utilizing RNN according to the third feature vector, the correlation quantity pyramid and the current estimated optical flow.
4. The optical flow estimation method according to claim 3, characterized in that:
calculating to obtain a first correlation matrix C according to the first eigenvector and the second eigenvector ijkl I.e. by
Figure QLYQS_1
Wherein i, j denote the first feature vector->
Figure QLYQS_2
On the abscissa and ordinate, k->
Figure QLYQS_3
Represents a second feature vector pick>
Figure QLYQS_4
H is the sign of the eigenvector channel.
5. An optical flow estimation device, characterized by comprising:
the frame building module is used for building frames for the pulse event stream and dividing the frames into a front event frame group and a rear event frame group;
the first neural network module is used for respectively extracting a first feature vector and a second feature vector according to the pre-event frame group and the post-event frame group which are sequentially input;
the second neural network module is used for extracting a third feature vector according to the input post-event frame group;
a correlation quantity extraction module, configured to obtain a correlation quantity pyramid according to the first feature vector and the second feature vector;
and the optical flow estimation module is used for obtaining the updated current estimated optical flow according to the third feature vector, the relevant quantity pyramid and the current estimated optical flow.
6. The optical flow estimation device according to claim 5, characterized in that:
the first neural network and the second neural network are structurally identical; and/or the first and/or second light sources,
the first neural network comprises a pulse neural network module and an artificial neural network module; or/and
the second neural network comprises a pulse neural network module and an artificial neural network module.
7. The optical flow estimation device according to claim 5 or 6, characterized in that:
and obtaining the updated current estimated optical flow by using RNN according to the third feature vector, the correlation quantity pyramid and the current estimated optical flow.
8. The optical flow estimation device according to claim 7, characterized in that:
calculating to obtain a first correlation matrix C according to the first eigenvector and the second eigenvector ijkl I.e. by
Figure QLYQS_5
Wherein i, j denote a first feature vector +>
Figure QLYQS_6
In the abscissa and ordinate, k, & gt>
Figure QLYQS_7
Represents a second feature vector pick>
Figure QLYQS_8
H is the sign of the eigenvector channel.
9. A chip, characterized by:
the chip comprises the optical flow estimation device of any one of claims 5 to 8.
10. An electronic device, characterized in that:
the electronic device is provided with a chip as claimed in claim 9 and with which the optical flow in the environment is estimated.
CN202310256478.XA 2023-03-16 2023-03-16 Optical flow estimation method and device, chip and electronic equipment Pending CN115953438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310256478.XA CN115953438A (en) 2023-03-16 2023-03-16 Optical flow estimation method and device, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310256478.XA CN115953438A (en) 2023-03-16 2023-03-16 Optical flow estimation method and device, chip and electronic equipment

Publications (1)

Publication Number Publication Date
CN115953438A true CN115953438A (en) 2023-04-11

Family

ID=87290964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310256478.XA Pending CN115953438A (en) 2023-03-16 2023-03-16 Optical flow estimation method and device, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN115953438A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188533A (en) * 2023-04-23 2023-05-30 深圳时识科技有限公司 Feature point tracking method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017096384A1 (en) * 2015-12-04 2017-06-08 Texas Instruments Incorporated Quasi-parametric optical flow estimation
CN115601403A (en) * 2022-09-15 2023-01-13 首都师范大学(Cn) Event camera optical flow estimation method and device based on self-attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017096384A1 (en) * 2015-12-04 2017-06-08 Texas Instruments Incorporated Quasi-parametric optical flow estimation
CN115601403A (en) * 2022-09-15 2023-01-13 首都师范大学(Cn) Event camera optical flow estimation method and device based on self-attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANKYU LEE: "Spike-FlowNet: Event-Based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks", COMPUTER VISION-ECCV 2020 *
YISA ZHANG等: "Event-Based Optical Flow Estimation with Spatio-Temporal Backpropagation Trained Spiking Neural Network", MDPI *
ZACHARY TEED: "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow", COMPUTER VISION-ECCV 2020 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188533A (en) * 2023-04-23 2023-05-30 深圳时识科技有限公司 Feature point tracking method and device and electronic equipment
CN116188533B (en) * 2023-04-23 2023-08-08 深圳时识科技有限公司 Feature point tracking method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Sekikawa et al. Eventnet: Asynchronous recursive event processing
Finn et al. Unsupervised learning for physical interaction through video prediction
CN112597883B (en) Human skeleton action recognition method based on generalized graph convolution and reinforcement learning
JP4633841B2 (en) Estimating 3D road layout from video sequences by tracking pedestrians
CN110390249A (en) The device and method for extracting the multidate information about scene using convolutional neural networks
CN108492319A (en) Moving target detecting method based on the full convolutional neural networks of depth
EP2352128B1 (en) Mobile body detection method and mobile body detection apparatus
CN112149459A (en) Video salient object detection model and system based on cross attention mechanism
CN107239727A (en) Gesture identification method and system
CN110232330A (en) A kind of recognition methods again of the pedestrian based on video detection
Ranjan et al. Learning human optical flow
CN115601403A (en) Event camera optical flow estimation method and device based on self-attention mechanism
CN111798370A (en) Manifold constraint-based event camera image reconstruction method and system
CN116012950A (en) Skeleton action recognition method based on multi-heart space-time attention pattern convolution network
CN115953438A (en) Optical flow estimation method and device, chip and electronic equipment
Zhang et al. Modeling long-and short-term temporal context for video object detection
CN115661505A (en) Semantic perception image shadow detection method
CN115565130A (en) Unattended system and monitoring method based on optical flow
CN116403152A (en) Crowd density estimation method based on spatial context learning network
Gehrig et al. Dense continuous-time optical flow from events and frames
CN111753670A (en) Human face overdividing method based on iterative cooperation of attention restoration and key point detection
CN111339934A (en) Human head detection method integrating image preprocessing and deep learning target detection
CN113191301A (en) Video dense crowd counting method and system integrating time sequence and spatial information
CN117197632A (en) Transformer-based electron microscope pollen image target detection method
Qi et al. Infrared moving targets detection based on optical flow estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination