CN113435576A

CN113435576A - Double-speed space-time graph convolution neural network architecture and data processing method

Info

Publication number: CN113435576A
Application number: CN202110703698.3A
Authority: CN
Inventors: 张雄伟; 方正; 曹铁勇; 郑云飞; 孙蒙; 王杨; 杨吉斌; 赵斐
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-24

Abstract

The invention discloses a double-speed space-time graph convolution neural network architecture and a data processing method in the field of graph data processing, which comprises the following steps: the space-time image feature coding module comprises a high-speed space-time image feature coding module and a low-speed space-time image feature coding module, and is respectively used for receiving and coding high-speed and low-speed space-time image data and outputting an output tensor obtained after coding to the space-time image convolution module; and the space-time diagram convolution module comprises a high-speed space-time diagram convolution module and a low-speed space-time diagram convolution module, and is respectively used for carrying out diagram convolution operation on the output tensors of the high-speed space-time diagram data and the low-speed space-time diagram data and outputting the output characteristic tensors obtained by the operation to the fusion module. The method can be used for tasks related to space-time diagram data, compared with the prior art, the method can effectively extract long-term and short-term information in the space-time diagram, and in addition, different connection relations of nodes in different paths can enable a model to learn more discriminative characteristics, so that better diagram node or diagram classification performance is obtained.

Description

Double-speed space-time graph convolution neural network architecture and data processing method

Technical Field

The invention relates to a double-speed space-time graph convolution neural network architecture and a data processing method, and belongs to the technical field of graph data processing.

Background

Convolutional neural networks have been shown to be excellent in the image, video, natural language, and speech fields in recent years, however, most convolutional neural networks can only process euclidean space data (such as images, text, and speech) and the data in these fields have translational invariance. Translational invariance allows us to define a globally shared convolution kernel in the input data space, thereby defining a convolutional neural network. The graph data is non-Euclidean space data, and the traditional convolution neural network has poor processing effect, so the graph convolution neural network is proposed and becomes a main model for modeling the graph data.

The application scenarios targeted by graph data modeling are very wide, which also makes the tasks handled by graph data modeling diverse. The downstream tasks can be divided into node-level tasks and graph-level tasks, wherein the node-level tasks comprise node classification, link prediction and the like, such as article classification in a citation network and preference inference of users on commodities in a recommendation system. The tasks at the map level include map generation, map classification, etc., such as drug network generation, protein classification in protein networks, skeleton action classification, etc.

The challenges faced in the construction of the convolutional neural network mainly stem from the following aspects:

(1) the graph data is non-European data, convolution and pooling in a traditional convolution neural network depend on translation invariance, and how to design the convolution and pooling for the graph data is a design difficulty;

(2) the diversity of the graph data, and various applications in real life can be naturally represented by the graph data, so that the graph data has various characteristics, such as directed connection of users in a social network, heterogeneous connection of authors and quotations in a quotation network, and different graph convolution neural networks need to be designed for different graphs so as to better model the graph data;

(3) the large scale of graph data, in the big data era, a graph in practical application may be very large in scale, and it is very challenging how to construct a graph convolution neural network on the large-scale graph in an acceptable range of time and space, wherein the graph may contain nodes in the order of millions or even tens of millions, such as a user commodity network in a recommendation system and a user network in a social network.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a double-speed space-time graph convolutional neural network architecture which is used for effectively extracting long-time and short-time information in a space-time graph in a task related to space-time graph data, and different connection relations of nodes in different paths can enable a model to learn characteristics with higher discrimination so as to obtain better graph node or graph classification performance.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a dual-speed space-time graph convolutional neural network architecture, including:

the space-time image feature coding module comprises a high-speed space-time image feature coding module and a low-speed space-time image feature coding module, and is respectively used for receiving and coding high-speed and low-speed space-time image data and outputting an output tensor obtained after coding to the space-time image convolution module;

the spatiotemporal graph convolution module comprises a high-speed spatiotemporal graph convolution module and a low-speed spatiotemporal graph convolution module, and is respectively used for performing graph convolution operation on the output tensors of the high-speed spatiotemporal graph data and the low-speed spatiotemporal graph data and outputting the output characteristic tensors obtained by the operation to the fusion module;

and the fusion module is used for performing fusion operation on the output characteristic tensors of the high-speed and low-speed space-time diagram data and outputting a result.

Preferably, the space-time map feature encoding module includes:

the graph node feature encoder is used for encoding a graph node feature tensor in the spatiotemporal graph data;

the graph node space-time sequence number encoder comprises a space sequence number encoder and a time sequence number encoder which are respectively used for encoding a graph node space sequence number tensor and a graph node time sequence number tensor of the space-time graph data.

Preferably, the graph node feature encoder is formed by sequentially combining a 1 × 1 convolution layer, a BN layer and an activation function layer, the spatial sequence number encoder and the time sequence number encoder are formed by sequentially combining a 1 × 1 convolution layer and an activation function layer, and the number of output channels of the convolution layers of the spatial sequence number encoder, the time sequence number encoder and the graph node feature encoder is the same.

Preferably, the space-time pattern convolution module includes a space-time pattern convolution formed by two branches, one branch is used for multiplying the output tensor of the space-time pattern feature coding module by an adjacent matrix, and then passes through a 1 × 1 convolution layer, a BN layer and an activation function layer, and the other branch is used for directly passing the output tensor of the space-time pattern feature coding module through the 1 × 1 convolution layer, the BN layer and the activation function layer.

Preferably, the fusion module comprises:

the high-speed and low-speed pooling layer is used for pooling output feature tensors of the high-speed and low-speed space-time diagrams to the same output scale and outputting the output feature tensors to the high-speed and low-speed feature parallel layer;

the high-speed and low-speed feature parallel layer is used for parallel output feature tensors after being pooled and outputting the output feature tensors to space-time convolution;

and the space-time convolution layer comprises K × K convolution, a BN layer and an activation function layer and is used for calculating the output characteristic tensor of the high-speed and low-speed characteristic parallel layers after parallel arrangement.

Preferably, the fusion module further comprises:

the time-space pooling layer is used for pooling the output characteristic tensor processed by the time-space convolution layer and outputting the output characteristic tensor to the full connection layer;

and the full connection layer is used for receiving the output characteristic tensor of the time-space pooling layer and outputting a classification result.

In a second aspect, the invention provides a double-speed space-time diagram data processing method, which comprises the following steps:

receiving high-speed and low-speed space-time diagram data and respectively carrying out coding operation to obtain an output tensor;

respectively carrying out graph convolution operation on the output tensors of the high-speed and low-speed space-time diagram data to obtain output characteristic tensors;

and performing output characteristic tensor line fusion operation on the high-speed and low-speed space-time diagram data and outputting a result.

Preferably, the encoding operation includes:

carrying out graph node feature coding operation and graph node space-time sequence number coding operation on high-speed and low-speed space-time graph data, and adding feature tensors obtained after the graph node feature coding operation and the graph node space-time sequence number coding operation;

the graph node space-time sequence number coding operation comprises a space sequence number coding operation and a time sequence number coding operation.

Compared with the prior art, the invention has the following beneficial effects:

compared with the prior art, the invention can simultaneously process high-speed (high frame rate) and low-speed (low frame rate) space-time graph data, can effectively extract long-time and short-time information in the space-time graph, and can enable a model to learn more discriminative characteristics by different connection relations of nodes in different paths, thereby obtaining better graph node or graph classification performance.

Drawings

Fig. 1 is an overall structural diagram of a dual-speed space-time graph convolutional neural network architecture according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a high-low speed space-time diagram feature encoding module according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a high-low speed space-time graph convolution module according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a fusion module according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The first embodiment is as follows:

a double-speed space-time graph convolution neural network architecture comprises a high-speed space-time graph feature coding module and a low-speed space-time graph feature coding module; high and low speed space-time graph convolution module and fusion module. Wherein:

the input of the high-speed and low-speed space-time graph feature coding module is a graph node feature tensor and a graph node space-time sequence number tensor, and the space-time sequence number feature is obtained by using unique thermal coding. The encoder comprises a graph node characteristic encoder and a graph node space-time sequence number encoder, wherein: the graph node characteristic encoder is formed by sequentially combining a 1 × 1 convolutional layer, a BN layer (Batch Normalization) and an activation function layer (ReLU function is selected generally according to tasks), and the number of output channels of the convolutional layer is determined according to the tasks; the graph node space-time sequence number encoder comprises a space sequence number encoder and a time sequence number encoder, and the independent thermal codes acting on the graph node space-time sequence numbers are formed by sequentially combining 1 multiplied by 1 convolutional layers and activation function layers, and the number of output channels of the convolutional layers needs to be consistent with the number of output channels of the graph node characteristic encoder. 1 to 2 convolutional layers can be set in the graph node characteristic encoder and the graph node space-time sequence number encoder aiming at different space-time graph data. And adding the feature tensors obtained after the two types of different data are coded to obtain the output tensor of the space-time image feature coding module. The high-speed and low-speed space-time image feature coding modules have the same structure.

And the high-speed and low-speed space-time image convolution module inputs the output feature tensor of the high-speed and low-speed space-time image feature coding module and the corresponding image adjacency matrix. The method comprises 1-3 space-time graph convolutions, the number of the space-time graph convolutions can be selected according to tasks, and the space-time graph convolutions are composed of two branches: one branch is a graph convolution branch, the input characteristics are multiplied by an adjacent matrix at first, and then the product passes through a 1 multiplied by 1 convolution layer, a BN layer and a ReLU layer; the other branch directly passes through the 1 × 1 convolution layer, the BN layer and the ReLU layer; the number of output channels in the graph convolution layer is also self-determined according to the task. And finally adding the two branches to obtain the final output of the time-space diagram convolution. The high-speed and low-speed space-time graph convolution modules have the same structure.

The input of the fusion module is the output characteristic tensor of the high-speed and low-speed space-time graph convolution module, which comprises a high-speed and low-speed pooling layer, a high-speed and low-speed characteristic parallel layer, a space-time convolution layer (comprising K × K convolution, a BN layer and an activation function layer, the number of output channels is determined by a task), a space-time pooling layer and a full connection layer. The high-speed and low-speed pooling layers are used for pooling the feature tensors of the high-speed and low-speed space-time diagrams to the same scale (the scale is determined by a task and is designed by a user), and then two paths of features are arranged in parallel in the feature parallel layer. And then the data is calculated by a space-time convolution layer, enters a space-time pooling layer and is calculated by a full connection layer to obtain output. Aiming at different tasks, the final output can be the output of a space-time convolution layer or the output of a full connection layer, and is selected by self.

The present invention is described in further detail below with reference to the attached drawing figures.

First, fig. 1 is a schematic diagram of a convolutional neural network structure with double-speed space-time diagrams, which is disclosed in the present invention. A high-speed and low-speed space-time image feature coding module; high and low speed space-time graph convolution module and fusion module.

The input space-time diagram data is assumed to be a tensor of Bs × C × S × T, wherein Bs represents the number of the input space-time diagrams, C represents the characteristic dimension of each node in the space-time diagrams, S represents the number of diagram nodes corresponding to each time point in the space-time diagrams, and T represents all time points in the space-time diagrams.

In addition, assume that the adjacency matrix of each time node graph in the space-time graph is A_k(k represents the number of adjacency matrices, and a plurality of different adjacency matrices can be defined or calculated for the same graph) A_kAnd the matrix is S-S and represents the connection relation of S nodes at each time point.

In order to obtain a high-speed and low-speed space-time diagram, a high-speed sampling interval is set to be H, and a low-speed sampling interval is set to be L, so that high-speed space-time diagram data is expressed as Bs multiplied by C multiplied by S multiplied by T/H, and low-speed space-time diagram data is expressed as Bs multiplied by C multiplied by S multiplied by T/L.

(1) The high-speed and low-speed space-time image feature coding module:

the high-speed space-time diagram feature coding module and the low-speed space-time diagram feature coding module are used for coding space-time diagram data, have the same structure, and consist of a diagram node feature coder and a diagram node space-time sequence number coder as shown in fig. 2.

The graph node feature encoder is used for encoding the features of all nodes in the graph, taking a low-speed space-time graph encoding module as an example, the input of the low-speed space-time graph encoding module is low-speed space-time graph data, and the output of the graph node feature encoder is assumed to be C₁The output of the graph node feature encoder is T₁＝Bs×C₁×S×T/L。

The graph node space-time serial number encoder is used for encoding serial numbers of space-time graphs, the number of output channels is consistent with that of a graph node characteristic encoder, the serial numbers are represented by a single-hot coding characteristic, for example, low-speed space-time graph data comprises T/L graphs, S nodes of each graph can obtain S-dimensional space single-hot coding characteristics, namely Bs multiplied by S multiplied by T/L, and the space single-hot coding characteristics are output as E after passing through the graph node space serial number encoder₂＝Bs×C₁xSxT/L (E represents the output tensor); and obtaining T/L dimensional time one-hot coding in time, namely Bs multiplied by T/L multiplied by S multiplied by T/L, and outputting E after passing through a graph node space serial number coder₃＝Bs×C₁xSxT/L; the output of the final low-speed space-time image coding module is E ═ E₁+E₂+E₃。

The structure and the operation process of the high-speed space-time diagram coding module are consistent with those of the low-speed space-time diagram coding module, and the difference is that the input data is high-speed space-time diagram data. The high-speed and low-speed space-time graph feature coding module is used for carrying out operation of coding and adding the graph node features and the node sequence number features, so that long-time and short-time information in the space-time graph can be more fully utilized, and more discriminative feature representation can be learned.

(2) High and low speed space-time graph convolution module:

after being coded by the high-speed and low-speed space-time diagram feature coding module, the output tensor enters the high-speed and low-speed space-time diagram convolution module to carry out the graph convolution operation,FIG. 2 is a diagram of a spatio-temporal graph convolution module, where the high-speed spatio-temporal graph convolution module and the low-speed spatio-temporal graph convolution module have the same structure, and the low-speed spatio-temporal graph convolution module is taken as an example for explanation. The graph convolution input tensor E is Bs multiplied by C1 multiplied by S multiplied by T/L, and the adjacent matrix is A_kThen the output tensor of the convolution of the space-time diagram at time T (T is more than or equal to 1 and less than or equal to T/L) is obtained by the following formula:

where W is the convolutional layer parameter of 1 × 1. Graph convolution is a mainstream method for processing graph data, and aims to combine graph structure and node characteristics to learn better characteristic representation so as to improve the performance of subsequent tasks. The space-time graph convolution module of the invention can contain 1-3 space-time graph convolution operations which are determined according to different task requirements. Meanwhile, a residual path, i.e. a path only including a convolutional layer in fig. 2, is added to improve the capability of network learning of deeper features.

(3) The fusion module structure:

the fusion module proposed in this patent is shown in fig. 4, and high-speed and low-speed space-time diagram data are subjected to graph convolution processing by a high-speed and low-speed space-time diagram convolution module and then are sent to the fusion module together to obtain final output. Assume the input high-speed feature tensor is E_{Gao Su}＝Bs×C₃X S X T/H, low speed feature tensor E_{Low speed}＝Bs×C₄The two tensors first go through two different pooling layers, with the goal of pooling having the same time and space dimensions for the high and low speed feature tensors. Suppose that high and low speed spatial and temporal dimensions are to be pooled to S₂And T₂Then, the parameters of the pooling layer are designed according to the output scale, and after passing through the pooling layer, the high-speed feature tensor becomes E_{Gao Su}＝Bs×C₃×S₂×T₂The low-speed feature tensor becomes E_{Low speed}＝Bs×C₄×S₂×T₂At the moment, the space-time scales of the two tensors are consistent, the parallel operation of the tensors can be carried out, and the tensor E can be obtained through parallel operation_{Parallel tensor}＝Bs×(C₄+C₃)×S₂×T₂. The parallel tensor is input into the subsequent space-time convolution, the convolution kernel is k multiplied by k, and the subsequent space-time pooling and full-connection layer operation are not needed for the problem of graph node classification or prediction; aiming at the classification problem of the whole space-time diagram, space-time pooling is needed, the feature tensor scale is pooled to 1 multiplied by 1, the classification result is output through a full connection layer, and the parameters of the classification result are determined according to specific tasks.

Example two:

a double-speed space-time diagram data processing method is applied to the double-speed space-time diagram convolution neural network architecture and comprises the following steps:

receiving high-speed and low-speed space-time diagram data and respectively carrying out coding operation to obtain an output tensor, wherein the coding operation comprises the following steps: carrying out graph node feature coding operation and graph node space-time sequence number coding operation on high-speed and low-speed space-time graph data, and adding feature tensors obtained after the graph node feature coding operation and the graph node space-time sequence number coding operation; the space-time sequence number coding operation of the graph nodes comprises a space sequence number coding operation and a time sequence number coding operation;

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A double-speed space-time graph convolutional neural network architecture, comprising:

2. The architecture of claim 1, wherein the space-time graph convolutional neural network comprises:

3. The architecture of claim 2, wherein the graph node feature encoder is formed by sequentially combining a 1 x 1 convolutional layer, a BN layer and an activation function layer, the spatial sequence number encoder and the time sequence number encoder are formed by sequentially combining a 1 x 1 convolutional layer and an activation function layer, and the number of output channels of the convolutional layers of the spatial sequence number encoder, the time sequence number encoder and the graph node feature encoder is the same.

4. The architecture of claim 1, wherein the space-time pattern convolution module comprises a space-time pattern convolution comprising two branches, one branch is used for multiplying the output tensor of the space-time pattern feature coding module by an adjacent matrix and then passing through a 1 x 1 convolution layer, a BN layer and an activation function layer, and the other branch is used for directly passing the output tensor of the space-time pattern feature coding module through the 1 x 1 convolution layer, the BN layer and the activation function layer.

5. The architecture of claim 1, wherein the fusion module comprises:

6. The architecture of claim 5, wherein the fusion module further comprises:

7. A method for processing data of a double-speed space-time diagram, which is applied to the architecture of the convolutional neural network of the double-speed space-time diagram, and comprises the following steps:

8. The method as claimed in claim 7, wherein the encoding operation comprises: