CN111027681A - Time sequence data processing model training method, data processing device and storage medium - Google Patents

Time sequence data processing model training method, data processing device and storage medium Download PDF

Info

Publication number
CN111027681A
CN111027681A CN201911252467.4A CN201911252467A CN111027681A CN 111027681 A CN111027681 A CN 111027681A CN 201911252467 A CN201911252467 A CN 201911252467A CN 111027681 A CN111027681 A CN 111027681A
Authority
CN
China
Prior art keywords
data processing
time sequence
processing model
sequence data
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911252467.4A
Other languages
Chinese (zh)
Other versions
CN111027681B (en
Inventor
徐挺洋
蔡兴宇
黄俊洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911252467.4A priority Critical patent/CN111027681B/en
Publication of CN111027681A publication Critical patent/CN111027681A/en
Application granted granted Critical
Publication of CN111027681B publication Critical patent/CN111027681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention provides a time sequence data processing model training method, which comprises the following steps: acquiring a training sample set, and processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model; extracting a dynamic time warping processing result of a network according to an output result of a time sequence data processing model and a time sequence characteristic of the time sequence data processing model to process the time sequence data processing model and determine an updating parameter of the time sequence data processing model; and according to the updating parameters of the time sequence data processing model, carrying out iterative updating on the encoder network parameters and the decoder network parameters of the time sequence data processing model through a training sample set. The invention also provides a time sequence data processing method, a time sequence data processing device and a storage medium. The invention can make the generalization capability of the time sequence data processing model stronger, improve the training precision and the training speed of the time sequence data processing model, and improve the accuracy and the readability of time sequence data processing.

Description

Time sequence data processing model training method, data processing device and storage medium
Technical Field
The present invention relates to information processing technologies, and in particular, to a training method, a data processing method, an apparatus, and a storage medium for a time series data processing model.
Background
In the deep neural network, in the process of fitting the existing signals, the similarity between the original data and the fitting parameters needs to be calculated. Typically, the computation of similarity in deep neural networks is only Minkowski or Mahalanobis distances. However, when the distances are applied to the time series data, the similarity of two signals with a deviation caused by the doppler effect in the time series data cannot be captured, so that a time series data processing model is difficult to generate a high-quality processing result, the processing of large-scale time series data is affected, and the use experience of a user is also affected.
Disclosure of Invention
In view of this, an embodiment of the present invention provides a time series data processing model training method, a data processing method, an apparatus, and a storage medium, and the technical solution of the embodiment of the present invention is implemented as follows:
the embodiment of the invention generally provides a time sequence data processing model training method, which comprises the following steps:
acquiring a training sample set, wherein the training sample set comprises at least one set of training samples of time sequence data;
processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model;
responding to the initial parameters of the time sequence data processing model, when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model;
and according to the update parameters of the time sequence data processing model, iteratively updating the encoder network parameters and the decoder network parameters of the time sequence data processing model through the training sample set so as to process the samples containing the time sequence data through the time sequence data processing model.
In the above scheme, the method further comprises:
negative case processing is carried out on the training sample set to form a negative case sample set corresponding to the training sample set, wherein the negative case sample set is used for adjusting encoder parameters and decoder parameters of the time sequence data processing model;
and determining a corresponding evaluation research value according to the negative sample set, wherein the evaluation research value is used for evaluating the processing result of the time series data processing model as a supervision parameter.
In the foregoing solution, the performing negative case processing on the training sample set includes:
randomly combining processing results to be output in a decoder of the time sequence data processing model to form a negative sample set corresponding to the training sample set; alternatively, the first and second electrodes may be,
and carrying out random deletion processing or replacement processing on a processing result to be output in a decoder of the time sequence data processing model to form a negative example sample set corresponding to the training sample set.
The embodiment of the invention also provides a data processing method of the time sequence data processing model, which comprises the following steps:
acquiring time sequence data information to be processed, and converting the time sequence data information to be processed into corresponding identifiable vector information;
determining at least one hidden variable corresponding to the vector information through an encoder network of the time sequence data processing model;
determining a dynamic time warping processing result matched with the to-be-processed time series data information through a time series characteristic extraction network of the time series data processing model;
responding to the dynamic time warping processing result, generating a data processing result corresponding to the hidden variable and a selected probability of the data processing result according to the at least one hidden variable through a decoder network of the time sequence data processing model;
forming a data processing result corresponding to the vector information according to the selected probability of the data processing result;
outputting the data processing result;
wherein the time series data processing model is obtained based on the training of the preamble embodiment.
The embodiment of the invention also provides a time sequence data processing model training device, which comprises:
the data transmission module is used for acquiring a training sample set, wherein the training sample set comprises at least one group of training samples of time sequence data;
the time sequence data processing model training module is used for processing the training sample set through a time sequence data processing model so as to determine initial parameters of the time sequence data processing model;
the time sequence data processing model training module is used for responding to the initial parameters of the time sequence data processing model, and when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the dynamic time warping processing result of the time sequence characteristic extraction network of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model;
and the time sequence data processing model training module is used for carrying out iterative updating on the encoder network parameter and the decoder network parameter of the time sequence data processing model through the training sample set according to the updating parameter of the time sequence data processing model so as to realize the processing of the sample containing the time sequence data through the time sequence data processing model.
In the above-mentioned scheme, the first step of the method,
the data transmission module is used for determining a dynamic noise threshold value matched with the use environment of the time sequence data processing model;
the data transmission module is used for acquiring original data from a data source corresponding to the use environment of the time sequence data processing model;
the data transmission module is used for denoising the original data according to the dynamic noise threshold value and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value;
and the data transmission module is used for performing word segmentation processing on the original data according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
In the above-mentioned scheme, the first step of the method,
the data transmission module is used for determining a fixed noise threshold value matched with the use environment of the time sequence data processing model;
the data transmission module is used for acquiring original data from a data source corresponding to the use environment of the time sequence data processing model;
the data transmission module is used for denoising the original data according to the fixed noise threshold value and triggering a fixed word segmentation strategy matched with the fixed noise threshold value;
and the data transmission module is used for performing word segmentation processing on the original data according to a fixed word segmentation strategy matched with the fixed noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for processing different training samples in the training sample set through a time sequence characteristic extraction network of the time sequence data processing model so as to determine a corresponding dynamic time warping processing result;
the time sequence data processing model training module is used for responding to the dynamic time warping processing result, substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the time sequence data processing model;
and the time sequence data processing model training module is used for determining that the parameters corresponding to the encoder and the corresponding decoder in the time sequence data processing model are used as the update parameters of the time sequence data processing model when the loss function meets the convergence condition.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for determining a noise parameter matched with the training sample set according to an updating parameter of the time sequence data processing model, and the noise parameter is used for representing a noise value of a parallel statement sample in the training sample set;
and the time sequence data processing model training module is used for carrying out iterative updating on the parameters of an encoder and a decoder of the time sequence data processing model according to the noise value of the noise parameter when the noise parameter reaches the corresponding noise value threshold value until a loss function corresponding to a self-encoding network formed by the encoder and the decoder of the time sequence data processing model meets the corresponding convergence condition.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for extracting time sequence data samples in the training sample set;
and the time sequence data processing model training module is used for training the time sequence characteristic extraction network of the time sequence data processing model through the time sequence data samples so as to determine network parameters matched with the time sequence characteristic extraction network.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for carrying out negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for adjusting the encoder parameter and the decoder parameter of the time sequence data processing model;
and the time sequence data processing model training module is used for determining a corresponding evaluation research value according to the negative sample set, wherein the evaluation research value is used as a supervision parameter to evaluate the processing result of the time sequence data processing model.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for randomly combining processing results to be output in a decoder of the time sequence data processing model to form a negative example sample set corresponding to the training sample set;
the time sequence data processing model training module is used for carrying out random deletion processing or replacement processing on a processing result to be output in a decoder of the time sequence data processing model to form a negative sample set corresponding to the training sample set.
In the above-mentioned scheme, the first step of the method,
the time sequence data processing model training module is used for adjusting parameters of a multi-attention mechanism-based cyclic convolution neural network in the decoder network according to a feature vector corresponding to multi-modal time sequence data of the time sequence data processing model when the using environment of the time sequence data processing model is a financial data monitoring process, so that the parameters of the multi-attention mechanism-based cyclic convolution neural network are matched with the multi-modal time sequence data fusion feature vector.
The embodiment of the invention also provides a time sequence data processing model processing device, which comprises:
the encoder module is used for acquiring time sequence data information to be processed and converting the time sequence data information to be processed into corresponding identifiable vector information;
the encoder module is used for determining at least one hidden variable corresponding to the vector information through an encoder network of a time sequence data processing model;
the time sequence characteristic extraction module is used for determining a dynamic time warping processing result matched with the to-be-processed time sequence data information through a time sequence characteristic extraction network of the time sequence data processing model;
a decoder module, configured to generate, in response to the dynamic time warping processing result, a data processing result corresponding to the hidden variable and a selected probability of the data processing result according to the at least one hidden variable through a decoder network of the time series data processing model;
the decoder module is used for forming a data processing result corresponding to the vector information according to the selected probability of the data processing result;
and the decoder module is used for outputting the data processing result.
The embodiment of the invention also provides a training device of the time sequence data processing model, which comprises:
a memory for storing executable instructions;
and the processor is used for realizing the training method of the preorder time sequence data processing model when the executable instructions stored in the memory are operated.
The embodiment of the invention also provides a time series data processing model processing device, and the image processing device comprises:
a memory for storing executable instructions;
and the processor is used for realizing the data processing method of the preorder time sequence data processing model when the executable instructions stored in the memory are operated.
An embodiment of the present invention further provides a computer-readable storage medium, in which executable instructions are stored, and when the executable instructions are executed by a processor, the method for training a preamble time series data processing model or the method for processing data of the preamble time series data processing model is implemented.
The embodiment of the invention has the following beneficial effects:
the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises at least one group of training samples of time sequence data; processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model; responding to the initial parameters of the time sequence data processing model, when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model; and according to the update parameters of the time sequence data processing model, iteratively updating the encoder network parameters and the decoder network parameters of the time sequence data processing model through the training sample set so as to process the samples containing the time sequence data through the time sequence data processing model. Therefore, the generalization capability of the time sequence data processing model is stronger, the training precision and the training speed of the time sequence data processing model are improved, the gain of the existing training sample carrying the time sequence data to model training can be effectively and fully utilized, the time sequence data processing model can adapt to different use scenes, the similarity of two signals with deviation caused by Doppler effect in the time sequence data is distinguished, the influence of environmental noise on the time sequence data processing model is avoided, the time sequence data processing model can generate a high-quality data processing result, and the accuracy and the readability of time sequence data processing are improved.
Drawings
FIG. 1 is a schematic view of a usage scenario of a training method for a time series data processing model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of a training apparatus for a time series data processing model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a process of processing time series data according to a conventional scheme;
FIG. 4 is a schematic flow chart illustrating an alternative method for training a time series data processing model according to an embodiment of the present invention;
FIG. 5 is an alternative schematic diagram of a temporal data processing model according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an alternative timing information reading of the timing data processing model according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an alternative structure of an encoder in the temporal data processing model according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of vector stitching of an encoder in a temporal data processing model according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating an encoding process of an encoder in a time series data processing model according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating a decoding process of a decoder in a time series data processing model according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating a decoding process of a decoder in a time series data processing model according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a decoding process of a decoder in a time series data processing model according to an embodiment of the present invention;
FIG. 13 is a machine-readable representation of an alternative time series data of the time series data processing model in accordance with the present invention;
FIG. 14 is a schematic flow chart illustrating an alternative method for training a time series data processing model according to an embodiment of the present invention;
FIG. 15 is a schematic flow chart illustrating an alternative method for training a time series data processing model according to an embodiment of the present invention;
FIG. 16 is a schematic diagram illustrating a structure of a sequential data processing model processing apparatus according to an embodiment of the present invention;
FIG. 17 is a schematic flow chart illustrating an alternative method for processing time series data according to the time series data processing model of the present invention;
FIG. 18 is a diagram illustrating an application environment of a time series data processing model according to an embodiment of the present invention;
FIG. 19 is a schematic flow chart illustrating an alternative method for training a time series data processing model according to an embodiment of the present invention;
FIG. 20 is a schematic diagram illustrating a working process of a time series feature extraction network in a time series data processing model according to an embodiment of the present invention;
FIG. 21 is a schematic flow chart illustrating an alternative method for training a time series data processing model according to an embodiment of the present invention;
fig. 22 is a schematic diagram of data transmission of a time series feature extraction network in the time series data processing model according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
1) Transformers: a new network architecture, employing an attention mechanism, replaces the traditional encoder-decoder that must rely on other neural network patterns. Word vector: a single word is represented by a fixed-dimension distribution vector. Compound word: the keywords with thicker granularity are composed of the keywords with fine granularity, and the semantics of the keywords with thicker granularity are richer and more complete than those of the keywords with fine granularity.
2) BERT: the method is called Bidirective Encoder recurrents from transformations, and is a language model training method utilizing massive texts. The method is widely applied to various natural language processing tasks such as text classification, text matching, machine reading understanding and the like.
3) Artificial neural networks: neural Network (NN) is a mathematical model or a computational model for simulating the structure and the function of a biological Neural Network and is used for estimating or approximating functions in the field of machine learning and cognitive science.
4) Model parameters: is a number of functions that use generic variables to establish relationships between functions and variables. In artificial neural networks, the model parameters are typically real matrices.
5) API: the full Application Programming Interface can process time sequence data into Application program interfaces, which are some predefined functions or appointments for linking different components of a software system. The goal is to provide applications and developers the ability to access a set of routines based on certain software or hardware without having to access native code or understand the details of the internal workings.
6) And (3) SDK: the Software Development Kit is a collection of Development tools for establishing application Software for a specific Software package, a Software framework, a hardware platform, an operating system and the like, and broadly comprises a collection of related documents, paradigms and tools for assisting in developing a certain type of Software.
7) Neural Networks (NN): an Artificial Neural Network (ANN), referred to as Neural Network or Neural Network for short, is a mathematical model or computational model that imitates the structure and function of biological Neural Network (central nervous system of animals, especially brain) in the field of machine learning and cognitive science, and is used for estimating or approximating functions.
8) Encoder-decoder architecture: the network structure is commonly used in the machine timing data processing technology. The decoder receives the output result of the encoder as input and outputs a corresponding text sequence of another language.
9) Convolutional Neural Networks (CNN Convolutional Neural Networks) are a class of Feed forward Neural Networks (Feed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep). The convolutional neural network has a representation learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure of the convolutional neural network.
10) And (4) model training, namely performing multi-classification learning on the image data set. The model can be constructed by adopting deep learning frames such as Tensor Flow, torch and the like, and a multi-classification model is formed by combining multiple layers of neural network layers such as CNN and the like. The input of the model is a three-channel or original channel matrix formed by reading an image through openCV and other tools, the output of the model is multi-classification probability, and the webpage category is finally output through softmax and other algorithms. During training, the model approaches to a correct trend through an objective function such as cross entropy and the like.
11) Bidirectional attention neural network model (BERT Bidirectional Encoder recurrent from transformations) Google.
12) token: the word unit, before any actual processing of the input text, needs to be divided into language units such as words, punctuation, numbers or pure alphanumerics. These units are called word units.
13) Softmax: the normalized exponential function is a generalization of the logistic function. It can "compress" a K-dimensional vector containing arbitrary real numbers into another K-dimensional real vector, such that each element ranges between [0,1] and the sum of all elements is 1.
Fig. 1 is a schematic view of a usage scenario of a training method for a time series data processing model according to an embodiment of the present invention, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a time series data processing client, a user can input corresponding information with time series data through the time series data processing client, and the client can also receive a corresponding data processing result and display the received data processing result (judgment or prediction of the time series data) to the user; the terminal is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and uses a wireless link to realize data transmission.
As an example, the server 200 is configured to lay and train the time series data processing model, so as to iteratively update network parameters of an encoder and a decoder of the time series data processing model, so as to generate a data processing result for fetching the to-be-processed time series data information through the encoder parameters and the decoder in the time series data processing model, and expose the data processing result corresponding to the to-be-processed time series data information generated by the time series data processing model through the terminal (the terminal 10-1 and/or the terminal 10-2).
Of course, before the target data to be processed is processed by the time sequence data processing model to generate a corresponding data processing result, the time sequence data processing model also needs to be trained, which specifically includes:
acquiring a training sample set, wherein the training sample set comprises at least one set of training samples of time sequence data; processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model; responding to the initial parameters of the time sequence data processing model, when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model; and according to the update parameters of the time sequence data processing model, iteratively updating the encoder network parameters and the decoder network parameters of the time sequence data processing model through the training sample set so as to process the samples containing the time sequence data through the time sequence data processing model.
As described in detail below, the structure of the training apparatus for a time series data processing model according to the embodiment of the present invention may be implemented in various forms, such as a dedicated terminal with a time series data processing model training function, or a server provided with a time series data processing model training function, for example, the server 200 in the foregoing fig. 1. Fig. 2 is a schematic diagram of a component structure of a training apparatus for a time series data processing model according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the training apparatus for the time series data processing model, and a part of or all of the structure shown in fig. 2 may be implemented as needed.
The training device of the time sequence data processing model provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the training apparatus of the time series data processing model are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.
The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.
It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.
In some embodiments, the training apparatus for a time series data processing model provided in the embodiments of the present invention may be implemented by a combination of software and hardware, and as an example, the training apparatus for a time series data processing model provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the training method for a time series data processing model provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
As an example of the time series data processing model training apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the time series data processing model training apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the time series data processing model training method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).
By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.
As an example of the hardware implementation of the training apparatus for the time series data processing model provided in the embodiment of the present invention, the apparatus provided in the embodiment of the present invention may be implemented directly by using the processor 201 in the form of a hardware decoding processor, for example, by using one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components to implement the time series data processing model training method provided in the embodiment of the present invention.
The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the training apparatus of the time series data processing model. Examples of such data include: any executable instructions for operating on a training apparatus for a time series data processing model, such as executable instructions, a program implementing the method for training from a time series data processing model according to an embodiment of the present invention may be included in the executable instructions.
In other embodiments, the training apparatus for a time series data processing model provided by an embodiment of the present invention may be implemented in software, and fig. 2 illustrates the training apparatus for a time series data processing model stored in the memory 202, which may be software in the form of programs and plug-ins, and includes a series of modules, and as an example of the programs stored in the memory 202, the training apparatus for a time series data processing model may include the following software modules: the system comprises a data transmission module 2081 and a time sequence data processing model training module 2082. When the software modules in the training apparatus for time series data processing model are read into the RAM by the processor 201 and executed, the training method for time series data processing model provided by the embodiment of the present invention will be implemented, and the functions of the software modules in the training apparatus for time series data processing model provided by the embodiment of the present invention will be described below, wherein,
the data transmission module 2081 is configured to obtain a training sample set, where the training sample set includes at least one set of training samples of time series data;
the time sequence data processing model training module 2082 is used for processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model;
the time sequence data processing model training module 2082, configured to respond to the initial parameter of the time sequence data processing model, and when the initial parameter of the time sequence data processing model is kept unchanged, process the time sequence data processing model according to the output result of the time sequence data processing model and the dynamic time warping processing result of the time sequence feature extraction network of the time sequence data processing model, and determine an update parameter of the time sequence data processing model;
the time series data processing model training module 2082 is configured to iteratively update the encoder network parameters and the decoder network parameters of the time series data processing model through the training sample set according to the update parameters of the time series data processing model, so as to process the sample including the time series data through the time series data processing model.
Before describing the training method of the time series data processing model provided by the embodiment of the present invention, first, in a process in which the time series data processing model generates a corresponding data processing result (which may be prediction of time series data) according to-be-processed data (which carries single time series data or multi-modal time series data) in the present application, before describing the training method of the time series data processing model provided by the embodiment of the present invention, fig. 3 is a schematic diagram of a time series data processing process in a conventional scheme, where an eq2seq model is an architectural manner represented by an encoder (Encode) and a decoder (Decode), and a seq2seq model generates an output sequence Y according to an input sequence X. In the seq2seq model represented by an encoder (Encode) which converts an input sequence into a vector of fixed length, and a decoder (Decode) which decodes the input vector of fixed length into an output sequence. As shown in fig. 3, an Encoder (Encoder) encodes input data to be processed to obtain text features of the data to be processed; and a Decoder (Decoder) decodes the text features and outputs the decoded text features to generate corresponding data processing results, wherein the encoder (Encode) and the Decoder (Decode) are in one-to-one correspondence.
It can be seen that the time series data processing model based on the Seq2Seq model for the related art shown in fig. 3 has a disadvantage in that in the conventional deep neural network, the similarity between the original data and the fitting parameters needs to be calculated in the fitting process of the existing signal. Typically, the computation of similarity in deep neural networks is only Minkowski or Mahalanobis distances. However, when the distances are applied to time series data, the similarity of two signals which are biased by the doppler effect in the time series data cannot be captured, and meanwhile, the distances are easily interfered by noise information, useless recognition is triggered, and user experience is poor.
To solve the drawbacks of the related art, referring to fig. 4, fig. 4 is an optional flowchart of a method for training a time series data processing model according to an embodiment of the present invention, and it can be understood that the steps shown in fig. 4 may be executed by various electronic devices operating a device for training a time series data processing model, for example, a dedicated terminal with a time series data processing function, a server with a time series data processing model training function, or a server cluster. The following is a description of the steps shown in fig. 4.
Step 401: the time sequence data processing model training device obtains a training sample set.
Wherein the training sample set comprises training samples of at least one set of time series data.
In some embodiments of the present invention, obtaining the training sample set may be implemented by:
determining a dynamic noise threshold value matched with the use environment of the time-series data processing model; acquiring raw data in a data source corresponding to a use environment of the time-series data processing model; denoising the original data according to the dynamic noise threshold value, and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value; and performing word segmentation processing on the original data according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model. Wherein the dynamic noise threshold value matched with the usage environment of the time series data processing model is different due to different usage environments of the time series data processing model, for example, in the usage environment of time series data processing for power consumption, the dynamic noise threshold value matched with the usage environment of the time series data processing model needs to be smaller than the dynamic noise threshold value in the time series data environment of stock information.
In some embodiments of the present invention, the obtaining of the training sample set may be implemented by:
determining a fixed noise threshold value matched with the use environment of the time-series data processing model; acquiring raw data in a data source corresponding to a use environment of the time-series data processing model; denoising the original data according to the fixed noise threshold value, and triggering a fixed word segmentation strategy matched with the fixed noise threshold value; and performing word segmentation processing on the original data according to a fixed word segmentation strategy matched with the fixed noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model. When the time series data processing model is solidified in a corresponding hardware mechanism, such as medical detection equipment, and the use environment is used for monitoring time series data in pathological information of a patient, due to the fact that noise is single, the training speed of the time series data processing model can be effectively improved through a fixed noise threshold corresponding to the fixed time series data processing model, and waiting time of a user is reduced.
Step 402: and the time sequence data processing model training device processes the training sample set through a time sequence data processing model so as to determine initial parameters of the time sequence data processing model.
Step 403: and the time sequence data processing model training device responds to the initial parameters of the time sequence data processing model, and when the initial parameters of the time sequence data processing model are kept unchanged, processes the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determines the updating parameters of the time sequence data processing model.
In some embodiments of the present invention, in response to the initial parameter of the time-series data processing model, while keeping the initial parameter of the time-series data processing model unchanged, processing the time-series data processing model through the output result of the time-series data processing model and the dynamic time warping processing result of the time-series feature extraction network of the time-series data processing model, and determining the update parameter of the time-series data processing model, may be implemented by:
processing different training samples in the training sample set through a time sequence characteristic extraction network of the time sequence data processing model to determine a corresponding dynamic time warping processing result;
responding to the dynamic time warping processing result, substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the time sequence data processing model;
and determining parameters corresponding to an encoder and corresponding decoder parameters in the time sequence data processing model when the loss function meets a convergence condition as updating parameters of the time sequence data processing model.
The composition of the time series data processing model can include: the encoder network and the decoder network, in some embodiments of the invention, the time-series data processing model may be a bidirectional attention neural network model (BERT bidirectional Encode responses from Transformers). With continuing reference to fig. 5, fig. 5 is an optional structural schematic diagram of the time series data processing model in the embodiment of the present invention, wherein the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attentional layer followed by a simple fully connected layer. With each sub-layer added with residual concatenation (residual connection) and normalization (normalization).
The Decoder includes: the Layer consists of N ═ 6 identical layers, wherein the layers and the encoder are not identical, and the layers comprise three sub-layers, wherein one self-orientation Layer is arranged, and the encoder-decoding Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attentional layers.
With continuing reference to fig. 6, fig. 6 is an alternative timing information reading schematic diagram of the timing data processing model in the embodiment of the present invention, in which both the encoder and decoder portions include 6 encoders and encoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the input target is "9 months, 20 days, 14: the 20A stock price is 60 yuan' and the output machine reading result is processed by a time sequence data processing model as follows: "9.20- -14: 20- -A- -60 ".
With continuing reference to FIG. 7, FIG. 7 is an alternative block diagram of an encoder in a time series data processing model in an embodiment of the present invention, where its input consists of a query of dimension d (Q) and a key (K) and a value of dimension d (V), all keys compute the dot product of the query and apply the softmax function to obtain the weight of the value.
With continued reference to FIG. 7, FIG. 7 shows a vector representation of an encoder in a time series data processing model in an embodiment of the present invention, where Q, K, and V are obtained by multiplying the vector x of the input encoder by W ^ Q, W ^ K, W ^ V. W ^ Q, W ^ K, W ^ V are (512, 64) in the dimension of the article, then suppose the dimension of our inputs is (m, 512), where m represents the number of words. The dimension of Q, K and V obtained after multiplying the input vector by W ^ Q, W ^ K, W ^ V is (m, 64).
With continued reference to FIG. 8, FIG. 8 is a schematic diagram of vector stitching of an encoder in the temporal data processing model according to an embodiment of the present invention, wherein Z0To Z7I.e. corresponding 8 parallel heads (dimension is (m, 64)), and then concat gets the (m, 512) dimension after these 8 heads. After the final multiplication with W ^ O, the output matrix with the dimension (m, 512) is obtained, and the dimension of the matrix is consistent with the dimension of entering the next encoder.
With continued reference to fig. 9, fig. 9 is a schematic diagram of an encoding process of an encoder in the time series data processing model according to the embodiment of the present invention, in which x1 passes through self-attention to reach a state z1, the tensor passing through self-attention needs to go through a residual error network and a latex Norm, and then the tensor passes through a fully connected feed-forward network, and the feed-forward network needs to perform the same operation, and perform residual error processing and normalization. The tensor which is finally output can enter the next encoder, then the iteration is carried out for 6 times, and the result of the iteration processing enters the decoder.
With continuing reference to fig. 10, fig. 10 is a schematic diagram of a decoding process of a decoder in the time series data processing model according to the embodiment of the present invention, wherein the input and output of the decoder and the decoding process are as follows:
and (3) outputting: probability distribution of output words corresponding to the i position;
inputting: output of encoder & output of corresponding i-1 position decoder. So the middle atttion is not self-atttion, its K, V comes from encoder and Q comes from the output of the decoder at the last position.
With continuing reference to fig. 11 and 12, fig. 11 is a schematic diagram illustrating a decoding process of a decoder in a time series data processing model according to an embodiment of the present invention, wherein a vector output by a last decoder of a decoder network passes through a Linear layer and a softmax layer. Fig. 12 is a schematic diagram of a decoding process of a decoder in a time series data processing model in an embodiment of the present invention, where the Linear layer is used to map a vector from the decoder portion into a logits vector, and then the softmax layer converts the logits vector into a probability value according to the logits vector, and finally finds a position of a maximum probability value, so as to complete output of the decoder.
In some embodiments of the invention, the time series data processing model may be a Bidirectional attention neural network model (BERT Bidirectional Encoder responses from Transformers). With continuing reference to fig. 5, fig. 5 is an optional structural schematic diagram of the time series data processing model in the embodiment of the present invention, wherein the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attention layer (multi-head attention layer) and then a simple fully connected layer. Each sub-layer is added with residual connection (residual connection) and normalization (normalization).
The Decoder includes: the Layer consists of N ═ 6 identical layers, wherein the layers and the encoder are not identical, and the layers comprise three sub-layers, wherein one self-orientation Layer is arranged, and the encoder-decoding Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attentional layers.
With continuing reference to FIG. 13, FIG. 13 is an alternative machine-readable representation of the time-series data processing model in an embodiment of the present invention, wherein the encoder and decoder portions each include 6 encoders and encoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the input target is '20 degrees of electricity consumption in 20 days in 9 months', the processing of the time sequence data processing model is carried out, and the output machine reading result is as follows: "9.20- -20".
Of course, the BERT model in the present invention is also replaced by a forward neural network model (Bi-LSTM Bi-directional long Short-Term Memory), a Gated round robin Unit network model (GRU Gated current Unit) model, an ELMo embedding from language model, a GPT model, and a GPT2 model, which are not described in detail herein.
Step 404: and the time sequence data processing model training device iteratively updates the encoder network parameter and the decoder network parameter of the time sequence data processing model through the training sample set according to the updating parameter of the time sequence data processing model so as to process the sample containing the time sequence data through the time sequence data processing model.
With continuing reference to fig. 14, fig. 14 is an alternative flowchart of the time series data processing model training method according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 14 may be executed by various electronic devices operating the time series data processing model training apparatus, for example, a dedicated terminal with a time series data processing model training function, a server with a time series data processing model training function, or a server cluster. The following is a description of the steps shown in fig. 14.
Step 1401: and the time sequence data processing model training device determines the noise parameters matched with the training sample set according to the updating parameters of the time sequence data processing model.
Wherein the noise parameter is used for characterizing the noise value of the parallel statement sample in the training sample set.
Step 1402: extracting time sequence data samples in the training sample set by a time sequence data processing model training device;
step 1403: and the time sequence data processing model training device trains the time sequence characteristic extraction network of the time sequence data processing model through the time sequence data sample so as to determine network parameters matched with the time sequence characteristic extraction network.
Step 1404: and when the noise parameter reaches the corresponding noise value threshold value, the time sequence data processing model training device iteratively updates the encoder parameter and the decoder parameter of the time sequence data processing model according to the noise value of the noise parameter until a loss function corresponding to a self-coding network formed by an encoder and a decoder of the time sequence data processing model meets the corresponding convergence condition.
In some embodiments of the present invention, wherein the loss function of the encoder network is expressed as:
loss _ a ═ Σ (decoder _ a (encoder (warp (x1))) -x1) 2; wherein decoder _ A is decoder A, warp is function of time sequence data to be processed, x1The encoder is used for encoding time sequence data to be processed.
In the iterative training process, the time sequence data to be processed is substituted into a loss function of the encoder network, parameters of the encoder A and the decoder A when the loss function is reduced according to a gradient (such as a maximum gradient) are solved, and when the loss function is converged (namely when a hidden variable corresponding to the time sequence data to be processed can be formed), the training is finished.
In the training process of the encoder network, the loss function of the encoder network is represented as: loss _ B ═ Σ (decoder _ B (encoder (warp (x2))) -x2) 2; wherein decoder _ B is a decoder B, warp is a function of the time sequence data to be processed, x2 is the time sequence data to be processed, and encoder is an encoder.
In the iterative training process, parameters of an encoder B and a decoder B when the loss function is reduced according to a gradient (such as a maximum gradient) are solved by substituting time sequence data to be processed into the loss function of the encoder network; when the loss function converges (i.e., when the decoding results in the selected probability of the data processing result corresponding to the time series data to be processed), the adjustment and training are ended.
In some embodiments of the invention, the method further comprises:
when the usage environment of the time series data processing model is a financial data monitoring process,
and adjusting parameters of the cyclic convolutional neural network based on the multiple attention mechanism in the decoder network according to the feature vectors corresponding to the multi-modal time sequence data of the time sequence data processing model so as to realize that the parameters of the cyclic convolutional neural network based on the multiple attention mechanism are matched with the multi-modal time sequence data fusion feature vectors. Therefore, the multi-modal financial data can be processed through the time series data processing model provided by the invention.
With continuing reference to fig. 15, fig. 15 is an alternative flowchart of the time series data processing model training method according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 15 may be executed by various electronic devices operating the time series data processing model training apparatus, for example, a dedicated terminal with a time series data processing model training function, a server with a time series data processing model training function, or a server cluster. The following is a description of the steps shown in fig. 15.
Step 1501: and the time sequence data processing model training device carries out negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set.
Wherein the negative sample set is used to adjust encoder parameters and decoder parameter adjustments of the time-series data processing model.
Step 1502: and the time sequence data processing model training device randomly combines the processing results to be output in a decoder of the time sequence data processing model to form a negative sample set corresponding to the training sample set.
Step 1503: and the time sequence data processing model training device carries out random deletion processing or replacement processing on a processing result to be output in a decoder of the time sequence data processing model to form a negative example sample set corresponding to the training sample set.
Step 1504: determining a corresponding evaluation study value from the negative sample set.
And the evaluation research value is used as a supervision parameter to evaluate the processing result of the time series data processing model.
As described in detail below, the structure of the time series data processing model processing apparatus according to the embodiment of the present invention may be implemented in various forms, such as with a dedicated terminal capable of running a time series data processing model, or with a server having a time series data processing function, so as to generate a corresponding time series data processing result according to a to-be-processed statement received by an application program in the terminal (for example, the server 200 in the previous sequence fig. 1). Fig. 16 is a schematic diagram of a composition structure of a time series data processing model processing apparatus according to an embodiment of the present invention, and it can be understood that fig. 16 only shows an exemplary structure of the time series data processing model processing apparatus, and a part of or all of the structure shown in fig. 16 may be implemented as needed.
The time sequence data processing model processing device provided by the embodiment of the invention comprises: at least one processor 1601, memory 1602, a user interface 1603, and at least one network interface 1604. The various components in the sequential data processing model processing device 160 are coupled together by a bus system 1605. It will be appreciated that bus system 1605 is used to enable connected communication between these components. The bus system 1605 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in figure 16 as bus system 1605.
User interface 1603 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, a touch screen, or the like.
It will be appreciated that the memory 1602 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 1602 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.
In some embodiments, the time series data processing model processing apparatus provided in the embodiments of the present invention may be implemented by a combination of software and hardware, and for example, the time series data processing model processing apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the time series data processing method of the time series data processing model provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
As an example of the time series data processing model processing apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the time series data processing model processing apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 1601, the software modules may be located in a storage medium, the storage medium is located in the memory 1602, the processor 1601 reads executable instructions included in the software modules in the memory 1602, and the time series data processing method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 1601 and other components connected to the bus 1605).
By way of example, the Processor 1601 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc., wherein the general purpose Processor may be a microprocessor or any conventional Processor, etc.
As an example of the time series data processing model processing apparatus provided by the embodiment of the present invention being implemented by hardware, the apparatus provided by the embodiment of the present invention may be implemented by directly using the processor 1601 in the form of a hardware decoding processor, for example, the time series data processing method for implementing the time series data processing model provided by the embodiment of the present invention may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The memory 1602 in the present embodiment is used to store various types of data to support the operation of the time series data processing model processing apparatus 160. Examples of such data include: any executable instructions for operating on the time series data processing model processing apparatus 160, such as executable instructions, may be included in the executable instructions, and the program implementing the time series data processing method from the time series data processing model of the embodiment of the present invention may be included in the executable instructions.
In other embodiments, the time series data processing model processing apparatus provided in the embodiments of the present invention may be implemented in software, and fig. 16 shows the time series data processing model processing apparatus stored in the memory 1602, which may be software in the form of programs and plug-ins, and includes a series of modules, and as an example of the programs stored in the memory 1602, the time series data processing model processing apparatus may include the following software modules: an encoder module 16081, a timing feature extraction module 16082, and a decoder module 16083. When the software modules in the time series data processing model processing apparatus are read into the RAM by the processor 1601 and executed, the time series data processing method of the time series data processing model provided by the embodiment of the present invention is implemented, and the functions of each software module in the time series data processing model processing apparatus include:
the encoder module 16081 is configured to obtain timing sequence data information to be processed, and convert the timing sequence data information to be processed into corresponding identifiable vector information;
the encoder module 16081 is configured to determine at least one hidden variable corresponding to the vector information through an encoder network of the time-series data processing model;
a time sequence feature extraction module 16082, configured to determine, through a time sequence feature extraction network of the time sequence data processing model, a dynamic time warping processing result matched with the to-be-processed time sequence data information;
a decoder module 16083, configured to respond to the dynamic time warping processing result, generate, through a decoder network of the time-series data processing model, a data processing result corresponding to the hidden variable and a selected probability of the data processing result according to the at least one hidden variable;
the decoder module 16083, configured to compose a data processing result corresponding to the vector information according to the selected probability of the data processing result;
the decoder module 16083 is configured to output the data processing result.
Referring to fig. 17, fig. 17 is an optional flowchart of the time series data processing method of the time series data processing model provided in the embodiment of the present invention, and it can be understood that the steps shown in fig. 17 may be executed by various electronic devices operating the time series data processing model processing apparatus, such as a dedicated terminal with a time series data processing function, a server with a time series data processing function, or a server cluster. The following is a description of the steps shown in fig. 17.
Step 1701: the time sequence data processing model processing device acquires time sequence data information to be processed and converts the time sequence data information to be processed into corresponding identifiable vector information.
Step 1702: the time series data processing model processing device determines at least one hidden variable corresponding to the vector information through an encoder network of the time series data processing model.
Step 1703: and the time sequence data processing model processing device determines a dynamic time warping processing result matched with the time sequence data information to be processed through a time sequence characteristic extraction network of the time sequence data processing model.
Step 1704: and the time sequence data processing model processing device responds to the dynamic time warping processing result, passes through a decoder network of the time sequence data processing model, and generates a data processing result corresponding to the hidden variable and the selected probability of the data processing result according to the at least one hidden variable.
Step 1705: and the time sequence data processing model processing device forms a data processing result corresponding to the vector information according to the selected probability of the data processing result.
Step 1706: and the time sequence data processing model processing device outputs the data processing result.
Therefore, the processing of each type of time series data through the time series data processing model is realized.
Taking time series data as power consumption data of a certain target user as an example, a method for training a time series data processing model provided by the embodiment of the present invention is described below, where fig. 18 is an application environment schematic diagram of the time series data processing model provided by the embodiment of the present invention, where, referring to fig. 18, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a client (for example, a power consumption monitoring client or an electricity fee calculation client) capable of displaying software of different time series information, where a user may obtain different time series data through a corresponding client, and may also send the corresponding time series data to a server of the time series data processing model; the terminal is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and uses a wireless link to realize data transmission.
Fig. 19 is an optional flowchart of the time series data processing model training method provided in the embodiment of the present invention, which specifically includes:
step 1901: and acquiring a power data training sample set matched with the use environment.
Wherein the set of training samples includes training samples of at least one set of power usage timing data.
Step 1902: and processing the electric quantity data training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model.
Step 1903: responding to the initial parameters of the time sequence data processing model, when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model;
step 1904: and according to the update parameters of the time sequence data processing model, carrying out iterative update on the encoder network parameters and the decoder network parameters of the time sequence data processing model through the training sample set.
Step 1905: and processing the power consumption data of the user through the time sequence data processing model to obtain a corresponding prediction result.
Fig. 20 is a schematic diagram of a working process of a time series feature extraction network in a time series data processing model according to an embodiment of the present invention, where in a conventional deep neural network, a similarity between original data and a fitting parameter needs to be calculated in a fitting process of an existing signal. Typically, the computation of similarity in deep neural networks is only Minkowski or Mahalanobis distances. However, when applied to time series data, these distances cannot capture the similarity between two signals in the time series data, which are biased by the doppler effect. For example: the traditional deep neural network can directly describe the characteristics of time series data based on a local characteristic extraction method so as to reflect the local structural characteristics of the data. The N-gram model takes N ordered entities in data as a local structural unit, and then counts the occurrence times of different units in one piece of data as a characteristic vector of the data; however, a deep learning method based on a Recurrent Neural Network (RNN) model (for example, LSTM (Long Short Term Memory Network), GRU (gated redundant Unit, gate cycle Unit), and the like) finds a feature representation of sequence data in a hidden space by depicting a context of each entity in the sequence, but features extracted in the above manner do not have semantics in a continuous space, and it is difficult to process global sequence information. The RNN-based method has huge calculation overhead due to the complex model; the RNN model is difficult to learn long-distance information in time sequence data due to the problem of gradient disappearance, the training difficulty of the improved models LSTM and GRU is high, and finally, the training difficulty of the RNN-based model is increased for data with large sequence length difference, so that the defects enable a traditional time sequence data processing model to be difficult to generate a high-quality processing result, further influence the processing and generation of large-scale time sequence data, and influence the use experience of a user.
With reference to fig. 20, when calculating the similarity through a Dynamic Time Warping (DTW) in the Time series data processing model provided in the present application, the Time series feature extraction network can consider a deviation caused by a doppler effect, so as to avoid a signal Time series misalignment, specifically referring to fig. 21, where fig. 21 is an optional flow diagram of a training method for a Time series data processing model provided in an embodiment of the present invention, and the method includes:
step 2101: the corresponding data set Y is entered, and the label z.
Wherein x isi∈Rl,Y={(yi,zi)|yi∈Rn,zi∈Z=[1,Nclass]}
Step 2102: model parameters w are initialized, assuming that there is N (N ═ N)kernel) A DTW core, all DTW core parameters xi. And setting the cycle number T and the stop condition E, and triggering an iterative training process.
Step 2103: a small fraction of the gradient decreasing mini-batch is collected from the data set. Computing output of current DTWNet run in accordance with forward conduction
Figure BDA0002309404220000261
Wherein (Y, z) is E.Y,
Figure BDA0002309404220000262
step (ii) of2104: recording paths within a current DTW
Figure BDA0002309404220000263
Thereby obtaining the determined ft(x, y) function.
Step 2105: using cross-entropy arithmetic output
Figure BDA0002309404220000271
And the actual label z, in a reverse transfer, wherein,
Figure BDA0002309404220000272
step 2106: for each DTW core, according to
Figure BDA0002309404220000273
Updating only paths
Figure BDA0002309404220000274
On
Figure BDA0002309404220000275
Step 2107: updating all model parameters w and all DTW core parameters x by using SGD algorithmiWherein, in the step (A),
Figure BDA0002309404220000276
step 2108: if two passes, Δ L ═ Lt-Lt-1If the change of | < epsilon is less than e, the circulation is terminated to obtain Gx,w:Rn→Z。
The forward conduction and the reverse conduction involved in the above steps are explained below, wherein training sample data is input to the feature extraction network, and finally reaches the output layer and outputs a result, which is the forward conduction process of the feature extraction network, because the output result of the feature extraction network has an error with the actual result, an error between the output result and the actual value is calculated, and the error is conducted in the reverse direction from the output layer until the error is conducted to the shallow layer, and in the process of the reverse conduction, the value of the model parameter is adjusted according to the error; and continuously iterating the process until convergence.
With continuing reference to fig. 22, fig. 22 is a schematic diagram of a transmission of time series feature extraction network data in the time series data processing model according to the embodiment of the present invention, in which the variable signal X (i.e., the learning parameter) is provided. In some embodiments of the present invention, a signal of a dotted line may be randomly generated, a DTW distance and a path may be found through a Dynamic Time Warping (DTW), and the DTW may be calculated again with a learning signal of the second layer (same dotted line, but new learning parameters). And finally, the final result is subjected to a full connection layer, and then the final label prediction is given.
Wherein, the dynamic programming derivation formula is:
Ci,j=||xi-yj||+min{Ci-1,j,Ci,j-1,Ci-1,j-1} (1)
here, C is the dynamic programming of the state of each node, xiRepresenting a variable signal to be learned, yjRepresenting the target signal, i, j respectively represent the data point locations on the signal. When the minimum DTW distance is found, the DTW path is also determined. It should be noted that during forward conduction, X is constant and the path of DTW is changed.
The inverse transfer is mainly to adjust and learn the X variable signals by gradient transfer so as to make them closer to the original target signal Y. At this time, the path of the DTW is fixed. Assuming that the path of the DTW is determined, the distance calculation formula is:
Figure BDA0002309404220000281
the derivation formula for X is:
Figure BDA0002309404220000282
furthermore, in the training process, after iteration is carried out with different target signals Y for multiple times, a process of finally learning a universal signal core X in a data set can be expanded to multiple universal signal cores, namely, different X's are trained simultaneously.
The beneficial technical effects are as follows:
compared with a deep neural network in the prior art, the time sequence feature extraction network in the time sequence data processing model can consider the deviation caused by the Doppler effect through a dynamic time warping algorithm when calculating the similarity, thereby avoiding the dislocation in the signal time sequence, enabling the time sequence data processing model to generate a high-quality processing result, being beneficial to processing and generating large-scale time sequence data, and improving the use experience of users.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method for training a time series data processing model, the method comprising:
acquiring a training sample set, wherein the training sample set comprises at least one set of training samples of time sequence data;
processing the training sample set through a time sequence data processing model to determine initial parameters of the time sequence data processing model;
responding to the initial parameters of the time sequence data processing model, when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the time sequence characteristic extraction network dynamic time warping processing result of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model;
and according to the update parameters of the time sequence data processing model, iteratively updating the encoder network parameters and the decoder network parameters of the time sequence data processing model through the training sample set so as to process the samples containing the time sequence data through the time sequence data processing model.
2. The method of claim 1, wherein obtaining the set of training samples comprises:
determining a dynamic noise threshold value matched with the use environment of the time-series data processing model;
acquiring raw data in a data source corresponding to a use environment of the time-series data processing model;
denoising the original data according to the dynamic noise threshold value, and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value;
and performing word segmentation processing on the original data according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
3. The method of claim 1, wherein obtaining the set of training samples comprises:
determining a fixed noise threshold value matched with the use environment of the time-series data processing model;
acquiring raw data in a data source corresponding to a use environment of the time-series data processing model;
denoising the original data according to the fixed noise threshold value, and triggering a fixed word segmentation strategy matched with the fixed noise threshold value;
and performing word segmentation processing on the original data according to a fixed word segmentation strategy matched with the fixed noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
4. The method of claim 1, wherein the determining updated parameters of the time series data processing model by processing the time series data processing model with output results of the time series data processing model and dynamic time warping processing results of a time series feature extraction network of the time series data processing model while keeping initial parameters of the time series data processing model constant in response to initial parameters of the time series data processing model comprises:
processing different training samples in the training sample set through a time sequence characteristic extraction network of the time sequence data processing model to determine a corresponding dynamic time warping processing result;
responding to the dynamic time warping processing result, substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the time sequence data processing model;
and determining parameters corresponding to an encoder and corresponding decoder parameters in the time sequence data processing model when the loss function meets a convergence condition as updating parameters of the time sequence data processing model.
5. The method of claim 4, wherein iteratively updating the encoder network parameters and decoder network parameters of the time series data processing model with the training sample set according to the updated parameters of the time series data processing model comprises:
determining a noise parameter matched with the training sample set according to the updated parameter of the time sequence data processing model, wherein the noise parameter is used for representing the noise value of the parallel statement samples in the training sample set;
when the noise parameter reaches the corresponding noise value threshold,
and iteratively updating the parameters of the encoder and the parameters of the decoder of the time sequence data processing model according to the noise value of the noise parameter until a loss function corresponding to a self-coding network formed by the encoder and the decoder of the time sequence data processing model meets a corresponding convergence condition.
6. The method of claim 4, further comprising:
extracting time sequence data samples in the training sample set;
and training a time sequence feature extraction network of the time sequence data processing model through the time sequence data sample so as to determine network parameters adaptive to the time sequence feature extraction network.
7. The method of claim 1, further comprising:
when the usage environment of the time series data processing model is a financial data monitoring process,
and adjusting parameters of the cyclic convolutional neural network based on the multiple attention mechanism in the decoder network according to the feature vectors corresponding to the multi-modal time sequence data of the time sequence data processing model so as to realize that the parameters of the cyclic convolutional neural network based on the multiple attention mechanism are matched with the multi-modal time sequence data fusion feature vectors.
8. A method of data processing in a time series data processing model, the method comprising:
acquiring time sequence data information to be processed, and converting the time sequence data information to be processed into corresponding identifiable vector information;
determining at least one hidden variable corresponding to the vector information through an encoder network of the time sequence data processing model;
determining a dynamic time warping processing result matched with the to-be-processed time series data information through a time series characteristic extraction network of the time series data processing model;
responding to the dynamic time warping processing result, generating a data processing result corresponding to the hidden variable and a selected probability of the data processing result according to the at least one hidden variable through a decoder network of the time sequence data processing model;
forming a data processing result corresponding to the vector information according to the selected probability of the data processing result;
outputting the data processing result;
wherein the time series data processing model is trained based on the method of any one of claims 1 to 7.
9. An apparatus for training a time series data processing model, the apparatus comprising:
the data transmission module is used for acquiring a training sample set, wherein the training sample set comprises at least one group of training samples of time sequence data;
the time sequence data processing model training module is used for processing the training sample set through a time sequence data processing model so as to determine initial parameters of the time sequence data processing model;
the time sequence data processing model training module is used for responding to the initial parameters of the time sequence data processing model, and when the initial parameters of the time sequence data processing model are kept unchanged, processing the time sequence data processing model through the output result of the time sequence data processing model and the dynamic time warping processing result of the time sequence characteristic extraction network of the time sequence data processing model, and determining the updating parameters of the time sequence data processing model;
and the time sequence data processing model training module is used for carrying out iterative updating on the encoder network parameter and the decoder network parameter of the time sequence data processing model through the training sample set according to the updating parameter of the time sequence data processing model so as to realize the processing of the sample containing the time sequence data through the time sequence data processing model.
10. The apparatus of claim 9,
the data transmission module is used for determining a dynamic noise threshold value matched with the use environment of the time sequence data processing model;
the data transmission module is used for acquiring original data from a data source corresponding to the use environment of the time sequence data processing model;
the data transmission module is used for denoising the original data according to the dynamic noise threshold value and triggering a dynamic word segmentation strategy matched with the dynamic noise threshold value;
and the data transmission module is used for performing word segmentation processing on the original data according to a dynamic word segmentation strategy matched with the dynamic noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
11. The apparatus of claim 9,
the data transmission module is used for determining a fixed noise threshold value matched with the use environment of the time sequence data processing model;
the data transmission module is used for acquiring original data from a data source corresponding to the use environment of the time sequence data processing model;
the data transmission module is used for denoising the original data according to the fixed noise threshold value and triggering a fixed word segmentation strategy matched with the fixed noise threshold value;
and the data transmission module is used for performing word segmentation processing on the original data according to a fixed word segmentation strategy matched with the fixed noise threshold value to form a training sample set matched with the use environment of the time sequence data processing model.
12. The apparatus of claim 9,
the time sequence data processing model training module is used for processing different training samples in the training sample set through a time sequence characteristic extraction network of the time sequence data processing model so as to determine a corresponding dynamic time warping processing result;
the time sequence data processing model training module is used for responding to the dynamic time warping processing result, substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the time sequence data processing model;
and the time sequence data processing model training module is used for determining that the parameters corresponding to the encoder and the corresponding decoder in the time sequence data processing model are used as the update parameters of the time sequence data processing model when the loss function meets the convergence condition.
13. The apparatus of claim 12,
the time sequence data processing model training module is used for determining a noise parameter matched with the training sample set according to an updating parameter of the time sequence data processing model, and the noise parameter is used for representing a noise value of a parallel statement sample in the training sample set;
and the time sequence data processing model training module is used for carrying out iterative updating on the parameters of an encoder and a decoder of the time sequence data processing model according to the noise value of the noise parameter when the noise parameter reaches the corresponding noise value threshold value until a loss function corresponding to a self-encoding network formed by the encoder and the decoder of the time sequence data processing model meets the corresponding convergence condition.
14. A time series data processing model processing apparatus, characterized in that the apparatus comprises:
the encoder module is used for acquiring time sequence data information to be processed and converting the time sequence data information to be processed into corresponding identifiable vector information;
the encoder module is used for determining at least one hidden variable corresponding to the vector information through an encoder network of a time sequence data processing model;
the time sequence characteristic extraction module is used for determining a dynamic time warping processing result matched with the to-be-processed time sequence data information through a time sequence characteristic extraction network of the time sequence data processing model;
a decoder module, configured to generate, in response to the dynamic time warping processing result, a data processing result corresponding to the hidden variable and a selected probability of the data processing result according to the at least one hidden variable through a decoder network of the time series data processing model;
and the decoder module is used for forming a data processing result corresponding to the vector information according to the selected probability of the data processing result.
CN201911252467.4A 2019-12-09 2019-12-09 Time sequence data processing model training method, data processing method, device and storage medium Active CN111027681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911252467.4A CN111027681B (en) 2019-12-09 2019-12-09 Time sequence data processing model training method, data processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911252467.4A CN111027681B (en) 2019-12-09 2019-12-09 Time sequence data processing model training method, data processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111027681A true CN111027681A (en) 2020-04-17
CN111027681B CN111027681B (en) 2023-06-27

Family

ID=70205032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911252467.4A Active CN111027681B (en) 2019-12-09 2019-12-09 Time sequence data processing model training method, data processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111027681B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401558A (en) * 2020-06-05 2020-07-10 腾讯科技(深圳)有限公司 Data processing model training method, data processing device and electronic equipment
CN112200308A (en) * 2020-11-17 2021-01-08 上海优扬新媒信息技术有限公司 Time sequence data processing method and device and electronic equipment
CN112348068A (en) * 2020-10-28 2021-02-09 东南大学 Time sequence data clustering method based on noise reduction encoder and attention mechanism

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018191344A1 (en) * 2017-04-14 2018-10-18 Salesforce.Com, Inc. Neural machine translation with latent tree attention
CN108776832A (en) * 2018-06-05 2018-11-09 腾讯科技(深圳)有限公司 Information processing method, device, computer equipment and storage medium
CN109146064A (en) * 2018-09-05 2019-01-04 腾讯科技(深圳)有限公司 Neural network training method, device, computer equipment and storage medium
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109543195A (en) * 2018-11-19 2019-03-29 腾讯科技(深圳)有限公司 A kind of method, the method for information processing and the device of text translation
CN109862587A (en) * 2018-12-18 2019-06-07 ***通信集团云南有限公司 Mobile network quality appraisal procedure based on multiple features time series and self-encoding encoder
CN110046359A (en) * 2019-04-16 2019-07-23 苏州大学 Neural machine translation method based on sample guidance
CN110147843A (en) * 2019-05-22 2019-08-20 哈尔滨工程大学 Voice Time Series Similar measure based on metric learning
CN110162799A (en) * 2018-11-28 2019-08-23 腾讯科技(深圳)有限公司 Model training method, machine translation method and relevant apparatus and equipment
CN110163181A (en) * 2019-05-29 2019-08-23 中国科学技术大学 Sign Language Recognition Method and device
CN110222164A (en) * 2019-06-13 2019-09-10 腾讯科技(深圳)有限公司 A kind of Question-Answering Model training method, problem sentence processing method, device and storage medium
JP2019159988A (en) * 2018-03-15 2019-09-19 株式会社豊田中央研究所 Neural network device and program
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
CN110334360A (en) * 2019-07-08 2019-10-15 腾讯科技(深圳)有限公司 Machine translation method and device, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018191344A1 (en) * 2017-04-14 2018-10-18 Salesforce.Com, Inc. Neural machine translation with latent tree attention
JP2019159988A (en) * 2018-03-15 2019-09-19 株式会社豊田中央研究所 Neural network device and program
CN108776832A (en) * 2018-06-05 2018-11-09 腾讯科技(深圳)有限公司 Information processing method, device, computer equipment and storage medium
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN109146064A (en) * 2018-09-05 2019-01-04 腾讯科技(深圳)有限公司 Neural network training method, device, computer equipment and storage medium
CN109543195A (en) * 2018-11-19 2019-03-29 腾讯科技(深圳)有限公司 A kind of method, the method for information processing and the device of text translation
CN110162799A (en) * 2018-11-28 2019-08-23 腾讯科技(深圳)有限公司 Model training method, machine translation method and relevant apparatus and equipment
CN109862587A (en) * 2018-12-18 2019-06-07 ***通信集团云南有限公司 Mobile network quality appraisal procedure based on multiple features time series and self-encoding encoder
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
CN110046359A (en) * 2019-04-16 2019-07-23 苏州大学 Neural machine translation method based on sample guidance
CN110147843A (en) * 2019-05-22 2019-08-20 哈尔滨工程大学 Voice Time Series Similar measure based on metric learning
CN110163181A (en) * 2019-05-29 2019-08-23 中国科学技术大学 Sign Language Recognition Method and device
CN110222164A (en) * 2019-06-13 2019-09-10 腾讯科技(深圳)有限公司 A kind of Question-Answering Model training method, problem sentence processing method, device and storage medium
CN110334360A (en) * 2019-07-08 2019-10-15 腾讯科技(深圳)有限公司 Machine translation method and device, electronic equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"触觉信息表征技术与分类感知试验" *
ILYA SUTSKEVER 等: "Sequence to Sequence Learning with Neural Networks" *
MARCO CUTURI 等: "Soft-DTW: a Differentiable Loss Function for Time-Series" *
VINCENT LE GUEN 等: "Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models" *
XINGYU CAI 等: "DTWNet: a Dynamic Time Warping Network" *
韩浩先 等: "基于聚类变分自编码器的协同过滤算法" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401558A (en) * 2020-06-05 2020-07-10 腾讯科技(深圳)有限公司 Data processing model training method, data processing device and electronic equipment
CN111401558B (en) * 2020-06-05 2020-10-09 腾讯科技(深圳)有限公司 Data processing model training method, data processing device and electronic equipment
CN112348068A (en) * 2020-10-28 2021-02-09 东南大学 Time sequence data clustering method based on noise reduction encoder and attention mechanism
CN112200308A (en) * 2020-11-17 2021-01-08 上海优扬新媒信息技术有限公司 Time sequence data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN111027681B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
US11816442B2 (en) Multi-turn dialogue response generation with autoregressive transformer models
CN110956018B (en) Training method of text processing model, text processing method, text processing device and storage medium
WO2022161202A1 (en) Multimedia resource classification model training method and multimedia resource recommendation method
Zhou et al. Deep semantic dictionary learning for multi-label image classification
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
CN113792113A (en) Visual language model obtaining and task processing method, device, equipment and medium
CN111461157B (en) Self-learning-based cross-modal Hash retrieval method
CN111027681B (en) Time sequence data processing model training method, data processing method, device and storage medium
Gupta et al. ALMNet: Adjacent layer driven multiscale features for salient object detection
WO2021238333A1 (en) Text processing network, neural network training method, and related device
Wang et al. Semantic supplementary network with prior information for multi-label image classification
CN111898704B (en) Method and device for clustering content samples
CN111144093A (en) Intelligent text processing method and device, electronic equipment and storage medium
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
Zhu et al. Multi-scale temporal network for continuous sign language recognition
CN111930981A (en) Data processing method for sketch retrieval
CN110659759A (en) Neural network based trend prediction
CN112307179A (en) Text matching method, device, equipment and storage medium
WO2023116572A1 (en) Word or sentence generation method and related device
Wang et al. Convolution-enhanced evolving attention networks
CN115204295A (en) Training and recommending method and device for comparison learning sequence based on self-guiding mechanism
CN115221960A (en) Training method, training device and recommendation method of recommendation model based on two-way transformations
Schneider et al. A survey of deep learning: From activations to transformers
CN109063934B (en) Artificial intelligence-based combined optimization result obtaining method and device and readable medium
Thapak et al. Transformer++

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40022190

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant