CN113536800A

CN113536800A - Word vector representation method and device

Info

Publication number: CN113536800A
Application number: CN202010286727.6A
Authority: CN
Inventors: 李长亮; 毛颖; 唐剑波
Original assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Current assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2021-10-22

Abstract

The application provides a word vector representation method and a device, wherein the method comprises the following steps: acquiring a subword set represented by a vector of each subword corresponding to a target word example, and dividing the subword set corresponding to the target word example into at least one subword sequence, wherein each subword sequence comprises vector representation of at least one subword; performing feature extraction on at least one sub-word sequence to obtain feature mapping corresponding to each sub-word sequence; and performing pooling operation on the feature mapping corresponding to each sub-word sequence to obtain vector representation corresponding to the target word example. According to the method and the device, after the vector representation of each subword corresponding to the target word example is obtained, the operation similar to a convolutional neural network is added, and finally the vector representation corresponding to the target word example is obtained, so that the subwords correspond to the word examples to form word example level vector representation, and the extraction capability of the word vector model for text information of the word examples is enhanced.

Description

Word vector representation method and device

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a word vector representation method, apparatus, computing device, and computer-readable storage medium.

Background

For natural language processing tasks, the attention mechanism of the BERT model is usually adopted to extract text information, and finally vector representations of all subwords are output. In actual application, the neural network layers aiming at different downstream tasks are spliced behind the BERT model, fine-tuning (fine-tuning) is carried out on the whole network by setting different loss functions, and a network structure suitable for specific tasks is generated. In the process of solving the problem of incomplete word stock coverage in the BERT model, the concept of the sub-words is introduced, the last layer of output vector representation is not integrated on a word case level generally, and meanwhile, because a downstream task generally takes the word case as a minimum processing unit, how to correspond the sub-words with the word case is achieved, and finally the obtained vector representation of the word case becomes the problem to be solved urgently.

Disclosure of Invention

In view of the above, embodiments of the present application provide a method, an apparatus, a computing device, and a computer-readable storage medium for word vector representation, so as to solve technical defects in the prior art.

According to a first aspect of embodiments of the present specification, there is provided a word vector representing method, including:

acquiring a sub-word set represented by a vector of each sub-word corresponding to a target word example, wherein the sub-word set comprises the vector representation of each sub-word corresponding to the target word example;

dividing a subword set corresponding to the target word example into at least one subword sequence, wherein each subword sequence comprises vector representation of at least one subword;

performing feature extraction on at least one sub-word sequence to obtain feature mapping corresponding to each sub-word sequence;

and performing pooling operation on the feature mapping corresponding to each sub-word sequence to obtain vector representation corresponding to the target word example.

According to a second aspect of embodiments of the present specification, there is provided a word vector representing apparatus including:

the vector representation module is configured to obtain a sub-word set represented by a vector of each sub-word corresponding to a target word example, wherein the sub-word set comprises the vector representation of each sub-word corresponding to the target word example;

a subword dividing module configured to divide a subword set corresponding to the target word case into at least one subword sequence, wherein each subword sequence includes a vector representation of at least one subword;

the convolution module is configured to perform feature extraction on at least one sub-word sequence to obtain feature mapping corresponding to each sub-word sequence;

and the pooling module is configured to perform pooling operation on the feature mapping corresponding to each sub-word sequence to obtain vector representation corresponding to the target word example.

According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the word vector representation method when executing the instructions.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the word vector representation method.

According to the method, after the vector representation of each sub-word corresponding to the target word example is obtained through the word vector model, operation similar to a convolutional neural network is added, namely a preset filter is used as a sliding window to divide a sub-word set corresponding to the target word example into a plurality of sub-word sequences, feature extraction is carried out on each sub-word sequence, then pooling operation is carried out on feature mapping corresponding to each sub-word sequence, and finally the vector representation corresponding to the target word example is obtained, so that the sub-words and the word examples are corresponding to form word example level vector representation, the extraction capability of the word vector model on text information of the word example is enhanced, and use and fine adjustment of downstream natural language processing tasks are facilitated.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flow chart of a word vector representation method provided in an embodiment of the present application;

FIG. 3 is another flow chart of a word vector representation method provided by an embodiment of the present application;

FIG. 4 is a diagram illustrating a method for representing word vectors according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a word vector representing apparatus according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

BERT model: a pre-trained bi-directional attention neural network model, the BERT model, aims to obtain semantic representation of a text containing rich semantic information by utilizing large-scale unmarked corpus training, and then finely adjusts the semantic representation of the text in a specific natural language processing (NNP) task to be finally applied to the natural language processing task.

Transformer model: the Transformer model abandons the traditional Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), and the whole Network structure is composed of Attention (Attention) mechanism, more precisely, the Transformer model is composed of and only composed of self-Attention (senf-Attention) mechanism and Feed-Forward neural Network (Feed Forward neural Network). A trainable neural network based on a Transformer model can be built by stacking Transformer models, for example, by building a coding-decoding model (Encoder-Decoder) with 6 layers for each Encoder and Decoder, and a total of 12 layers.

A convolutional neural network: the convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. In the convolutional layer of the convolutional neural network, one neuron is connected to only part of the neighbor neurons. In a convolutional layer of CNN, there are usually several feature maps (feature maps), each feature Map is composed of some neurons arranged in a rectangle, and the neurons of the same feature Map share weights, where the shared weights are convolutional kernels. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel learns to obtain a reasonable weight in the training process of the network.

Word example: also known as tokens, require segmentation into linguistic units such as words, punctuation, numbers, or pure alphanumerics before any actual processing of the input text. These elements are referred to as word instances.

The sub-words are: also called subwords, a token is segmented to generate a plurality of subwords, which can be understood as parts of a word example, such as a root, a prefix and a suffix.

ReLU function: the vector can be used as an activation function and a modified linear unit of the neural network to increase the nonlinear relation among the layers of the neural network, and takes a vector [0.2, -1, 0, 0.8] as an example, and becomes [0.2, 0, 0, 0.8] through a ReLU function.

Target word example: refers to any one of the tokens (tokens t) required in the downstream natural language processing task.

And (3) sub-word set: the term "subword set" refers to a set of vector representations obtained by inputting a target word case into a word vector model, where each subword in the word case is fused with full-text semantic information.

Sequence of subwords: the method refers to a set of vector representations which are obtained after a subword set is divided by a filter and contain at least one subword.

Characteristic mapping: also known as feature map or activation map, refers to semantic features obtained after a vector representation of a subword is subjected to a convolution operation and an excitation function.

In the present application, a word vector representing method, apparatus, computing device and computer readable storage medium are provided, which are described in detail in the following embodiments one by one.

FIG. 1 shows a block diagram of a computing device 100, according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a local area network (NAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 wireless local area network (WNAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100 and other components not shown in FIG. 1 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the method shown in fig. 2. Fig. 2 is a schematic flow chart diagram illustrating a word vector representation method according to an embodiment of the present application, including steps 202 to 208.

Step 202: and acquiring a sub-word set represented by the vector of each sub-word corresponding to the target word example.

In an embodiment of the present application, obtaining a subword set represented by a vector of each subword corresponding to a target word example includes:

inputting the text information corresponding to the target word example into a pre-trained BERT model to obtain the vector representation of each subword corresponding to the target word example so as to form a subword set corresponding to the target word example.

In the above embodiment, the system or the terminal of the application first obtains the text corpus corresponding to the target word example, and segments the text corpus corresponding to the target word example into a plurality of sub-words, for example, segment the english word example "pnaying" into "pnay" and "# # ing", then convert each sub-word in the text corpus into a word embedding vector (token embedding) by querying the word vector table, and automatically obtain the text vector (segment embedding) and the position vector (position embedding) describing the position difference of each sub-word in the global semantic information, and obtain the text information corresponding to the target word example as the input of the BERT model by superimposing the word embedding vector, the text vector and the position vector corresponding to each sub-word of the target word example, wherein the BERT model distinguishes the influence of different parts of the input on the output through the multi-layer attention machine system, and finally obtains the vector representation of each sub-word corresponding to the target word example, taking the formation of the subword set corresponding to the target word example, taking the target word example "token t" as an example, and dividing the target word example "token t" into n subwords, the subword set output by the BERT model corresponding to the target word example "token t" is { s (1), s (2), …, s (n) }, where n is a positive integer greater than or equal to 1, and s (n) is a vector representation of the target word example corresponding to the nth subword.

According to the method and the device, vector representation containing rich semantic information of the text corpus corresponding to the target word example is obtained by using the BERT model, the vector representation is used as the basis of downstream natural language task processing, the universality and the accuracy are high, the application effect of the vector representation of each sub-word of the target word example 'token t' can be ensured, and the operation efficiency of the whole technical scheme is improved.

Step 204: and dividing the subword set corresponding to the target word example into at least one subword sequence, wherein each subword sequence comprises vector representation of at least one subword.

In an embodiment of the present application, as shown in fig. 3, the sub-word set corresponding to the target word example is divided into at least one sub-word sequence, which includes steps 302 to 304.

Step 302: and forming a sliding window with a corresponding scale in the sub-word set according to a preset filter.

In the above embodiment, the system or the terminal of the present application forms, according to a preset filter, a sliding window with a corresponding size of k in the subword set, where a value of k is determined experimentally and may be set to 2, 3, or another suitable value, and when k is set to 2 or 3, the sliding window may include vector representations of 2 or 3 subwords in the subword set. Taking the set of subwords output by the BERT model corresponding to the target word case "token t" { s (1), s (2), …, s (n) } as an example, when k is set to 3, in the case that the size of the sliding window is 3, the sliding window may encompass { s (1), s (2), s (3) } or { s (2), s (3), s (4) } or the like in the set of subwords { s (1), s (2), …, s (n) }.

According to the method and the device, the sliding window is formed by referring to the filter in the convolutional neural network, so that subsequent mathematical operations such as convolutional operation are facilitated, and a foundation is provided for extracting subsequent feature mapping.

Step 304: and traversing the sliding window through the sub-word set according to a preset step length, and obtaining the vector representation of at least one sub-word contained in the sliding window in each step to obtain at least one sub-word sequence.

In the foregoing embodiment, the system or the terminal of the application divides a subword set { s (1), s (2), …, s (n) } output by the BERT model corresponding to the target word example "token t" according to a preset step length by using a sliding window with a size of k, so as to obtain a plurality of subword sequences { s (1), s (2), s (k) }, { s (2), s (3), s (k +1) } … { s (n-k +1), …, s (n) } corresponding to the target word example "token t".

According to the method and the device, the subword set corresponding to the target word example is collected by using the sliding window with the size of k, so that the characteristic extraction imitating the convolutional neural network is realized, the characteristic information in the target word example is extracted, and the target word example and the subword corresponding to the target word example are conveniently associated.

Step 206: and performing feature extraction on at least one sub-word sequence to obtain feature mapping corresponding to each sub-word sequence.

In an embodiment of the present application, performing feature extraction on at least one of the sub-word sequences to obtain a feature mapping corresponding to each of the sub-word sequences includes:

and performing convolution operation on the vector representation of the subword in each subword sequence and performing nonlinear activation through a ReLU function to obtain the characteristic mapping corresponding to each subword sequence.

In the foregoing embodiment, the system or terminal of the present application performs convolution operation on a vector representation s (1) … s (n) of each subword in a plurality of subword sequences { s (1), s (2), s (k) }, { s (2), s (3), s (k +1) } … { s (n-k +1), s (n-1), s (n), (n) }, and performs linear correction through an activation function to obtain a feature map, where the vector representation s (1), s (2), s (k) }, { s (2), s (3), s (k +1) } … { s (n-k +1), s (n-1), s (n) }, which correspond to the target word instance "token t".

The formula of the convolution operation can be expressed as:

W*concat(s(1)，s(2)，s(k))+b (1)

in the formula (1), W represents a weight parameter obtained by training, concat represents a matrix or a vector formed by connecting a plurality of vectors, s (k) represents a vector representation of the kth subword corresponding to the target word instance, and b represents an offset parameter obtained by training.

Meanwhile, the ReLU function may be expressed as:

f(x)＝max(0，x) (2)

in the formula (2), the ReLU function represents that all the input x has the maximum value, i.e. takes 0 when x is a negative value, and keeps unchanged when x is a positive value, wherein x may be the numerical value of the feature map corresponding to each sub-word sequence, and if 0 is output, it represents that the neuron is not activated, thereby realizing single-side suppression and introducing a non-linear feature to the whole convolution process.

After vector representation of the subwords is obtained, operation similar to a convolutional neural network is added, convolution operation is carried out on a plurality of subword sequences corresponding to a target word case, linear correction is carried out through an activation function, and feature mapping corresponding to each subword sequence is obtained, so that the relation between the subwords and the word case is extracted, and the output of a BERT model is gradually increased to the word case level from the subword level.

Step 208: and performing pooling operation on the feature mapping corresponding to each sub-word sequence to obtain vector representation corresponding to the target word example.

In an embodiment of the present application, performing pooling operation on feature mappings corresponding to each sub-word sequence to obtain a vector representation corresponding to the target word case includes:

and acquiring the maximum value of the feature mapping corresponding to each sub-word sequence on each dimension to form vector representation corresponding to the target word example.

In the foregoing embodiment, the system or terminal of the present application performs maximum pooling on the feature map corresponding to each subword sequence, that is, in a case that a dimension represented by a vector of each subword in the target word instance "token t" is w, where w is a positive integer greater than or equal to 1, selects a maximum value in each dimension of the feature maps corresponding to each subword sequence { s (1), s (2), s (k) }, { s (2), s (3), s (k +1) } … { s (n-k +1), s (n-1), s (n) }, and obtains a vector representation of a w dimension as a vector representation corresponding to the target word instance "token t".

According to the method, after the vector representation of each subword corresponding to the target word example is obtained through the word vector model, operation similar to a convolutional neural network is added, namely a preset filter is used as a sliding window to divide a subword set corresponding to the target word example into a plurality of subword sequences, each subword sequence is convoluted and passes through a ReLU activation function, then feature mapping corresponding to each subword sequence is subjected to pooling operation, and finally the vector representation corresponding to the target word example is obtained, so that the subwords and the word examples are corresponding to form word example level vector representation, extraction of text information of the word examples through the word vector model is enhanced, and use and fine adjustment of downstream natural language processing tasks are facilitated.

Fig. 4 illustrates a word vector representing method according to an embodiment of the present specification, where the word vector representing method is described by taking the word "retrieving" as an example, and includes steps 402 to 408.

Step 402: inputting text information of the word example "retrieving" into a trained BERT model to obtain a sub-word set s corresponding to the word example "retrieving", wherein the text information includes text corpora of the word example "retrieving" and an original word vector, and the sub-word set s is as follows: { "re", "# # ons", "# # struct", "# # ing" }, and the vector size of each sub-word in the sub-word set s is 768 dimensions.

Step 404: dividing the subword set s by a sliding window with a preset size of 3 to obtain two subword sequences s1 needing to be convolved and activated: { "re", "# # ons", "# # struct" } and the sub-word sequence s 2: { "# # ons", "# # struct", "# # ing" }.

Step 406: respectively convolving the subword sequence s1 and the subword sequence s2 and selecting output values of a part of neurons by means of the ReLU function, in the convolution operation,

taking the sequence of subwords s1 as an example, the formula of the convolution operation can be expressed as:

W*concat(e(“re”)，e(“##ons”)，e(“##truct”))+b (3)

in equation (3), W represents a weight parameter obtained by training, concat represents a matrix or a vector formed by connecting a plurality of vectors, e represents a vector representation corresponding to each subword, and b represents an offset parameter obtained by training.

Meanwhile, the ReLU function may be expressed as:

f(x)＝max(0，x) (4)

in the formula (4), the ReLU function indicates that all the input x has the maximum value, i.e. takes 0 when x is a negative value, and remains unchanged when x is a positive value, where x may be the value of the feature map corresponding to each of the sub-word sequences, and if 0 is output, it indicates that the neuron is not activated, so that single-side suppression is realized and a non-linear feature is introduced to the whole convolution process.

And further obtaining a feature mapping fm1 corresponding to the sub-word sequence s1 and a feature mapping fm2 corresponding to the sub-word sequence s2, wherein the feature mapping fm1 and the feature mapping fm2 are vectors of 768 dimensions.

Step 408: and performing maximum pooling operation on the feature map fm1 and the feature map fm2, namely selecting a maximum value on each dimension for two vectors of 768 dimensions, and finally forming a vector representation of 768 dimensions corresponding to the word "retrieving".

Corresponding to the above method embodiments, this specification further provides an embodiment of a word vector representing device, and fig. 5 shows a schematic structural diagram of the word vector representing device according to an embodiment of this specification. As shown in fig. 5, the apparatus includes:

a vector representing module 501, configured to obtain a sub-word set represented by a vector of each sub-word corresponding to a target word example, where the sub-word set includes a vector representation of each sub-word corresponding to the target word example;

a subword dividing module 502 configured to divide the subword set corresponding to the target word case into at least one sequence of subwords, where each sequence of subwords includes a vector representation of at least one subword;

a convolution module 503 configured to perform feature extraction on at least one of the sub-word sequences to obtain a feature mapping corresponding to each of the sub-word sequences;

a pooling module 504 configured to perform a pooling operation on the feature mapping corresponding to each sub-word sequence to obtain a vector representation corresponding to the target word case.

Optionally, the vector representing module 501 includes:

and the vector generation unit is configured to input the text information corresponding to the target word example into a pre-trained BERT model to obtain a vector representation of each subword corresponding to the target word example so as to form a subword set corresponding to the target word example.

Optionally, the subword dividing module 502 includes:

the window configuration unit is configured to form a sliding window with a corresponding scale in the subword set according to a preset filter;

and the window dividing unit is configured to traverse the subword set through the sliding window according to a preset step length, obtain the vector representation of at least one subword contained in the sliding window in each step, and obtain at least one subword sequence.

Optionally, the convolution module 503 includes:

and the convolution operation unit is configured to perform convolution operation on the vector representation of the subword in each subword sequence and perform nonlinear activation through a ReLU function to obtain the feature mapping corresponding to each subword sequence.

Optionally, the pooling module 504 includes:

and the maximum pooling operation unit is configured to acquire the maximum value of the feature mapping corresponding to each sub-word sequence on each dimension and form vector representation corresponding to the target word example.

According to the method, after the vector representation of each subword corresponding to the target word example is obtained through the word vector model, the operation similar to a convolutional neural network is added, namely a preset filter is used as a sliding window to divide a subword set corresponding to the target word example into a plurality of subword sequences, each subword sequence is convoluted and passes through a ReLU activation function, then the characteristic mapping corresponding to each subword sequence is subjected to pooling operation, and finally the vector representation corresponding to the target word example is obtained, so that the subwords correspond to the word examples, the extraction of text information of the word examples by the word vector model is enhanced, and the use and fine tuning of downstream natural language processing tasks are facilitated.

An embodiment of the present application further provides a computing device, including a memory, a processor, and computer instructions stored on the memory and executable on the processor, where the processor executes the instructions to implement the following steps:

An embodiment of the present application further provides a computer-readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the word vector representation method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the computer-readable storage medium and the technical solution of the word vector representation method belong to the same concept, and details that are not described in detail in the technical solution of the computer-readable storage medium can be referred to the description of the technical solution of the word vector representation method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A method for word vector representation, comprising:

acquiring a sub-word set represented by the vector of each sub-word corresponding to the target word example;

2. The method of claim 1, wherein obtaining a set of subwords represented by vectors of each subword corresponding to the target word instance comprises:

3. The method of claim 1, wherein dividing the set of subwords corresponding to the target word instance into at least one sequence of subwords comprises:

forming a sliding window with a corresponding scale in the sub-word set according to a preset filter;

and traversing the sliding window through the sub-word set according to a preset step length, and obtaining the vector representation of at least one sub-word contained in the sliding window in each step to obtain at least one sub-word sequence.

4. The method according to claim 1, wherein performing feature extraction on at least one of the sub-word sequences to obtain a feature map corresponding to each of the sub-word sequences comprises:

5. The method of claim 1, wherein pooling the feature maps corresponding to each of the sub-word sequences to obtain the vector representation corresponding to the target word case comprises:

6. A word vector representation apparatus, comprising:

7. The apparatus of claim 6, wherein the vector representation module comprises:

8. The apparatus of claim 6, wherein the subword division module comprises:

9. The apparatus of claim 6, wherein the convolution module comprises:

10. The apparatus of claim 6, wherein the pooling module comprises:

11. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-5 when executing the instructions.

12. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 5.