WO2023168814A1 - 句子向量生成方法、装置、计算机设备及存储介质 - Google Patents

句子向量生成方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2023168814A1
WO2023168814A1 PCT/CN2022/089817 CN2022089817W WO2023168814A1 WO 2023168814 A1 WO2023168814 A1 WO 2023168814A1 CN 2022089817 W CN2022089817 W CN 2022089817W WO 2023168814 A1 WO2023168814 A1 WO 2023168814A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
sequence
model
context
current
Prior art date
Application number
PCT/CN2022/089817
Other languages
English (en)
French (fr)
Inventor
陈浩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023168814A1 publication Critical patent/WO2023168814A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to sentence vector generation methods, devices, computer equipment and storage media.
  • Sentence embedding as a vector representation of text data, is widely used in many application scenarios of natural language processing.
  • mapping text data into a quantifiable vector space we can obtain sentence vector representations that represent text data features, semantics, grammar and other information, and then use vector clustering, classification and other methods to obtain the relationship between text sentences, which can realize the sentence vector in Application in actual scenarios.
  • Existing solutions for sentence vector construction mainly include construction methods based on word vector average and construction methods based on contrastive learning.
  • Construction methods based on word vector average such as word2vec, glove, bert, etc.
  • construction methods based on contrastive learning Construct positive samples for contrastive learning by using different methods, such as dropout, replacement, deletion, back-translation, etc.
  • the inventor realized that the shortcomings of the existing solutions are: 1) The construction method based on the average word vector, which destroys the dependence between words in the sentence, and the accuracy of feature extraction is low; 2) The method based on contrastive learning Construction method.
  • the similarity between the randomly selected negative samples and the original sentences is low, which leads to low difficulty in training the model.
  • the transfer ability of the model in actual tasks is insufficient, which in turn leads to the generated Sentence vectors have lower accuracy.
  • this application provides a sentence vector generation method, device, computer equipment and storage medium.
  • the main purpose is to solve the problem in the existing technology that the construction method based on the word vector average has low accuracy in sentence feature extraction, and based on The construction method of contrastive learning has a technical problem of insufficient transfer ability of the model in actual tasks, resulting in low accuracy of the generated sentence vectors.
  • a sentence vector generation method which method includes:
  • the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
  • the trained sequence-to-sequence model is obtained through the following steps:
  • the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
  • the trained sequence-to-sequence model is obtained.
  • a sentence vector generation device which device includes:
  • the model training module can be used to use the initial sequence-to-sequence model to encode and contextually decode the current sentence in the sequence from the context sentences in the constructed sentence sample set to obtain the above prediction sentence and the below prediction of the current sentence. sentence; and, based on the above predicted sentence and the following predicted sentence, a trained sequence-to-sequence model is obtained;
  • the preprocessing module is used to perform semantic segmentation on the obtained initial sentence text and obtain the segmented sentence text;
  • An encoding module configured to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing used to predict the context of the sentence text.
  • the sentence vector generation model is a trained sequence-to-sequence model. encoding layer.
  • a storage medium on which a computer program is stored.
  • the above sentence vector generation method is implemented, including:
  • the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
  • the trained sequence-to-sequence model is obtained through the following steps:
  • the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
  • the trained sequence-to-sequence model is obtained.
  • a computer device including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor.
  • the processor executes the program, the above sentence vector is realized.
  • Generation methods including:
  • the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
  • the trained sequence-to-sequence model is obtained through the following steps:
  • the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
  • the trained sequence-to-sequence model is obtained.
  • sequence-to-sequence model training is performed on sequences based on contextual sentences, and the encoding layer of the trained sequence-to-sequence model is used to generate sentence vectors, which can effectively improve the accuracy of sentence vector generation on the basis of improving the difficulty of model training. It ensures the integrity of the semantic information and grammatical information of the generated sentence vectors, thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction.
  • the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low.
  • Figure 1 shows a schematic flowchart of a sentence vector generation method provided by an embodiment of the present application
  • Figure 2 shows a schematic flowchart of another sentence vector generation method provided by an embodiment of the present application
  • Figure 3 shows a schematic diagram of the initial sequence-to-sequence model architecture provided by the embodiment of the present application
  • Figure 4 shows a schematic structural diagram of a sentence vector generation device provided by an embodiment of the present application
  • Figure 5 shows a schematic structural diagram of another sentence vector generation device provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • AI is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • this embodiment provides a sentence vector generation method, as shown in Figure 1.
  • This method is explained by taking the method applied to computer equipment such as servers as an example.
  • the server can be an independent server or a cloud-provided server. Services, cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN: Content Delivery Network), and big data and artificial intelligence platforms and other basics Cloud server for cloud computing services.
  • the above method includes the following steps:
  • Step 101 Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
  • the book recommendation scenario it is suitable for recommending other similar books based on the obtained book text content.
  • a book recommendation request is received, according to the book title in the book recommendation request, obtain and For the book text content corresponding to the book title, the book text content is segmented based on Chinese punctuation, and through text segmentation, multiple sentence texts are obtained for input into the sentence vector generation model.
  • the book text content can be book abstract text, book introduction text, etc., which are not specifically limited here.
  • Step 102 Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
  • the sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer; wherein, the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and context decoded on the current sentence in the sequence, and the result is The upper predicted sentence and the lower predicted sentence of the current sentence are used; according to the upper predicted sentence and the lower predicted sentence, a trained sequence-to-sequence model is obtained.
  • the initial sequence-to-sequence model is trained based on the constructed sentence sample set including the context sentence pair sequence, where the context sentence pair sequence includes the current sentence and the context sentence corresponding to the current sentence, and the current sentence is input into the initial sequence to the sequence
  • the coding layer of the model performs encoding processing to obtain a vector representation containing the context feature information of the current sentence.
  • the vector representation containing the context feature information of the current sentence is input into the initial sequence into the two decoding layers set up in parallel in the sequence model.
  • the current sentence is obtained through decoding processing.
  • the coding layer of the trained sequence-to-sequence model has the coding ability to accurately predict the current sentence context and can retain the integrity of the semantic information and grammatical information of the current sentence context. Therefore, the vector representation output on this basis can contain the current sentence. Complete contextual feature information to ensure the accuracy of subsequent book recommendations.
  • the context sentence pair sequence constructed based on the current sentence and its context sentences is used as the input data of the initial sequence-to-sequence model, which can retain the interdependence and mutual influence between words without destroying the overall structure of the text data, thereby ensuring
  • the model can learn the complete semantic information and grammatical information contained in the sentence text, improving the accuracy of the model in extracting contextual sentence features.
  • the obtained initial sentence text can be semantically segmented according to the above solution to obtain the segmented sentence text, and a pre-built sentence vector generation model can be used to predict the context of the sentence text through coding processing.
  • the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model.
  • this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and uses the coding layer of the trained sequence-to-sequence model
  • the generated sentence vector of the sentence text can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation.
  • Step 201 Use the initial sequence-to-sequence model to perform encoding and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the upper predicted sentence and the lower predicted sentence of the current sentence.
  • the context sentence pair sequence specifically includes: the current sentence used to be input to the coding layer of the initial sequence to sequence model for context sentence prediction; and the above sentence used to train the output result of the initial sequence to sequence model.
  • the target sentence and the target sentence below, the output result is the predicted sentence above and the predicted sentence below output during the model training process.
  • step 201 may specifically include: using a word segmentation tool to perform word segmentation processing according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation;
  • the context sentence of the current sentence in the sequence is used to obtain the sentence embedding vector of the current sentence using the encoding layer of the initial sequence to sequence model; according to the sentence embedding vector of the current sentence, the initial sequence to sequence model is used
  • Two decoding layers are set up in parallel to obtain the upper prediction sentence and the lower prediction sentence respectively; wherein, the two decoding layers refer to the first decoding layer used to predict the upper part, and the second decoding layer used to predict the lower part. layer.
  • the first decoding layer used to predict the upper part is a first GRU model
  • the second decoding layer used to predict the lower part is a second GRU.
  • the step of obtaining the above predicted sentence and the below predicted sentence respectively according to the sentence embedding vector of the current sentence, using the two decoding layers set in parallel in the initial sequence to sequence model specifically includes:
  • the sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing;
  • the sentence embedding vector of the current sentence is used as the input data of the first GRU model.
  • the input data of the two GRU models are decoded to obtain the following predicted sentences of the current sentence.
  • the step of using the initial sequence-to-sequence model to obtain the above predicted sentence and the following predicted sentence of the current sentence based on the current sentence in the context sentence pair sequence it also includes: constructing a sentence sample set, and the sentence sample set includes the context. Sentence pair sequence. Specific steps include:
  • the sequence of context sentence pairs is expressed as (S 1 ,S 2 ,S 3 ), (S 2 ,S 3 ,S 4 ), (S 3 ,S 4 ,S 5 ), (S i-1 ,S i , S i+1 ),..., (S n-2 ,S n-1 ,S n ), where S i represents the current sentence, S i-1 represents the above target sentence adjacent to S i , S i+1 Represents the following target sentence adjacent to Si .
  • the encoding layer Encoder of the initial sequence-to-sequence model is used to output the sentence embedding vector sentence embedding h s of the current sentence, and the first decoding layer pre-Decoder for predicting the above sentence sequence and the first decoding layer for predicting the following sentence sequence are simultaneously input
  • the second decoding layer next-Decoder uses the first decoding layer pre-Decoder and the second decoding layer next-Decoder to respectively obtain the upper prediction sentence of the current sentence and the lower prediction sentence of the current sentence. As shown in Figure 3, specific steps include:
  • the initial sequence-to-sequence model includes one encoding layer and two decoding layers.
  • the basic models of the encoding layer and the decoding layer are both gated recurrent units (GRU: Gate Recurrent Unit). ).
  • the next-Encoder decodes the sentence embedding vector sentence embedding h s synchronously, and obtains the upper prediction sentence of the current sentence and the lower prediction sentence corresponding to the current sentence. Specifically include:
  • the sentence embedding vector sentence embedding h s as the input of the first decoding layer pre-Decoder (above decoding), and obtain the above predicted sentence Y i-1 corresponding to the current sentence through decoding processing.
  • the sentence embedding vector sentence embedding h s of the current sentence Si is used to predict the above predicted sentence Y i-1 corresponding to the current sentence. Since upward prediction does not conform to the characteristics of natural language, the training difficulty of the first decoding layer is greater than that of the second
  • the decoding layer next-Decoder improves the GRU model architecture to improve the accuracy of the above prediction while ensuring training efficiency and preventing gradient disappearance.
  • the GRU model at each moment can combine the sentence embedding vector sentence embedding h s of the current sentence S i .
  • the specific formula is as follows:
  • z t represents the update gate of the GRU model
  • W z , U z are the update gate parameters of the original GRU model
  • x t represents the input vector at the current time t
  • h t-1 represents the previous moment, that is, the input vector at time t-1
  • V z represents the parameters set for the sentence embedding vector sentence embedding h s .
  • the reset gate r t and candidate memory units of the GRU model They all incorporate sentence embedding h s , W r , U r , V r represents the parameters of the reset gate, tanh represents the activation function, W k , U k , V k represents the parameters of the candidate memory unit, h t represents the current moment t Output vector, ⁇ represents the fully connected layer with activation function, ⁇ represents the multiplication operation of the corresponding elements of the vector.
  • next-Encoder through decoding processing, obtains the following predicted sentence Y i+1 corresponding to the current sentence. Among them, predicting the following sentences based on the current sentence is in line with the top-down characteristics of natural language. Therefore, the second decoding layer next-Encoder uses the existing GRU model, and the sentence embedding vector sentence embedding h s is only used as the initial vector of the second decoding layer.
  • predicting the previous sentence of the current sentence based on the encoder-decoder model framework breaks the top-down rule of natural language, increases the difficulty of model training, and enables the model to be fully trained, thereby outputting complete semantic signals and
  • the sentence vector representation of grammatical information furthermore, by improving the update gate, reset gate and candidate memory unit of the GRU model, can effectively ensure the training efficiency of the model while improving the difficulty of model training.
  • Step 202 Use the target loss function to train the initial sequence-to-sequence model based on the upper predicted sentence and the lower predicted sentence of the current sentence to obtain a trained sequence-to-sequence model.
  • the target loss function is determined based on the sum of the first loss function and the second loss function, and the first loss function in the target loss function is set based on the first decoding layer used to predict the above, The second loss function in the target loss function is set based on the second decoding layer used to predict the following.
  • the target loss function is used to train the initialized sequence-to-sequence model. network parameters until the initialized sequence-to-sequence model converges, and the trained sequence-to-sequence model is obtained.
  • the cross-entropy loss function is used as the basic loss function, and the specific formula is:
  • CE represents the cross entropy loss function
  • S represents the current sentence
  • Y represents the predicted sentence generated by the decoding layer Decoder
  • l represents the number of tokens determined after segmentation of the current sentence S
  • t j represents the jth token obtained by segmenting the current sentence S.
  • token y j represents the j-th token in the predicted sentence Y.
  • the corresponding upper sentence loss function (first loss function) and the lower sentence are determined.
  • Loss function (second loss function) and then obtain the target loss function of the initialized sequence-to-sequence model, that is, the sum of the above sentence loss function and the following sentence loss function.
  • the initialized sequence-to-sequence model is trained until the target loss function value of the initialized sequence-to-sequence model is reached.
  • the training ends and the trained sequence-to-sequence model is obtained.
  • Step 203 Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
  • Step 204 Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
  • the sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer.
  • the coding layer of the trained sequence-to-sequence model is extracted as a sentence vector generation model, so that after receiving a book recommendation request, according to the book title in the book recommendation request, the introduction text corresponding to the book title is obtained, based on Chinese punctuation is used to segment the introduction text into sentences, and the Harbin Institute of Technology LTP model is used to perform word segmentation processing on the segmented introduction text to obtain the sentence text after word segmentation.
  • the sentence vector generation model is then used to encode the sentence text to obtain the vector representation of the sentence text. .
  • Step 205 Calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library, where the sentence embedding vector in the preset book sample library is generated using the sentence vector obtained from the model output.
  • the sentence vector generation model is used to output the sentence embedding vector corresponding to the introduction text, thereby constructing a preset book sample library based on the output sentence embedding vector, and using cosine similarity
  • This algorithm calculates the similarity value between the corresponding sentence vector output according to the book recommendation request and the sentence embedding vector corresponding to each book in the preset book sample library.
  • Step 206 Generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values meet the preset conditions in the preset book sample library.
  • the book when the user browses a book on the platform, the book is used as the target book, and a book recommendation request containing the title of the target book is generated.
  • the sentence vector generation model is used to generate the corresponding sentence vectors, and then calculate the similarity values between the generated sentence vectors and each set of sentence embedding vectors in the preset book sample library corresponding to the platform, and arrange them in descending order to embed sentences whose similarity values meet the preset conditions.
  • the book information corresponding to the vector is recommended to the user as a similar book.
  • the obtained initial sentence text is semantically segmented to obtain the segmented sentence text, and the pre-built sentence vector generation model is used to generate the coding process for predicting the context of the sentence text.
  • the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model.
  • performing sequence-to-sequence model training on sequences based on context sentences and using the encoding layer of the trained sequence-to-sequence model to generate sentence vectors can effectively improve the accuracy of sentence vector generation and ensure the generation of sentence vectors while improving the difficulty of model training.
  • the integrity of the semantic information and grammatical information of the sentence vector thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction, and based on contrastive learning
  • the construction method the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low, which is a technical problem.
  • the embodiment of the present application provides a sentence vector generation device, as shown in Figure 4.
  • the device includes: a model training module 41, a preprocessing module 42, and an encoding module 43.
  • the model training module 41 can be used to use the initial sequence-to-sequence model to perform encoding processing and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the above predicted sentence and the following sentence of the current sentence. Predict sentences; based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.
  • the preprocessing module 42 can be used to perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
  • the encoding module 43 may be used to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
  • the sentence vector generation model is a trained sequence to The encoding layer of the sequence model.
  • a book recommendation module 44 is also included.
  • the model training module 41 includes a training unit 411.
  • the training unit 411 may be used to train the initial sequence-to-sequence model using a target loss function based on the above prediction sentence and the below prediction sentence of the current sentence, and obtain a trained sequence-to-sequence model; wherein, The target loss function is determined based on the sum of the first loss function and the second loss function.
  • the context sentence pair sequence specifically includes: a current sentence used to be input to the encoding layer of the initial sequence to sequence model for context sentence prediction; and, used to train the initial sequence to sequence model.
  • the upper target sentence and the lower target sentence of the output result are the upper prediction sentence and the lower prediction sentence output during the model training process.
  • the model training module 41 can be used to perform word segmentation processing using a word segmentation tool according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation.
  • the sequence of context sentence pairs after word segmentation For the current sentence in the sequence, use the encoding layer of the initial sequence to the sequence model to obtain the sentence embedding vector of the current sentence.
  • the sentence embedding vector of the current sentence use the initial sequence to the sequence set in parallel in the sequence model.
  • the two decoding layers respectively obtain the upper prediction sentence and the lower prediction sentence, wherein the two decoding layers refer to the first decoding layer used to predict the upper part and the second decoding layer used to predict the lower part.
  • the first decoding layer used to predict the upper part is a first GRU model
  • the second decoding layer used to predict the lower part is a second GRU model
  • the step of using the sentence embedding vector of the current sentence and using the two decoding layers set up in parallel in the initial sequence to sequence model to obtain the above predicted sentence and the following predicted sentence respectively specifically including:
  • the sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing;
  • the sentence embedding vector of the current sentence is As the input data of the second GRU model, the following predicted sentence of the current sentence is obtained through decoding processing.
  • the first loss function in the target loss function is set based on the first decoding layer used to predict the above, and the second loss function in the target loss function is based on the first loss function used in prediction. Set by the second decoding layer below.
  • the book recommendation module 44 includes a similarity calculation unit 441 and a generation unit 442.
  • the similarity calculation unit 441 may be used to calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library.
  • the generation unit 442 may be configured to generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values satisfy the preset conditions in the preset book sample library; wherein, the sentences in the preset book sample library The embedding vector is obtained using the sentence vector generation model output.
  • Sentence vector generation methods including:
  • the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
  • the trained sequence-to-sequence model is obtained through the following steps:
  • the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
  • the trained sequence-to-sequence model is obtained.
  • the steps of obtaining a trained sequence-to-sequence model based on the above predicted sentences and the following predicted sentences include:
  • the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model
  • the target loss function is determined based on the sum of the first loss function and the second loss function.
  • context sentence pair sequence specifically includes:
  • the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
  • the storage medium is a computer-readable storage medium, which may be non-volatile or volatile.
  • the technical solution of the present application can be embodied in the form of a software product.
  • the software product can be stored in a storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) and includes a number of instructions to enable
  • a computer device which may be a personal computer, a server, or a network device, etc. executes the methods described in each implementation scenario of this application.
  • embodiments of the present application also provide a computer device, which can be a personal computer, Server, network equipment, etc.
  • the physical equipment includes a storage medium and a processor; the storage medium is used to store the computer program; the processor is used to execute the computer program to implement the above sentence vector generation method as shown in Figure 1 and Figure 2, include:
  • the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
  • the trained sequence-to-sequence model is obtained through the following steps:
  • the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
  • the trained sequence-to-sequence model is obtained.
  • the steps of obtaining a trained sequence-to-sequence model based on the above predicted sentences and the following predicted sentences include:
  • the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model
  • the target loss function is determined based on the sum of the first loss function and the second loss function.
  • context sentence pair sequence specifically includes:
  • the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, etc.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc.
  • the optional user interface may also include a USB interface, a card reader interface, etc.
  • Optional network interfaces may include standard wired interfaces, wireless interfaces (such as Bluetooth interfaces, WI-FI interfaces), etc.
  • a computer device does not constitute a limitation on the physical device, and may include more or less components, or combine certain components, or arrange different components.
  • the storage medium may also include an operating system and a network communication module.
  • An operating system is a program that manages the hardware and software resources of a computer device and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to implement communication between components within the storage medium, as well as communication with other hardware and software in the physical device.
  • this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and utilizes well-trained
  • the sentence vectors of sentence texts generated by the encoding layer of the sequence-to-sequence model can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation, thereby effectively avoiding the existing problem based on the average value of word vectors.
  • the construction method destroys the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction.
  • the construction method based on contrastive learning the training difficulty of the model is low, and the transfer ability of the model in actual tasks is insufficient.
  • the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present application.
  • the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or can be correspondingly changed and located in one or more devices different from the implementation scenario.
  • the modules of the above implementation scenarios can be combined into one module or further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了句子向量生成方法、装置、计算机设备及存储介质,涉及人工智能技术领域,可以提升句子向量生成的准确性。其中方法包括:对获取到的初始句子文本进行语义分割,得到分割后的句子文本;利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层。本申请适用于基于图书文本句子向量的图书推荐。

Description

句子向量生成方法、装置、计算机设备及存储介质
本申请要求于2022年3月9日提交中国专利局、申请号为202210232057.9、申请名称为“句子向量生成方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及句子向量生成方法、装置、计算机设备及存储介质。
背景技术
自然语言处理是计算机科学领域与人工智能领域中的一个重要方向,句子向量(sentence embedding)作为文本数据的向量表示被广泛应用在自然语言处理的诸多应用场景中。通过将文本数据映射到可量化的向量空间,得到表征文本数据特征、语义、语法等信息的句子向量表示,进而利用向量聚类,分类等方法得到文本句子之间的关系,能够实现句子向量在实际场景中的应用。
现有用于句子向量构造的解决方案主要包括基于词向量平均值的构造方法和基于对比学***均值的构造方法如word2vec、glove、bert等;基于对比学***均值的构造方法,其破坏了句子中词语之间的依赖关系,特征提取的准确性较低;2)基于对比学习的构造方法,虽然获取正样本的方法很多,但随机选取的负样本和原始句子之间的相似度较低,导致模型的训练难度较低,模型在实际任务中的迁移能力不足,进而导致生成的句子向量的准确度较低。
发明内容
有鉴于此,本申请提供了句子向量生成方法、装置、计算机设备及存储介质,主要目的在于解决现有技术中,基于词向量平均值的构造方法存在句子特征提取的准确性较低,以及基于对比学习的构造方法存在模型在实际任务中的迁移能力不足,导致生成的句子向量的准确度较低的技术问题。
根据本申请的一个方面,提供了一种句子向量生成方法,该方法包括:
对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
其中,所述训练好的序列到序列模型通过下述步骤得到:
利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
根据本申请的另一方面,提供了一种句子向量生成装置,该装置包括:
模型训练模块,可以用于利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测 句子和下文预测句子;以及,根据上文预测句子和下文预测句子,得到训练好的序列到序列模型;
预处理模块,用于对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
编码模块,用于利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层。
依据本申请又一个方面,提供了一种存储介质,其上存储有计算机程序,所述程序被处理器执行时实现上述句子向量生成方法,包括:
对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
其中,所述训练好的序列到序列模型通过下述步骤得到:
利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
依据本申请再一个方面,提供了一种计算机设备,包括存储介质、处理器及存储在存储介质上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述句子向量生成方法,包括:
对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
其中,所述训练好的序列到序列模型通过下述步骤得到:
利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
借由上述技术方案,基于上下文句子对序列进行序列到序列模型训练,利用训练好的序列到序列模型的编码层生成句子向量,能够在提升模型训练难度的基础上,有效提升句子向量生成的准确性,保证生成的句子向量语义信息和语法信息的完整性,从而有效避免现有基于词向量平均值的构造方法,破坏句子中词语之间的依赖关系,导致句子特征提取的准确性较低,以及基于对比学习的构造方法,模型的训练难度较低,模型在实际任务中的迁移能力不足,生成的句子向量的准确度较低的技术问题。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出了本申请实施例提供的一种句子向量生成方法的流程示意图;
图2示出了本申请实施例提供另一种句子向量生成方法的流程示意图;
图3示出了本申请实施例提供的初始序列到序列模型架构示意图;
图4示出了本申请实施例提供的一种句子向量生成装置的结构示意图;
图5示出了本申请实施例提供的另一种句子向量生成装置的结构示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(AI:Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互***、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
针对现有技术中基于词向量平均值的构造方法存在句子特征提取的准确性较低,以及基于对比学***台等基础云计算服务的云服务器。上述方法包括以下步骤:
步骤101、对获取到的初始句子文本进行语义分割,得到分割后的句子文本。
在本实施例中,以图书推荐场景为例,适用于基于获取到的图书文本内容推荐其他相似图书,具体为,当接收到图书推荐请求时,根据图书推荐请求中的图书书名,获取与图书书名对应的图书文本内容,基于中文标点对图书文本内容进行断句,通过文本分割得到用于输入句子向量生成模型的多个句子文本。根据实际应用场景的需要,图书文本内容可以为图书摘要文本、图书简介文本等,此处不具体限定。
步骤102、利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;其中,所述训练好的序列到序列模型通过下述步骤得到:利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
在本实施例中,基于构建的包括上下文句子对序列的句子样本集训练初始序列到序列模型,其中,上下文句子对序列包括当前句子和当前句子对应的上下文句子,将当前句子输入初始序列到序列模型的编码层进行编码处理,得到包含当前句子上下文特征信息的向量表示,将包含当前句子上下文特征信息的向量表示分别输入初始序列到序列模型并行设置的两个解码层,通过解码处理得到当前句子的上文预测句子和下文预测句子,进一步地,通过将上下文句子对序列中当前句子的上文句子和下文句子作为上文预测句子和下文预测句子的训练目标,得到训练好的序列到序列模型。可见,训练好的序列到序列模型的编码层具有准确预测当前句子上下文的编码能力,能够保留当前句子上下文的语义信息和语法信息的完整性,因此在此基础上输出的向量表示能够包含当前句子的完整上下文特征信息,进而保证后续图书推荐的准确性。
其中,将基于当前句子及其上下文句子构建的上下文句子对序列作为初始序列到序列模型的输入数据,能够不破坏文本数据的整体结构,保留词语之间相互依赖,相互影响的 文本特征,从而保证模型能够学习到句子文本蕴含的完整语义信息和语法信息,提升模型对上下文句子特征提取的准确性。
对于本实施例可以按照上述方案,对获取到的初始句子文本进行语义分割,得到分割后的句子文本,并利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;其中,所述训练好的序列到序列模型通过下述步骤得到:利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。与现有基于词向量平均值的构造、基于对比学习的构造等句子向量生成方案相比,本实施例利用上下文句子对序列进行序列到序列模型训练,利用训练好的序列到序列模型的编码层生成的句子文本的句子向量,能够保证句子文本语义信息和语法信息的完整性,从而有效提升句子向量生成的准确性。
进一步的,作为上述实施例具体实施方式的细化和扩展,为了完整说明本实施例的具体实施过程,提供了另一种句子向量生成方法,如图2所示,该方法包括:
步骤201、利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子。
其中,所述上下文句子对序列具体包括:用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
为了说明步骤201的具体实施方式,作为一种优选实施例,步骤201具体可以包括:根据所述上下文句子对序列,利用分词工具进行分词处理得到分词后的上下文句子对序列;根据所述分词后的上下文句子对序列中的当前句子,利用所述初始序列到序列模型的编码层,得到所述当前句子的句子嵌入向量;根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子;其中,所述两个解码层是指用于预测上文的第一解码层,以及用于预测下文的第二解码层。
为了说明步骤201的具体实施方式,作为另一种优选实施例,所述用于预测上文的第一解码层为第一GRU模型,所述用于预测下文的第二解码层为第二GRU模型,所述根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子的步骤,具体包括;将所述当前句子的句子嵌入向量分别作为第一GRU模型中重置门、更新门和候选记忆单元的输入数据,通过解码处理得到当前句子的上文预测句子;将所述当前句子的句子嵌入向量作为第二GRU模型的输入数据,通过解码处理得到当前句子的下文预测句子。
实施中,根据上下文句子对序列中的当前句子,利用初始序列到序列模型得到当前句子的上文预测句子和下文预测句子的步骤之前,还包括:构建句子样本集,所述句子样本集包括上下文句子对序列。具体步骤包括:
1)随机选取任意的图书文本,基于中文标点对选取的图书文本进行语句分割,得到图书文本D,D=[S 1,S 2,S 3,S 4,S 5…S i,…,S n],其中,S i表示图书文本D中的第i个句子,n表示图书文本D经语句分割得到的句子个数。例如,图书文本集合包括3727本图书,以及每本图书的所有文本内容,从中随机选取任意图书文本,并对所选取的图书文本的所有文本内容进行语句分割。
2)基于图书文本D构建上下文句子对序列sentence pairs,即通过遍历图书文本D中的每个句子构建上下文句子对序列,得到句子样本集G。其中,上下文句子对序列表示为 (S 1,S 2,S 3)、(S 2,S 3,S 4)、(S 3,S 4,S 5)、(S i-1,S i,S i+1)、…、(S n-2,S n-1,S n),其中S i表示当前句子,S i-1表示与S i相邻的上文目标句子,S i+1表示与S i相邻的下文目标句子。
实施中,利用初始序列到序列模型的编码层Encoder输出当前句子的句子嵌入向量sentence embedding h s,并同步输入用于预测上文句子序列的第一解码层pre-Decoder和用于预测下文句子序列的第二解码层next-Decoder,利用第一解码层pre-Decoder和第二解码层next-Decoder分别得到当前句子的上文预测句子和当前句子的下文预测句子。如图3所示,具体步骤包括:
1)利用分词工具(哈工大LTP模型)对句子样本集G的每个上下文句子对序列中的句子进行分词处理,得到分词后的句子表示为S i[t 1,t 2,…,t p,…,t l],其中,t p表示S i中第p个token,l表示S i分词后得到的token个数。
2)基于encoder-decoder模型架构构建初始序列到序列模型,初始序列到序列模型包括一个编码层和两个解码层,编码层和解码层的基础模型均为门控循环单元(GRU:Gate Recurrent Unit)。
3)将分词处理后的句子样本集G作为初始序列到序列模型的输入,将每个句子对序列中的当前句子输入初始序列到序列模型的编码层Encoder,通过编码处理得到当前句子的句子嵌入向量sentence embedding h s,利用第一解码层pre-Decoder和第二解码层
next-Encoder对句子嵌入向量sentence embedding h s同步进行解码处理,分别得到当前句子的上文预测句子和当前句子对应的下文预测句子。具体包括:
①将上下文句子对序列中的当前句子作为初始序列到序列模型编码层Encoder的输入,以(S i-1,S i,S i+1)为例,将分词后的(S i-1,S i,S i+1)中的句子S i=[t 1,t 2,…,t p,…,t l]输入编码层Encoder,通过编码处理得到S i的句子嵌入向量sentence embedding h s
②将句子嵌入向量sentence embedding h s作为第一解码层pre-Decoder(上文解码)的输入,通过解码处理得到当前句子对应的上文预测句子Y i-1。其中,根据当前句子S i的句子嵌入向量sentence embedding h s预测当前句子对应的上文预测句子Y i-1,由于向上预测不符合自然语言的特点,因此第一解码层的训练难度大于第二解码层next-Decoder(下文解码),对GRU模型架构进行改进,在提升上文预测准确性的同时,保证训练效率,防止梯度消失。具体为,通过向第一解码层中的更新门、重置门及候选记忆单元的输入端增加当前句子的嵌入向量sentence embedding h s并设置相应参数,以保证在token-by-token生成的过程中,每个时刻的GRU模型均能够结合当前句子S i的句子嵌入向量sentence embedding h s,具体公式如下:
z t=σ(W zx t+U zh t-1+V zh s)
r t=σ(W rx t+U rh t-1+V rh s)
Figure PCTCN2022089817-appb-000001
Figure PCTCN2022089817-appb-000002
其中,z t表示GRU模型的更新门,W z,U z为原始GRU模型更新门参数,x t表示当前时刻t的输入向量,h t-1表示前一时刻,即t-1时刻传到当前时刻t的向量,V z表示针对句子嵌入向量sentence embedding h s设置的参数。同理,GRU模型的重置门r t和候选记忆单元
Figure PCTCN2022089817-appb-000003
都融合了sentence embedding h s,W r,U r,V r表示重置门的参数,tanh表示激活函数,W k,U k,V k表示候选记忆单元的参数,h t表示当前时刻t的输出向量,σ表示带有激活函数的全连接层,☉表示向量对应元素相乘运算。
③与第一解码层同步,将句子嵌入向量sentence embedding h s输入第二解码层
next-Encoder,通过解码处理得到当前句子对应的下文预测句子Y i+1。其中,基于当前句子预测下文句子,符合自然语言自上而下的特点,因此第二解码层next-Encoder采用现有GRU模型,句子嵌入向量sentence embedding h s仅作为第二解码层的初始向量。
可见,基于encoder-decoder模型框架对当前句子的上文句子进行预测,打破了自然语言自上而下的规律,提升了模型训练的难度,使得模型得以充分的训练,从而输出包含完整语义信号和语法信息的句子向量表示,进一步地,通过对GRU模型的更新门、重置门及候选记忆单元的改进,能够在提升模型训练的难度的同时,有效保证模型的训练效率。
步骤202、根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型。其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的,所述目标损失函数中的第一损失函数是基于用于预测上文的第一解码层设定的,所述目标损失函数中的第二损失函数是基于用于预测下文的第二解码层设定的。
实施中,根据上文目标句子S i-1、下文目标句子S i+1,以及上文预测句子Y i-1、下文预测句子Y i+1,利用目标损失函数训练初始化的序列到序列模型的网络参数,直至初始化的序列到序列模型收敛,得到训练好的序列到序列模型。具体地,利用交叉熵损失函数作为基础损失函数,具体公式为:
Figure PCTCN2022089817-appb-000004
其中,CE表示交叉熵损失函数,S表示当前句子,Y表示解码层Decoder生成的预测句子,l表示当前句子S分词后确定的token个数,t j表示当前句子S经过分词得到的第j个token,y j表示预测句子Y中的第j个token。
进一步地,基于分别用于输出上文预测句子和下文预测句子的第一解码层pre-Decoder和第二解码层next-Encoder,确定相应的上文句子损失函数(第一损失函数)和下文句子损失函数(第二损失函数),进而得到初始化的序列到序列模型的目标损失函数,即上文句子损失函数和下文句子损失函数之和,具体公式如下:
Figure PCTCN2022089817-appb-000005
其中,
Figure PCTCN2022089817-appb-000006
表示上文句子损失函数pre-loss,
Figure PCTCN2022089817-appb-000007
表示下文句子损失函数next-loss。
根据实际应用场景的需要,通过设定批大小batch size为128,时期epoch为50,学习率lr为0.005,对初始化的序列到序列模型进行训练,直到初始化的序列到序列模型的目标损失函数值趋于稳定,训练结束,得到训练好的序列到序列模型。
步骤203、对获取到的初始句子文本进行语义分割,得到分割后的句子文本。
步骤204、利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层。
实施中,提取训练好的序列到序列模型的编码层作为句子向量生成模型,以便在接收到图书推荐请求后,根据图书推荐请求中的图书书名,获取与图书书名对应的简介文本,基于中文标点对简介文本进行语句分割,并利用哈工大LTP模型对分割后的简介文本进行分词处理,得到分词后的句子文本,进而利用句子向量生成模型对句子文本进行编码处理,得到句子文本的向量表示。
步骤205、计算所述句子文本的向量表示与预设图书样本库中的句子嵌入向量之间的相似度值,其中,所述预设图书样本库中的句子嵌入向量是利用所述句子向量生成模型输出得到的。
实施中,根据初始图书样本库中每本图书的简介文本,利用所述句子向量生成模型输出对应简介文本的句子嵌入向量,从而基于输出的句子嵌入向量构建预设图书样本库,利 用余弦值相似性算法,计算出根据图书推荐请求输出的相应句子向量与预设图书样本库中每本图书对应的句子嵌入向量的相似度值。
步骤206、根据所述预设图书样本库中相似度值满足预设条件的句子嵌入向量,生成所述句子文本的图书推荐信息。
实施中,当用户在平台上浏览一本图书时,将该图书作为目标图书,生成包含该目标图书书名的图书推荐请求,根据目标图书书名对应的简介文本,利用句子向量生成模型生成相应的句子向量,进而分别计算出所生成的句子向量与该平台对应的预设图书样本库中每组句子嵌入向量的相似度值,并进行降序排列,以便将相似度值满足预设条件的句子嵌入向量对应的图书信息作为相似图书推荐给用户,实验发现,线上ABtest结果显示,基于本实施例得到的用户点击率能够有效提升2.31%。
通过应用本实施例的技术方案,对获取到的初始句子文本进行语义分割,得到分割后的句子文本,并利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;其中,所述训练好的序列到序列模型通过下述步骤得到:利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。可见,基于上下文句子对序列进行序列到序列模型训练,利用训练好的序列到序列模型的编码层生成句子向量,能够在提升模型训练难度的基础上,有效提升句子向量生成的准确性,保证生成的句子向量语义信息和语法信息的完整性,从而有效避免现有基于词向量平均值的构造方法,破坏句子中词语之间的依赖关系,导致句子特征提取的准确性较低,以及基于对比学习的构造方法,模型的训练难度较低,模型在实际任务中的迁移能力不足,生成的句子向量的准确度较低的技术问题。
进一步地,作为图1方法的具体实现,本申请实施例提供了一种句子向量生成装置,如图4所示,该装置包括:模型训练模块41、预处理模块42、编码模块43。
模型训练模块41,可以用于利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
预处理模块42,可以用于对获取到的初始句子文本进行语义分割,得到分割后的句子文本。
编码模块43,可以用于利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层。
在具体的应用场景中,如图5所示,还包括图书推荐模块44。
在具体的应用场景中,模型训练模块41包括训练单元411。
训练单元411,可以用于根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
在具体的应用场景中,所述上下文句子对序列具体包括:用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
在具体的应用场景中,所述模型训练模块41,具体可以用于根据所述上下文句子对序列,利用分词工具进行分词处理得到分词后的上下文句子对序列,根据所述分词后的上下文句子对序列中的当前句子,利用所述初始序列到序列模型的编码层,得到所述当前句子 的句子嵌入向量,根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子,其中,所述两个解码层是指用于预测上文的第一解码层,以及用于预测下文的第二解码层。
在具体的应用场景中,所述用于预测上文的第一解码层为第一GRU模型,所述用于预测下文的第二解码层为第二GRU模型,所述根据所述当前句子的句子嵌入向量;所述根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子的步骤,具体包括:将所述当前句子的句子嵌入向量分别作为第一GRU模型中重置门、更新门和候选记忆单元的输入数据,通过解码处理得到当前句子的上文预测句子;将所述当前句子的句子嵌入向量作为第二GRU模型的输入数据,通过解码处理得到当前句子的下文预测句子。
在具体的应用场景中,所述目标损失函数中的第一损失函数是基于用于预测上文的第一解码层设定的,所述目标损失函数中的第二损失函数是基于用于预测下文的第二解码层设定的。
在具体的应用场景中,图书推荐模块44包括相似度计算单元441、生成单元442。
相似度计算单元441,可以用于计算所述句子文本的向量表示与预设图书样本库中的句子嵌入向量之间的相似度值。
生成单元442,可以用于根据所述预设图书样本库中相似度值满足预设条件的句子嵌入向量,生成所述句子文本的图书推荐信息;其中,所述预设图书样本库中的句子嵌入向量是利用所述句子向量生成模型输出得到的。
需要说明的是,本申请实施例提供的一种句子向量生成装置所涉及各功能单元的其他相应描述,可以参考图1和图2中的对应描述,在此不再赘述。
基于上述如图1和图2所示方法,相应的,本申请实施例还提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述如图1和图2的句子向量生成方法,包括:
对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
其中,所述训练好的序列到序列模型通过下述步骤得到:
利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
可选的,所述根据上文预测句子和下文预测句子,得到训练好的序列到序列模型的步骤,具体包括:
根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
可选的,所述上下文句子对序列具体包括:
用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
可选的,所述存储介质为计算机可读存储介质,可以是非易失性,也可以是易失性。
基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使 得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。
基于上述如图1、图2所示的方法,以及图4、图5所示的虚拟装置实施例,为了实现上述目的,本申请实施例还提供了一种计算机设备,具体可以为个人计算机、服务器、网络设备等,该实体设备包括存储介质和处理器;存储介质,用于存储计算机程序;处理器,用于执行计算机程序以实现上述如图1和图2所示的句子向量生成方法,包括:
对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
其中,所述训练好的序列到序列模型通过下述步骤得到:
利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
可选的,所述根据上文预测句子和下文预测句子,得到训练好的序列到序列模型的步骤,具体包括:
根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
可选的,所述上下文句子对序列具体包括:
用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
可选的,该计算机设备还可以包括用户接口、网络接口、摄像头、射频(Radio Frequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如蓝牙接口、WI-FI接口)等。
本领域技术人员可以理解,本实施例提供的一种计算机设备结构并不构成对该实体设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储介质中还可以包括操作***、网络通信模块。操作***是管理计算机设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块用于实现存储介质内部各组件之间的通信,以及与该实体设备中其它硬件和软件之间通信。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。通过应用本申请的技术方案,与现有基于词向量平均值的构造以及基于对比学***均值的构造方法,破坏句子中词语之间的依赖关系,导致句子特征提取的准确性较低,以及基于对比学习的构造方法,模型的训练难度较低,模型在实际任务中的迁移能力不足,生成的句子向量的准确度较低的技术问题。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实 施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。

Claims (20)

  1. 一种句子向量生成方法,其中,包括:
    对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
    利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
    其中,所述训练好的序列到序列模型通过下述步骤得到:
    利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
    根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
  2. 根据权利要求1所述的方法,其中,所述根据上文预测句子和下文预测句子,得到训练好的序列到序列模型的步骤,具体包括:
    根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
    其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
  3. 根据权利要求1或2所述的方法,其中,所述上下文句子对序列具体包括:
    用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
    以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
  4. 根据权利要求1所述的方法,其中,所述利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子的步骤,具体包括:
    根据所述上下文句子对序列,利用分词工具进行分词处理得到分词后的上下文句子对序列;
    根据所述分词后的上下文句子对序列中的当前句子,利用所述初始序列到序列模型的编码层,得到所述当前句子的句子嵌入向量;
    根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子;
    其中,所述两个解码层是指用于预测上文的第一解码层,以及用于预测下文的第二解码层。
  5. 根据权利要求4所述的方法,其中,所述用于预测上文的第一解码层为第一GRU模型,所述用于预测下文的第二解码层为第二GRU模型,所述根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子的步骤,具体包括:
    将所述当前句子的句子嵌入向量分别作为第一GRU模型中重置门、更新门和候选记忆单元的输入数据,通过解码处理得到当前句子的上文预测句子;
    将所述当前句子的句子嵌入向量作为第二GRU模型的输入数据,通过解码处理得到当前句子的下文预测句子。
  6. 根据权利要求2或4所述的方法,其中,所述目标损失函数中的第一损失函数是基于用于预测上文的第一解码层设定的,所述目标损失函数中的第二损失函数是基于用于预测下文的第二解码层设定的。
  7. 根据权利要求1所述的方法,其中,所述利用所述句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示的步骤之后,还包括:
    计算所述句子文本的向量表示与预设图书样本库中的句子嵌入向量之间的相似度值;
    根据所述预设图书样本库中相似度值满足预设条件的句子嵌入向量,生成所述句子文本的图书推荐信息;
    其中,所述预设图书样本库中的句子嵌入向量是利用所述句子向量生成模型输出得到的。
  8. 一种句子向量生成装置,其中,包括:
    模型训练模块,用于利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;以及,根据上文预测句子和下文预测句子,得到训练好的序列到序列模型;
    预处理模块,用于对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
    编码模块,用于利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层。
  9. 根据权利要求8所述的装置,其中,所述模型训练模块,具体包括:
    训练单元,用于根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
    其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
  10. 根据权利要求8或9所述的装置,其中,所述上下文句子对序列具体包括:
    用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
    以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
  11. 根据权利要求8所述的装置,其中,所述模型训练模块,具体包括:
    根据所述上下文句子对序列,利用分词工具进行分词处理得到分词后的上下文句子对序列;
    根据所述分词后的上下文句子对序列中的当前句子,利用所述初始序列到序列模型的编码层,得到所述当前句子的句子嵌入向量;
    根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子;
    其中,所述两个解码层是指用于预测上文的第一解码层,以及用于预测下文的第二解码层。
  12. 根据权利要求11所述的装置,其中,所述用于预测上文的第一解码层为第一GRU模型,所述用于预测下文的第二解码层为第二GRU模型,所述根据所述当前句子的句子嵌入向量,利用所述初始序列到序列模型中并行设置的两个解码层,分别得到上文预测句子和下文预测句子的步骤,具体包括:
    将所述当前句子的句子嵌入向量分别作为第一GRU模型中重置门、更新门和候选记忆单元的输入数据,通过解码处理得到当前句子的上文预测句子;
    将所述当前句子的句子嵌入向量作为第二GRU模型的输入数据,通过解码处理得到当前句子的下文预测句子。
  13. 根据权利要求9或11所述的装置,其中,所述目标损失函数中的第一损失函数是基于用于预测上文的第一解码层设定的,所述目标损失函数中的第二损失函数是基于用于预测下文的第二解码层设定的。
  14. 根据权利要求8所述的装置,其中,还包括图书推荐模块,具体包括:
    相似度计算单元,用于计算所述句子文本的向量表示与预设图书样本库中的句子嵌入向量之间的相似度值;
    生成单元,用于根据所述预设图书样本库中相似度值满足预设条件的句子嵌入向量, 生成所述句子文本的图书推荐信息;
    其中,所述预设图书样本库中的句子嵌入向量是利用所述句子向量生成模型输出得到的。
  15. 一种计算机设备,包括存储介质、处理器及存储在存储介质上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现句子向量生成方法,包括:
    对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
    利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
    其中,所述训练好的序列到序列模型通过下述步骤得到:
    利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
    根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
  16. 根据权利要求15所述的计算机设备,其中,所述根据上文预测句子和下文预测句子,得到训练好的序列到序列模型的步骤,具体包括:
    根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
    其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
  17. 根据权利要求15或16所述的计算机设备,其中,所述上下文句子对序列具体包括:
    用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
    以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
  18. 一种存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现句子向量生成方法,包括:
    对获取到的初始句子文本进行语义分割,得到分割后的句子文本;
    利用预先构建的句子向量生成模型,通过用于预测所述句子文本上下文的编码处理,得到所述句子文本的向量表示,所述句子向量生成模型为训练好的序列到序列模型的编码层;
    其中,所述训练好的序列到序列模型通过下述步骤得到:
    利用初始序列到序列模型,对构建的句子样本集中的上下文句子对序列中的当前句子进行编码处理和上下文解码处理,得到所述当前句子的上文预测句子和下文预测句子;
    根据上文预测句子和下文预测句子,得到训练好的序列到序列模型。
  19. 根据权利要求18所述的计算机设备,其中,所述根据上文预测句子和下文预测句子,得到训练好的序列到序列模型的步骤,具体包括:
    根据所述当前句子的上文预测句子和下文预测句子,利用目标损失函数对所述初始序列到序列模型进行训练,得到训练好的序列到序列模型;
    其中,所述目标损失函数是根据第一损失函数与第二损失函数之和确定的。
  20. 根据权利要求18或19所述的计算机设备,其中,所述上下文句子对序列具体包括:
    用于输入至所述初始序列到序列模型的编码层进行上下文句子预测的当前句子;
    以及,用于训练所述初始序列到序列模型输出结果的上文目标句子和下文目标句子,所述输出结果为模型训练过程中输出的上文预测句子和下文预测句子。
PCT/CN2022/089817 2022-03-09 2022-04-28 句子向量生成方法、装置、计算机设备及存储介质 WO2023168814A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210232057.9 2022-03-09
CN202210232057.9A CN114444471A (zh) 2022-03-09 2022-03-09 句子向量生成方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023168814A1 true WO2023168814A1 (zh) 2023-09-14

Family

ID=81359057

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089817 WO2023168814A1 (zh) 2022-03-09 2022-04-28 句子向量生成方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN114444471A (zh)
WO (1) WO2023168814A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178082A (zh) * 2019-12-05 2020-05-19 北京葡萄智学科技有限公司 一种句向量生成方法、装置及电子设备
US20200218780A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Automated contextual dialog generation for cognitive conversation
WO2020151688A1 (zh) * 2019-01-24 2020-07-30 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN111602128A (zh) * 2017-10-27 2020-08-28 巴比伦合伙有限公司 计算机实现的确定方法和***
CN112052329A (zh) * 2020-09-02 2020-12-08 平安科技(深圳)有限公司 文本摘要生成方法、装置、计算机设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111602128A (zh) * 2017-10-27 2020-08-28 巴比伦合伙有限公司 计算机实现的确定方法和***
US20200218780A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Automated contextual dialog generation for cognitive conversation
WO2020151688A1 (zh) * 2019-01-24 2020-07-30 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN111178082A (zh) * 2019-12-05 2020-05-19 北京葡萄智学科技有限公司 一种句向量生成方法、装置及电子设备
CN112052329A (zh) * 2020-09-02 2020-12-08 平安科技(深圳)有限公司 文本摘要生成方法、装置、计算机设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RYAN KIROS, YUKUN ZHU, RUSLAN SALAKHUTDINOV, RICHARD S ZEMEL, ANTONIO TORRALBA, RAQUEL URTASUN, SANJA FIDLER: "Skip-Thought Vectors", 22 June 2015 (2015-06-22), XP055428189, Retrieved from the Internet <URL:https://arxiv.org/pdf/1506.06726.pdf> *

Also Published As

Publication number Publication date
CN114444471A (zh) 2022-05-06

Similar Documents

Publication Publication Date Title
CN111444340B (zh) 文本分类方法、装置、设备及存储介质
WO2022007823A1 (zh) 一种文本数据处理方法及装置
CN111967266A (zh) 中文命名实体识别模型及其构建方法和应用
WO2022022421A1 (zh) 语言表示模型***、预训练方法、装置、设备及介质
CN110163181B (zh) 手语识别方法及装置
CN112131366A (zh) 训练文本分类模型及文本分类的方法、装置及存储介质
WO2020244475A1 (zh) 用于语言序列标注的方法、装置、存储介质及计算设备
CN111159485B (zh) 尾实体链接方法、装置、服务器及存储介质
CN113051356B (zh) 开放关系抽取方法、装置、电子设备及存储介质
CN113553412B (zh) 问答处理方法、装置、电子设备和存储介质
US11487971B2 (en) Multi-dimensional language style transfer
CN110188158B (zh) 关键词及话题标签生成方法、装置、介质及电子设备
WO2022228127A1 (zh) 要素文本处理方法、装置、电子设备和存储介质
CN111145914B (zh) 一种确定肺癌临床病种库文本实体的方法及装置
JP2023002690A (ja) セマンティックス認識方法、装置、電子機器及び記憶媒体
CN116541492A (zh) 一种数据处理方法及相关设备
CN116050425A (zh) 建立预训练语言模型的方法、文本预测方法及装置
CN115408488A (zh) 用于小说场景文本的分割方法及***
CN115114407A (zh) 意图识别方法、装置、计算机设备及存储介质
WO2023116572A1 (zh) 一种词句生成方法及相关设备
WO2023137903A1 (zh) 基于粗糙语义的回复语句确定方法、装置及电子设备
WO2023168814A1 (zh) 句子向量生成方法、装置、计算机设备及存储介质
US10706086B1 (en) Collaborative-filtering based user simulation for dialog systems
CN116881446A (zh) 一种语义分类方法、装置、设备及其存储介质
CN116432646A (zh) 预训练语言模型的训练方法、实体信息识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22930443

Country of ref document: EP

Kind code of ref document: A1