CN112560499A - Pre-training method and device of semantic representation model, electronic equipment and storage medium - Google Patents

Pre-training method and device of semantic representation model, electronic equipment and storage medium Download PDF

Info

Publication number
CN112560499A
CN112560499A CN202011463938.9A CN202011463938A CN112560499A CN 112560499 A CN112560499 A CN 112560499A CN 202011463938 A CN202011463938 A CN 202011463938A CN 112560499 A CN112560499 A CN 112560499A
Authority
CN
China
Prior art keywords
fragment
semantic
sequence
representation model
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011463938.9A
Other languages
Chinese (zh)
Other versions
CN112560499B (en
Inventor
丁思宇
王硕寰
尚骏远
孙宇
�田�浩
吴华
王海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011463938.9A priority Critical patent/CN112560499B/en
Publication of CN112560499A publication Critical patent/CN112560499A/en
Application granted granted Critical
Publication of CN112560499B publication Critical patent/CN112560499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a pre-training method and device of a semantic representation model, electronic equipment and a storage medium, and relates to the technical fields of artificial intelligence, such as the technical field of deep learning and the technical field of Natural Language Processing (NLP). The specific implementation scheme is as follows: obtaining a disorder fragment sequence of a sample text and an original sequencing sequence of N fragments in the disorder fragment sequence in the sample text; aiming at the ith fragment in the disorder fragment sequence, inputting a semantic expression model into the semantic fusion vector of the (i-1) th fragment and the ith fragment in the disorder fragment sequence to obtain the semantic fusion vector of the ith fragment; inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sorting sequence of the N fragments in the sample text; according to the original sequencing sequence and the predicted sequencing sequence, the semantic representation model and the prediction model are pre-trained, so that the whole sample text can be processed, the global information of the sample text can be learned, and the processing efficiency of the semantic representation model can be improved.

Description

Pre-training method and device of semantic representation model, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to the technical fields of artificial intelligence such as the technical field of deep learning and the technical field of Natural Language Processing (NLP), and particularly relates to a pre-training method and device of a semantic representation model, electronic equipment and a storage medium.
Background
The pre-training method of the current semantic representation model mainly comprises the steps of extracting N natural sentences from chapters to construct a text with the total length less than 512 words, dividing the text into a plurality of fragments, randomly disordering the fragments, inputting the fragments into an ERNIE2.0 model, and adjusting model coefficients based on the sequence prediction results of the fragments to realize pre-training.
In the method, the input is limited to the text with the total length less than 512, so that the model obtained by pre-training can only learn the local information of the chapters, and the processing efficiency is poor.
Disclosure of Invention
The disclosure provides a pre-training method, a device, equipment and a storage medium of a semantic representation model.
According to an aspect of the present disclosure, there is provided a pre-training method of a semantic representation model, including: obtaining a disorder fragment sequence of a sample text and an original sequencing sequence of N fragments in the disorder fragment sequence in the sample text, wherein N is a positive integer; inputting a semantic representation model for the ith fragment and the semantic fusion vector of the (i-1) th fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until obtaining the semantic fusion vector of the Nth fragment; inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction ordering order of the N fragments in the sample text; and pre-training the semantic representation model and the prediction model according to the original sorting sequence and the prediction sorting sequence.
According to another aspect of the present disclosure, there is provided a pre-training apparatus for a semantic representation model, including: the device comprises an acquisition module, a judgment module and a display module, wherein the acquisition module is used for acquiring a disordered fragment sequence of a sample text and an original sequencing sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer; a first input module, configured to input, for an ith segment in the disordered segment sequence, a semantic fusion vector of an i-1 th segment in the disordered segment sequence and the ith segment into a semantic representation model to obtain a semantic fusion vector of the ith segment, where i is a positive integer less than or equal to N, and repeat this step until a semantic fusion vector of an nth segment is obtained; a second input module, configured to input the semantic fusion vector of the nth segment into a prediction model, so as to generate a prediction sorting order of the N segments in the sample text; and the pre-training module is used for pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the pre-training method of the semantic representation model as described above.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of pre-training a semantic representation model as described above.
According to a fifth aspect, there is provided a computer program product, which when executed by an instruction processor implements the method of pre-training a semantic representation model as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a semantic representation model + a prediction model;
FIG. 3 is a schematic diagram according to a second embodiment of the present application;
FIG. 4 is a schematic illustration according to a third embodiment of the present application;
FIG. 5 is a block diagram of an electronic device for implementing a pre-training method of a semantic representation model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a pre-training method, an apparatus, an electronic device, and a storage medium of a semantic representation model according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a schematic diagram according to a first embodiment of the present application. It should be noted that the execution subject of the embodiment of the present application is a pre-training device of the semantic representation model, and the pre-training device of the semantic representation model may specifically be a hardware device, or software in the hardware device, or the like.
As shown in fig. 1, the pre-training method of the semantic representation model is implemented as follows:
step 101, obtaining a disorder fragment sequence of a sample text and an original sorting sequence of N fragments in the disorder fragment sequence in the sample text, wherein N is a positive integer.
In the embodiment of the present application, the sample text may be any text obtained by any method. The obtaining mode of the disordered fragment sequence of the sample text can be manual obtaining or automatic obtaining. In order to improve the acquisition efficiency of the disordered fragment sequence of the sample text and reduce the acquisition cost, the process of the pre-training device of the semantic representation model executing step 101 may be, for example, acquiring the sample text; carrying out fragment division on the sample text to obtain N fragments; carrying out disorder processing on the N fragments to generate a disorder fragment sequence of the sample text; and acquiring the original sequencing sequence of the N fragments in the disordered fragment sequence in the sample text.
In the embodiment of the present application, the division of the segments may be, for example, division by sentences, division by paragraphs, and the like. In order to reduce the number of the divided fragments and ensure the semantic integrity of the divided fragments, the fragments may include: at least one sentence. Correspondingly, the process of segmenting the sample text by the pre-training device of the semantic representation model to obtain N segments may be, for example, segmenting the sample text according to the sentence end characters in the sample text and the number of the limit characters of the semantic representation model to obtain N segments.
In the embodiment of the present application, under the condition that the number of characters in the sample text is large, for example, when the number of characters in the sample text is larger than a preset multiple of the number of limit characters, in order to ensure that the number of characters of the divided segment is smaller than or equal to the number of limit characters, a segment of text in the sample text may be first intercepted according to the number of limit characters, and whether the end of the segment of text is an end symbol is determined, if so, the segment of text is determined as a segment; if not, acquiring the last terminator in the text segment, and combining all texts before the last terminator in the text segment into a segment. Then, the next section of text in the sample text is intercepted from the ending character according to the limited character number, and the processing is repeated until the characters in the sample text are processed.
In the embodiment of the application, under the condition that the number of characters in the sample text is small, for example, when the number of characters is less than or equal to a preset multiple of the number of limiting characters, the sample text may be directly segmented according to the sentence end characters in the sample text, so as to obtain N segments.
In this embodiment of the present application, the original sorting order of the N segments in the out-of-order segment sequence in the sample text may refer to sequence numbers of the N segments in the sample text. For example, if the sample text includes 4 segments, each A, B, C, D, the out-of-order segment sequence includes: B. c, A, D, the original sorting order of the N fragments in the scrambled fragment sequence in the sample text may be 1, 2, 0, 3.
102, aiming at the ith segment in the disorder segment sequence, inputting a semantic fusion vector of the (i-1) th segment and the ith segment in the disorder segment sequence into a semantic representation model to obtain the semantic fusion vector of the ith segment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth segment is obtained.
In the embodiment of the application, the semantic fusion vector of the (i-1) th segment refers to a vector obtained by fusing the semantic vector of the (i-1) th segment in the disordered segment sequence and the semantic vectors of all segments before the (i-1) th segment in the disordered segment sequence. For example, taking i as 4 as an example, the semantic fusion vector of the 3 rd fragment is a vector obtained by fusing the semantic vector of the 1 st fragment, the semantic vector of the 2 nd fragment and the semantic vector of the 3 rd fragment in the out-of-order fragment sequence.
In the embodiment of the present application, the manner for the semantic representation model to obtain the semantic fusion vector of the ith segment may be, for example, to obtain the semantic vector of the ith segment; and performing fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the (i-1) th fragment to obtain the semantic fusion vector of the ith fragment, so that the semantic fusion vector obtained by fusing the semantic vectors of the fragments can be obtained, and the global information of the text sample is extracted. The semantic representation model can be, for example, a semantic representation model based on a Transformer-XL architecture. The semantic representation model based on the Transformer-XL framework is used for sequentially processing the first N-1 fragments in the disordered fragment sequence when a task is reordered, acquiring the semantic fusion vector of the Nth fragment when the last fragment in the disordered fragment sequence is input into the semantic representation model, and predicting the ordering sequence of the N fragments in a sample text.
And 103, inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sequencing sequence of the N fragments in the sample text.
In this embodiment of the application, the prediction model may be, for example, a classification model, where the classification model obtains all the sorting orders of the N segments, predicts the probability of each sorting order, and determines the sorting order with the highest probability as the predicted sorting order of the N segments in the sample text. Wherein, the schematic diagram of the semantic representation model + the prediction model can be as shown in fig. 2.
And 104, pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence.
In the embodiment of the application, the pre-training device of the semantic representation model can calculate the loss function value according to the original sorting sequence, the predicted sorting sequence and the preset loss function; simultaneously adjusting parameters of the semantic representation model and the prediction model according to the loss function value; and adjusting parameters of the semantic representation model and the prediction model for multiple times by adopting a plurality of sample texts to finish the pre-training of the semantic representation model.
In summary, a disorder fragment sequence of a sample text and an original sorting order of N fragments in the disorder fragment sequence in the sample text are obtained, where N is a positive integer; inputting a semantic representation model for the ith fragment and the semantic fusion vector of the (i-1) th fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth fragment is obtained; inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sorting sequence of the N fragments in the sample text; and pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence, so that the whole sample text can be processed, partial sentences in the sample text are prevented from being selected for processing, the global information of the sample text can be learned, and the processing efficiency of the semantic representation model is improved.
Fig. 3 is a schematic diagram according to a second embodiment of the present application. It should be noted that the execution subject of the embodiment of the present application is a pre-training device of the semantic representation model, and the pre-training device of the semantic representation model may specifically be a hardware device, or software in the hardware device, or the like.
As shown in fig. 3, the pre-training method of the semantic representation model is implemented as follows:
step 301, obtaining a disorder fragment sequence of a sample text and an original sorting order of N fragments in the disorder fragment sequence in the sample text, where N is a positive integer.
Step 302, judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model; if yes, go to step 304, and if not, go to step 303.
Step 303, when the number of the characters does not exceed the limit number of the semantic representation model, inputting the disordered fragment sequence into the semantic representation model, and obtaining the semantic fusion vector of the Nth fragment in the disordered fragment sequence.
In the embodiment of the application, when the number of characters of the sample text does not exceed the limited number of the semantic representation model, the disordered fragment sequence can be directly input into the semantic representation model, so that the semantic representation model can process a plurality of fragments in the disordered fragment sequence in parallel, a semantic fusion vector of an Nth fragment in the disordered fragment sequence is obtained, and the processing efficiency of the semantic representation model on the fragments is improved.
Step 304, when the number of characters exceeds the limit number of the semantic representation model, judging whether the ith segment in the disordered segment sequence is the first segment in the disordered segment sequence, if so, executing step 305, and if not, executing step 306.
Step 305, inputting the ith segment into the semantic representation model, obtaining the semantic vector of the ith segment, and determining the semantic vector of the ith segment as the semantic fusion vector of the ith segment.
In the embodiment of the application, if the ith segment is the first segment in the disordered segment sequence, and other segments do not exist before the first segment, the semantic vector of the ith segment can be directly determined as the semantic fusion vector of the ith segment, so that the vector fusion processing process is reduced, and the calculation amount is reduced.
Step 306, inputting the semantic fusion vector of the (i-1) th segment and the ith segment in the disorder segment sequence into a semantic representation model to obtain the semantic fusion vector of the ith segment, wherein i is a positive integer less than or equal to N; and then jumps to step 304.
And 307, repeatedly executing the steps 304 to 306 until a semantic fusion vector of the Nth fragment is obtained.
And 308, inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sorting sequence of the N fragments in the sample text.
Step 309, pre-training the semantic representation model and the prediction model according to the original sorting order and the prediction sorting order.
In the embodiment of the present application, detailed descriptions of step 301, step 308, and step 309 may refer to the embodiment shown in fig. 1, and are not described in detail here.
In summary, a disorder fragment sequence of a sample text and an original sorting order of N fragments in the disorder fragment sequence in the sample text are obtained, where N is a positive integer; judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model; when the number of the characters does not exceed the limit number of the semantic representation model, inputting the disordered fragment sequence into the semantic representation model to obtain a semantic fusion vector of an Nth fragment in the disordered fragment sequence; when the number of characters exceeds the limit number of the semantic representation model, aiming at the ith fragment in the disorder fragment sequence, and when the ith fragment is the first fragment, taking the semantic vector of the ith fragment as a semantic fusion vector; when the ith fragment is a non-first fragment, inputting the semantic fusion vector of the (i-1) th fragment and the ith fragment in the disordered fragment sequence into a semantic representation model to obtain the semantic fusion vector of the ith fragment; inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sorting sequence of the N fragments in the sample text; according to the original sequencing sequence and the prediction sequencing sequence, the semantic representation model and the prediction model are pre-trained, so that the whole sample text can be processed, partial sentences in the sample text are prevented from being selected for processing, global information of the sample text can be learned, and the processing efficiency of the semantic representation model is improved.
In order to implement the above embodiments, the embodiments of the present application further provide a pre-training device for a semantic representation model.
Fig. 4 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 4, the pre-training apparatus 400 for semantic representation model includes: an acquisition module 410, a first input module 420, a second input module 430, and a pre-training module 440.
The obtaining module 410 is configured to obtain a disordered fragment sequence of a sample text and an original sorting order of N fragments in the disordered fragment sequence in the sample text, where N is a positive integer;
a first input module 420, configured to input, for an ith segment in the out-of-order segment sequence, a semantic fusion vector of an i-1 th segment in the out-of-order segment sequence and the ith segment, into a semantic representation model to obtain a semantic fusion vector of the ith segment, where i is a positive integer less than or equal to N, and repeat this step until a semantic fusion vector of an nth segment is obtained;
a second input module 430, configured to input the semantic fusion vector of the nth segment into a prediction model to generate a prediction ordering order of the N segments in the sample text;
a pre-training module 440, configured to pre-train the semantic representation model and the prediction model according to the original sorting order and the prediction sorting order.
As a possible implementation manner of the embodiment of the present application, the obtaining module 410 is specifically configured to obtain the sample text; carrying out fragment division on the sample text to obtain N fragments; carrying out disorder processing on the N fragments to generate a disorder fragment sequence of the sample text; and acquiring the original sequencing sequence of the N fragments in the disordered fragment sequence in the sample text.
As a possible implementation manner of the embodiment of the present application, the segment includes: at least one sentence; the obtaining module 410 is specifically configured to segment the sample text according to the sentence end characters in the sample text and the number of the limit characters of the semantic representation model, so as to obtain N segments.
As a possible implementation manner of the embodiment of the present application, the apparatus further includes: the first judgment module and the third input module; the first judging module is used for judging whether the number of characters of the sample text exceeds the limited number of the semantic representation model; and the third input module is used for inputting the disordered fragment sequence into the semantic representation model when the number of the characters does not exceed the limit number of the semantic representation model, and acquiring the semantic fusion vector of the Nth fragment in the disordered fragment sequence.
As a possible implementation manner of the embodiment of the present application, the apparatus further includes: a second judgment module; the second judging module is configured to judge whether the ith fragment is a first fragment in the out-of-order fragment sequence; the first input module is further configured to input the ith segment into a semantic representation model when the ith segment is a first segment in the disordered segment sequence, and obtain a semantic vector of the ith segment; and determining the semantic vector of the ith segment as a semantic fusion vector of the ith segment.
As a possible implementation manner of the embodiment of the present application, the semantic fusion vector of the ith segment is obtained by the semantic representation model in a manner of obtaining the semantic vector of the ith segment; and performing fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the (i-1) th fragment to obtain the semantic fusion vector of the ith fragment.
In summary, a disorder fragment sequence of a sample text and an original sorting order of N fragments in the disorder fragment sequence in the sample text are obtained, where N is a positive integer; inputting a semantic representation model for the ith fragment and the semantic fusion vector of the (i-1) th fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth fragment is obtained; inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction sorting sequence of the N fragments in the sample text; and pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence, so that the whole sample text can be processed, partial sentences in the sample text are prevented from being selected for processing, the global information of the sample text can be learned, and the processing efficiency of the semantic representation model is improved.
There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.
Fig. 5 is a block diagram of an electronic device for pre-training a semantic representation model according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the pre-training method of semantic representation models provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the pre-training method of semantic representation models provided herein.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the pre-training method of the semantic representation model in the embodiments of the present application (e.g., the obtaining module 410, the first input module 420, the second input module 430, and the pre-training module 440 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e. implementing the pre-training method of the semantic representation model in the above-described method embodiments, by running non-transitory software programs, instructions and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of a pre-trained electronic device of the semantic representation model, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory remotely located from processor 501, which may be connected to a pre-trained electronic device of the semantic representation model over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the pre-training method of the semantic representation model may further comprise: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the pre-trained electronic device of the semantic representation model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (15)

1. A method of pre-training a semantic representation model, comprising:
obtaining a disorder fragment sequence of a sample text and an original sequencing sequence of N fragments in the disorder fragment sequence in the sample text, wherein N is a positive integer;
inputting a semantic representation model for the ith fragment and the semantic fusion vector of the (i-1) th fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until obtaining the semantic fusion vector of the Nth fragment;
inputting the semantic fusion vector of the Nth fragment into a prediction model to generate a prediction ordering order of the N fragments in the sample text; and
and pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence.
2. The pre-training method of the semantic representation model according to claim 1, wherein the obtaining of the disordered fragment sequence of the sample text and the original ordering order of the N fragments in the disordered fragment sequence in the sample text comprises:
acquiring the sample text;
carrying out fragment division on the sample text to obtain N fragments;
carrying out disorder processing on the N fragments to generate a disorder fragment sequence of the sample text; and
and acquiring the original sequencing sequence of the N fragments in the disordered fragment sequence in the sample text.
3. The method of pre-training of a semantic representation model according to claim 2, wherein the segments comprise: at least one sentence;
the fragmenting the sample text to obtain N fragments includes:
and carrying out fragment division on the sample text according to the sentence end characters in the sample text and the number of the limit characters of the semantic representation model to obtain N fragments.
4. The pre-training method of semantic representation model according to claim 1, wherein before inputting the semantic fusion vector of the i-1 th segment in the out-of-order segment sequence and the ith segment into the semantic representation model to obtain the semantic fusion vector of the ith segment, the method further comprises:
judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model;
and when the number of the characters does not exceed the limit number of the semantic representation model, inputting the disordered fragment sequence into the semantic representation model to obtain a semantic fusion vector of the Nth fragment in the disordered fragment sequence.
5. The pre-training method of semantic representation model according to claim 1, wherein before inputting the semantic fusion vector of the i-1 th segment in the out-of-order segment sequence and the ith segment into the semantic representation model to obtain the semantic fusion vector of the ith segment, the method further comprises:
judging whether the ith fragment is the first fragment in the disordered fragment sequence;
when the ith fragment is the first fragment in the disordered fragment sequence, inputting the ith fragment into a semantic representation model to obtain a semantic vector of the ith fragment;
and determining the semantic vector of the ith segment as a semantic fusion vector of the ith segment.
6. The pre-training method of the semantic representation model according to claim 1, wherein the semantic representation model obtains the semantic fusion vector of the ith segment in a manner,
obtaining a semantic vector of the ith fragment;
and performing fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the (i-1) th fragment to obtain the semantic fusion vector of the ith fragment.
7. An apparatus for pre-training a semantic representation model, comprising:
the device comprises an acquisition module, a judgment module and a display module, wherein the acquisition module is used for acquiring a disordered fragment sequence of a sample text and an original sequencing sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer;
a first input module, configured to input, for an ith segment in the disordered segment sequence, a semantic fusion vector of an i-1 th segment in the disordered segment sequence and the ith segment into a semantic representation model to obtain a semantic fusion vector of the ith segment, where i is a positive integer less than or equal to N, and repeat this step until a semantic fusion vector of an nth segment is obtained;
a second input module, configured to input the semantic fusion vector of the nth segment into a prediction model, so as to generate a prediction sorting order of the N segments in the sample text;
and the pre-training module is used for pre-training the semantic representation model and the prediction model according to the original sequencing sequence and the prediction sequencing sequence.
8. The pre-training apparatus for a semantic representation model according to claim 7, wherein the obtaining module is specifically configured to,
acquiring the sample text;
carrying out fragment division on the sample text to obtain N fragments;
carrying out disorder processing on the N fragments to generate a disorder fragment sequence of the sample text; and
and acquiring the original sequencing sequence of the N fragments in the disordered fragment sequence in the sample text.
9. The pre-training apparatus of the semantic representation model according to claim 8, wherein the segments comprise: at least one sentence;
the obtaining module is specifically configured to obtain,
and carrying out fragment division on the sample text according to the sentence end characters in the sample text and the number of the limit characters of the semantic representation model to obtain N fragments.
10. The apparatus for pre-training of a semantic representation model according to claim 7, wherein the apparatus further comprises: the first judgment module and the third input module;
the first judging module is used for judging whether the number of characters of the sample text exceeds the limited number of the semantic representation model;
and the third input module is used for inputting the disordered fragment sequence into the semantic representation model when the number of the characters does not exceed the limit number of the semantic representation model, and acquiring the semantic fusion vector of the Nth fragment in the disordered fragment sequence.
11. The apparatus for pre-training of a semantic representation model according to claim 7, wherein the apparatus further comprises: a second judgment module;
the second judging module is configured to judge whether the ith fragment is a first fragment in the out-of-order fragment sequence;
the first input module is further configured to input the ith segment into a semantic representation model when the ith segment is a first segment in the disordered segment sequence, and obtain a semantic vector of the ith segment; and determining the semantic vector of the ith segment as a semantic fusion vector of the ith segment.
12. The pre-training apparatus of semantic representation model according to claim 7, wherein the semantic representation model obtains the semantic fusion vector of the ith segment in a manner,
obtaining a semantic vector of the ith fragment;
and performing fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the (i-1) th fragment to obtain the semantic fusion vector of the ith fragment.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product implementing the method of any one of claims 1-6 when executed by an instruction processor in the computer program product.
CN202011463938.9A 2020-12-11 2020-12-11 Pre-training method and device for semantic representation model, electronic equipment and storage medium Active CN112560499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011463938.9A CN112560499B (en) 2020-12-11 2020-12-11 Pre-training method and device for semantic representation model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011463938.9A CN112560499B (en) 2020-12-11 2020-12-11 Pre-training method and device for semantic representation model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112560499A true CN112560499A (en) 2021-03-26
CN112560499B CN112560499B (en) 2024-01-09

Family

ID=75063132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011463938.9A Active CN112560499B (en) 2020-12-11 2020-12-11 Pre-training method and device for semantic representation model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112560499B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361712A (en) * 2021-06-30 2021-09-07 北京百度网讯科技有限公司 Training method of feature determination model, semantic analysis method and device and electronic equipment
CN113807102A (en) * 2021-08-20 2021-12-17 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for establishing semantic representation model
CN113903329A (en) * 2021-09-08 2022-01-07 北京百度网讯科技有限公司 Voice processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035917A (en) * 2014-06-10 2014-09-10 复旦大学 Knowledge graph management method and system based on semantic space mapping
CN109299272A (en) * 2018-10-31 2019-02-01 北京国信云服科技有限公司 A kind of large information capacity document representation method for neural network input
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
US20200356875A1 (en) * 2017-12-11 2020-11-12 Beijing Sankuai Online Technology Co., Ltd Model training
CN111950291A (en) * 2020-06-22 2020-11-17 北京百度网讯科技有限公司 Semantic representation model generation method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035917A (en) * 2014-06-10 2014-09-10 复旦大学 Knowledge graph management method and system based on semantic space mapping
US20200356875A1 (en) * 2017-12-11 2020-11-12 Beijing Sankuai Online Technology Co., Ltd Model training
CN109299272A (en) * 2018-10-31 2019-02-01 北京国信云服科技有限公司 A kind of large information capacity document representation method for neural network input
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111950291A (en) * 2020-06-22 2020-11-17 北京百度网讯科技有限公司 Semantic representation model generation method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李霞等: "基于局部和全局语义融合的跨语言句子语义相似度计算模型", 中文信息学报, vol. 33, no. 6, pages 18 - 26 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361712A (en) * 2021-06-30 2021-09-07 北京百度网讯科技有限公司 Training method of feature determination model, semantic analysis method and device and electronic equipment
CN113361712B (en) * 2021-06-30 2023-07-21 北京百度网讯科技有限公司 Training method of feature determination model, semantic analysis method, semantic analysis device and electronic equipment
CN113807102A (en) * 2021-08-20 2021-12-17 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for establishing semantic representation model
CN113903329A (en) * 2021-09-08 2022-01-07 北京百度网讯科技有限公司 Voice processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112560499B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
CN111079442B (en) Vectorization representation method and device of document and computer equipment
KR102645185B1 (en) Method, apparatus, electronic device, program and readable storage medium for creating a label marking model
CN111078865B (en) Text title generation method and device
CN112560499B (en) Pre-training method and device for semantic representation model, electronic equipment and storage medium
CN112036509A (en) Method and apparatus for training image recognition models
CN111079945B (en) End-to-end model training method and device
CN111753914A (en) Model optimization method and device, electronic equipment and storage medium
CN112507735B (en) Training method and device of machine translation model and electronic equipment
CN111144115A (en) Pre-training language model obtaining method and device, electronic equipment and storage medium
CN110797005B (en) Prosody prediction method, apparatus, device, and medium
CN111950291A (en) Semantic representation model generation method and device, electronic equipment and storage medium
CN111709252B (en) Model improvement method and device based on pre-trained semantic model
EP3896595A1 (en) Text key information extracting method, apparatus, electronic device, storage medium, and computer program product
CN111127191B (en) Risk assessment method and risk assessment device
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN111241810A (en) Punctuation prediction method and device
CN111522944A (en) Method, apparatus, device and storage medium for outputting information
CN113723278A (en) Training method and device of form information extraction model
CN111539224A (en) Pruning method and device of semantic understanding model, electronic equipment and storage medium
CN111241234A (en) Text classification method and device
CN111967591B (en) Automatic pruning method and device for neural network and electronic equipment
CN112232089B (en) Pre-training method, device and storage medium of semantic representation model
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111325000B (en) Language generation method and device and electronic equipment
CN112397050A (en) Rhythm prediction method, training device, electronic device, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant