CN115309879A

CN115309879A - Multi-task semantic parsing model based on BART

Info

Publication number: CN115309879A
Application number: CN202210936486.4A
Authority: CN
Inventors: 张卫山; 王振琦; 侯召祥; 孙晨瑜; 陈涛
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-11-08

Abstract

The invention provides a multi-task semantic parsing model based on BART, which belongs to the technical field of natural language processing and comprises a word embedding layer, a BART coding layer, a domain classifier, a BART decoding layer, a probability Decoder, a SPARQL Decoder and a grammar checker. The invention directly converts natural language into knowledge map query language SPARQL, simplifies the question answering steps to play a role in reducing error accumulation, performs field identification on the problems, and queries a corresponding professional field knowledge base according to the field, thereby improving the question answering accuracy.

Description

Multi-task semantic parsing model based on BART

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a multi-task semantic parsing model based on BART.

Background

A semantic analysis algorithm in the traditional knowledge graph question answering adopts a multi-step assembly line mode, firstly, the intention of a user and the type of a problem are identified, then entity relation extraction is carried out, finally, slot filling is carried out according to a predefined query template, a complete SPARQL query statement retrieval database is formed, and the user answer is returned. The task is divided into different stages by the multi-step pipeline semantic analysis mode, the task is decomposed, the operation is flexible, the interpretability is strong, but the cascade decomposition of the task can cause wrong accumulation and amplification; when designing various entities and relations or reasoning questions and answers, the model is poor in performance, and therefore the performance of the question-answering task is affected.

Disclosure of Invention

In view of the above problems, the present invention provides a BART-based multitask semantic parsing model, which includes a word embedding layer, a BART coding layer, a domain classifier, a BART decoding layer, a probability decoder, a SPARQL decoder, and a syntax checker; the word embedding layer converts each character in the input question into vector representation and inputs the vector representation into a BART coding layer; the BART coding layer learns the deep semantic information of the character vectors and codes the deep semantic information, and the hidden layer vector representation of the last layer of coding is respectively input into a domain classifier and a BART decoding layer; the domain classifier performs text classification on the last layer of hidden layer vectors to obtain the domain of the problem; the BART decoding layer generates complete decoding information through a language model according to the problem coding information and the above decoding information; the probability decoder module performs semantic information enhancement decoding on the vector output by the last layer of the BART decoder and inputs decoding information into the SPARQL decoder module; the SPARQL decoder module circularly generates a SPARQL query statement according to the decoding strategy and the field information and inputs the statement into the syntax checker module; the syntax checker is used for checking syntax errors of the SPARQL query statement.

In one possible design, the word embedding layer converts each character in the input question into a vector representation, specifically:

input data is formed by splicing [ CLS ], question and [ SEP ] characters, the maximum length of the data is 512 characters, the length is truncated, and the length is filled by [ PAD ] characters;

add special characters in the dictionary table "? x ","? y "and special characters in the SPARQL syntax;

the vector representation of each input character is obtained by adding the word embedding and the position embedding, and the calculation formula is as follows:

E _embedding ＝E _word +E _position (1)

wherein E is _word Embedding vectors for words of characters, E _position For embedding vectors for positions, E _embedding Is a vector representation of the input character.

In one possible design, the BART uses a standard Transformer model, including a 6-layer Transformer encoder and a 6-layer Transformer decoder, and the BART coding layer comprehensively learns features in different subspaces through a bidirectional multi-head attention mechanism to capture deeper semantic information, specifically:

the multi-head attention sublayer of the transform encoder adopts a self-attention mechanism, and creates three vectors, namely Query, key and Value, for each word to calculate a self-attention score, wherein the calculation formula is as follows:

head_i＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V ) (3)

Multihead(Q，K，V)＝Concat(head ₁ ，head ₂ ，…，head _h )W ^O (4)

wherein Softmax is a logistic regression function, and the Attention function uses the Softmax function to calculate the self-Attention through different QWs _i ^Q ，KW _i ^K ，VW _i ^V Combining and calculating multiple groups of self-attention head _ i, and connecting the multiple groups of self-attention through a Concat function by the Multihead (Q, K, V) to combine the multiple groups of self-attention into multi-head attention;

performing layer normalization of a transform encoder to prevent the problem of covariance offset, and connecting through residual errors to prevent gradient disappearance, wherein a calculation formula is shown as follows;

SubLayer＝Layer_Normalization(x+(sublayer(x))) (5)

x is the double-headed attention operation result of the current Layer, sublayer (x) is the double-headed attention operation result of the next Layer, the two operation results are directly added, namely residual error linkage is carried out, and Layer Normalization is carried out through Layer _ Normalization;

the nonlinear fitting effect of the network is improved through a feedforward network layer and a nonlinear activation function ReLU;

and inputting the calculation result into a next layer of Transformer encoder, and executing 6 layers of Transformer encoders in total, wherein the weights cannot be shared among the layers.

In one possible design, the domain classifier takes the hidden layer vector representation of the last layer of the BART coding layer as the input of the layer, and outputs the domain to which the problem belongs through text classification.

In one possible design, the BART coding layer implements SPARQL left-to-right autoregressive text generation by a language model according to problem coding information and the above decoding information, specifically:

the multi-head attention sublayer of the transform decoder adopts a self-attention mechanism, three vectors, namely Query, key and Value, are created for the vector of each character to calculate a self-attention score, and the calculation formulas are shown in the above formula (2), formula (3) and formula (4);

using an upper triangular MASK matrix to shield the following information, so that each word can only focus on the above information, and the model is prevented from using words input in the future during training;

performing layer normalization of a transform decoder to prevent the covariance shift problem from occurring, and connecting through residual errors to prevent the gradient from disappearing, wherein the calculation formula is shown in formula (5);

according to the question coding information and the decoding information, the SPARQL autoregressive text generation from left to right is realized through a language model, and the language model is calculated as follows:

p(y ₁ ，y ₂ ，y ₃ ，...，y _n )＝p(y ₁ |E _o )p(y ₂ |E _o ，y ₁ )p(y ₃ |E _o ，y ₁ ，y ₂ )...p(y _n |E _o ，y ₁ ，...，y _n-1 ) (6)

the above formula is a Markov model calculation formula, E _o Representing the starting character, the formula calculates the probability of the occurrence of the following character, p (y), starting from the 1 st character _i |E _o ，y ₁ ，...，y _i-1 ) Is shown at E _o 、y _i To y _i-1 When present, the next character is y _i What is the probability of (c).

In one possible design, the probability decoder fuses the vectors output by the last layer of the BART coding layer with the classification label [ CLS ] vectors of the last layer of the BART coding layer, so as to realize semantic information enhanced decoding, and calculates to obtain the probability of each word in the word list.

In one possible design, the SPARQL decoder cyclically selects a word from the dictionary distribution as a result generated at each moment according to the decoding strategy and the domain information, and completes the SPARQL query statement generation.

In one possible design, the grammar checker corrects simple grammar errors for the model output to improve the accuracy of question answering.

The second aspect of the present invention also provides a multitask semantic parsing device applied in a knowledge-graph question-answering system, the device comprising at least one processor and at least one memory, the processor and the memory being coupled; a computer program having stored therein an analytical model according to the first aspect; the processor, when executing the computer program stored by the memory, causes the device to implement a multi-tasking semantic parsing function.

A third aspect of the present invention also provides a computer-readable storage medium having stored therein a program or instructions of the parsing model according to the first aspect, which when executed by a processor, causes a computer to implement a multitask semantic parsing function.

Has the advantages that: the invention provides a BART-based multitask semantic analysis model, which can directly convert natural language into knowledge map query language SPARQL, simplify question answering steps to play a role in reducing error accumulation, perform field identification on problems, and query a corresponding professional field knowledge base according to the field to which the questions belong, thereby improving the question answering accuracy. The BART model is based on an Encoder-Decoder model framework of a Transformer, text noise is increased through the means of word deletion, sentence arrangement transformation, document rotation, word filling and the like, input decoding with noise is mapped into an original text, a sequence-to-sequence Encoder is obtained through training, and better effects are achieved in generation tasks such as question answering, translation, abstract and the like.

Drawings

FIG. 1 is a diagram of a multi-task semantic parsing model architecture based on BART of the present invention.

FIG. 2 is a simplified structural diagram of a multitask semantic parsing device applied in a knowledge-graph question-answering system according to the present invention.

Detailed Description

The knowledge-graph question-answering system can generate a corresponding SPARQL (a database query language) query sentence according to a natural language question input by a user and query the query sentence in a knowledge base so as to obtain an answer. The question-answering system semantic analysis algorithm finally converts the user questions into knowledge map query sentences through different subtasks, can be used as text translation tasks in the NLP field, automatically infers the corresponding relation among multiple entities according to the user questions, and directly outputs SPARQL sentences corresponding to the user questions through a text generation technology, so that the question-answering steps are reduced. A BART model (abbreviation of Bidirectional and Auto-Regressive transducers, which is a sequence-to-sequence model) is based on an Encoder-Decoder model architecture of a transducer, text noise is increased through means of word deletion, sentence arrangement transformation, document rotation, word filling and the like, input with noise is decoded and mapped into original text, a sequence-to-sequence Encoder is obtained through training, and better effects are achieved in generating tasks such as question answering, translation, abstract and the like.

The invention can directly convert natural language into knowledge map query language SPARQL, simplify the question answering steps to reduce error accumulation, perform field recognition on the problems, and query a corresponding professional field knowledge base according to the field, thereby improving the question answering accuracy.

The invention is further illustrated by the following specific examples.

Example 1:

as shown in fig. 1, a Multi-task Semantic Parsing Model (MSP-BART) based on BART directly converts a natural language into a knowledge graph query language SPARQL, and the Multi-task Semantic Parsing Model based on BART includes seven parts, namely a word embedding layer, a BART coding layer, a domain classifier, a BART decoding layer, a probability decoder, a SPARQL decoder and a syntax checker; the word embedding layer obtains vector representation of each input character through word embedding and position embedding addition; the BART coding layer comprehensively learns the characteristics in different subspaces through a bidirectional multi-head attention mechanism and captures deeper semantic information; the domain classifier takes the hidden layer vector representation of the last layer of the BART coding layer as the input of the layer, and outputs the domain to which the problem belongs through text classification; the BART decoding layer realizes the autoregressive text generation from left to right of SPARQL through a language model according to the problem coding information and the above decoding information; the probability decoder fuses the vectors output by the last layer of the BART decoder with the classification label [ CLS ] vectors of the last layer of the BART coding layer to realize semantic information enhanced decoding; the SPARQL decoder circularly selects a word from dictionary distribution as a result generated at each moment according to a decoding strategy and field information to complete the generation of the SPARQL query statement; the grammar checker corrects simple grammar errors output by the model, and the accuracy rate of question answering is improved.

Wherein, the word embedding layer converts each character in the input question into vector representation and inputs the vector representation into a BART coding layer; the BART coding layer learns the deep semantic information of the character vector and codes the deep semantic information, and the hidden layer vector of the last layer of coding is respectively input into a domain classifier and a BART decoding layer; the domain classifier performs text classification on the last layer of hidden layer vectors to obtain the domain of the problem; the BART decoding layer generates complete decoding information through a language model according to the problem coding information and the above decoding information; the probability decoder module performs semantic information enhancement decoding on the vector output by the last layer of the BART decoder and inputs decoding information into the SPARQL decoder module; the SPARQL decoder module circularly generates a SPARQL query statement according to the decoding strategy and the field information, and inputs the statement into the syntax checker module; the syntax checker checks for syntax errors of the SPARQL query statement. The grammar checker corrects simple grammar errors output by the model to improve the accuracy rate of the question answering.

This example is described as "who is the wife of Yao Ming? "this problem is taken as an example, and with reference to fig. 1, a specific workflow of the BART-based multitask semantic parsing model of the present invention is described:

s1, the word embedding layer adds word embedding and position embedding to get the question "who is the wife of Yao Ming? "and inputting the vector representation of each input character into a BART coding layer;

s2, the BART coding layer comprehensively learns the problem "who wife of Yao Ming is? "the characteristics in different subspaces in the vector representation of each character in the system capture deeper semantic information, and the hidden layer vector representation of the last layer of coding is respectively input into a domain classifier and a BART decoding layer;

s3, the domain classifier represents the hidden layer vector of the last layer of the BART coding layer as the input of the layer, and outputs the domain to which the problem belongs through text classification, wherein the domain to which the problem belongs is the common sense of life;

s4, the BART decoding layer realizes the SPARQL from left to right autoregressive text generation through an attention machine mechanism and a Markov language model according to the problem coding information and the above decoding information, for example, the next character generated is 'less' through 'Selectx where { < Yao Ming >';

s5, fusing a vector output by the last layer of the BART decoding layer with a classification label [ CLS ] vector of the last layer of the BART coding layer by a probability decoder to realize semantic information enhanced decoding, calculating to obtain the probability of each word in a word list, and inputting decoding information into a SPARQL decoder module;

s6, the SPARQL decoder circularly selects a word from the dictionary distribution as a generated result at each moment according to the decoding strategy and the domain information, completes the generation of the SPARQL query statement, and inputs the statement into a grammar checker module, wherein the generated statement is' Selectx where { < Yao Ming > < wife >? x. } ";

and S7, correcting simple grammar errors output by the model by the grammar checker, and improving the accuracy of question answering.

In the step S1, the word embedding layer adds word embedding and position embedding to obtain a vector representation of each input character, and mainly includes:

s11, who is the wife who passed through [ CLS ], the question sentence "Yao Ming? Characters such as 'SEP' are spliced to form input data, the maximum length of the data is 512 characters, the data is truncated if the data is long, and the data is filled in if the data is short through PAD characters;

and S12, adding the word embedding and the position embedding to obtain the vector representation of each input character.

In step S2, the BART coding layer comprehensively learns the features in different subspaces in the vector representation of the character through a bidirectional multi-head attention mechanism, captures deeper semantic information, and mainly includes:

s21, the multi-head attention sublayer of the transform encoder adopts a self-attention mechanism, which is [ CLS ], the question "Yao Ming wife? ", [ SEP ] creates three vectors of Query, key and Value for each character, calculates self-Attention score through Attention (Q, K, V) formula, and calculates multi-head self-Attention through Multihead (Q, K, V) formula; the calculation formula is as follows:

head_i＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

Multihead(Q，K，V)＝Concat(head ₁ ，head ₂ ，...，head _h )W ^O

s22, performing Layer Normalization of a Transformer encoder through a Layer _ Normalization formula; the calculation formula is shown below;

SubLayer＝Layer_Normalization(x+(sublayer(x)))

s23, a feedforward network layer and a nonlinear activation function ReLU are used;

and S24, inputting the calculation result of the S23 into a next-layer Transformer encoder, and executing 6 layers of Transformer encoders in total.

In step S3, the domain classifier takes the hidden layer vector representation of the last layer of the BART coding layer as the input of the layer, and outputs the domain to which the question belongs through text classification, wherein the question belongs to the classification of 'life general knowledge'.

In step S4, the BART decoding layer generates an autoregressive text from left to right of SPARQL through a language model according to the problem coding information and the above decoding information, and mainly includes:

s41, calculating a self-Attention score through an Attention (Q, K, V) formula, and calculating multi-head self-Attention through a Multihead (Q, K, V) formula;

s42, shielding the following information by using an upper triangular MASK matrix;

s43, performing Layer Normalization of a Transformer decoder through a Layer _ Normalization formula;

s44, according to the question coding information and the above decoding information, the SPARQL autoregressive text generation from left to right is realized through a language model, for example, a Chinese style is realized through' Selectx where<Yao Ming>"the next character generated is known to be" < ". The language model calculation is as follows: p (y) ₁ ，y ₂ ，y ₃ ，...，y _n )＝p(y ₁ |E _o )p(y ₂ |E _o ，y ₁ )p(y ₃ |E _o ，y ₁ ，y ₂ )...p(y _n |E _o ，y ₁ ，...，y _n-1 )

In step S5, the probability decoding layer fuses the vector output from the last layer of the BART decoding layer with the classification tag [ CLS ] vector of the last layer of the BART coding layer to realize semantic information enhanced decoding, and calculates the probability of each character in the vocabulary, for example, when the classification tag [ CLS ] represents the vector of the symbol, the probability that the next character belongs to the characters of "{", "<" and the like is calculated at this time.

In step S6, the SPARQL decoder module circularly selects a word from the dictionary distribution as a result generated at each moment according to the decoding strategy to complete the generation of the SPARQL query sentence, wherein the generated sentence is a "select x where<Yao Ming><Wife (wife)>Is it a question of x. } "; in calculating the formula p (y) ₁ ，y ₂ ，y ₃ ，...，y _n ) When y is present _i The previously decoded character is "Selectx", then y is calculated at this time _i Is the probability of "where".

In step S7, the syntax checker corrects a simple syntax error output by the model.

Example 2:

as shown in fig. 2, the present invention also provides a multitask semantic parsing device applied in a knowledge-graph question-answering system, the device comprising at least one processor and at least one memory, the processor and the memory being coupled; the memory stores a computer program for parsing the model as described in example 1; the processor, when executing the computer program stored by the memory, causes the device to implement a multi-tasking semantic parsing function. The internal bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus. The memory may include a high-speed RAM memory, and may further include a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic disk or an optical disk.

The device may be provided as a terminal, server, or other form of device.

Fig. 2 is a block diagram of an apparatus shown for illustration. The device may include one or more of the following components: processing components, memory, power components, multimedia components, audio components, interfaces for input/output (I/O), sensor components, and communication components. The processing components typically control overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components may include one or more processors to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component can include one or more modules that facilitate interaction between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.

The memory is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component provides power to various components of the electronic device. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device. The multimedia component comprises a screen providing an output interface between said electronic device and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component is configured to output and/or input an audio signal. For example, the audio assembly includes a Microphone (MIC) configured to receive an external audio signal when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. The I/O interface provides an interface between the processing component and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly includes one or more sensors for providing various aspects of status assessment for the electronic device. For example, the sensor assembly may detect an open/closed state of the electronic device, the relative positioning of the components, such as a display and keypad of the electronic device, the sensor assembly may also detect a change in the position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, orientation or acceleration/deceleration of the electronic device, and a change in the temperature of the electronic device. The sensor assembly may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

Example 3:

the present invention also provides a computer-readable storage medium having stored therein a program or instructions for parsing a model as described in embodiment 1, which when executed by a processor, causes a computer to implement a multitask semantic parsing function.

In particular, a system, apparatus or device may be provided which is provided with a readable storage medium on which software program code implementing the functionality of any of the embodiments described above is stored and which causes a computer or processor of the system, apparatus or device to read out and execute instructions stored in the readable storage medium. In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks (e.g., CD-ROM, CD-R, CD-RW, DVD-20ROM, DVD-RAM, DVD-RW), tape, and the like. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

It should be understood that a storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in a terminal or server.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Although the present invention has been described with reference to the specific embodiments, it should be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A multi-task semantic parsing model based on BART is characterized in that: the model comprises a word embedding layer, a BART coding layer, a domain classifier, a BART decoding layer, a probability decoder, a SPARQL decoder and a syntax checker; the word embedding layer converts each character in the input question into vector representation and inputs the vector representation into a BART coding layer; the BART coding layer learns the deep semantic information of the character vectors and codes the deep semantic information, and the hidden layer vector representation of the last layer of coding is respectively input into a domain classifier and a BART decoding layer; the domain classifier performs text classification on the last layer of hidden layer vectors to obtain the domain of the problem; the BART decoding layer generates complete decoding information through a language model according to the problem coding information and the above decoding information; the probability decoder performs semantic information enhancement decoding on the vector output by the last layer of the BART decoder and inputs decoding information into an SPARQL decoder module; the SPARQL decoder module circularly generates a SPARQL query statement according to the decoding strategy and the field information and inputs the statement into the syntax checker module; the syntax checker is used for checking syntax errors of the SPARQL query statement.

2. A BART-based multitask semantic parsing model according to claim 1 wherein: the word embedding layer converts each character in the input question into vector representation, specifically:

E _embedding ＝E _word +E _position

(1)

3. A BART-based multitask semantic parsing model according to claim 1 wherein: the BART uses a standard Transformer model and comprises a 6-layer Transformer encoder and a 6-layer Transformer decoder, the BART encoding layer comprehensively learns the characteristics in different subspaces through a bidirectional multi-head attention mechanism and captures deeper semantic information, and the method specifically comprises the following steps:

the multi-head attention sublayer of the transform encoder adopts a self-attention mechanism to create three vectors, namely Query, key and Value, for each word to calculate a self-attention score, and the calculation formula is as follows:

Multihead(Q，K，V)＝Concat(head ₁ ，head ₂ ，...，head _h )W ^O

(4)

wherein, softmax is a logistic regression function, and the Attention function uses the Softmax function to calculate the self-Attention through different functions

Combining and calculating multiple groups of self-attention head _ i, and connecting the multiple groups of self-attention through a Concat function by the Multihead (Q, K, V) to combine the multiple groups of self-attention into multi-head attention;

SubLayer＝Layer_Normalization(x+(sublayer(x)))

(5)

4. A BART-based multitask semantic parsing model according to claim 1 wherein: the domain classifier takes the hidden layer vector representation of the last layer of the BART coding layer as the input of the layer, and the domain to which the problem belongs is output through text classification.

5. A BART-based multitask semantic parsing model according to claim 1 wherein: the BART coding layer realizes SPARQL autoregressive text generation from left to right through a language model according to problem coding information and the above decoding information, and the BART coding layer specifically comprises the following steps:

performing layer normalization of transform decoding to prevent the problem of covariance offset, and connecting through residual errors to prevent gradient disappearance, wherein a calculation formula is shown as formula (5);

p(y ₁ ，y ₂ ，y ₃ ，...，y _n )＝p(y ₁ |E _o )p(y ₂ |E _o ，y ₁ )p(y ₃ |E _o ，y ₁ ，y ₂ )...p(y _n |E _o ，y ₁ ，...，y _n-1 )

(6)

the above formula is a Markov model calculation formula, E _o Representing the starting character, formula calculates the probability of the occurrence of the following character, p (y), starting from the 1 st character _i |E _o ，y ₁ ，...，y _i-1 ) Is shown at E _o 、y _i To y _i-1 When present, the next character is y _i What is the probability of (c).

6. A BART-based multitask semantic parsing model according to claim 1 wherein: and the probability decoder fuses the vector output by the last layer of the BART coding layer with the classification label [ CLS ] vector of the last layer of the BART coding layer to realize the enhancement decoding of semantic information, and calculates to obtain the probability of each word in the word list.

7. A BART-based multitask semantic parsing model according to claim 1 wherein: and the SPARQL decoder circularly selects a word from the dictionary distribution as a generated result at each moment according to the decoding strategy and the field information to finish the generation of the SPARQL query statement.

8. A BART-based multitask semantic parsing model according to claim 1 wherein: the grammar checker corrects simple grammar errors output by the model to improve the accuracy rate of question answering.

9. A multitask semantic parsing device applied in a knowledge-graph question-answering system is characterized in that: the apparatus comprises at least one processor and at least one memory, the processor and memory coupled; a computer program for storing an analytical model according to any one of claims 1 to 8 in said memory; the processor, when executing the computer program stored by the memory, causes the device to implement a multi-task semantic parsing function.

10. A computer-readable storage medium, in which a program or instructions of the parsing model according to any of claims 1 to 8 are stored, which when executed by a processor, causes a computer to implement a multitask semantic parsing function.