WO2023108991A1 - Procédé et appareil d'entraînement de modèle, procédé et appareil de classification de connaissances et dispositif et support - Google Patents

Procédé et appareil d'entraînement de modèle, procédé et appareil de classification de connaissances et dispositif et support Download PDF

Info

Publication number
WO2023108991A1
WO2023108991A1 PCT/CN2022/090718 CN2022090718W WO2023108991A1 WO 2023108991 A1 WO2023108991 A1 WO 2023108991A1 CN 2022090718 W CN2022090718 W CN 2022090718W WO 2023108991 A1 WO2023108991 A1 WO 2023108991A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
answer
option
knowledge
vector
Prior art date
Application number
PCT/CN2022/090718
Other languages
English (en)
Chinese (zh)
Inventor
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023108991A1 publication Critical patent/WO2023108991A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of machine learning, and in particular to a model training method, a knowledge classification method, a device, a device, and a medium.
  • machine reading comprehension technology can be used to give answers to questions.
  • Machine reading comprehension is a technology that enables machines to understand natural language texts and answer corresponding answers given questions and documents. This technology can be applied in many fields such as text question answering, information extraction in knowledge graph and event graph, and dialogue system.
  • the embodiment of the present application proposes a training method for a knowledge classification model, and the training method for the knowledge classification model includes:
  • the original annotation data includes question stem data, option data and answer data;
  • the preset pre-training model is trained according to the topic data to obtain a knowledge classification model; wherein, the knowledge classification model is used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • the embodiment of the present application proposes a knowledge classification method for multiple-choice questions, and the knowledge classification method for multiple-choice questions includes:
  • the multiple-choice data includes question stem data
  • the stem characterization vector into a knowledge classification model; wherein, the knowledge classification model is obtained by training according to the method described in the first aspect above;
  • Knowledge classification processing is performed according to the feature vector information to obtain knowledge point types.
  • the embodiment of the present application proposes a training device for a knowledge classification model, and the training device for the knowledge classification model includes:
  • the original data acquisition module is used to obtain the original annotation data;
  • the original annotation data includes question stem data, option data and answer data;
  • a question stem coding module configured to encode the question stem data to obtain a question stem representation vector
  • the option answer encoding module is used to encode the option data and answer data according to the preset knowledge graph to obtain option attribute values and answer attribute values;
  • a word segmentation and splicing module used to perform word segmentation and splicing processing on the option attribute value and the answer attribute value to obtain an option answer representation vector
  • a vector splicing module configured to splice the question stem representation vector and the option answer representation vector to obtain topic data
  • the classification model training module is used to train the preset pre-training model according to the topic data to obtain the knowledge classification model; wherein, the knowledge classification model is used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • the embodiment of the present application proposes a knowledge classification device for multiple-choice questions, and the knowledge classification device for multiple-choice questions includes:
  • the multiple-choice data acquisition module is used to obtain multiple-choice data to be classified; wherein, the multiple-choice data includes question stem data, option data and answer data;
  • a data input module configured to input the data of the multiple-choice questions into the knowledge classification model; wherein, the knowledge classification model is trained according to the method described in the first aspect above;
  • a feature extraction module configured to perform feature extraction on the multiple-choice question data through the knowledge classification model to obtain feature vector information
  • the knowledge classification module is configured to perform knowledge classification processing according to the feature vector information to obtain knowledge point types.
  • the embodiment of the present application proposes a computer device, including:
  • the program is stored in the memory, and the processor executes the at least one program to implement a knowledge classification model training method or a multiple-choice knowledge classification method
  • the knowledge classification model training method includes: obtaining Original labeling data; wherein, the original labeling data includes question stem data, option data and answer data; encoding the question stem data to obtain question stem representation vectors; Carry out encoding processing to obtain the option attribute value and the answer attribute value; perform word segmentation and splicing processing on the option attribute value and the answer attribute value to obtain an option answer representation vector; represent the question stem representation vector and the option answer
  • the vectors are spliced to obtain topic data; the preset pre-training model is trained according to the topic data to obtain a knowledge classification model; wherein, the knowledge classification model is used to perform knowledge classification processing on the target topic to obtain knowledge points type.
  • the knowledge classification method for multiple-choice questions includes: obtaining multiple-choice question data to be classified; wherein, the multiple-choice question data includes question stem data; encoding the question stem data to obtain question stem representation vectors; The stem representation vector is input to the knowledge classification model; wherein, the knowledge classification model is obtained by training according to the above-mentioned knowledge classification model training method; the feature extraction is performed on the question stem data through the knowledge classification model to obtain feature vector information; Knowledge classification processing is performed according to the feature vector information to obtain knowledge point types.
  • the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute
  • a method for training a knowledge classification model or a method for classifying knowledge for multiple-choice questions wherein the method for training the knowledge classification model includes: obtaining original label data; wherein the original label data includes question stem data, option data and answer data; encoding the question stem data to obtain a question stem representation vector; encoding the option data and answer data according to a preset knowledge graph to obtain an option attribute value and an answer attribute value; encoding the option attribute value Perform word segmentation and splicing processing with the answer attribute value to obtain the option answer characterization vector; carry out vector splicing of the question stem characterization vector and the option answer characterization vector to obtain topic data;
  • the training model is trained to obtain a knowledge classification model; wherein, the knowledge classification model is used to perform knowledge classification processing on target topics to obtain knowledge point
  • the knowledge classification method for multiple-choice questions includes: obtaining multiple-choice question data to be classified; wherein, the multiple-choice question data includes question stem data; encoding the question stem data to obtain question stem representation vectors; The stem representation vector is input to the knowledge classification model; wherein, the knowledge classification model is obtained by training according to the above-mentioned knowledge classification model training method; the feature extraction is performed on the question stem data through the knowledge classification model to obtain feature vector information; Knowledge classification processing is performed according to the feature vector information to obtain knowledge point types.
  • the training method of the knowledge classification model proposed in the embodiment of the present application can be used for the target topic Carrying out knowledge classification processing to obtain knowledge point types that meet requirements can improve the accuracy and efficiency of knowledge classification.
  • FIG. 1 is a flowchart of a training method for a knowledge classification model provided by an embodiment of the present disclosure
  • Fig. 2 is the flowchart of step 102 in Fig. 1;
  • Fig. 3 is a partial flowchart of the training method of the knowledge classification model provided by another embodiment
  • Fig. 4 is the flowchart of step 103 in Fig. 1;
  • Fig. 5 is a flowchart of step 104 in Fig. 1;
  • FIG. 6 is a flow chart of a knowledge classification method for multiple-choice questions provided by an embodiment of the present disclosure
  • FIG. 7 is a functional block diagram of a training device for a knowledge classification model provided by an embodiment of the present disclosure.
  • Fig. 8 is a functional block diagram of a multiple-choice knowledge classification method device provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a hardware structure of a computer device provided by an embodiment of the present disclosure.
  • Artificial Intelligence It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science. Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Natural language processing uses computers to process, understand and use human languages (such as Chinese, English, etc.). NLP belongs to a branch of artificial intelligence and is an interdisciplinary subject between computer science and linguistics. Known as computational linguistics. Natural language processing includes syntax analysis, semantic analysis, text understanding, etc. Natural language processing is often used in technical fields such as machine translation, handwritten and printed character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining. It involves language processing Related data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research and linguistics research related to language computing, etc.
  • Knowledge Graph It combines the theories and methods of applied mathematics, graphics, information visualization technology, information science and other disciplines with metrology citation analysis, co-occurrence analysis and other methods, and uses the visual graph to display the subject visually.
  • the main goal of the knowledge map is to describe various entities and concepts that exist in the real world, as well as the strong relationship between them. We use relationships to describe the association between two entities.
  • Entity refers to something that is distinguishable and exists independently. Such as a certain person, a certain city, a certain plant, a certain commodity, etc. Everything in the world is made up of concrete things, which refer to entities. Entities are the most basic elements in knowledge graphs, and different entities have different relationships.
  • Semantic class A collection of entities with the same characteristics, such as countries, nations, books, computers, etc. Concepts mainly refer to collections, categories, object types, and types of things, such as people, geography, etc.
  • Relationship There is a certain relationship between entities and entities, between different concepts and concepts, and between concepts and entities.
  • the relation is formalized as a function that maps k points to a Boolean value.
  • a relation is a function that maps kk graph nodes (entities, semantic classes, attribute values) to Boolean values.
  • Attribute The value of an entity-specific attribute, which is the attribute value pointed from an entity to it. Different attribute types correspond to edges with different types of attributes.
  • the attribute value mainly refers to the value of the specified attribute of the object. For example: "area”, “population”, “capital” are several different attributes.
  • the attribute value mainly refers to the value of the specified attribute of the object, such as 9.6 million square kilometers, etc.
  • triple ( ⁇ E,R ⁇ ) is a general representation of knowledge graph; the basic form of triple mainly includes (entity 1-relationship-entity 2) and (entity-attribute-attribute value) wait.
  • Each entity (the extension of the concept) can be identified by a globally unique ID
  • each attribute-value pair (AVP) can be used to describe the intrinsic characteristics of the entity, and the relationship can be used to connect two entities. the connection between them.
  • AVP attribute-value pair
  • Beijing is an entity
  • population is a Attributes
  • 20.693 million are attribute values.
  • Beijing-population-20.693 million constitutes an example triplet of (entity-attribute-attribute value).
  • token is the basic unit of indexing, representing each indexed character; if a field is tokenized, it means that the field has passed an analysis program that can convert the content into a token string; in the process of tokenization , the parser applies any transformation logic (such as removing stop words such as "a” or "the”, performing a stemming search, converting all text without case sensitivity to lowercase, etc.), the extraction should be compiled Text content to be indexed.
  • BERT Bidirectional Encoder Representation from Transformers
  • the BERT model further increases the generalization ability of the word vector model, fully describes the character-level, word-level, sentence-level and even inter-sentence relationship features, and is built based on Transformer.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • an embodiment of the present disclosure provides a training method for a knowledge classification model, a knowledge classification method for multiple-choice questions, a training device for a knowledge classification model, a knowledge classification device for multiple-choice questions, a computer device, and a storage medium, which can improve the ability of the model to classify knowledge. accuracy and efficiency.
  • the knowledge classification model training method, knowledge classification method for multiple-choice questions, training device for knowledge classification model, knowledge classification device for multiple-choice questions, computer equipment, and storage media provided by the embodiments of the present disclosure will be specifically described through the following embodiments.
  • the training method of the knowledge classification model in the embodiment of the present disclosure will be specifically described through the following embodiments.
  • the training method of the knowledge classification model provided by the embodiment of the present disclosure relates to the technical field of machine learning.
  • the training method of the knowledge classification model provided by the embodiments of the present disclosure may be applied to a terminal, may also be applied to a server, and may also be software running on the terminal or the server.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart watch;
  • the server end can be configured as an independent physical server, or as a server cluster composed of multiple physical servers or as a distributed
  • the system can also be configured to provide basic cloud computing such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the cloud server of the service; the software can be the application of the training method for realizing the knowledge classification model, etc., but it is not limited to the above forms.
  • FIG. 1 is an optional flow chart of a method for training a knowledge classification model provided by an embodiment of the present disclosure.
  • the method in FIG. 1 may include but not limited to steps 101 to 106 .
  • Step 101 obtaining original annotation data;
  • the original annotation data includes question stem data, option data and answer data;
  • Step 102 encoding the question stem data to obtain a question stem representation vector
  • Step 103 Encoding the option data and answer data according to the preset knowledge map to obtain option attribute values and answer attribute values;
  • Step 104 performing word segmentation and splicing processing on the option attribute value and the answer attribute value to obtain the option answer representation vector
  • Step 105 performing vector concatenation of the question stem representation vector and the option answer representation vector to obtain question data
  • Step 106 Train the preset pre-training model according to the topic data to obtain a knowledge classification model; wherein, the knowledge classification model is used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • step 101 of an application scenario it is necessary to obtain a certain amount of original labeling data, for example, 1 million pieces of original labeling data.
  • the original labeling data may be manually labeled topic data.
  • the type of knowledge points investigated by the topic that is, the label of the original labeled data is the type of knowledge point.
  • the type of knowledge point investigated in [attributive clause] is an attributive clause
  • the type of knowledge point investigated in [adverbial clause] is an adverbial clause.
  • 1 million labeled data are used to train the model, so tens of millions or even more English questions can be automatically classified at the cost of only 1 million data.
  • the original annotation data is the question stem data, option data and answer data of English multiple-choice questions.
  • the question stem representation vector is obtained, and the option data and answer in the original annotation data are analyzed according to the preset knowledge graph.
  • the data is encoded, so that the option attribute value and the answer attribute value can be obtained, and then the option attribute value and the answer attribute value are subjected to word segmentation and splicing processing to obtain the option answer representation vector, and then the question stem representation vector and the option answer.
  • the characterization vectors are spliced to obtain the topic data, and finally the preset pre-training model is trained according to the topic data to obtain a knowledge classification model, which can be used to perform knowledge classification processing on the target topic to obtain knowledge points type, the knowledge classification model obtained in the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification.
  • the question stem data is encoded to obtain the question stem representation vector, specifically including:
  • Step 201 preprocessing the question stem data to obtain a preliminary question stem sequence
  • Step 202 perform word segmentation processing on the preliminary question stem sequence to obtain a question stem representation vector.
  • step 201 includes:
  • the English content of the question stem data includes: I lOVE YOU, all I lOVE YOU are converted into lowercase, and the obtained preliminary question stem sequence is: i love you.
  • step 201 also includes:
  • the English abbreviated content of the question stem data is restored to the English full name, and the preliminary question stem sequence is obtained.
  • the preliminary question stem sequence obtained after restoring the I'm containing the English abbreviation to the English full name is: i am.
  • step 202 word segmentation processing is performed on the preliminary question stem sequence to obtain a question stem representation vector, specifically including:
  • the preliminary stem sequence is:
  • the stem representation vector obtained after tokenizing i am playing is:
  • the training method of the knowledge classification model also includes: building a knowledge map, which may specifically include but not limited to steps 301 to 303:
  • Step 301 acquiring preset knowledge points
  • Step 302 constructing a first triplet and a second triplet according to preset knowledge points
  • Step 303 constructing a knowledge graph based on the first triple and the second triple; wherein, the first triple includes the first knowledge entity, relationship, and second knowledge entity, and the second triple includes the second knowledge entity , attribute, attribute value.
  • step 301 of some embodiments technical means such as a web crawler may be used to crawl relevant data such as preset knowledge points; relevant data may also be obtained from a preset database.
  • relevant data such as preset knowledge points; relevant data may also be obtained from a preset database.
  • the preset knowledge points are preset English knowledge points, such as English test points in the English online education scenario.
  • the principle of constructing the English knowledge map is: constructing the first triplet and the second triplet according to each knowledge point of the preset knowledge points, wherein the first triplet includes the first knowledge Entity, relation, second knowledge entity, the second triple group includes second knowledge entity, attribute, attribute value.
  • the association relationship between the first knowledge entity and the second knowledge entity is established, specifically, the connection of the association relationship between the first knowledge entity and the second knowledge entity is established through an undirected edge.
  • Explanation on the first triple if there is a relationship between two knowledge nodes, then the two knowledge nodes with the relationship are connected together by an undirected edge.
  • the knowledge node is called an entity, and the undirected edge represents The relationship between the two knowledge nodes, in the embodiment of the present disclosure, the two knowledge nodes correspond to the first knowledge entity and the second knowledge entity.
  • the second knowledge entity represents the name of the corresponding English knowledge point
  • the second triplet represents: the name of the corresponding English knowledge point, the attribute of the English knowledge point, and the attribute value corresponding to the attribute .
  • the first triple can be expressed as: clause-include-attributive clause; or the first triple can be expressed as: clause-include-adverbial clause; where [clause] is the corresponding English knowledge point , this English knowledge point includes [attributive clause] and [adverbial clause] two knowledge points, and the internal relationship is containment.
  • the second triple can be expressed as: attributive clause-grade-grade 8, attributive clause-relative word-which; among them, the [attributive clause] has an attribute of [grade], and the [grade]
  • the attribute value of is [Grade 8], which means that the [attributive clause] is a knowledge point of [Grade 8].
  • the [attributive clause] also has an attribute value of [relative word], and the attribute value of this [relative word] is which.
  • the first triple can be expressed as: clause-include-attributive clause; or the first triple can be expressed as: clause-include-adverbial clause; where [clause] is the corresponding English knowledge point , this English knowledge point includes [attributive clause] and [adverbial clause] two knowledge points, and the internal relationship is containment.
  • the composition structure of English knowledge points and the inspection points of English knowledge points can be clearly known; in addition, the sum of edges between two knowledge points can be calculated to Whether two knowledge points are similar knowledge points can be judged with reference to related technologies, which is not limited in the embodiments of the present disclosure.
  • the preset knowledge graph includes the first triplet and the second triplet, and the option data and answer data are encoded according to the preset knowledge graph to obtain the option Attribute values and answer attribute values, which may specifically include but are not limited to include:
  • the knowledge graph includes a first triplet and multiple second triplets
  • the option data and answer data are encoded according to a preset knowledge graph to obtain option attribute values and answer attribute values, including:
  • Step 401 Encoding option data according to the first triplet and multiple second triplets to obtain option attribute values; wherein, the option attribute value includes attribute values of multiple second triplets;
  • Step 402 Encode the answer data according to the first triplet and one of the second triplets to obtain the answer attribute value; wherein, the answer attribute value is one of the multiple attribute values in the option attribute value .
  • the embodiment of the present disclosure introduces the knowledge information of the knowledge map to the encoding stage of the options and answers.
  • the options and answers of the questions are used to obtain knowledge entities through the relevant information of the first triplet and the second triplet of the knowledge graph.
  • take an English multiple-choice question as an example.
  • a sentence containing clause content is given in the question stem data: My house, which I bought last year, has got a stylish garden.
  • the question stem data it is required to judge the clause type of the clause "which I bought last year”.
  • the option data is: A, B, C, D four options, where option A is an adverbial clause, option B is a main clause, option C is an attributive clause, and option D is an predicative clause.
  • option A is an adverbial clause
  • option B is a main clause
  • option C is an attributive clause
  • option D is an predicative clause.
  • the first triple of the knowledge map is expressed as: clause-contains-attributive clause
  • the second triple is: attributive clause-relative word-which.
  • the "which" in the clause "which I bought last year” is a relative word
  • the type of the corresponding clause is "attributive clause", which is the expression of the second triple: attributive clause-relative word-which.
  • the answer corresponding to the type of the clause "which I bought last year” is: the clause is a defining clause, and the answer corresponds to the expression of the first triple: clause-contains-attributive clause.
  • the option data is encoded according to the first triplet and multiple second triplets, and the obtained option attribute values are: adverbial clause, subject clause, attributive clause, and predicative clause.
  • the answer data is encoded according to the first triplet and one of the second triplets, and the obtained answer attribute value is: attributive clause (that is, the attributive clause in the option attribute value); in this application scenario, the English knowledge investigated The point is the judgment of the attributive clause in the clause.
  • step 104 of some embodiments word segmentation and splicing are performed on the option attribute value and the answer attribute value to obtain the option answer representation vector, which may specifically include but not limited to include:
  • Step 501 perform word vectorization on the option attribute value and answer attribute value, and obtain the option attribute value and answer attribute value of word vectorization;
  • Step 502 concatenate the option attribute values and answer attribute values quantized to obtain option answer representation vectors.
  • the knowledge words corresponding to the option attribute value and the answer attribute value are vectorized into a vector token corresponding to the option attribute value and a vector token corresponding to the answer attribute value, and then the two vectors The tokens are spliced to obtain the option answer representation vector.
  • the attribute value of the option and the attribute value of the answer can be spliced first to obtain the attribute value of the option answer, and then the attribute value of the option answer is vectorized into a vector token corresponding to the option answer, that is, the option answer representation vector.
  • the option attribute value is a sequence of sentences A
  • the answer attribute value is a sentence B
  • the two sentences A and B are concatenated into an option answer representation vector.
  • the option answer characterization vector can be a sequence with a length of 320; if the length of the option answer characterization vector is not 320, the option answer characterization vector needs to be zero-filled; and because the option attribute value may be very long, Therefore, it is necessary to truncate the attribute value of the option, and cut off the tail of a longer sentence each time until the length of the entire option answer representation vector is 320.
  • the question stem data is given a clause content, and it is required to judge the clause type of the clause content.
  • the options are A, B, C, and D.
  • Option A is an adverbial Dependent clauses
  • option B is the main clause
  • option C is the attributive clause
  • option D is the predicative clause
  • the answer data corresponds to: attributive clause; that is, the option attribute value includes the adverbial clause, the subject clause, the attributive clause, and the predicative clause
  • the answer attribute value as an attributive clause. Therefore, the option answer representation vector obtained after word segmentation and concatenation of the option attribute value and the answer attribute value is expressed as [adverbial clause, subject clause, attributive clause, predicative clause, attributive clause].
  • the question stem characterization vector and the option answer characterization vector are vector concatenated to obtain question data, which may specifically include but not limited to include:
  • the question stem representation vector and the option answer representation vector are vector-spliced through separators to obtain the question data.
  • the delimiter can be a pair of placeholders: a first placeholder [CLS] and a second placeholder [SEP], wherein the first placeholder [CLS] represents the beginning of the sequence, and the second The placeholder [SEP] indicates the end of the sequence.
  • CLS classifer token
  • SEP sentence separator
  • a character is also a special token that can be used to separate two sentences.
  • the question stem characterization vector and the option answer characterization vector are vector-spliced through the separator to obtain the question data, including:
  • the question stem characterization vector is set between the first placeholder and the second placeholder, the second placeholder is set between the question stem characterization vector and the option answer characterization vector, and the question stem characterization vector and the option answer characterization vector Perform vector splicing to obtain the topic data.
  • the representation form of the question data is: [ ⁇ CLS>, question stem representation vector, ⁇ SEP>, option answer representation vector]
  • the stem representation vector is: i, am, play, ing
  • the option answer representation vector is: [adverbial clause, subject clause, attributive clause, predicative clause, attributive clause]
  • the preset pre-training model can be a BERT model; specifically, according to the topic data obtained in step 105 as the input of the BERT model, the BERT model is trained to obtain a knowledge classification model, the knowledge classification model
  • the basic framework of BERT is the BERT model; the knowledge classification model is used to predict the knowledge type of the target topic; specifically, the knowledge classification model includes a softmax classifier; the knowledge classification model obtains the feature vector information corresponding to ⁇ CLS> according to the input topic data , ⁇ CLS> can predict the knowledge type of the target topic after passing through a softmax classifier.
  • the target topic is a topic input into the knowledge classification model, for example, it may be a multiple-choice topic, and more specifically, in the case of an English multiple-choice question, the target topic may be a multiple-choice question examining attributive clauses.
  • each token-level word it includes: token embedding, position embedding, and segment embedding; wherein the token embedding is a vector representation of the word on the entire corpus obtained by the token after the model is pre-trained on the corpus;
  • the positional embedding is the position index of the current token in the sequence;
  • the segmental embedding is to mark whether it is sentence A or sentence B in this sequence, where the segmental embedding of the token belonging to sentence A is 0, and the segmental embedding of the token belonging to sentence B is 1.
  • the three embeddings of token embedding, position embedding, and segment embedding are spliced together to form the word embedding of each token, and the embedding of the entire sequence is input into the multi-layer bidirectional Transformer encoder, and the first one of the last hidden layer is taken
  • the vector corresponding to the token (namely [CLS]) is used as the aggregate representation of the entire sentence, that is, the vector represents the vector representation of the entire option sequence.
  • the knowledge type of the topic can be predicted by passing the sequence represented by the topic data through the softmax classifier.
  • the question stem representation vector is obtained, and the option data and answer data in the original annotation data are processed according to the preset knowledge graph. Encoding processing, so that the option attribute value and the answer attribute value can be obtained, and then the option attribute value and the answer attribute value are subjected to word segmentation and splicing processing to obtain the option answer representation vector, and then the question stem representation vector and the option answer representation vector Perform vector splicing to obtain topic data, and finally train the preset pre-training model according to the topic data to obtain a knowledge classification model.
  • This knowledge classification model can be used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • the knowledge classification model obtained in the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification.
  • the topics of English multiple-choice questions are classified, and the model can be used to automatically distinguish the knowledge points investigated by the questions.
  • the technical solutions of the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification, and by introducing the knowledge map coding information (triple information) of options and answers, it is possible to more accurately predict the content of the topic. knowledge type. With a fixed cost of labeled samples, new topics can be classified more efficiently.
  • the embodiment of the present disclosure also provides a knowledge classification method for multiple-choice questions.
  • the knowledge classification method for multiple-choice questions provided by the embodiment of the present disclosure relates to the technical field of machine learning.
  • the multiple-choice knowledge classification method provided by the embodiments of the present disclosure can be applied to a terminal, can also be applied to a server, and can also be software running on the terminal or the server.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart watch;
  • the server end can be configured as an independent physical server, or as a server cluster composed of multiple physical servers or as a distributed
  • the system can also be configured to provide basic cloud computing such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the cloud server of the service; the software can be the application of knowledge classification methods to realize multiple-choice questions, but it is not limited to the above forms.
  • Fig. 6 is an optional flow chart of the multiple-choice knowledge classification method provided by the embodiment of the present disclosure.
  • the method in Fig. 6 may include but not limited to steps 601 to 604:
  • Step 601. Obtain multiple-choice question data to be classified; wherein, the multiple-choice question data includes question stem data, option data and answer data;
  • Step 602 input multiple-choice question data into the knowledge classification model; wherein, the knowledge classification model is obtained through training according to the method of the first aspect above;
  • Step 603 perform feature extraction on the multiple-choice question data through the knowledge classification model, and obtain feature vector information
  • Step 604 performing knowledge classification processing according to the feature vector information to obtain knowledge point types.
  • multiple-choice question data to be classified includes question stem data, option data and answer data.
  • the multiple choice data is different from the original label data: the original label data includes knowledge point types, and the multiple choice data does not include knowledge point types.
  • target questions include multiple-choice question data to be classified.
  • the knowledge classification model includes a softmax classifier.
  • the knowledge classification method for multiple-choice questions feature extraction is performed on the data of multiple-choice questions through the knowledge classification model, and the feature vector information corresponding to ⁇ CLS> is obtained, and the obtained feature vector information includes question stem representation vectors and option answer representation vectors;
  • the The question stem characterization vector is the same as the question stem characterization vector in the above-mentioned knowledge classification model training method, that is, the question stem characterization vector in this embodiment is set between the first placeholder ⁇ CLS> and the second placeholder ⁇ SEP> , it can also be said that the question stem representation vector includes the first placeholder ⁇ CLS>;
  • the knowledge classification method of the multiple-choice question in this embodiment is the same as the training method of the above-mentioned knowledge classification model and also includes: the second placeholder ⁇ SEP> set Between the question stem representation vector and the option answer representation vector, it can also be said that the option answer representation vector includes a second placeholder ⁇ SEP>.
  • step 604 of some embodiments according to the feature vector information corresponding to ⁇ CLS> obtained in step 603, through a softmax classifier, the softmax classifier can perform word count classification processing according to the feature vector information corresponding to ⁇ CLS>, thereby predicting The knowledge type of the topic.
  • the question stem representation vector is obtained, and the option data and answer data in the original annotation data are processed according to the preset knowledge graph. Encoding processing, so that the option attribute value and the answer attribute value can be obtained, and then the option attribute value and the answer attribute value are subjected to word segmentation and splicing processing to obtain the option answer representation vector, and then the question stem representation vector and the option answer representation vector Perform vector splicing to obtain topic data, and finally train the preset pre-training model according to the topic data to obtain a knowledge classification model.
  • This knowledge classification model can be used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • the knowledge classification model obtained in the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification.
  • the topics of English multiple-choice questions are classified, and the model can be used to automatically distinguish the knowledge points investigated by the questions.
  • the technical solutions of the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification, and by introducing the knowledge map coding information (triple information) of options and answers, it is possible to more accurately predict the content of the topic. knowledge type. With a fixed cost of labeled samples, new topics can be classified more efficiently.
  • an embodiment of the present disclosure also provides a training device for a knowledge classification model, which can implement the above-mentioned training method for the knowledge classification model.
  • the training device for the knowledge classification model includes: an original data acquisition module for obtaining original label data ;
  • the original labeling data includes question stem data, option data and answer data;
  • the question stem encoding module is used to encode the question stem data to obtain the question stem representation vector;
  • the option answer encoding module is used to obtain the question stem representation vector according to the preset knowledge
  • the map encodes the option data and the answer data to obtain the option attribute value and the answer attribute value;
  • the word segmentation and splicing module is used to perform word segmentation and splicing processing on the option attribute value and the answer attribute value to obtain the option answer representation vector ;
  • a vector splicing module used for vector splicing the question stem representation vector and the option answer representation vector to obtain topic data;
  • a classification model training module used for training a preset pre-training model according to the topic data
  • the knowledge classification model training device in the embodiment of the present disclosure is used to execute the knowledge classification model training method in the above embodiment, and its specific processing process is the same as the knowledge classification model training method in the above embodiment, which will not be repeated here. A repeat.
  • an embodiment of the present disclosure also provides a knowledge classification device for multiple-choice questions, which can realize the knowledge classification method for the above-mentioned multiple-choice questions.
  • the knowledge classification device for multiple-choice questions includes: a data acquisition module for multiple-choice questions, used to obtain Multiple-choice question data; wherein, the multiple-choice question data includes question stem data, option data and answer data; the data input module is used to input the multiple-choice question data into the knowledge classification model; wherein, the knowledge classification model is the knowledge according to the above-mentioned first aspect
  • the training method of the classification model is trained; the feature extraction module is used to extract the features of the multiple choice data through the knowledge classification model to obtain the feature vector information; the knowledge classification module is used to perform knowledge classification processing according to the feature vector information to obtain the knowledge point type .
  • the knowledge classification device for multiple-choice questions in the embodiment of the present disclosure is used to implement the knowledge classification method for multiple-choice questions in the above-mentioned embodiments, and its specific processing process is the same as the knowledge classification method for multiple-choice questions in the above-mentioned embodiments, and will not be repeated here. repeat.
  • An embodiment of the present disclosure also provides a computer device, including:
  • the program is stored in the memory, and the processor executes the at least one program to implement the above-mentioned knowledge classification model training method or multiple choice question knowledge classification method in the present disclosure.
  • the computer device may be any intelligent terminal including a mobile phone, a tablet computer, a personal digital assistant (PDA for short), a vehicle-mounted computer, and the like.
  • FIG. 9 illustrates a hardware structure of a computer device in another embodiment, and the computer device includes:
  • the processor 701 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to Realize the technical solutions provided by the embodiments of the present disclosure;
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to Realize the technical solutions provided by the embodiments of the present disclosure
  • ASIC Application Specific Integrated Circuit
  • the memory 702 may be implemented in the form of a ROM (ReadOnly Memory, read only memory), a static storage device, a dynamic storage device, or a RAM (Random Access Memory, random access memory).
  • the memory 702 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 702 and called by the processor 701 to execute the implementation of the present disclosure.
  • the knowledge classification method for multiple-choice questions includes: obtaining multiple-choice question data to be classified; wherein, the multiple-choice question data includes question stem data; encoding the question stem data to obtain question stem characterization vectors; The vector is input to the knowledge classification model; wherein, the knowledge classification model is obtained by training according to the above-mentioned knowledge classification model training method; the feature extraction is performed on the stem data through the knowledge classification model to obtain feature vector information; according to the The feature vector information is processed by knowledge classification to obtain the type of knowledge points.
  • the input/output interface 703 is used to realize information input and output
  • the communication interface 704 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.); and
  • the processor 701 , the memory 702 , the input/output interface 703 and the communication interface 704 are connected to each other within the device through the bus 705 .
  • An embodiment of the present disclosure also provides a storage medium, which is a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make the computer execute the above-mentioned knowledge classification model training method or multiple-choice knowledge classification method; wherein, the knowledge classification model training method includes: obtaining the original Labeling data; wherein, the original labeling data includes question stem data, option data, and answer data; the question stem data is encoded to obtain the question stem representation vector; the option data and answer data are encoded according to the preset knowledge map to obtain option attribute value and answer attribute value; the option attribute value and answer attribute value are segmented and spliced to obtain the option answer representation vector; the question stem representation vector and the option answer representation vector are vector spliced to obtain the topic data; according to the topic data
  • the preset pre-training model is trained to obtain a knowledge classification model; wherein, the knowledge classification model is
  • the knowledge classification method for multiple-choice questions includes: obtaining multiple-choice question data to be classified; wherein, the multiple-choice question data includes question stem data; encoding the question stem data to obtain question stem characterization vectors; The vector is input to the knowledge classification model; wherein, the knowledge classification model is obtained by training according to the above-mentioned knowledge classification model training method; the feature extraction is performed on the stem data through the knowledge classification model to obtain feature vector information; according to the The feature vector information is processed by knowledge classification to obtain the type of knowledge points.
  • the training method of the knowledge classification model, the knowledge classification method of the multiple choice questions, the training device of the knowledge classification model, the knowledge classification device of the multiple choice questions, the computer equipment, and the storage medium proposed by the embodiments of the present disclosure obtain the original labeling data, and the original labeling
  • the question stem data in the data is encoded to obtain the question stem representation vector
  • the option data and answer data in the original annotation data are encoded according to the preset knowledge map, so that the option attribute value and answer attribute value can be obtained, and then Segment and concatenate the option attribute value and the answer attribute value to obtain the option answer characterization vector, and then perform vector splicing on the question stem characterization vector and the option answer characterization vector, so that the topic data can be obtained, and finally according to the topic data.
  • the preset pre-training model is trained to obtain a knowledge classification model.
  • the knowledge classification model can be used to perform knowledge classification processing on the target topic to obtain the type of knowledge points.
  • the knowledge classification model obtained in the embodiment of the present disclosure can improve the accuracy of knowledge classification. accuracy and efficiency.
  • the topics of English multiple-choice questions are classified, and the model can be used to automatically distinguish the knowledge points investigated by the questions.
  • the technical solutions of the embodiments of the present disclosure can improve the accuracy and efficiency of knowledge classification, and by introducing the knowledge map coding information (triple information) of options and answers, it is possible to more accurately predict the content of the topic. knowledge type. With a fixed cost of labeled samples, new topics can be classified more efficiently.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • FIGS. 1-6 do not constitute a limitation to the embodiments of the present disclosure, and may include more or fewer steps than those shown in the illustrations, or combine certain steps, or be different. A step of.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention appartient au domaine technique de l'apprentissage automatique. La présente invention concerne un procédé et un appareil d'entraînement de modèle, un procédé et un appareil de classification de connaissances, ainsi qu'un dispositif et un support. Le procédé d'entraînement de modèle consiste à : acquérir des données d'annotation d'origine, les données d'annotation d'origine comprenant des données d'énoncé de question, des données de choix et des données de réponse; coder les données d'énoncé de question pour obtenir un vecteur de représentation d'énoncé de question; coder des données de choix et des données de réponse selon un graphique de connaissances prédéfini, de manière à obtenir une valeur d'attribut de choix et une valeur d'attribut de réponse; effectuer une segmentation de mot et un traitement d'épissage sur la valeur d'attribut de choix et la valeur d'attribut de réponse, de façon à obtenir un vecteur de représentation de réponse de choix; effectuer un épissage de vecteur sur le vecteur de représentation d'énoncé de question et le vecteur de représentation de réponse de choix, de façon à obtenir des données de question; et entraîner un modèle de pré-entraînement prédéfini en fonction des données de question, de façon à obtenir un modèle de classification de connaissances, le modèle de classification de connaissances étant utilisé pour effectuer une classification de connaissances sur une question cible. Le modèle de classification de connaissances obtenu dans les modes de réalisation de la présente invention peut améliorer la précision et l'efficacité de classification de connaissances.
PCT/CN2022/090718 2021-12-15 2022-04-29 Procédé et appareil d'entraînement de modèle, procédé et appareil de classification de connaissances et dispositif et support WO2023108991A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111536048.0 2021-12-15
CN202111536048.0A CN114238571A (zh) 2021-12-15 2021-12-15 模型的训练方法、知识分类方法、装置、设备、介质

Publications (1)

Publication Number Publication Date
WO2023108991A1 true WO2023108991A1 (fr) 2023-06-22

Family

ID=80756448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090718 WO2023108991A1 (fr) 2021-12-15 2022-04-29 Procédé et appareil d'entraînement de modèle, procédé et appareil de classification de connaissances et dispositif et support

Country Status (2)

Country Link
CN (1) CN114238571A (fr)
WO (1) WO2023108991A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955589A (zh) * 2023-09-19 2023-10-27 山东山大鸥玛软件股份有限公司 一种基于教材知识图谱的智能命题方法、***、命题终端及存储介质
CN117171654A (zh) * 2023-11-03 2023-12-05 酷渲(北京)科技有限公司 一种知识萃取方法、装置、设备及可读存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201603A (zh) * 2021-11-04 2022-03-18 阿里巴巴(中国)有限公司 实体分类方法、装置、存储介质、处理器及电子装置
CN114238571A (zh) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 模型的训练方法、知识分类方法、装置、设备、介质
CN115186780B (zh) * 2022-09-14 2022-12-06 江西风向标智能科技有限公司 学科知识点分类模型训练方法、***、存储介质及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563166A (zh) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 一种针对数学问题分类的预训练模型方法
CN112395858A (zh) * 2020-11-17 2021-02-23 华中师范大学 融合试题数据和解答数据的多知识点标注方法和***
CN112818120A (zh) * 2021-01-26 2021-05-18 北京智通东方软件科技有限公司 习题标注方法、装置、存储介质及电子设备
US20210150152A1 (en) * 2019-11-20 2021-05-20 Oracle International Corporation Employing abstract meaning representation to lay the last mile towards reading comprehension
CN113743083A (zh) * 2021-09-06 2021-12-03 东北师范大学 一种基于深度语义表征的试题难度预测方法及***
CN114238571A (zh) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 模型的训练方法、知识分类方法、装置、设备、介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210150152A1 (en) * 2019-11-20 2021-05-20 Oracle International Corporation Employing abstract meaning representation to lay the last mile towards reading comprehension
CN111563166A (zh) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 一种针对数学问题分类的预训练模型方法
CN112395858A (zh) * 2020-11-17 2021-02-23 华中师范大学 融合试题数据和解答数据的多知识点标注方法和***
CN112818120A (zh) * 2021-01-26 2021-05-18 北京智通东方软件科技有限公司 习题标注方法、装置、存储介质及电子设备
CN113743083A (zh) * 2021-09-06 2021-12-03 东北师范大学 一种基于深度语义表征的试题难度预测方法及***
CN114238571A (zh) * 2021-12-15 2022-03-25 平安科技(深圳)有限公司 模型的训练方法、知识分类方法、装置、设备、介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 25 July 2020, BEIJING FORESTRY UNIVERSITY, CN, article LIAO, ZIHUI: "Intelligent English Grammar Exercises System Based on Knowledge Graph", pages: 1 - 79, XP009546182, DOI: 10.26949/d.cnki.gblyu.2020.000421 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955589A (zh) * 2023-09-19 2023-10-27 山东山大鸥玛软件股份有限公司 一种基于教材知识图谱的智能命题方法、***、命题终端及存储介质
CN116955589B (zh) * 2023-09-19 2024-01-30 山东山大鸥玛软件股份有限公司 一种基于教材知识图谱的智能命题方法、***、命题终端及存储介质
CN117171654A (zh) * 2023-11-03 2023-12-05 酷渲(北京)科技有限公司 一种知识萃取方法、装置、设备及可读存储介质
CN117171654B (zh) * 2023-11-03 2024-02-09 酷渲(北京)科技有限公司 一种知识萃取方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN114238571A (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2023108991A1 (fr) Procédé et appareil d'entraînement de modèle, procédé et appareil de classification de connaissances et dispositif et support
CN110019839B (zh) 基于神经网络和远程监督的医学知识图谱构建方法和***
CN108595708A (zh) 一种基于知识图谱的异常信息文本分类方法
CN106886580B (zh) 一种基于深度学习的图片情感极性分析方法
CN110110054A (zh) 一种基于深度学习的从非结构化文本中获取问答对的方法
CN111931517B (zh) 文本翻译方法、装置、电子设备以及存储介质
CN104809176A (zh) 藏语实体关系抽取方法
CN113987147A (zh) 样本处理方法及装置
WO2023159767A1 (fr) Procédé et appareil de détection de mot cible, dispositif électronique et support de stockage
CN114722069A (zh) 语言转换方法和装置、电子设备及存储介质
CN115858758A (zh) 一种多非结构化数据识别的智慧客服知识图谱***
CN116561538A (zh) 问答评分方法、问答评分装置、电子设备及存储介质
CN114841146B (zh) 文本摘要生成方法和装置、电子设备及存储介质
CN110969023A (zh) 文本相似度的确定方法及装置
CN113609267B (zh) 基于GCNDT-MacBERT神经网络框架的话语关系识别方法及***
CN111178080A (zh) 一种基于结构化信息的命名实体识别方法及***
CN112749556B (zh) 多语言模型的训练方法和装置、存储介质和电子设备
CN114722774B (zh) 数据压缩方法、装置、电子设备及存储介质
CN114611529B (zh) 意图识别方法和装置、电子设备及存储介质
KR102455747B1 (ko) 딥러닝 알고리즘을 이용한 가짜 뉴스 탐지 모델 제공 시스템 및 방법
CN115270746A (zh) 问题样本生成方法和装置、电子设备及存储介质
CN115204300A (zh) 文本和表格语义交互的数据处理方法、装置及存储介质
CN115730051A (zh) 一种文本处理方法和装置、电子设备及存储介质
CN114662496A (zh) 信息识别方法、装置、设备、存储介质及产品
CN112015891A (zh) 基于深度神经网络的网络问政平台留言分类的方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22905759

Country of ref document: EP

Kind code of ref document: A1