CN116975578A - Logic rule network model training method, device, equipment, program and medium - Google Patents

Logic rule network model training method, device, equipment, program and medium Download PDF

Info

Publication number
CN116975578A
CN116975578A CN202310149602.2A CN202310149602A CN116975578A CN 116975578 A CN116975578 A CN 116975578A CN 202310149602 A CN202310149602 A CN 202310149602A CN 116975578 A CN116975578 A CN 116975578A
Authority
CN
China
Prior art keywords
network model
logic
loss function
classification
logic rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310149602.2A
Other languages
Chinese (zh)
Inventor
黄予
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310149602.2A priority Critical patent/CN116975578A/en
Publication of CN116975578A publication Critical patent/CN116975578A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a logic rule network model training method, a device and electronic equipment, wherein the method comprises the following steps: carrying out characterization processing on the original data to obtain training feature vectors; determining a disjunctive normal form constraint loss function, a network sparsity constraint loss function and a known rule constraint loss function of a logic rule network model; determining a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function; training the logic rule network model according to the training feature vector and the fusion loss function, and determining network model parameters of the logic rule network model; logic symbols in the logic expression generated by the trained logic rule network model are clear, so that the readability of the logic expression is improved, and the understanding difficulty of the logic rule network model generated result is reduced. The embodiment of the invention can be also applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Logic rule network model training method, device, equipment, program and medium
Technical Field
The present invention relates to information classification technology, and in particular, to a logic rule network model training method, apparatus, electronic device, computer program product, and storage medium, so that the applicable fields of the present solution include, but are not limited to, the fields of automatic driving, internet of vehicles, intelligent transportation, etc.
Background
Artificial intelligence (AI, artificial Intelligence) is a comprehensive technology of computer science, and by researching the design principle and implementation method of various intelligent machines, the machines have the functions of sensing, reasoning and deciding. Artificial intelligence technology is a comprehensive discipline, and is widely related to fields, such as natural language processing technology, machine learning/deep learning and other directions, and it is believed that with the development of technology, the artificial intelligence technology will be applied in more fields and become more and more valuable.
In the related art, the logic rule network model is widely applied to a large number of business scenes such as advertisement, recommendation, search and the like. The logic expression generated after the classification model training provided by the related technology is completed often has poor readability due to logic symbol mixing, and the difficulty of understanding is increased. At the same time, known rules cannot be introduced. When the data volume of the training data is insufficient, or the data has bias, or the intrinsic rules behind the data are too complex or obscure, the model has difficulty in directly learning all rules from the data.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method, an apparatus, an electronic device, a software program, and a storage medium for training a logic rule network model, which can accurately and efficiently train the logic rule network model through a machine learning technique, so that the logic rule network model is applicable to various classification tasks with input as feature vectors, and meanwhile, the logic rule network model can mine potential logic rule relationships between labels and features thereof, and give interpretability to classification results, so that the user can understand the logic rule network model conveniently.
The embodiment of the invention provides a logic rule network model training method, which comprises the following steps:
carrying out characterization processing on the original data to obtain training feature vectors;
determining a disjunctive normal form constraint loss function of the logic rule network model;
determining a network sparsity constraint loss function of the logic rule network model;
determining a known rule constraint loss function of the logical rule network model;
determining a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function;
And training the logic rule network model according to the training feature vector and the fusion loss function, and determining network model parameters of the logic rule network model.
The embodiment of the invention also provides a data classification method, which comprises the following steps:
acquiring data to be classified;
splicing the data to be classified through a binarization layer of the logic rule network model to obtain a binarization classification feature vector;
connecting the binarization classification feature vector through a conjunctive logic layer of the logic rule network model to obtain a connection classification feature vector;
connecting the connection classification feature vectors through a disjunctive logic layer of the logic rule network model to obtain or connect the classification feature vectors;
and converting the or connection classification feature vector through a linear conversion layer of the logic rule network model to obtain the classification probability of the data to be classified and the logic rule corresponding to the data to be classified.
The embodiment of the invention also provides a logic rule network model training device, which comprises:
the information transmission module is used for carrying out characteristic processing on the original data to obtain training feature vectors;
The information processing module is used for determining a disjunctive normal form constraint loss function of the logic rule network model;
the information processing module is used for determining a network sparsity constraint loss function of the logic rule network model;
the information processing module is used for determining a known rule constraint loss function of the logic rule network model;
the information processing module is used for determining a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function;
the information processing module is used for training the logic rule network model according to the training feature vector and the fusion loss function, and determining network model parameters of the logic rule network model.
In the above scheme, the information transmission module is configured to segment the training feature vector to obtain a discrete training feature vector and a continuous training feature vector;
the information transmission module is configured to perform binarization processing on the continuous training feature vector through a binarization layer of the logic rule network model, and splice the continuous training feature vector with the discrete training feature vector to obtain a binarization training feature vector, where the logic rule network model includes: a binarization layer, a conjunctive logic layer, a disjunctive logic layer and a linear conversion layer;
The information transmission module is used for connecting the binarization training feature vector through the conjunctive logic layer by using a connector to obtain a connection training feature vector;
the information transmission module is used for carrying out connection processing on the connection training feature vector by using or a connector through the extraction logic layer to obtain or connect the training feature vector;
the information transmission module is used for converting the or connection training feature vector through the linear conversion layer to obtain a classification vector;
the information transmission module is used for calculating the disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the training feature vector number and the classification vectors.
In the above scheme, the information transmission module is configured to calculate a normalized exponential classification vector corresponding to the classification vector when the logic rule network model is used for a single-label multi-classification task;
the information transmission module is used for calculating cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the number of the training feature vectors and the normalized index classification vectors.
In the above scheme, the information transmission module is configured to calculate a logistic regression classification vector corresponding to the classification vector when the logic rule network model is used for a multi-label and multi-classification task;
the information transmission module is used for calculating binary cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the training feature vector number and the logistic regression classification vector.
In the above scheme, the information transmission module is configured to obtain a network model parameter of a conjunctive logic layer, a network model parameter of a disjunctive logic layer, and a network model parameter of a linear conversion layer in the logic rule network model;
the information transmission module is used for regularizing and restraining network model parameters of the linear conversion layer to obtain a first constraint loss function;
the information transmission module is used for determining the length of the connected training feature vector and the length of the or connected training feature vector;
the information transmission module is used for regularizing the network model parameters of the conjunctive logic layer and the network model parameters of the disjunctive logic layer according to the length of the connection training feature vector and the length of the or connection training feature vector to obtain a second constraint loss function;
The information transmission module is used for sparsity constraint on the or connection training feature vectors according to the quantity of the training feature vectors to obtain a third constraint loss function;
the information transmission module is used for determining the network sparsity constraint loss function according to the sum of the first constraint loss function, the second constraint loss function and the third constraint loss function.
In the above scheme, the information transmission module is configured to obtain a preset logic rule corresponding to the logic rule network model;
the information transmission module is used for decomposing the network model parameters of the conjunctive logic layer to obtain network model parameters of the updatable conjunctive logic layer and network model parameters of the non-updatable conjunctive logic layer;
the information transmission module is used for decomposing the network model parameters of the disjunctive logic layer to obtain network model parameters of the updatable disjunctive logic layer and network model parameters of the non-updatable disjunctive logic layer;
the information transmission module is used for expanding the network model parameters of the updatable conjunctive logic layer according to the preset logic rule to obtain the network model parameters of the expanded updatable conjunctive logic layer;
The information transmission module is used for expanding the network model parameters of the updatable extraction logic layer according to the preset logic rule to obtain the network model parameters of the expanded updatable extraction logic layer;
the information transmission module is used for calculating a pseudo-binarization training feature vector corresponding to the binarization training feature vector according to the network model parameters of the extensible updatable conjunctive logic layer and the network model parameters of the extensible updatable disjunctive logic layer;
the information transmission module is configured to keep the network model parameters of the non-updatable conjunctive logic layer and the network model parameters of the non-updatable disjunctive logic layer unchanged, and determine the known rule constraint loss function according to the number of training feature vectors, the binarized training feature vectors, and the pseudo-binarized training feature vectors.
In the above solution, the information transmission module is configured to keep the extracted normal form constraint loss function and the network sparsity constraint loss function unchanged when the preset logic rule is changed, and dynamically update a known rule constraint loss function of the logic rule network model.
The embodiment of the invention also provides a data classification device, which comprises:
the data transmission module is used for acquiring data to be classified;
the data processing module is used for splicing the data to be classified through the binarization layer of the logic rule network model to obtain a binarization classification feature vector;
the data processing module is used for carrying out connection processing on the binary classification feature vector through a conjunctive logic layer of the logic rule network model to obtain a connection classification feature vector;
the data processing module is used for carrying out connection processing on the connection classification feature vector through a disjunctive logic layer of the logic rule network model to obtain or connect the classification feature vector;
the data processing module is used for converting the or connection classification feature vector through a linear conversion layer of the logic rule network model to obtain the classification probability of the data to be classified and the logic rule corresponding to the data to be classified.
The embodiment of the invention also provides electronic equipment, which comprises:
a memory for storing executable instructions;
and the processor is used for realizing the logic rule network model training method and the data classification method when the executable instructions stored in the memory are operated.
The embodiment of the invention also provides a computer program product, which comprises a computer program or instructions and is characterized in that the computer program or instructions realize the logic rule network model training method and the data classification method when being executed by a processor.
The embodiment of the invention also provides a computer readable storage medium which stores executable instructions which are executed by a processor to realize the logic rule network model training method and the data classification method.
The embodiment of the invention has the following beneficial effects:
in the training method of the logic rule network model provided by the invention, the logic rule network model comprises the following steps: the fusion loss function of the logic rule network model comprises: the normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function are extracted, so that logic symbols in logic expression generated by a trained logic rule network model are clear, the readability of the logic expression is improved, and the understanding difficulty of a logic rule network model generated result is reduced; meanwhile, a known rule constraint loss function is introduced in the training process, when the data volume of training data is insufficient, or the data has bias, or the intrinsic rules behind the data are too complex or obscure, the logic rule network model can directly learn all rules from the data, so that the logic rule network model can absorb the prior experience, and the accuracy of classifying the logic rule network model is improved.
Drawings
FIG. 1 is a schematic diagram of a usage scenario of a training method of a logic rule network model according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of an alternative method for training a logic rule network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a logical rule network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a known rule constraint loss function calculation process for a logical rule network model;
FIG. 5 is a schematic diagram of network model parameter expansion when rule constraints are known in an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a data classification method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of processing logic of a data classification method according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a composition structure of a training device for a logic rule network model according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.
1) In response to a condition or state that is used to represent the condition or state upon which the performed operation depends, the performed operation or operations may be in real-time or with a set delay when the condition or state upon which it depends is satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.
2) Based on the conditions or states that are used to represent the operations that are being performed, one or more of the operations that are being performed may be in real-time or with a set delay when the conditions or states that are being relied upon are satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.
3) softmax: very common and important functions in machine learning are widely used, especially in multi-class scenarios, mapping some inputs to real numbers between 0-1, and normalizing the guaranteed sum to 1.
4) Neural Networks (NN): an artificial neural network (Artificial Neural Network, ANN), abbreviated as neural network or neural-like network, is a mathematical or computational model that mimics the structure and function of biological neural networks (the central nervous system of animals, particularly the brain) for estimating or approximating functions in the field of machine learning and cognitive sciences.
5) Model parameters: is a quantity that uses common variables to establish relationships between functions and variables. In artificial neural networks, the model parameters are typically real matrices.
6) Disjunctive paradigm (DNF, disjunctive Normal Form): the expression "analysis and synthesis" is a standardized expression of logical expression, and is the extraction of a conjunctive clause, such as (A. Lamda. B) VC, (A. Lamda. B) V (C. Lamda. D).
The embodiment of the invention can be realized by combining Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, and can also be understood as the general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a Cloud computing business model. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites, so cloud technologies need to be supported by cloud computing.
Through cloud technology, by utilizing the training method of the logic rule network model provided by the application, the using results of all logic rule network models can be recorded in the corresponding cloud servers, and when a target object browses media information in different terminals, whether information is recommended to a user or not can be classified through the logic rule network model stored in the cloud servers.
It should be noted that cloud computing is a computing mode, which distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service as required. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool platform, referred to as a cloud platform for short, is generally called infrastructure as a service (IaaS, infrastructure as a Service), and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises: computing devices (which may be virtualized machines, including operating systems), storage devices, and network devices.
Before introducing the training method of the logic rule network model provided by the application, firstly, the defect of the use of the logic rule network model in the related technology is briefly described, when the related technology is used for the logic rule network model, AND OR or NOT logic operation is introduced into the deep learning network structure design, so that parameters of the model and the logic rule have a one-to-one correspondence relation, and a user can directly understand the mechanism and logic behind a certain classification label output by checking the logic rule corresponding to the model parameters and the weight of the logic rule; however, the logic expression generated after model training is completed often has poor readability due to logic symbol mixing, and the difficulty of understanding is increased; and cannot introduce known rules. When the data volume of the training data is insufficient, or the data has bias, or the intrinsic rules behind the data are too complex or obscure, the model is difficult to directly learn all rules from the data, so that the model cannot be used for classification scenes requiring clear readability such as clinical auxiliary diagnosis, medical image segmentation and the like.
In order to solve the above drawbacks in the use of the logic rule network model, fig. 1 is a schematic view of a usage scenario of the logic rule network model training method according to an embodiment of the present application, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with corresponding clients capable of executing different classification functions, for example, the terminal (including the terminal 10-1 and the terminal 10-2) may acquire, through the network 300, epidemiological investigation results matched with a target object from a corresponding server 200, or may acquire, through the network 300, epidemiological investigation of a crowd (for example, close-contact cases, sub-close-contact cases, and accompanying empty cases) associated with a current target from the corresponding server 200. The server 200 may store diagnosis and treatment information of a corresponding target object corresponding to each of the different target objects, or may store epidemiological investigation matching with the diagnosis and treatment information of the corresponding target object of the target object. In some embodiments of the present application, the disease type information stored in the server 200 may include: the information of various types of infectious diseases can be distinguished by corresponding disease type identifiers, meanwhile, the diagnosis confirming and early warning threshold corresponding to the disease type identifiers can be stored, and after the classification result of data to be classified is obtained through a logic rule network model, the diagnosis confirming and early warning threshold is utilized to timely send out diagnosis confirming and early warning information to inform corresponding disease control departments. The disease type identifier carried by the disease information in the application can characterize various infectious diseases, and particularly the infectious diseases are classified into a class A, a class B and a class C. The A infectious disease refers to plague and cholera. The infection of B type infectious diseases comprises infectious atypical pneumonia, type A coronavirus infection, AIDS, viral hepatitis, etc. The infections of the group of the disease include influenza, mumps, rubella and the like. For infectious atypical pneumonia in B infectious diseases, pulmonary anthrax in anthrax and highly pathogenic avian influenza infected by human, prevention and control measures of A infectious diseases are adopted.
The logic rule network model training method provided by the embodiment of the application is realized based on artificial intelligence, wherein the artificial intelligence (Artificial Intelligence, AI) is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
In the embodiment of the application, the mainly related artificial intelligence software technology comprises the voice processing technology, machine learning and other directions. For example, speech recognition techniques (Automatic Speech Recognition, ASR) in Speech technology (Speech Technology) may be involved, including Speech signal preprocessing (Speech signal preprocessing), speech signal frequency domain analysis (Speech signal frequency analyzing), speech signal feature extraction (Speech signal feature extraction), speech signal feature matching/recognition (Speech signal feature matching/recognition), training of Speech (Speech training), and the like.
For example, machine Learning (ML) may be involved, which is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine Learning typically includes Deep Learning (Deep Learning) techniques, including artificial neural networks (artificial neural network), such as convolutional neural networks (Convolutional Neural Network, CNN), recurrent neural networks (Recurrent Neural Network, RNN), deep neural networks (Deep neural network, DNN), and the like.
It can be appreciated that the logic rule network model training method and the voice processing provided by the application can be applied to an intelligent device (Intelligent device), and the intelligent device can be any device with an information display function, for example, an intelligent terminal, an intelligent home device (such as an intelligent sound box and an intelligent washing machine), an intelligent wearable device (such as an intelligent watch), a vehicle-mounted intelligent central control system (for displaying media information to a user through a small program for executing different tasks), or an AI intelligent medical device (for displaying a treatment case through media information).
Referring to fig. 2, fig. 2 is an optional flowchart of a logic rule network model training method according to an embodiment of the present application, it may be understood that the steps shown in fig. 2 may be performed by various electronic devices running the logic rule network model training apparatus, for example, may be a dedicated server with the logic rule network model training apparatus, or a cloud service cluster, and the steps shown in fig. 2 are described below.
Step 201: the logic rule network model training device performs characterization processing on the original data to obtain training feature vectors.
The logic rule network model provided by the application is suitable for various classification tasks with input as feature vectors, such as different classification tasks of disease prediction, department prediction, emotion classification, identity information classification, whether a bank lends or not and the like. Due to different classification tasks, the types of the original data are different, and the original data is subjected to characterization processing, so that a discrete training feature vector (training) or a continuous training feature vector (continuous) can be obtained. For example: in the medical classification task, the original body data of the target object (patient) acquired by each medical image acquisition device can be a continuous training feature vector, and the disease type expression information is a discrete training feature vector.
Step 202: the logic rule network model training means determines a disjunctive normal form constraint loss function of the logic rule network model.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a logic rule network model according to an embodiment of the present invention, where the logic rule network model includes: a binarization layer, a conjunctive logic layer, a disjunctive logic layer and a linear conversion layer. The logic expression in any form can be generated through the network architecture shown in fig. 3, and the readability is improved by clearly mixing logic symbols of the logic expression generated after the training of the logic rule network model is completed.
For example: the logical expression extraction paradigm (DNF, disjunctive Normal Form) of the logical rule network model output shown in fig. 3 may be a standardized paradigm of logical expression, and is extraction of a conjunctive clause, for example, (a ∈b) 'v C, (a ∈b)' v (C ∈d), and the like. The disjunctive paradigm is friendly for human understanding, and taking clinical diagnosis as an example, the reasoning process and medical knowledge of a physician can generally be summarized as disjunctive paradigm: "fever and diarrhea" or "fever and vomiting" occur, possibly in patients with infectious diarrhea; the occurrence of "fever and hand-foot herpes with age less than 5 years old" may be a hand-foot-and-mouth patient; the occurrence of "fever and hyposmia" or "fever and hypogustatory" or "fever and glossing of the lung" may be a patient with viral pneumonia. This requires that the logic rule network model be trained, not only with the ability to extract normal form constraints, but also with the ability to draw up the prior experience of known rules, and therefore the fusion loss function of the logic rule network model is composed of three parts: the extraction paradigm constraint loss function, the network sparsity constraint loss function, and the known rule constraint loss function are described separately below.
In some embodiments of the present invention, when determining the disjunctive normal form constraint loss function of the logic rule network model, firstly, the training feature vector is segmented to obtain a discrete training feature vector and a continuous training feature vector; secondly, carrying out binarization processing on the continuous training feature vector through a binarization layer of the logic rule network model, splicing the continuous training feature vector with the discrete training feature vector to obtain a binarization training feature vector, and carrying out connection processing on the binarization training feature vector through a conjunctive logic layer by utilizing a connector to obtain a connection training feature vector; connecting the connection training feature vector with the connection training feature vector by using a disjunctor through a disjunctor logic layer to obtain or connect the training feature vector; converting or connecting the training feature vectors through a linear conversion layer to obtain classification vectors; finally, calculating logic according to the category number of the classification vectors, the number of the training feature vectors and the classification vectorsThe disjunctive paradigm of the regular network model constrains the loss function. Let x denote the features of the input sample, as shown in FIG. 3, and be cut into discrete features x in one-hot form D And continuous feature x C . For continuous features, the binarization layer Binarization Layer is used to perform binarization processing, for example, the feature continuous features can be divided into 10 buckets by using tenths, and then converted into one-hot vectors. X is x C Conversion to one-hot vector and x D Splicing (Concat) to obtain a binarized training feature vector b epsilon {0,1}, with a length of n n
b first go through the conjunctive logic layer (Conjunction Layer) to connect the respective Boolean variables b using the and connector (∈ j Obtaining a training feature vector h with length m and connection c Wherein h is c Is referred to formula 1:
wherein the training feature vector is h c N is the length of the binarized training feature vector,i.e. the importance of the jth extraction paradigm to the ith class, b j Is a model F c Is a boolean variable in (c).
Wherein each element in the vectorCorresponds to a conjunctive clause. h is a c Then through the disjunctive logical layer (Disjunction Layer) each conjunctive clause is connected using or connector ()>Obtaining a length t or connection training feature vector h d ,h d Is referred to formula 2:
wherein, the liquid crystal display device comprises a liquid crystal display device,for each element in the vector, the training feature vector is h c M is the length of the binarized training feature vector, < ->I.e., the importance of the jth disjunctive normal form in each element to the ith class.
Wherein each element in the vectorCorresponds to a disjunctive paradigm. h is a d A classification vector (logits) z with a length of K is obtained through a Linear Layer (Linear Layer), wherein the calculation of z refers to formula 3:
z=W l h d +b l equation 3
Wherein, the liquid crystal display device comprises a liquid crystal display device,that is, the importance of the jth extraction paradigm to the kth class is expressed, z is a classification vector, h d B for each element in the vector l Is the result of the binarization layer processing.
In some embodiments of the present invention, the classification tasks due to the logical rule network model include: single-label multi-classification tasks such as emotion classification, and multi-label multi-classification tasks such as disease type prediction and department prediction, therefore, two cases are included in the calculation of the disjunctive normal form constraint loss function of the logic rule network model:
1) When the logic rule network model is used for a single-label multi-classification task, calculating a normalized index classification vector corresponding to the classification vector; and calculating cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the number of the training feature vectors and the normalized index classification vectors. Specifically, referring to equation 4, softmax and cross entropy are employed as the disjunctive normal form constraint loss function:
Wherein L is cls For the disjunctive normal form constraint loss function, N is the number of training samples, K is the class, and softmax () is the normalized exponential function.
2) When the logic rule network model is used for the multi-label multi-classification task, calculating a logic regression classification vector corresponding to the classification vector; and calculating binary cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the number of the training feature vectors and the logistic regression classification vectors. Specifically, referring to equation 5, sigmoid and binary cross entropy are employed as disjunctive normal form constraint loss functions:
wherein L is cls For the disjunctive normal form constraint loss function, N is the number of training samples, K is the category, and sigmoid () is the logistic regression function.
So far, the calculation of the disjunctive normal form constraint loss function is completed, and then the network model parameters W of the conjunctive logic layer in the logic rule network model are obtained C Extracting network model parameters W of logic layer d Network model parameters W of the linear translation layer l According to W at the same time C 、W d The non-zero element in the rule network model can obtain the disjunctive normal form corresponding to the logic rule network model, and the disjunctive normal form is selected from W l Obtained by (1)The weights of the various classes are extracted from the norms accordingly.
Step 203: the logic rule network model training means determines a network sparsity constraint loss function of the logic rule network model.
In some embodiments of the present invention, the network sparsity constraint loss function is obtained by adding a first constraint loss function, a second constraint loss function, and a third constraint loss function, and specifically, regularizing network model parameters of the linear conversion layer to obtain a first constraint loss function; determining the length of the connection training feature vector and/or the length of the connection training feature vector; regularizing the network model parameters of the synthesis logic layer and the network model parameters of the extraction logic layer according to the length of the connection training feature vector and/or the length of the connection training feature vector to obtain a second constraint loss function; according to the number of the training feature vectors, sparsity constraint is carried out on or connected with the training feature vectors, so that a third constraint loss function is obtained; and determining a network sparsity constraint loss function according to the sum of the first constraint loss function, the second constraint loss function and the third constraint loss function.
The configuration principle of the first constraint loss function is as follows: the total number of disjunctive norms generated should be as small as possible. Referring to equation 6, the group lasso pair W is used l Constraint is carried out:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the first constraint loss function, which represents the importance of the j-th extraction normal form in each element to the k-th class, t is the total number of extraction normal forms.
Desired W l And (3) some columns of the convolution layer are 0 at the same time, namely, some disjunctive normal form rules are never used, so that Group Lasso regularization is used, a large number of parameters in the convolution layer are effectively zero, and the high sparsification of each convolution layer is realized.
The configuration principle of the second constraint loss function is as follows: increase W C 、W d Is referred to formula 7:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the second constraint loss function, < >>The importance of the j-th extraction normal form to the i-th class in each element is represented, t is the total number of the extraction normal forms, m is the length of the binarization training feature vector, and n is the number of training samples.
The third constraint loss function is configured according to the following principle: h generated for all training samples d Sparsity constraint is performed, referring to equation 8:
wherein L is r3 For the third constraint loss function, n is the number of training samples, h d For each element in the vector, it is category information.
For each sample, penalty is done when the disjunctive normal form of hits is greater than 1, reducing the number of hits of disjunctive normal form.
Step 204: the logic rule network model training means determines a known rule constraint loss function of the logic rule network model.
Referring to fig. 4, fig. 4 is a schematic diagram of a known rule constraint loss function calculation process of a logic rule network model, specifically including the following steps:
Step 401: and acquiring a preset logic rule corresponding to the logic rule network model.
The machine learning essence is to fit a mapping relation from input (feature) to output (label) by using training data, and mine the connection between the feature and the inherent essence of the label time in the fitting process. However, in most scenarios, machine learning logic rule network models have difficulty learning all rules directly from the data due to insufficient data volume, or data with bias, or too complex or obscure the intrinsic rules behind the data.
For example, for a simple pendulum system, the task of the logic rule network model is to predict the system state (position, speed) at the next moment, and it is difficult for the logic rule network model to learn from the data that the total energy of the system is in the nature of diminishing. Specifically, when the preset logic rules are used for intelligent inquiry, for a xx pneumonia prediction task, if most patients in a certain area do not have symptoms of hyposmia and hypogustation, the logic rule network model cannot learn the association between the two symptoms and xx pneumonia.
Step 402: decomposing the network model parameters of the conjunctive logic layer to obtain network model parameters of the updatable conjunctive logic layer and network model parameters of the non-updatable conjunctive logic layer; and decomposing the network model parameters of the disjunctive logic layer to obtain network model parameters of the updatable disjunctive logic layer and network model parameters of the non-updatable disjunctive logic layer.
In the process of decomposing the network model parameters of the conjunctive logic layer and the network model parameters of the disjunctive logic layer in step 402, the network model parameters W of the conjunctive logic layer are first decomposed c Disassemble into updatable parameter partsAnd non-renewable part->Similarly, extracting network model parameters W of the logic layer d Also broken down into parts of updatable parameters +.>And non-renewable part->Because whether the logic rule can act on each category is unknown (possibly unknown or known, and various use scenes are adapted), the network model parameters of the non-updatable conjunctive logic layer and the network model parameters of the non-updatable disjunctive logic layer remain unchanged during training and do not participate in parameter updating of the logic rule network model.
Step 403: expanding the network model parameters of the updatable conjunctive logic layer according to a preset logic rule to obtain the network model parameters of the expanded updatable conjunctive logic layer, and expanding the network model parameters of the updatable disjunctive logic layer according to the preset logic rule to obtain the network model parameters of the expanded updatable disjunctive logic layer.
In step 403, when the network model parameters of the updatable conjunctive logic layer and the network model parameters of the updatable disjunctive logic layer are extended, experience brought by a preset logic rule can be obtained by updating the network model parameters of the conjunctive logic layer and the network model parameters of the updatable disjunctive logic layer. For example, referring to fig. 5, fig. 5 is a schematic diagram of network model parameter expansion when rule constraint is known in the embodiment of the present invention, and two preset logic rules to be introduced are respectively:
q 1 =(A∧B)∨(C∧D)
q 2 =(A∧E)
First, for each binarized feature term A, B, C, D, E, if some features are not currently present in b, then b needs to be extended (increasing the length of b, adding some elements). Next, for each of the conjunctive clauses, namely (A. Lamda.B), (C. Lamda.D), (A. Lamda.E), inWith 0 or 1 filled in for each row, each row expresses a conjunctive clause. Finally, for each logic rule q to be introduced i In->With 0 or 1 filled in, each row representing a disjunctive paradigm. Thus, the logic rules are explicitly preset in the logic rule network model.
Step 404: and calculating a pseudo-binarization training feature vector corresponding to the binarization training feature vector according to the network model parameters of the extensible and updatable conjunctive logic layer and the network model parameters of the extensible and updatable disjunctive logic layer.
Wherein, when the action of the preset logic rules on each category is unknown, for example, whether obesity is a high risk factor of xx pneumonia or whether patients with different identities or different () families are in concentrated distribution is not known, the model is directly trained after the preset rules, and W is passed through l Judging whether obesity has positive or negative effect on xx pneumonia or passing W l Judging whether the classification of xx pneumonia has positive or negative effect on the A identity or the A () family.
When the action of a preset logic rule on a certain category is known, for example, a patient known to have a (fever Λ lung abrasion glass shadow) 'hyposmia' has a high probability of suffering from xx pneumonia. After embedding preset rules in the model, generating a pseudo sample continuously in the training process, and assisting with a corresponding loss function L r4 The update direction of the model is constrained.
Step 405: and keeping the network model parameters of the non-updatable conjunctive logic layer and the network model parameters of the non-updatable disjunctive logic layer unchanged, and determining a known rule constraint loss function according to the number of training feature vectors, the binarization training feature vectors and the pseudo-binarization training feature vectors.
In connection with the processing in step 404, let q denote a preset extraction pattern, let q= (a Λ B) v (C v D), and let q=true have a positive effect on the class k (when q=true, the input sample has a higher probability of belonging to the class k). Inputting the ith training sample x (i), and binarizing the training feature vector as b (i) Randomly selecting any combination clause p of q, let p= (A Λ B), B (i) The element at the position of the characteristic A and the characteristic B is manually set to be 1, namely, the element at the position of the characteristic B is set to B (i) Proceeding withA certain disturbance is carried out to obtain a pseudo-binarization vector b' (i) . Since q=true has a positive effect on class k, the input b 'is entered' (i) The resulting predictive probability of class kShould be greater than or equal to input b (i) The prediction probability of the class k obtained>Therefore, the rule constraint loss function can be calculated by equation 9 +.>
Wherein, the liquid crystal display device comprises a liquid crystal display device,for constraint loss function, N is training sample number, +.>For b' (i) Predictive probability of corresponding class k, +.>B is (i) The prediction probability of the corresponding class k. />
Conversely, if q=true is known to have a negative effect on class k, then the rule constraint loss function is calculated by equation 10:
wherein, the liquid crystal display device comprises a liquid crystal display device,to constrain the loss function, N is the number of training samples,/>for b' (i) Predictive probability of corresponding class k, +.>B is (i) The prediction probability of the corresponding class k.
The known rule constraint loss function is added during training, a network of a logic layer can be guided to learn rules which cannot be covered by training data, so that the learned model has better robustness and adaptability, and meanwhile, when the data volume of the training data is insufficient, or the data has bias, or the intrinsic rules behind the data are too complex or obscure, the logic rule network model can directly learn all rules from the data, so that the logic rule network model can absorb the prior experience, and the accuracy of classifying the logic rule network model is improved.
In some embodiments of the present invention, when the preset logic rule changes, the disjunctive normal form constraint loss function and the network sparsity constraint loss function can be kept unchanged, and the known rule constraint loss function of the logic rule network model is dynamically updated. In this way, due to the change of priori knowledge, the known rule constraint loss function can be adjusted in time, so that the logic rule network model can absorb the existing rule information in time, and the classification result of the logic rule network model can be more accurate
After the extracted normal form constraint loss function, the network sparsity constraint loss function, and the known rule constraint loss function are obtained through steps 202 to 204, the fused loss function of the logic rule network model is continuously calculated through step 205.
Step 205: the logic rule network model training device determines a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function.
Wherein, the logic rule network model merges the loss function L calculation with reference to formula 11:
wherein the fusion loss function is L, L cls To extract the normal form constraint loss function, To constrain the loss function alpha s The parameter is equal to or more than 0; when there is no preset rule or the effect of the preset logic rule on each category is unknown, α 4 =0。
Step 206: the logic rule network model training device trains the logic rule network model according to the training feature vector and the fusion loss function, and determines network model parameters of the logic rule network model.
So far, through the training of steps 201 to 206, the network model parameters of the logic rule network model have been determined, and the trained logic rule network model may be deployed in a server to classify the received data, and the trained logic rule network model is used for: classifying data to be classified, and determining logic rules corresponding to the data to be classified; fig. 6 is a schematic diagram of a processing procedure of a data classification method according to an embodiment of the present invention, which specifically includes the following steps:
step 601: and acquiring data to be classified, and splicing the data to be classified through a binarization layer of the logic rule network model to obtain a binarization classification feature vector.
Taking an intelligent inquiry system as an example, the trained logic rule network model can determine whether the target object is infected with the type a coronavirus according to the medical data to be classified, and fig. 7 is a schematic processing logic diagram of the data classification method in the embodiment of the present invention, where the training of the logic rule network model can be performed through steps 201 to 206 in the previous embodiment. The diagnosis early warning threshold value of the A-type coronavirus is 0.3, and when the classification result of the data to be classified is more than or equal to 0.3, the data to be classified can have symptoms such as muscle strength reduction, sensory symptoms, aphasia, blurred vision, dizziness, headache, nausea, vomiting, cognitive dysfunction, consciousness dysfunction and the like. The method has the advantages that the data to be classified possibly infects the type A coronavirus, so that diagnosis alarm information needs to be sent out in time, isolation treatment is carried out on the data to be classified, and meanwhile, logic rules corresponding to the data to be classified, namely logic rules for infects the type A coronavirus, can be output: "fever and hyposmia" or "fever and hypogustatory" or "fever and glossing of the lung".
In step 601, the binarization layer of the logic rule network model performs stitching on data to be classified to obtain a binarized classification feature vector, and before stitching the data to be classified, the data to be classified may be subjected to a characterization process to obtain a continuity feature vector and a discrete feature vector, which may include: a continuous feature vector representing the body temperature of the target object, a continuous feature vector representing the ct image of the lung of the target object, and a discrete feature vector representing the self-describing olfactory change of the target object.
Step 602: and connecting the binarized classification feature vectors through a conjunctive logic layer of the logic rule network model to obtain the connected classification feature vectors.
The conjunctive logic layer performs connection processing to obtain a connection classification feature vector, which may be: "hyposmia" ("a)" hypogustatory "; "fever (.lambda.) diarrhea".
Step 603: and carrying out connection processing on the connection classification feature vector through a disjunctive logic layer of the logic rule network model to obtain or connect the classification feature vector.
The connection processing result of the extracting logic layer on the connection classification feature vector may be: hyposmia (Λ) "hypogustation" ("a) fever (Λ) diarrhea" ("a) body temperature is 38.1 ° (" a) 39.1 ° ("a) 40.2 °".
Step 604: and converting or connecting the classification feature vectors through a linear conversion layer of the logic rule network model to obtain the classification probability of the data to be classified and the logic rules corresponding to the data to be classified.
Wherein, through the processing from step 601 to step 604, the probability of the target object infecting the type a coronavirus is determined to be 0.6 through the logic rule network model, and at the same time, the logic rule infecting the type a coronavirus is output: "fever and hyposmia" or "fever and hypogustatory" or "sustained high fever in body temperature.
In some embodiments of the present application, with the progress of medical technology and medical diagnosis experience, when a preset logic rule of an infection type a coronavirus changes, a disjunctive normal form constraint loss function and a network sparsity constraint loss function can be kept unchanged, and a known rule constraint loss function of a logic rule network model is dynamically updated, so that the logic rule network model can output a new logic rule of an infection type a coronavirus, and the accuracy of prediction of the infection type a coronavirus is improved.
In order to implement the logic rule network model training method provided by the present application, the present application further provides a corresponding hardware device, and the following details the structure of the logic rule network model training device according to the embodiment of the present application, where the logic rule network model training device may be implemented in various forms, such as a dedicated terminal with a logic rule network model training processing function, or may be a server provided with a logic rule network model training device processing function, for example, a server 200 in fig. 1. Fig. 8 is a schematic diagram of a component structure of a logic rule network model training device according to an embodiment of the present application, and it can be understood that fig. 8 only shows an exemplary structure of the logic rule network model training device, but not all the structure, and part or all of the structures shown in fig. 8 can be implemented as required.
The logic rule network model training device provided by the embodiment of the invention comprises: at least one processor 201, a memory 202, a user interface 203, and at least one network interface 204. The various components in the logic rule network model training apparatus are coupled together by a bus system 205. It is understood that the bus system 205 is used to enable connected communications between these components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 205 in fig. 8.
The user interface 203 may include, among other things, a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, or touch screen, etc.
It will be appreciated that the memory 202 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operation on the terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application may comprise various applications.
In some embodiments, the logic rule network model training apparatus provided by the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the logic rule network model training apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the logic rule network model training method provided by the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASICs, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device), field programmable gate arrays (FPGAs, field-Programmable Gate Array), or other electronic components.
As an example of implementation of the logic rule network model training device provided by the embodiment of the present invention by combining software and hardware, the logic rule network model training device provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the logic rule network model training method provided by the embodiment of the present invention is completed by combining necessary hardware (including, for example, the processor 201 and other components connected to the bus 205).
By way of example, the processor 201 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
As an example of implementation of the logic rule network model training apparatus provided by the embodiment of the present invention by hardware, the apparatus provided by the embodiment of the present invention may be implemented directly by the processor 201 in the form of a hardware decoding processor, for example, by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field programmable gate arrays (FPGA, field-Programmable Gate Array), or other electronic components to implement the logic rule network model training method provided by the embodiment of the present invention.
The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the logic rules network model training device. Examples of such data include: any executable instructions, such as executable instructions, for operation on a logic rule network model training device, a program implementing the slave logic rule network model training method of embodiments of the present invention may be included in the executable instructions.
In other embodiments, the logic rule network model training device provided in the embodiments of the present invention may be implemented in a software manner, and fig. 8 shows the logic rule network model training device stored in the memory 202, which may be software in the form of a program, a plug-in, and a series of modules, and as an example of the program stored in the memory 202, may include the logic rule network model training device, where the logic rule network model training device includes the following software modules:
an information transmission module 2081 and an information processing module 2082. When software modules in the logic rule network model training device are read by the processor 201 into the RAM and executed, the logic rule network model training method provided by the embodiment of the present invention is implemented, where the functions of each software module in the logic rule network model training device include:
the information transmission module 2081 is configured to perform a characterization process on the original data to obtain a training feature vector;
an information processing module 2082 for determining a disjunctive normal form constraint loss function of the logical rule network model;
an information processing module 2082 for determining a network sparsity constraint loss function of the logical rule network model;
An information processing module 2082 for determining a known rule constraint loss function of the logical rule network model;
the information processing module 2082 is configured to determine a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function, and the known rule constraint loss function;
the information processing module 2082 is configured to train the logic rule network model according to the training feature vector and the fusion loss function, and determine network model parameters of the logic rule network model.
According to the electronic device shown in fig. 8, in one aspect of the application, the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer readable storage medium by a processor of the electronic device, which executes the computer instructions, causing the electronic device to perform the different embodiments and combinations of embodiments provided in various alternative implementations of the logic rule network model training method or the data classification method described above.
The beneficial technical effects are as follows:
First, in the training method of the logic rule network model provided by the invention, the logic rule network model comprises: the fusion loss function of the logic rule network model comprises: the normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function are extracted, so that logic symbols in logic expression generated by a trained logic rule network model are clear, the readability of the logic expression is improved, and the understanding difficulty of a logic rule network model generated result is reduced; meanwhile, a known rule constraint loss function is introduced in the training process, when the data volume of training data is insufficient, or the data has bias, or the intrinsic rules behind the data are too complex or obscure, the logic rule network model can directly learn all rules from the data, so that the logic rule network model can absorb the prior experience, and the accuracy of classifying the logic rule network model is improved.
And secondly, due to the change of priori knowledge, the known rule constraint loss function can be timely adjusted, so that the logic rule network model timely absorbs the existing rule information, and the classification result of the logic rule network model can be more accurate.
The above embodiments are merely examples of the present invention, and are not intended to limit the scope of the present invention, so any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (13)

1. A method for training a logic rule network model, the method comprising:
carrying out characterization processing on the original data to obtain training feature vectors, wherein the original data comprises: original body data and disease manifestation information of the target object;
determining a disjunctive normal form constraint loss function of the logic rule network model;
determining a network sparsity constraint loss function of the logic rule network model;
determining a known rule constraint loss function of the logical rule network model;
determining a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function;
training the logic rule network model according to the training feature vector and the fusion loss function, and determining network model parameters of the logic rule network model, wherein the trained logic rule network model is used for: classifying the data to be classified, and determining a logic rule corresponding to the data to be classified.
2. The method of claim 1, wherein determining the disjunctive normal form constraint loss function of the logically regular network model comprises:
segmenting the training feature vector to obtain a discrete training feature vector and a continuous training feature vector;
performing binarization processing on the continuous training feature vector through a binarization layer of the logic rule network model, and splicing the continuous training feature vector with the discrete training feature vector to obtain a binarization training feature vector, wherein the logic rule network model comprises: a binarization layer, a conjunctive logic layer, a disjunctive logic layer and a linear conversion layer;
the binary training feature vector is connected through the conjunctive logic layer by using a connector to obtain a connected training feature vector;
the connection processing is carried out on the connection training feature vector by using the disjunctor through the disjunctor logic layer, so as to obtain or connect the training feature vector;
converting the or connection training feature vector through the linear conversion layer to obtain a classification vector;
and calculating the disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the number of the training feature vectors and the classification vectors.
3. The method of claim 2, wherein said calculating the disjunctive normal form constraint loss function of the logically regular network model from the number of categories of the classification vector, the number of training feature vectors, and the classification vector comprises:
when the logic rule network model is used for a single-label multi-classification task, calculating a normalized index classification vector corresponding to the classification vector;
and calculating cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the number of the training feature vectors and the normalized index classification vectors.
4. The method of claim 2, wherein said calculating the disjunctive normal form constraint loss function of the logically regular network model from the number of categories of the classification vector, the number of training feature vectors, and the classification vector comprises:
when the logic rule network model is used for multi-label multi-classification tasks, calculating a logic regression classification vector corresponding to the classification vector;
and calculating binary cross entropy as a disjunctive normal form constraint loss function of the logic rule network model according to the category number of the classification vectors, the training feature vector number and the logistic regression classification vector.
5. The method of claim 2, wherein said determining a network sparsity constraint loss function of the logically regular network model comprises:
acquiring network model parameters of a conjunctive logic layer, network model parameters of a disjunctive logic layer and network model parameters of a linear conversion layer in the logic rule network model;
regularization constraint is carried out on network model parameters of the linear conversion layer, so that a first constraint loss function is obtained;
determining the length of the connected training feature vector and the length of the or connected training feature vector;
regularizing the network model parameters of the conjunctive logic layer and the network model parameters of the disjunctive logic layer according to the length of the connection training feature vector and the length of the or connection training feature vector to obtain a second constraint loss function;
according to the number of the training feature vectors, sparsity constraint is carried out on the or connection training feature vectors, and a third constraint loss function is obtained;
and determining the network sparsity constraint loss function according to the sum of the first constraint loss function, the second constraint loss function and the third constraint loss function.
6. The method of claim 2, wherein said determining a known rule constraint loss function of the logical rule network model comprises:
acquiring a preset logic rule corresponding to the logic rule network model;
decomposing the network model parameters of the conjunctive logic layer to obtain network model parameters of an updatable conjunctive logic layer and network model parameters of an un-updatable conjunctive logic layer;
decomposing the network model parameters of the disjunctive logic layer to obtain network model parameters of the updatable disjunctive logic layer and network model parameters of the non-updatable disjunctive logic layer;
expanding the network model parameters of the updatable conjunctive logic layer according to the preset logic rule to obtain the network model parameters of the expanded updatable conjunctive logic layer;
expanding the network model parameters of the updatable extraction logic layer according to the preset logic rules to obtain the network model parameters of the expanded updatable extraction logic layer;
calculating a pseudo-binarization training feature vector corresponding to the binarization training feature vector according to the network model parameters of the extensible updatable conjunctive logic layer and the network model parameters of the extensible updatable disjunctive logic layer;
And keeping the network model parameters of the non-updatable conjunctive logic layer and the network model parameters of the non-updatable disjunctive logic layer unchanged, and determining the known rule constraint loss function according to the number of the training feature vectors, the binarization training feature vectors and the pseudo-binarization training feature vectors.
7. The method of claim 6, wherein the method further comprises:
and when the preset logic rule is changed, keeping the disjunctive normal form constraint loss function and the network sparsity constraint loss function unchanged, and dynamically updating the known rule constraint loss function of the logic rule network model.
8. A method of classifying data, the method comprising:
acquiring data to be classified;
splicing the data to be classified through a binarization layer of a logic rule network model to obtain a binarization classification feature vector;
connecting the binarization classification feature vector through a conjunctive logic layer of the logic rule network model to obtain a connection classification feature vector;
connecting the connection classification feature vectors through a disjunctive logic layer of the logic rule network model to obtain or connect the classification feature vectors;
Converting the or connection classification feature vector through a linear conversion layer of the logic rule network model to obtain the classification probability of the data to be classified and the logic rule corresponding to the data to be classified;
wherein the logical rule network model is trained by the method of any one of claims 1 to 7.
9. A logic rule network model training apparatus, the apparatus comprising:
the information transmission module is used for carrying out characteristic processing on the original data to obtain training characteristic vectors, wherein the original data comprises: original body data and disease manifestation information of the target object;
the information processing module is used for determining a disjunctive normal form constraint loss function of the logic rule network model;
the information processing module is used for determining a network sparsity constraint loss function of the logic rule network model;
the information processing module is used for determining a known rule constraint loss function of the logic rule network model;
the information processing module is used for determining a fusion loss function of the logic rule network model according to the disjunctive normal form constraint loss function, the network sparsity constraint loss function and the known rule constraint loss function;
The information processing module is configured to train the logic rule network model according to the training feature vector and the fusion loss function, and determine network model parameters of the logic rule network model, where the trained logic rule network model is used for: classifying the data to be classified, and determining a logic rule corresponding to the data to be classified.
10. A data sorting apparatus, the apparatus comprising:
the data transmission module is used for acquiring data to be classified;
the data processing module is used for splicing the data to be classified through the binarization layer of the logic rule network model to obtain a binarization classification feature vector;
the data processing module is used for carrying out connection processing on the binary classification feature vector through a conjunctive logic layer of the logic rule network model to obtain a connection classification feature vector;
the data processing module is used for carrying out connection processing on the connection classification feature vector through a disjunctive logic layer of the logic rule network model to obtain or connect the classification feature vector;
the data processing module is used for converting the or connection classification feature vector through a linear conversion layer of the logic rule network model to obtain the classification probability of the data to be classified and the logic rule corresponding to the data to be classified;
Wherein the logical rule network model is trained by the method of any one of claims 1 to 7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the logic rule network model training method of any one of claims 1 to 7 or implements the data classification method of claim 8.
12. An electronic device, the electronic device comprising:
a memory for storing executable instructions;
a processor configured to implement the logic rule network model training method of any one of claims 1-7 or the data classification method of claim 8 when executing the executable instructions stored in the memory.
13. A computer readable storage medium storing executable instructions which when executed by a processor implement the logic rule network model training method of any one of claims 1-7 or the data classification method of claim 8.
CN202310149602.2A 2023-02-09 2023-02-09 Logic rule network model training method, device, equipment, program and medium Pending CN116975578A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310149602.2A CN116975578A (en) 2023-02-09 2023-02-09 Logic rule network model training method, device, equipment, program and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310149602.2A CN116975578A (en) 2023-02-09 2023-02-09 Logic rule network model training method, device, equipment, program and medium

Publications (1)

Publication Number Publication Date
CN116975578A true CN116975578A (en) 2023-10-31

Family

ID=88475524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310149602.2A Pending CN116975578A (en) 2023-02-09 2023-02-09 Logic rule network model training method, device, equipment, program and medium

Country Status (1)

Country Link
CN (1) CN116975578A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746167A (en) * 2024-02-20 2024-03-22 四川大学 Training method and classifying method for oral panorama image swing bit error classification model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746167A (en) * 2024-02-20 2024-03-22 四川大学 Training method and classifying method for oral panorama image swing bit error classification model
CN117746167B (en) * 2024-02-20 2024-04-19 四川大学 Training method and classifying method for oral panorama image swing bit error classification model

Similar Documents

Publication Publication Date Title
Liang et al. Symbolic graph reasoning meets convolutions
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
CN112364880B (en) Omics data processing method, device, equipment and medium based on graph neural network
CN109783666A (en) A kind of image scene map generation method based on iteration fining
CN110795944A (en) Recommended content processing method and device, and emotion attribute determining method and device
CN111046275A (en) User label determining method and device based on artificial intelligence and storage medium
WO2024031933A1 (en) Social relation analysis method and system based on multi-modal data, and storage medium
CN114064928A (en) Knowledge inference method, knowledge inference device, knowledge inference equipment and storage medium
CN116975578A (en) Logic rule network model training method, device, equipment, program and medium
Liu et al. Uncertain label correction via auxiliary action unit graphs for facial expression recognition
Yusuf et al. Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Lonij et al. Open-world visual recognition using knowledge graphs
Li et al. Multi-label video classification via coupling attentional multiple instance learning with label relation graph
Marini et al. Semi-supervised learning with a teacher-student paradigm for histopathology classification: a resource to face data heterogeneity and lack of local annotations
CN115952343A (en) Social robot detection method based on multi-relation graph convolutional network
CN116958624A (en) Method, device, equipment, medium and program product for identifying appointed material
CN113537325B (en) Deep learning method for image classification based on extracted high-low layer feature logic
CN115019342A (en) Endangered animal target detection method based on class relation reasoning
Lian et al. Virtual Reality and Internet of Things‐Based Music Online Learning via the Graph Neural Network
CN111552827B (en) Labeling method and device, behavior willingness prediction model training method and device
CN115440384A (en) Medical knowledge map processing method and system based on multitask learning
Liu et al. HANA: hierarchical attention network assembling for semantic segmentation
Yu et al. Vision-based vehicle detection in foggy days by convolutional neural network
Liang et al. AMEMD-FSL: fuse attention mechanism and earth mover’s distance metric network to deep learning for few-shot image recognition
Song et al. Prior-guided multi-scale fusion transformer for face attribute recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication