CN116069948A - Content wind control knowledge base construction method, device, equipment and storage medium - Google Patents
Content wind control knowledge base construction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116069948A CN116069948A CN202310094574.9A CN202310094574A CN116069948A CN 116069948 A CN116069948 A CN 116069948A CN 202310094574 A CN202310094574 A CN 202310094574A CN 116069948 A CN116069948 A CN 116069948A
- Authority
- CN
- China
- Prior art keywords
- wind control
- entity
- content
- ontology
- modeling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000009411 base construction Methods 0.000 title claims description 21
- 238000000605 extraction Methods 0.000 claims abstract description 62
- 239000013598 vector Substances 0.000 claims description 31
- 238000004891 communication Methods 0.000 claims description 16
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 230000008520 organization Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 abstract description 6
- 239000010410 layer Substances 0.000 description 15
- 238000004422 calculation algorithm Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a method, a device, equipment and a storage medium for constructing a content wind control knowledge base. Wherein the method comprises the following steps: modeling the content wind control domain ontology according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling; content wind control knowledge extraction is carried out according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction; and constructing a content wind control knowledge base according to the extracted entity relationship and the entity. According to the method, the system and the device, the content wind control knowledge is formed through body design, the domain knowledge base oriented to the content wind control is constructed, knowledge support is provided for the content wind control technical service based on the knowledge graph, a reliable content wind control knowledge base is provided for language understanding and knowledge reasoning of a computer, and the accuracy and reliability of intelligent auditing are improved.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for constructing a content wind control knowledge base.
Background
In face of the increasing amount of internet content data and the regulatory requirements of content security, the content wind control system services using technology as a main tool continue to expand. However, most of the traditional content wind control knowledge bases are literature bases, and cannot provide structured and systematic wind control knowledge, so that application requirements in the field of content wind control are difficult to meet. With the rapid progress of artificial intelligence and knowledge graph technology, the content wind control knowledge base with knowledge reasoning and knowledge updating capabilities is increasingly in urgent need, and has very important application space.
Disclosure of Invention
In view of the foregoing, the present application is directed to providing a method, apparatus, device, and storage medium for building a content pneumatic control knowledge base that overcomes or at least partially solves the foregoing problems.
According to one aspect of the present application, there is provided a content wind control knowledge base construction method, including:
modeling the content wind control domain ontology according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling;
content wind control knowledge extraction is carried out according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction;
and constructing a content wind control knowledge base according to the extracted entity relationship and the entity.
According to another aspect of the present application, there is provided a content wind-controlled knowledge base construction apparatus, including:
the modeling module is used for modeling the body of the content wind control field according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling;
the knowledge extraction module is used for carrying out content wind control knowledge extraction according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction;
and the knowledge base construction module is used for constructing a content wind control knowledge base according to the extracted entity relation and the entity.
According to another aspect of the present application, there is provided an electronic device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the content wind control knowledge base construction method.
According to another aspect of the present application, there is provided a computer storage medium, where at least one executable instruction is stored, where the executable instruction causes a processor to execute operations corresponding to the content wind control knowledge base construction method described herein.
According to the method, the device and the storage medium for constructing the content wind control knowledge base, the content wind control domain ontology is modeled according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling; content wind control knowledge extraction is carried out according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction; and constructing a content wind control knowledge base according to the extracted entity relationship and the entity. The content wind control knowledge is formed through ontology design, a domain knowledge base oriented to content wind control is constructed, knowledge support is provided for content wind control technical service based on a knowledge graph, a reliable content wind control knowledge base is provided for language understanding and knowledge reasoning of a computer, and accuracy and reliability of intelligent auditing are improved.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic flow chart of a method for constructing a content wind-controlled knowledge base according to an embodiment of the present application;
fig. 2 illustrates an ontology modeling schematic diagram in a content wind-controlled knowledge base construction method according to a first embodiment of the present application;
fig. 3 is a schematic diagram of entity relation extraction and entity extraction in a method for constructing a content wind-controlled knowledge base according to an embodiment of the present application;
fig. 4 is a schematic diagram of a modeling process of a NARRE double-tower model in a content wind control knowledge base construction method according to a second embodiment of the application;
FIG. 5 is a schematic structural diagram of a content wind-controlled knowledge base construction device according to a third embodiment of the present application;
fig. 6 shows a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example 1
Fig. 1 is a schematic flow chart of a method for constructing a content pneumatic control knowledge base according to an embodiment of the present application. As shown in fig. 1, the method includes:
step S11, modeling the body of the content wind control field according to preset corpus data; wherein modeling includes ontology conceptual modeling and ontology relational modeling.
The preset corpus data may be obtained in advance, for example, may be sentences, words, or the like. Specifically, the main stream media official netpage can be used as a main stream media official netpage, a mode of deep from point to surface layer to layer is adopted, and through network linking, hypertext markup language (Hyper Text Markup Language, HTML) markup language is deeply analyzed, and HTML markup content is acquired and analyzed at fixed time to obtain original corpus data. After the original corpus data is obtained, preprocessing is carried out on the original corpus data, including de-duplication of the original corpus data with multiple sources, removal of labels and special characters in texts, and the like. In the feature extraction process, the embodiment comprehensively utilizes the topic relevance of the words to calculate the keyword weight, further extracts text features, combines a related similarity algorithm to obtain the data semantic similarity degree, integrates a rapid clustering algorithm to obtain the final semantic similarity, and realizes the de-duplication of the original corpus data to obtain the preset corpus data.
Wherein, the ontology is an important knowledge base and represents basic terms and relations of vocabulary in the subject field. The content wind control field body is a system comprising content wind control terms and normative relations and descriptions among the terms. According to the embodiment, the ontology terms are extracted by adopting a multi-strategy fusion method, the preset corpus data are subjected to word segmentation, word part analysis after word segmentation and the like based on stop words, number words, graduated words, date and place nouns, the named entity identifies low-frequency personal names, manual screening keywords and other element design domain term filtering algorithms, the initial terms are subjected to multi-round filtering, words which have no obvious meaning, disordered grammar structures or semantically close meanings in the terms are filtered, and finally the ontology terms in the content wind control domain are obtained.
Step S12, content wind control knowledge extraction is carried out according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction.
Specifically, firstly, a fine-tuning pre-training language model of a classification task is adopted to extract entity relations, and then entity relation information and the pre-training language model are fused to extract the entity.
And S13, constructing a content wind control knowledge base according to the extracted entity relation and the entity.
Specifically, an RDF (Resource Description Framework, resource framework system) storage system may be employed to store content-aware knowledge as graph data, using a relational database as the underlying storage.
Therefore, the embodiment models the body of the content wind control field according to the preset corpus data; the modeling comprises ontology concept modeling and ontology relation modeling; content wind control knowledge extraction is carried out according to the modeled ontology concept and ontology relationship; the content wind control knowledge extraction comprises entity relation extraction and entity extraction; and constructing a content wind control knowledge base according to the extracted entity relationship and the entity. The content wind control knowledge is formed through ontology design, a content wind control-oriented knowledge base is constructed, knowledge support is provided for the content wind control technical service based on the knowledge graph, a reliable content wind control knowledge base is provided for language understanding and knowledge reasoning of a computer, and the accuracy and reliability of intelligent auditing are improved.
In an alternative embodiment, the ontology concept modeling includes:
acquiring ontology terms of the content wind control field according to preset corpus data; calculating word embedding characteristics of ontology terms in the content wind control field, and performing multistage clustering on the word embedding characteristics; and modeling the ontology concept of the content wind control field into characters, institutions, events and field feature word tables according to the characteristics of the content wind control and 5W elements of the media content.
Wherein the 5W element includes when (when), where (where), what (what), what (why), and what (who). Specifically, the word embedding feature can be used in the embodiment, the word embedding feature of each term is calculated through a direct Skip-Gram (DSG) algorithm, and the word embedding feature of the term is clustered in multiple stages by the k-means algorithm. As shown in fig. 2, the content wind control domain ontology concept is modeled as follows in combination with the characteristics of the content wind control and the 5W element of the media content: persona, institution, event, and domain feature vocabulary.
In an alternative embodiment, the ontology modeling includes:
when the relation among ontology terms in the content wind control field is a hierarchical relation, extracting by adopting a template preset by an expert and a multi-strategy mode based on language rules and a clustering method; when the relation between the ontology terms in the content wind control field is a non-hierarchical relation, the corpus data is analyzed by adopting a natural language processing technology, the core verbs in each sentence are identified, the terms adjacent to the core verbs are found by combining the context, and the relation between the two terms is constructed.
The relation among the ontology terms is divided into a hierarchical relation and a non-hierarchical relation. The extraction of the hierarchical relationship can be performed by adopting a template preset by an expert, a multi-strategy mode based on language rules and a clustering method, for example, the relationship between a person and an organization is divided into: the relationship between the people is: relatives, colleagues/upper and lower levels, etc. The extraction of the non-hierarchical relationship adopts a deep natural language processing technology to carry out syntactic analysis and semantic dependency analysis on the language, identifies the core verb in each sentence, then combines the context, searches the term adjacent to the core word and constructs the relationship between the two terms.
In an alternative embodiment, the entity relationship extraction includes:
giving a sentence, and sending the sentence into an encoder to obtain a corresponding word vector; the hidden coding of the entity information is realized by simulating the importance degree and the relativity between word vectors, and the average pooling operation is added to obtain the entity embedded characteristics of sentences; and splicing the entity embedding vector and the word vector, and classifying the entity embedding vector and the word vector through a neural network so as to obtain entity relation expression of the whole sentence.
Specifically, the entity relation extraction task in the sentence is based on hidden layer embedding of the language model, so that sentence-level text classification is realized. As shown in FIG. 3, a fifth sentence is given as S n First, S is n Feeding into Roberta encoder to obtain corresponding word vector n is the number of words of the sentence and d is the vector dimension. Entity relation depends on priori knowledge such as entity category, position sequence and the like, the embodiment in sentences is the association degree between words, therefore, an entity information coding layer based on an attention mechanism is designed behind a hidden layer, hidden coding of entity information is realized by simulating the importance degree of word vectors and the correlation between word vectors, and an average pooling operation is added to obtain the entity embedded characteristics of the whole sentence>AttNet means that self-care mechanisms are used to obtain embedded information between hidden layer vectors. Embedding the generated entity in a vector->Embedding +.>Splicing and classifying by neural network to obtain relation expression of whole sentence>Wherein σ is the sigmod activation function, a threshold ε is set, when +.>Determining the sentence inclusion relation r i I.e. occurs.
In an alternative embodiment, the entity extraction includes:
obtaining a continuous representation of the relationship prompt information according to the entity relationship representation; and fusing the continuous representation with the word vector, identifying the entity by using the conditional random field, and obtaining the output of each word in the entity classification stage.
Specifically, as shown in FIG. 3, the relationship obtained in the entity relationship extraction stageConverting into one-hot vector and re-parameterizing the same by using a multi-layer perceptron to obtain a continuous representation P of relationship cues r . Prompting relation information P of model r And word vector->Fusion is performed through a transducer network module by combining P r Calculation K of attention change by stitching with K, V vector in network p 、V p Synchronous updating prompt parameter P in training process r And an attention weight matrix. Then using conditional random field to identify entity and obtain output Y of every word in entity classification stage n I.e., the probability that the token contained in the current input is an entity of some sort.
Example two
As shown in fig. 4, the embodiment of the present application provides a method for constructing a content wind-controlled knowledge base, which is a specific embodiment, and is used for describing the scheme of the present invention in detail, as shown in fig. 4, and specifically includes the following steps:
step S21, data acquisition and processing.
In the embodiment, based on the main stream media official netpage, the mode of deep depth from point to surface and layer to layer is adopted, and the HTML mark language is deeply analyzed through network link, so that the HTML mark content is acquired and analyzed at fixed time. The preprocessing of the original data after the data acquisition comprises the steps of de-duplication of multi-source data, removal of labels and special characters in the text and the like. In the feature extraction process, the embodiment comprehensively utilizes the topic relevance of the words to calculate the keyword weight, further extracts text features, combines a related similarity algorithm to obtain the data semantic similarity degree, integrates a rapid clustering algorithm to obtain the final semantic similarity, realizes the final result of data deduplication, and obtains the final corpus data.
And S22, modeling the ontology in the content wind control field.
An ontology is an important knowledge base that represents basic terms and relationships of vocabulary for a topic area. The content wind control field ontology is a system which comprises content wind control terms and canonical relations among the terms and descriptions. The content wind control field ontology modeling comprises the following parts:
the body terms are defined, the body terms are extracted by adopting a multi-strategy fusion method, the language data are subjected to word segmentation based on stop words, number words, graduated words, date and place nouns, part-of-speech analysis and part-of-speech analysis after word segmentation, named entity recognition is carried out on low-frequency personal names, manual screening key words and other element design field term filtering algorithms, initial terms are subjected to multi-round filtering, words which have no obvious meaning, disordered grammar structures or close semantics in terms are filtered, and finally the body terms in the content wind control field are obtained.
In the embodiment, word embedding characteristics are adopted, the word embedding characteristics of each term are calculated through a direct Skip-Gram (DSG) algorithm, and multistage clustering is carried out on the word embedding characteristics of the terms by assisting a k-means algorithm. As shown in fig. 2, the content wind control domain ontology concept is modeled as follows in combination with the characteristics of the content wind control and the 5W element of the media content: persona, institution, event, and domain feature vocabulary.
The ontology relation modeling is carried out, and the relation among the ontology terms is divided into a hierarchical relation and a non-hierarchical relation. The extraction of the hierarchical relationship adopts a template preset by an expert, and adopts a multi-strategy mode extraction based on language rules and a clustering method, for example, the relationship between a person and a mechanism is divided into: the relationship between the people is: relatives, colleagues/upper and lower levels, etc. The extraction of the non-hierarchical relationship adopts a deep natural language processing technology to carry out syntactic analysis and semantic dependency analysis on the language, identifies the core verb in each sentence, then combines the context, searches the term adjacent to the core word and constructs the relationship between the two terms.
And S23, extracting the content wind control knowledge.
The embodiment provides a two-stage entity relation extraction method, which comprises the steps of firstly adopting a fine-tuning pre-training language model of a classification task to extract entity relation, and then fusing entity relation information and the pre-training language model to extract the entity. The implementation steps are as follows:
the entity relation extraction task in the sentence is based on the hidden layer embedding of the language model, and the text classification of the sentence level is realized. As shown in FIG. 3, a sentence S is given n First, S is n Feeding into Roberta encoder to obtain corresponding word vectorn is the number of words of the sentence and d is the vector dimension. Entity relation depends on priori knowledge such as entity category, position sequence and the like, the embodiment in sentences is the association degree between words, so an entity information coding layer based on an attention mechanism is designed behind a hidden layer, hidden coding of entity information is realized by simulating the importance degree of word vectors and the correlation between word vectors, and an average pooling operation is added to obtain entity embedded characteristics of the whole sentenceAttNet means that self-care mechanisms are used to obtain embedded information between hidden layer vectors. Embedding the generated entity in a vector->Embedding +.>Splicing and classifying by neural network to obtain relation expression of whole sentence> Wherein σ is the sigmod activation function, a threshold ε is set, when +.>Determining the sentence inclusion relation r i 。
Entity extraction, namely extracting the relationship obtained in the entity relationship extraction stageConverting into one-hot vector and re-parameterizing the same by using a multi-layer perceptron to obtain a continuous representation P of relationship cues r . Prompting relation information P of model r And word vector->Fusion is performed through a transducer network module by combining P r Calculation K of attention change by stitching with K, V vector in network p 、V p Synchronous updating prompt parameter P in training process r And an attention weight matrix. Then using conditional random field to identify entity and obtain output Y of every word in entity classification stage n I.e., the probability that the token contained in the current input is an entity of some sort.
And S24, constructing a content wind control knowledge base.
In the embodiment, an RDF storage system is adopted, content wind control knowledge is stored as graph data, and a relational database is used as a bottom storage scheme.
According to the embodiment, data cleaning, structured extraction and knowledge mining are automatically carried out, a content wind control knowledge base is finally constructed, the content wind control knowledge base comprises a bottom layer content wind control field term base and relational knowledge among terms, and support service can be provided for media content wind control.
Example III
Fig. 5 shows a schematic structural diagram of a content wind-controlled knowledge base construction device according to a third embodiment of the present application. As shown in fig. 5, the apparatus includes: a modeling module 31, a knowledge extraction module 32, and a knowledge base construction module 33; wherein,,
the modeling module 31 is configured to model the body of the content wind control domain according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling;
the knowledge extraction module 32 is configured to perform content wind control knowledge extraction according to the modeled ontology concepts and ontology relationships; the content wind control knowledge extraction comprises entity relation extraction and entity extraction;
the knowledge base construction module 33 is configured to construct a content wind control knowledge base according to the extracted entity relationships and entities.
Further, the modeling module 31 is specifically configured to: acquiring ontology terms of the content wind control field according to preset corpus data; calculating word embedding characteristics of the ontology term in the content wind control field, and performing multistage clustering on the word embedding characteristics; and modeling the ontology concept of the content wind control field into characters, institutions, events and field feature word tables according to the characteristics of the content wind control and 5W elements of the media content.
Further, the modeling module 31 is specifically configured to: when the relation among ontology terms in the content wind control field is a hierarchical relation, extracting by adopting a template preset by an expert and a multi-strategy mode based on language rules and a clustering method; when the relation between the ontology terms in the content wind control field is a non-hierarchical relation, the corpus data is analyzed by adopting a natural language processing technology, the core verbs in each sentence are identified, the terms adjacent to the core verbs are found by combining the context, and the relation between the two terms is constructed.
Further, the knowledge extraction module 32 is specifically configured to: giving a sentence, and sending the sentence into an encoder to obtain a corresponding word vector; the hidden coding of the entity information is realized by simulating the importance degree and the relativity between the word vectors, and the entity embedding characteristics of sentences are obtained by adding an average pooling operation; and splicing the entity embedding vector and the word vector, and classifying through a neural network to obtain entity relation representation of the whole sentence.
Further, the knowledge extraction module 32 is specifically configured to: obtaining a continuous representation of relationship prompt information according to the entity relationship representation; and fusing the continuous representation with the word vector, identifying the entity by using a conditional random field, and obtaining the output of each word in the entity classification stage.
Further, the hierarchical relationship includes a relationship between a person and an organization, or a relationship between a person and a person; the relationship between the person and the organization is the establishment/establishment, the wilting and the visit/visit, and the relationship between the person and the person is the relatives, colleagues/upper and lower levels.
Further, the knowledge base construction module 33 is specifically configured to: and a resource framework RDF storage system is adopted, the content wind control knowledge is stored as graph data, and a relational database is used as a bottom layer for storage.
The content wind control knowledge base construction device in this embodiment is used to execute the content wind control knowledge base construction methods in the first to second embodiments, and the working principle is similar to the technical effect, and is not repeated here.
Example IV
A fourth embodiment of the present application provides a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, where the computer executable instruction may perform the method for building a content wind-controlled knowledge base in any of the foregoing method embodiments.
Example five
Fig. 6 shows a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. The specific embodiments of the present application are not limited to specific implementations of electronic devices.
As shown in fig. 6, the electronic device may include: a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.
Wherein: processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. Processor 402, for executing program 410, may specifically perform the relevant steps in the method embodiments described above.
In particular, program 410 may include program code including computer-operating instructions.
The processor 402 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included in the electronic device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and the above description of specific languages is provided for disclosure of preferred embodiments of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the present application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application may also be embodied as an apparatus or device program (e.g., computer program and computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.
Claims (10)
1. The method for constructing the content wind control knowledge base is characterized by comprising the following steps of:
modeling the content wind control domain ontology according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling;
content wind control knowledge extraction is carried out according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction;
and constructing a content wind control knowledge base according to the extracted entity relationship and the entity.
2. The method of claim 1, wherein the ontology concept modeling comprises:
acquiring ontology terms of the content wind control field according to preset corpus data;
calculating word embedding characteristics of the ontology term in the content wind control field, and performing multistage clustering on the word embedding characteristics;
and modeling the ontology concept of the content wind control field into characters, institutions, events and field feature word tables according to the characteristics of the content wind control and 5W elements of the media content.
3. The method of claim 1, wherein the ontology relationship modeling comprises:
when the relation among ontology terms in the content wind control field is a hierarchical relation, extracting by adopting a template preset by an expert and a multi-strategy mode based on language rules and a clustering method;
when the relation between the ontology terms in the content wind control field is a non-hierarchical relation, the corpus data is analyzed by adopting a natural language processing technology, the core verbs in each sentence are identified, the terms adjacent to the core verbs are found by combining the context, and the relation between the two terms is constructed.
4. The method of claim 1, wherein the entity relationship extraction comprises:
giving a sentence, and sending the sentence into an encoder to obtain a corresponding word vector;
the hidden coding of the entity information is realized by simulating the importance degree and the relativity between the word vectors, and the entity embedding characteristics of sentences are obtained by adding an average pooling operation;
and splicing the entity embedding vector and the word vector, and classifying through a neural network to obtain entity relation representation of the whole sentence.
5. The method of claim 4, wherein the entity extraction comprises:
obtaining a continuous representation of relationship prompt information according to the entity relationship representation;
and fusing the continuous representation with the word vector, identifying the entity by using a conditional random field, and obtaining the output of each word in the entity classification stage.
6. The method of claim 3, wherein the hierarchical relationship comprises a relationship of people to institutions, or a relationship between people to people; the relationship between the person and the organization is the establishment/establishment, the wilting and the visit/visit, and the relationship between the person and the person is the relatives, colleagues/upper and lower levels.
7. The method according to any one of claims 1-6, wherein constructing a content-pneumatic knowledge base from the extracted entity relationships and entities comprises:
and a resource framework RDF storage system is adopted, the content wind control knowledge is stored as graph data, and a relational database is used as a bottom layer for storage.
8. A content wind-controlled knowledge base construction apparatus, comprising:
the modeling module is used for modeling the body of the content wind control field according to preset corpus data; wherein the modeling comprises ontology concept modeling and ontology relationship modeling;
the knowledge extraction module is used for carrying out content wind control knowledge extraction according to the modeled ontology concepts and ontology relations; the content wind control knowledge extraction comprises entity relation extraction and entity extraction;
and the knowledge base construction module is used for constructing a content wind control knowledge base according to the extracted entity relation and the entity.
9. An electronic device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the content wind-controlled knowledge base construction method according to any one of claims 1 to 7.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the content wind-controlled knowledge base construction method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310094574.9A CN116069948B (en) | 2023-01-17 | 2023-01-17 | Content wind control knowledge base construction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310094574.9A CN116069948B (en) | 2023-01-17 | 2023-01-17 | Content wind control knowledge base construction method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116069948A true CN116069948A (en) | 2023-05-05 |
CN116069948B CN116069948B (en) | 2024-01-09 |
Family
ID=86179869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310094574.9A Active CN116069948B (en) | 2023-01-17 | 2023-01-17 | Content wind control knowledge base construction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116069948B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232443A1 (en) * | 2017-02-16 | 2018-08-16 | Globality, Inc. | Intelligent matching system with ontology-aided relation extraction |
CN110968700A (en) * | 2019-11-01 | 2020-04-07 | 数地科技(北京)有限公司 | Domain event map construction method and device fusing multi-class affairs and entity knowledge |
CN111832307A (en) * | 2020-07-09 | 2020-10-27 | 北京工业大学 | Entity relationship extraction method and system based on knowledge enhancement |
CN111930856A (en) * | 2020-07-06 | 2020-11-13 | 北京邮电大学 | Method, device and system for constructing domain knowledge graph ontology and data |
CN112559766A (en) * | 2020-12-08 | 2021-03-26 | 杭州互仲网络科技有限公司 | Legal knowledge map construction system |
CN114064918A (en) * | 2021-11-06 | 2022-02-18 | 中国电子科技集团公司第五十四研究所 | Multi-modal event knowledge graph construction method |
CN114661856A (en) * | 2020-12-23 | 2022-06-24 | 沈阳新松机器人自动化股份有限公司 | Fusion map construction method |
CN114780745A (en) * | 2022-04-20 | 2022-07-22 | 北京明略昭辉科技有限公司 | Method and device for constructing knowledge system, electronic equipment and storage medium |
CN115292506A (en) * | 2022-06-24 | 2022-11-04 | 北京百度网讯科技有限公司 | Knowledge graph ontology construction method and device applied to office field |
CN115309915A (en) * | 2022-09-29 | 2022-11-08 | 北京如炬科技有限公司 | Knowledge graph construction method, device, equipment and storage medium |
-
2023
- 2023-01-17 CN CN202310094574.9A patent/CN116069948B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232443A1 (en) * | 2017-02-16 | 2018-08-16 | Globality, Inc. | Intelligent matching system with ontology-aided relation extraction |
CN110968700A (en) * | 2019-11-01 | 2020-04-07 | 数地科技(北京)有限公司 | Domain event map construction method and device fusing multi-class affairs and entity knowledge |
CN111930856A (en) * | 2020-07-06 | 2020-11-13 | 北京邮电大学 | Method, device and system for constructing domain knowledge graph ontology and data |
CN111832307A (en) * | 2020-07-09 | 2020-10-27 | 北京工业大学 | Entity relationship extraction method and system based on knowledge enhancement |
CN112559766A (en) * | 2020-12-08 | 2021-03-26 | 杭州互仲网络科技有限公司 | Legal knowledge map construction system |
CN114661856A (en) * | 2020-12-23 | 2022-06-24 | 沈阳新松机器人自动化股份有限公司 | Fusion map construction method |
CN114064918A (en) * | 2021-11-06 | 2022-02-18 | 中国电子科技集团公司第五十四研究所 | Multi-modal event knowledge graph construction method |
CN114780745A (en) * | 2022-04-20 | 2022-07-22 | 北京明略昭辉科技有限公司 | Method and device for constructing knowledge system, electronic equipment and storage medium |
CN115292506A (en) * | 2022-06-24 | 2022-11-04 | 北京百度网讯科技有限公司 | Knowledge graph ontology construction method and device applied to office field |
CN115309915A (en) * | 2022-09-29 | 2022-11-08 | 北京如炬科技有限公司 | Knowledge graph construction method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116069948B (en) | 2024-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9613024B1 (en) | System and methods for creating datasets representing words and objects | |
KR101339103B1 (en) | Document classifying system and method using semantic feature | |
US9880998B1 (en) | Producing datasets for representing terms and objects based on automated learning from text contents | |
US9965726B1 (en) | Adding to a knowledge base using an ontological analysis of unstructured text | |
WO2021051518A1 (en) | Text data classification method and apparatus based on neural network model, and storage medium | |
KR101136007B1 (en) | System and method for anaylyzing document sentiment | |
CN103229223A (en) | Providing answers to questions using multiple models to score candidate answers | |
CN112861990A (en) | Topic clustering method and device based on keywords and entities and computer-readable storage medium | |
Kavitha et al. | Chatbot for healthcare system using Artificial Intelligence | |
Rodrigues et al. | Advanced applications of natural language processing for performing information extraction | |
CN115098706A (en) | Network information extraction method and device | |
Al-Zoghby et al. | Semantic relations extraction and ontology learning from Arabic texts—a survey | |
US9262395B1 (en) | System, methods, and data structure for quantitative assessment of symbolic associations | |
Vaissnave et al. | Modeling of automated glowworm swarm optimization based deep learning model for legal text summarization | |
Alruily | Using text mining to identify crime patterns from arabic crime news report corpus | |
Phan et al. | Applying skip-gram word estimation and SVM-based classification for opinion mining Vietnamese food places text reviews | |
Rao et al. | Enhancing multi-document summarization using concepts | |
CN115714002B (en) | Training method for depression risk detection model, depression symptom early warning method and related equipment | |
CN116069948B (en) | Content wind control knowledge base construction method, device, equipment and storage medium | |
Lee | Natural Language Processing: A Textbook with Python Implementation | |
Lai et al. | An unsupervised approach to discover media frames | |
Halterman | Extracting political events from text using syntax and semantics | |
Varga et al. | LELA-A natural language processing system for Romanian tourism | |
Ledeneva et al. | Recent advances in computational linguistics | |
Suta et al. | Matching question and answer using similarity: an experiment with stack overflow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |