CN112035646A - Key content extraction method - Google Patents

Key content extraction method Download PDF

Info

Publication number
CN112035646A
CN112035646A CN202010905863.9A CN202010905863A CN112035646A CN 112035646 A CN112035646 A CN 112035646A CN 202010905863 A CN202010905863 A CN 202010905863A CN 112035646 A CN112035646 A CN 112035646A
Authority
CN
China
Prior art keywords
subject
extracted
vocabulary
key content
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010905863.9A
Other languages
Chinese (zh)
Inventor
王鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Original Assignee
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd filed Critical Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority to CN202010905863.9A priority Critical patent/CN112035646A/en
Publication of CN112035646A publication Critical patent/CN112035646A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a key content extraction method, which comprises the steps of acquiring subject information of key content to be extracted, and generating a corresponding subject knowledge base according to the subject information; extracting an original text from the subject knowledge base, and performing data processing on the original text to obtain a corresponding target text; performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords. According to the technical scheme, the purpose of automatically extracting the key content from the text corresponding to the subject is achieved, the extraction efficiency and the extraction accuracy of the key content are improved, and compared with a manual extraction mode for manually marking the exercises, the extraction mode of the key content improves the working efficiency and saves a large amount of manpower.

Description

Key content extraction method
Technical Field
The invention relates to the technical field of data processing, in particular to a key content extraction method.
Background
With the continuous development and progress of computer technology and internet technology and the gradual popularization of intelligent electronic products, the learning of students is gradually completed by means of electronic products in consideration of the intelligence and convenience of the learning of electronic products. Therefore, a large number of electronic problems also exist in teaching students. At present, knowledge points and corresponding keywords are basically confirmed in a manual labeling mode, so that the working efficiency is low and the workload is large.
Disclosure of Invention
The invention provides a key content extraction method, and aims to realize automatic extraction of key content corresponding to an electronic exercise.
The invention provides a method for extracting key content, which comprises the following steps:
acquiring subject information of key content to be extracted, and generating a corresponding subject knowledge base according to the subject information;
extracting an original text from the subject knowledge base, and performing data processing on the original text to obtain a corresponding target text;
performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords.
Further, the acquiring subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information includes:
analyzing the subject information to acquire known subject knowledge points and known subject keywords corresponding to the subject information;
and generating a discipline knowledge base containing the known discipline knowledge points and the known discipline keywords according to the acquired known discipline knowledge points and the known discipline keywords.
Further, the generating a discipline knowledge base containing the known discipline knowledge points and the known discipline keywords according to the acquired known discipline knowledge points and the known discipline keywords comprises:
and labeling the known subject knowledge points and the known subject keywords according to the acquired known subject knowledge points and the known subject keywords, taking the labeled known subject knowledge points and the labeled known subject keywords as label samples, and generating a subject knowledge base containing the label samples.
Further, the acquiring subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information includes:
acquiring subject information of key content to be extracted, analyzing the subject information, and acquiring a subject type and subject characteristics corresponding to the subject information;
acquiring professional subject vocabularies and high-frequency vocabularies corresponding to the subject types and the subject characteristics according to the subject types and the subject characteristics;
and marking the subject vocabulary and the high-frequency vocabulary, taking the marked subject vocabulary and the high-frequency vocabulary as label samples, and generating a subject knowledge base containing the label samples.
Further, the acquiring subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information includes:
acquiring subject information of key content to be extracted, and collecting known subject knowledge points and known subject keywords from the subject information;
and generating a discipline knowledge base corresponding to the discipline knowledge graph according to the collected known discipline knowledge points and the known discipline keywords.
Further, the performing data processing on the original text to obtain a corresponding target text includes:
and according to the subject knowledge base, performing data preprocessing on the original text, and removing irrelevant characters containing spaces in the original text to obtain a corresponding target text.
Further, the performing word segmentation processing and cluster analysis on the target text, and obtaining key content in the target text according to a preset analysis method includes:
performing word segmentation processing on the target text to obtain a plurality of corresponding word segmentation words, and calculating the current heat value of each word segmentation word;
performing cluster analysis on the word segmentation vocabularies to obtain word segmentation vocabulary sets corresponding to the word segmentation vocabularies;
extracting target words in each participle word set according to a preset N word extraction modes to obtain a plurality of extracted word sets corresponding to each participle word set, wherein each extracted word set comprises corresponding target words;
determining a comprehensive effective value corresponding to each extracted vocabulary set according to the current heat value of the target vocabulary;
sequencing the comprehensive effective values from big to small to obtain the first n extracted vocabulary sets;
and extracting the key content of each extracted vocabulary set in the first n extracted vocabulary sets to obtain the key content in the target text.
Further, the calculating the current heat value of each participle word comprises:
calculating the current heat value of each participle word by using the formula (1):
Figure BDA0002661438680000031
in the formula (1), SkRepresenting the current heat value of the kth participle word; beta is akThe vocabulary attribute value of the kth word segmentation vocabulary is a preset value and has a value range of [1,5 ]](ii) a n represents the number of unit time periods included in a preset total time period; chi shapekiRepresenting the attention degree of the kth participle word in the ith unit time period; chi shapek' represents an average degree of attention of the k-th segmented word in a total time period; chi shapekmaxRepresenting the maximum attention degree of the k-th word segmentation vocabulary in all unit time periods in a total time period;
wherein, the xkiIs calculated as follows in equation (2):
Figure BDA0002661438680000041
wherein p iskiRepresenting the searching frequency of the kth participle word in the ith unit time period; p1iIndicating the total frequency of searching different segmented words in the ith unit time period.
Further, the determining, according to the current heat value of the target vocabulary, a respective corresponding comprehensive effective value of each extracted vocabulary set includes:
and (3) calculating a comprehensive effective value corresponding to each extracted vocabulary set by using a formula (3) and a formula (4), wherein the comprehensive effective values comprise:
Figure BDA0002661438680000042
Figure BDA0002661438680000043
wherein Z isaA comprehensive effective value representing the a-th extracted vocabulary set; m represents the total number of all the finally extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are respectively extracted by the N vocabulary extraction modes; sajA current heat value representing the extracted jth word; p is a radical ofamaxRepresenting the extracted probability corresponding to the vocabulary with the maximum number of times of extraction in the extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are extracted by the N vocabulary extraction modes; p is a radical ofaminRepresenting the extracted probability corresponding to the vocabulary with the least number of times of extraction in the extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are extracted by the N vocabulary extraction modes;
dajrepresenting the total times of occurrence of the j-th extracted vocabulary in the extraction process by adopting N vocabulary extraction modes; k is a radical ofadThe number of the words extracted in the process of extracting the a-th extracted word set by adopting the d-th word extraction mode is shown.
The key content extraction method comprises the steps of acquiring subject information of key content to be extracted, and generating a corresponding subject knowledge base according to the subject information; extracting an original text from the subject knowledge base, and performing data processing on the original text to obtain a corresponding target text; performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords; the method and the device achieve the purpose of automatically extracting the knowledge points and the key words corresponding to the electronic exercises, improve the extraction efficiency and the extraction accuracy of the knowledge points and the key words, and compared with a manual extraction mode of manually marking the exercises, the extraction mode of the knowledge points and the key words improves the working efficiency, reduces the error rate and saves a large amount of manpower.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described below by means of the accompanying drawings and examples.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic workflow diagram of an embodiment of the key content extraction method of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The invention provides a key content extraction method, which solves the problems of low working efficiency and large workload of manual marking exercises and achieves the purpose of automatically extracting corresponding knowledge points and key words of electronic exercises.
As shown in fig. 1, fig. 1 is a schematic workflow diagram of an embodiment of a key content extraction method according to the present invention; the key content extraction method of the present invention may be implemented as steps S10-S30 described below.
And step S10, acquiring subject information of the key content to be extracted, and generating a corresponding subject knowledge base according to the subject information.
In the embodiment of the invention, a system acquires subject information of key content to be extracted; wherein the subject information comprises: the corresponding disciplines of mathematics, language, physics, chemistry, and the like and all electronic exercises corresponding to the disciplines. Generating a corresponding subject knowledge base according to the subject information; in order to facilitate extraction of knowledge points and keywords, the subject knowledge base may only include subject information corresponding to a subject.
And step S20, extracting an original text from the discipline knowledge base, and performing data processing on the original text to obtain a corresponding target text.
The original text may be an electronic problem or other subject related text.
Extracting an original text from the discipline knowledge base based on the generated discipline knowledge base, and when data preprocessing is performed on the original text, eliminating contents irrelevant to words in the original text, such as spaces and other characters without any symbolic meaning, so as to obtain a corresponding target text.
And step S30, performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords.
In the embodiment of the invention, word segmentation processing is carried out on the target text to obtain corresponding word segmentation words, and then clustering analysis is carried out on the obtained word segmentation words to obtain corresponding word segmentation sets. The preset analysis method includes but is not limited to: and extracting target words from the word segmentation sets according to corresponding extraction modes based on the obtained word segmentation sets so as to obtain corresponding extraction sets, and extracting corresponding knowledge points and/or keywords based on the extraction sets and outputting the knowledge points and/or keywords.
In an embodiment, the acquiring subject information of the key content to be extracted, and generating a corresponding subject knowledge base according to the subject information may be implemented as follows:
analyzing the subject information to acquire known subject knowledge points and known subject keywords corresponding to the subject information;
and generating a discipline knowledge base containing the known discipline knowledge points and the known discipline keywords according to the acquired known discipline knowledge points and the known discipline keywords.
In one embodiment, the generating a discipline knowledge base containing the known discipline knowledge points and the known discipline keywords according to the acquired known discipline knowledge points and the known discipline keywords may be implemented as follows:
and labeling the known subject knowledge points and the known subject keywords according to the acquired known subject knowledge points and the known subject keywords, taking the labeled known subject knowledge points and the labeled known subject keywords as label samples, and generating a subject knowledge base containing the label samples.
In an embodiment, the acquiring subject information of the key content to be extracted, and generating a corresponding subject knowledge base according to the subject information may also be implemented as follows:
acquiring subject information of key content to be extracted, analyzing the subject information, and acquiring a subject type and subject characteristics corresponding to the subject information;
acquiring professional subject vocabularies and high-frequency vocabularies corresponding to the subject types and the subject characteristics according to the subject types and the subject characteristics;
and marking the subject vocabulary and the high-frequency vocabulary, taking the marked subject vocabulary and the high-frequency vocabulary as label samples, and generating a subject knowledge base containing the label samples.
In the embodiment of the invention, known professional subject vocabularies and high-frequency vocabularies are labeled and then stored into a corresponding subject knowledge base as label samples. For example, moments, newtons (international units of magnitude of a weighing force), and the like appearing in the physical discipline are labeled and stored as corresponding label samples in the physical discipline knowledge base.
In an embodiment, the acquiring subject information of the key content to be extracted, and generating a corresponding subject knowledge base according to the subject information may also be implemented as follows:
acquiring subject information of key content to be extracted, and collecting known subject knowledge points and known subject keywords from the subject information;
and generating a discipline knowledge base corresponding to the discipline knowledge graph according to the collected known discipline knowledge points and the known discipline keywords.
In an embodiment, the data processing on the original text to obtain the corresponding target text may be implemented as follows:
and according to the subject knowledge base, performing data preprocessing on the original text, and removing irrelevant characters containing spaces in the original text to obtain a corresponding target text.
In an embodiment, the performing word segmentation and clustering analysis on the target text, and obtaining the key content in the target text according to a preset analysis method may be implemented as follows:
performing word segmentation processing on the target text to obtain a plurality of corresponding word segmentation words, and calculating the current heat value of each word segmentation word;
performing cluster analysis on the word segmentation vocabularies to obtain word segmentation vocabulary sets corresponding to the word segmentation vocabularies;
extracting target words in each participle word set according to a preset N word extraction modes to obtain a plurality of extracted word sets corresponding to each participle word set, wherein each extracted word set comprises corresponding target words;
determining a comprehensive effective value corresponding to each extracted vocabulary set according to the current heat value of the target vocabulary;
sequencing the comprehensive effective values from big to small to obtain the first n extracted vocabulary sets;
and extracting the key content of each extracted vocabulary set in the first n extracted vocabulary sets to obtain the key content in the target text.
In one embodiment, the calculating the current heat value of each participle word may be performed as follows:
calculating the current heat value of each participle word by using the formula (1):
Figure BDA0002661438680000081
in the formula (1), SkRepresenting the current heat value of the kth participle word; beta is akThe vocabulary attribute value of the kth word segmentation vocabulary is a preset value and has a value range of [1,5 ]](ii) a n represents the number of unit time periods included in a preset total time period; chi shapekiRepresenting the attention degree of the kth participle word in the ith unit time period; 'chi'kRepresenting the average attention degree of the k-th participle in a total time period; chi shapekmaxRepresenting the maximum attention degree of the k-th word segmentation vocabulary in all unit time periods in a total time period;
wherein, the xkiIs calculated as follows in equation (2):
Figure BDA0002661438680000091
wherein p iskiRepresenting the searching frequency of the kth participle word in the ith unit time period; p1iIndicating the total frequency of searching different segmented words in the ith unit time period.
In one embodiment, the determining the respective comprehensive valid value for each extracted vocabulary set according to the current heat value of the target vocabulary may be implemented as follows:
and (3) calculating a comprehensive effective value corresponding to each extracted vocabulary set by using a formula (3) and a formula (4), wherein the comprehensive effective values comprise:
Figure BDA0002661438680000092
Figure BDA0002661438680000093
wherein Z isaA comprehensive effective value representing the a-th extracted vocabulary set; m represents the total number of all the finally extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are respectively extracted by the N vocabulary extraction modes; sajA current heat value representing the extracted jth word; p is a radical ofamaxRepresenting the extracted probability corresponding to the vocabulary with the maximum number of times of extraction in the extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are extracted by the N vocabulary extraction modes; p is a radical ofaminRepresenting the extracted probability corresponding to the vocabulary with the least number of times of extraction in the extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are extracted by the N vocabulary extraction modes;
dajrepresenting the total times of occurrence of the j-th extracted vocabulary in the extraction process by adopting N vocabulary extraction modes; k is a radical ofadThe number of the words extracted in the process of extracting the a-th extracted word set by adopting the d-th word extraction mode is shown.
In the implementation of the present invention, the current heat value of the target vocabulary is determined according to the current heat value of the participle vocabulary, and the number of the types of the vocabulary in the extracted set is less than or equal to the number of the types of the vocabulary in the corresponding participle set.
The above N vocabularies may be extracted by using an attribute related to the popularity of a vocabulary, or by using an attribute related to the difficulty level corresponding to the vocabulary.
In the implementation of the invention, the target problem is obtained by preprocessing, and the efficiency of obtaining the subsequent word segmentation is improved; providing a data base for acquiring knowledge points by calculating the current heat value of each vocabulary; through carrying out cluster analysis to the vocabulary, and adopt and predetermine the extraction mode, be convenient for establish accurate extraction set, can be effective and comprehensive screening key vocabulary, and synthesize the virtual value through calculating every extraction set, be convenient for confirm the validity of this set, through n1 extraction sets before the screening, and based on subject knowledge base, be convenient for acquire effectual key content.
In an embodiment, in the embodiment shown in fig. 1, "step S10, acquiring subject information of the key content to be extracted, and generating a corresponding subject knowledge base according to the subject information", may also be implemented according to the following technical means:
acquiring subject information of key content to be extracted, and collecting known subject knowledge points and known subject keywords from the subject information; and generating a discipline knowledge base corresponding to the discipline knowledge graph according to the collected known discipline knowledge points and the known discipline keywords.
In the embodiment of the invention, the concept of the knowledge graph is introduced, and the incidence relation between different vocabularies in the subject information is correspondingly displayed through the knowledge graph. The processing mode is suitable for describing chromosome related information between a plurality of bodies with genetic relations in a discipline for representing the association relations between different vocabularies through a graph, such as a biological discipline. The embodiment of the invention can also mark the known discipline knowledge points and the known discipline keywords as the label samples and store the label samples in the discipline knowledge base.
The key content extraction method comprises the steps of acquiring subject information of key content to be extracted, and generating a corresponding subject knowledge base according to the subject information; extracting an original text from the subject knowledge base, and performing data processing on the original text to obtain a corresponding target text; performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords. According to the technical scheme, the purpose of automatically extracting the key content of the text corresponding to the subject can be achieved, the extraction efficiency and the extraction accuracy of the key content are improved, compared with a mode of manually extracting the label exercises, the extraction mode of the knowledge points and the key words improves the working efficiency, reduces the error rate, and saves a large amount of manpower.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method for extracting key content, the method comprising:
acquiring subject information of key content to be extracted, and generating a corresponding subject knowledge base according to the subject information;
extracting an original text from the subject knowledge base, and performing data processing on the original text to obtain a corresponding target text;
performing word segmentation processing and clustering analysis on the target text, and obtaining key contents in the target text according to a preset analysis method, wherein the key contents comprise knowledge points and/or keywords.
2. The method for extracting key content according to claim 1, wherein the obtaining subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information comprises:
analyzing the subject information to acquire known subject knowledge points and known subject keywords corresponding to the subject information;
and generating a discipline knowledge base containing the known discipline knowledge points and the known discipline keywords according to the acquired known discipline knowledge points and the known discipline keywords.
3. The method for extracting key content according to claim 2, wherein the generating a discipline knowledge base containing the known discipline knowledge points and known discipline keywords according to the acquired known discipline knowledge points and known discipline keywords comprises:
and labeling the known subject knowledge points and the known subject keywords according to the acquired known subject knowledge points and the known subject keywords, taking the labeled known subject knowledge points and the labeled known subject keywords as label samples, and generating a subject knowledge base containing the label samples.
4. The method for extracting key content according to claim 1, wherein the obtaining subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information comprises:
acquiring subject information of key content to be extracted, analyzing the subject information, and acquiring a subject type and subject characteristics corresponding to the subject information;
acquiring professional subject vocabularies and high-frequency vocabularies corresponding to the subject types and the subject characteristics according to the subject types and the subject characteristics;
and marking the subject vocabulary and the high-frequency vocabulary, taking the marked subject vocabulary and the high-frequency vocabulary as label samples, and generating a subject knowledge base containing the label samples.
5. The method for extracting key content according to claim 1, wherein the obtaining subject information of the key content to be extracted and generating a corresponding subject knowledge base according to the subject information comprises:
acquiring subject information of key content to be extracted, and collecting known subject knowledge points and known subject keywords from the subject information;
and generating a discipline knowledge base corresponding to the discipline knowledge graph according to the collected known discipline knowledge points and the known discipline keywords.
6. The method for extracting key content according to any one of claims 1 to 5, wherein the performing data processing on the original text to obtain a corresponding target text comprises:
and according to the subject knowledge base, performing data preprocessing on the original text, and removing irrelevant characters containing spaces in the original text to obtain a corresponding target text.
7. The method for extracting key content according to any one of claims 1 to 5, wherein the performing word segmentation and cluster analysis on the target text and obtaining the key content in the target text according to a preset analysis method comprises:
performing word segmentation processing on the target text to obtain a plurality of corresponding word segmentation words, and calculating the current heat value of each word segmentation word;
performing cluster analysis on the word segmentation vocabularies to obtain word segmentation vocabulary sets corresponding to the word segmentation vocabularies;
extracting target words in each participle word set according to a preset N word extraction modes to obtain a plurality of extracted word sets corresponding to each participle word set, wherein each extracted word set comprises corresponding target words;
determining a comprehensive effective value corresponding to each extracted vocabulary set according to the current heat value of the target vocabulary;
sequencing the comprehensive effective values from big to small to obtain the first n extracted vocabulary sets;
and extracting the key content of each extracted vocabulary set in the first n extracted vocabulary sets to obtain the key content in the target text.
8. The method of claim 7, wherein the calculating the current heat value of each segmented word comprises:
calculating the current heat value of each participle word by using the formula (1):
Figure FDA0002661438670000031
in the formula (1), SkRepresenting the current heat value of the kth participle word; beta is akThe vocabulary attribute value of the kth word segmentation vocabulary is a preset value and has a value range of [1,5 ]](ii) a n represents the number of unit time periods included in a preset total time period; chi shapekiIndicating that the k word segmentation is at the ithAttention per unit time period; 'chi'kRepresenting the average attention degree of the k-th participle in a total time period; chi shapekmaxRepresenting the maximum attention degree of the k-th word segmentation vocabulary in all unit time periods in a total time period;
wherein, the xkiIs calculated as follows in equation (2):
Figure FDA0002661438670000032
wherein p iskiRepresenting the searching frequency of the kth participle word in the ith unit time period; p1iIndicating the total frequency of searching different segmented words in the ith unit time period.
9. The method for extracting key content according to claim 7, wherein said determining a respective comprehensive valid value for each set of extracted words according to the current heat value of the target words comprises:
and (3) calculating a comprehensive effective value corresponding to each extracted vocabulary set by using a formula (3) and a formula (4), wherein the comprehensive effective values comprise:
Figure FDA0002661438670000041
Figure FDA0002661438670000042
wherein Z isaA comprehensive effective value representing the a-th extracted vocabulary set; m represents the total number of all the finally extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are respectively extracted by the N vocabulary extraction modes; sajA current heat value representing the extracted jth word; p is a radical ofamaxIndicating that the target words in the a-th extracted word set are respectively extracted by the N word extraction modesDuring extraction, the extracted probability corresponding to the vocabulary with the largest number of times of extraction in the extracted vocabularies; p is a radical ofaminRepresenting the extracted probability corresponding to the vocabulary with the least number of times of extraction in the extracted vocabularies when the target vocabularies in the a-th extracted vocabulary set are extracted by the N vocabulary extraction modes;
dajrepresenting the total times of occurrence of the j-th extracted vocabulary in the extraction process by adopting N vocabulary extraction modes; k is a radical ofadThe number of the words extracted in the process of extracting the a-th extracted word set by adopting the d-th word extraction mode is shown.
CN202010905863.9A 2020-09-01 2020-09-01 Key content extraction method Pending CN112035646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010905863.9A CN112035646A (en) 2020-09-01 2020-09-01 Key content extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010905863.9A CN112035646A (en) 2020-09-01 2020-09-01 Key content extraction method

Publications (1)

Publication Number Publication Date
CN112035646A true CN112035646A (en) 2020-12-04

Family

ID=73592198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010905863.9A Pending CN112035646A (en) 2020-09-01 2020-09-01 Key content extraction method

Country Status (1)

Country Link
CN (1) CN112035646A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111702A (en) * 2021-03-01 2021-07-13 联想(北京)有限公司 Information determination method and device and electronic equipment
CN114416890A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Heterogeneous knowledge point integrated representation, storage, retrieval, generation and interaction method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111702A (en) * 2021-03-01 2021-07-13 联想(北京)有限公司 Information determination method and device and electronic equipment
CN114416890A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Heterogeneous knowledge point integrated representation, storage, retrieval, generation and interaction method

Similar Documents

Publication Publication Date Title
CN109543084B (en) Method for establishing detection model of hidden sensitive text facing network social media
CN102662930B (en) Corpus tagging method and corpus tagging device
CN112632989B (en) Method, device and equipment for prompting risk information in contract text
CN107657008A (en) Across media training and search method based on depth discrimination sequence study
CN111475615B (en) Fine granularity emotion prediction method, device and system for emotion enhancement and storage medium
CN112035646A (en) Key content extraction method
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN111144079A (en) Method and device for intelligently acquiring learning resources, printer and storage medium
CN107301411A (en) Method for identifying mathematical formula and device
CN117150151B (en) Wrong question analysis and test question recommendation system and method based on large language model
CN110689018A (en) Intelligent marking system and processing method thereof
CN110175657A (en) A kind of image multi-tag labeling method, device, equipment and readable storage medium storing program for executing
CN109947923A (en) A kind of elementary mathematics topic type extraction method and system based on term vector
CN112182237A (en) Topic knowledge point association method, topic knowledge point association system and storage medium
CN117573894B (en) Knowledge graph-based resource recommendation system and method
CN106022389B (en) A kind of related feedback method actively selecting more example multiple labeling digital pictures
CN117592470A (en) Low-cost gazette data extraction method driven by large language model
CN112989811A (en) BilSTM-CRF-based historical book reading auxiliary system and control method thereof
CN110442858B (en) Question entity identification method and device, computer equipment and storage medium
CN110674678A (en) Method and device for identifying sensitive mark in video
CN115982460A (en) Personalized recommendation method, system and medium for health science popularization information
CN115879463A (en) Course element recognition model training and recognition method based on text mining
CN111341404B (en) Electronic medical record data set analysis method and system based on ernie model
CN113515599A (en) Method for arranging help semantic analysis and scheme recommendation
CN118012921B (en) Man-machine interaction data processing system for intellectual property virtual experiment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination