CN117473078A - Visual reading system of long literature based on cross-domain named entity recognition - Google Patents
Visual reading system of long literature based on cross-domain named entity recognition Download PDFInfo
- Publication number
- CN117473078A CN117473078A CN202311298279.1A CN202311298279A CN117473078A CN 117473078 A CN117473078 A CN 117473078A CN 202311298279 A CN202311298279 A CN 202311298279A CN 117473078 A CN117473078 A CN 117473078A
- Authority
- CN
- China
- Prior art keywords
- entity
- character
- literature
- visualization
- entities
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 53
- 230000008451 emotion Effects 0.000 claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 33
- 230000008859 change Effects 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 238000013508 migration Methods 0.000 claims abstract description 5
- 230000005012 migration Effects 0.000 claims abstract description 5
- 238000012800 visualization Methods 0.000 claims description 45
- 238000011161 development Methods 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 14
- 238000013461 design Methods 0.000 claims 1
- 230000003993 interaction Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 238000007794 visualization technique Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 206010063659 Aversion Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a visual reading system of long literature based on cross-domain named entity recognition, which comprises the following components: the data acquisition module acquires a source text of the literary work through a web crawler program; the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set; the literature entity optimization module is used for training a cross-domain named entity recognition model based on parameter migration based on a coarse-granularity entity data set, and further recognizing and generating fine-granularity character entity, place entity and family entity data sets through a rule weight network optimization model result; the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and the visual analysis and display comprises four units of a figure relation network, a figure moving track, a figure emotion change and a figure attendance frequency.
Description
Technical Field
The invention relates to the technical field of natural language processing and data visualization, in particular to a long literature visual reading system based on cross-domain named entity recognition.
Background
Since the internet age of "information explosion," the contradiction between the vast and sophisticated information and limited time-driven incentives has led to an increasing pursuit of fragmented information with a decreasing focus on traditional literature, especially long literature. The long literature works usually have the characteristics of long length, complex character relationship, and zigzag plot, and are often obscured and understandable to read, so that people are difficult to adhere to. Under the rapid development of information science, the literature field starts to explore the application value and development potential of text visual analysis technology gradually. Text processing analysis can well combine natural language processing techniques, while visualization can clearly and effectively convey and communicate information in a graphical manner. The combination of text analysis and visualization in the literature field gives full play to the core advantages of the two technical fields, and the content and the characteristics of the literature works can be understood deeply and rapidly through interaction with readers.
Currently, there is a visualization method for literary works, with publication number CN116151255a. The patent discloses a text analysis and visualization method and system, but the method only considers the visualization of the frequency of person appearing, the form is relatively single, and the requirements of readers cannot be met.
Disclosure of Invention
The invention aims to provide a visual reading system for long literature based on cross-domain named entity recognition, which takes a novel text as a research object, performs character relation analysis, character track analysis, emotion change analysis, field frequency analysis and the like on an original text, generalizes law facts and emotion factors therein, and shows the law facts and emotion factors in a more efficient and visual mode, so that a user can conveniently read the novel text and understand characters, topics and text emotion more clearly. The invention constructs a visual platform which is focused on intelligent analysis and interactivity of long literary works, thereby optimizing the traditional reading mode, increasing the reading interest, improving the overall understanding ability of readers to the literary works, leading lengthy and complex stories to become clear and easy to understand and arousing attention of people to the literary works.
The specific technical scheme for realizing the aim of the invention is as follows:
a visual reading system for long literature based on cross-domain named entity recognition, comprising:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse granularity entity data set, setting a rule weight network by introducing context constraint rules of specific entity types, optimizing model results, further identifying literature work source texts, and generating fine granularity character entity, place entity and family entity data sets;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
Preferably, in a chinese literature entity optimization module of a visual reading system based on cross-domain named entity recognition, the context constraint rule of a specific entity type includes: rules for identifying persona entities, rules for identifying place entities, and rules for identifying family entities.
Preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification comprises:
the figure relation network visualization unit is used for displaying a complex figure relation network in the literary works and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; and using the open source visualization tool D3.Js, taking the names of the roles as nodes and the relationship strength values among the roles as the weights of the edges, and presenting a complex person relationship network in a visual mode.
Preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character movement track visualization unit is used for displaying movement tracks and important events of characters in literary works, extracting place entities in character sections in the articles according to the character names and the appearance sequence, and constructing a character track data set; extracting important events by a rule-based matching method; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; the module also integrates interaction functions of storylines and places where different chapters occur, and provides reading navigation tools to assist readers in understanding and tracking complex character relationships and story development;
preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character emotion change visualization unit is used for showing the change trend of character emotion along with the development of the scenario in the literary works, the module utilizes an open source tool NLTK to extract sentences describing characters, carries out emotion analysis on the sentences, and calculates the score of each character in different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character appearance frequency visualization unit is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a figure out field frequency change line graph by using an open source visualization tool ECharts, wherein a user can check the number of times that a certain character appears in the whole novel by dragging a time axis so as to help understand the importance degree change trend of the figure in the plot development.
Compared with the prior art, the invention has at least the following advantages or beneficial effects:
(1) Visual presentation of novel text: the invention constructs a perfect visual system for the literacy by processing and analyzing the text. With the visualization method, including charts, graphs and other visual elements, readers can intuitively understand and perceive the episodes, character relationships and other important elements of novels in a completely new form. The visual mode provides a brand new reading experience, so that literary works are more vivid, easy to understand and appreciate.
(2) Automated entity extraction: in the aspect of the extraction of fictitious characters and place entities of the novel, the invention adopts an automatic method. Through a cross-domain named entity recognition technology, entities and relations in novels can be efficiently and accurately extracted. Compared with the traditional manual extraction mode, the automatic extraction method greatly reduces the manual workload and improves the efficiency.
(3) Lowering the reading threshold: the invention enables readers to learn about the content of novels in a faster way, thereby lowering the threshold of reading. By processing and analyzing the novel text, key information can be extracted, and a brief summary or abstract is constructed. The reader does not need to fully read the entire book to understand its main content. The method not only saves reading time, but also increases the interest and interactivity of reading.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
FIG. 2 is a flow chart of the literature entity optimization module of the present invention;
FIG. 3 is a diagram of a relationship network of people provided by an embodiment of the present invention;
FIG. 4 is a diagram showing a change in the movement track of a person according to an embodiment of the present invention;
FIG. 5 is a graph of a person's emotion change provided by an embodiment of the present invention;
fig. 6 is a graph showing the variation of the frequency of the person's appearance according to the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.
Examples
FIG. 1 is a schematic diagram of a visual reading system for long literature based on cross-domain named entity recognition; as shown in fig. 1, the visual reading system for long literature based on cross-domain named entity recognition comprises:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse granularity entity data set, setting a rule weight network by introducing context constraint rules of specific entity types, optimizing model results, further identifying literature work source texts, and generating fine granularity character entity, place entity and family entity data sets;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
In this embodiment, the literary works are selected as English novel from Song of ice and fire, and the literary entities include character names, place names and family names in the literary works.
In this embodiment, the literature entity recognition module uses an open source crawler tool to capture the novel text of ice and fire, and uses an open source natural language processing tool library SpaCy to perform entity recognition processing, so as to construct a novel text entity data set named GOT, wherein the novel text entity data set includes entities such as person names, place names, family names and the like;
in this embodiment, the implementation steps of the literature entity optimization module (as shown in fig. 2) are as follows:
step 1: using the CONLL03 dataset as the source dataset, multiple language models are pre-trained on the dataset, including BERT, biLSTM, biLSTM +crf based models, etc., that will get a shared semantic representation by learning named entity features in the source dataset.
Step 2: and taking the GOT as a target data set, and performing fine tuning on the pre-trained model. In the fine tuning process, the GOT data set is used for performing supervised training on the model, so that the model is better suitable for entity identification tasks in the literature field.
Step 3: the method comprises the steps of introducing context constraint rules of specific entity types, wherein the specific rules comprise: rules for identifying persona entities, including matching rules expressed as if an entity appears before a verb as a subject or object, and mention rules, it is likely to be a persona entity; the mention rule is expressed as if an entity is mentioned together with a known persona entity in a sentence, it is likely to be a persona entity. Rules for identifying a place entity, including matching rules and description rules, the matching rules being expressed as if an entity occurs after a place preposition (e.g., on, at), then it is likely to be a place entity; the descriptive rule is expressed as if an entity is mentioned together with a known locality entity in a sentence, it is likely to be a locality entity. Rules for identifying a family entity, including descriptive rules, are expressed as if an entity is mentioned together with a known family entity in a sentence, then it is likely to be a family entity.
Step 4: designing a rule weight network, taking the fully-connected neural network as the weight network, taking the characteristic representation of the rule as input, outputting the weight of the rule, adding a weight coefficient in the fine tuning process, and optimizing a model result.
Step 5: by testing a series of model migration methods, the best performance of the BiLSTM+CRF based pre-training model was found, and therefore the model was chosen as the final pre-training model.
Step 6: and (3) re-identifying the entities in the text of 'Bing and Huo Geng' by using the cross-domain named entity identification model based on BiLSTM+CRF optimized in the steps 1 to 4 to obtain a fine-grained entity data set for the subsequent visual analysis and display module.
In this embodiment, the visual analysis and display module for literature entities includes:
the figure relation network visualization unit is used for displaying a complex figure relation network in the literary works and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; and using the open source visualization tool D3.Js, taking the names of the roles as nodes and the relationship strength values among the roles as the weights of the edges, and presenting a complex person relationship network in a visual mode.
In a specific example, the person relationship network visualization unit is implemented by:
step 1: and constructing the character relationship. According to the fine-grained literature entity data set, the interaction times between people are calculated, for example, in a novel, the people and the entities are simultaneously present in the same sentence, and then the existence of one interaction between the two people is judged.
Step 2: character relationship data sets are formed. And (3) arranging the person interaction relation data obtained through statistics into a database table structure suitable for visual display so as to be read and presented during the visual display.
Step 3: and (5) visual display. An interactive character relationship network diagram is constructed by using an open source visualization tool library d3.Js, as shown in fig. 3, in which character nodes represent character entities and basic information of characters including names, nicknames, family information, character profiles, etc. are integrated. The relationship among the people is represented by drawing the connection lines among the people nodes, and the connection lines are added with labels so as to enrich the content of the people relationship network.
Step 4: interaction function. In the visual display process, a user can check personal information of a person, including names, nicknames, family information, person profiles and the like, by clicking on nodes in the network diagram; meanwhile, the user can click on the edges between the nodes to know the relation strength between the person and other people, so as to understand the complexity of the person relation.
In this embodiment, the visual analysis module for literature entities further includes:
the character movement track visualization unit is used for displaying movement tracks and important events of characters in literary works, extracting place entities in character sections in the articles according to the character names and the appearance sequence, and constructing a character track data set; extracting important events by a rule-based matching method; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; the module also integrates interaction functions of storylines and places where different chapters occur, and provides reading navigation tools to assist readers in understanding and tracking complex character relationships and story development;
in a specific example, the implementation steps of the character movement track visualization unit are as follows:
step 1: character track data is extracted. Aiming at literary works of 'ice and fire songs', a name entity recognition technology is used for extracting a person name from chapter names, and comparing the person name with place names on ice and fire Wikipedia, and extracting place names where people appear in relevant chapter texts;
step 2: searching a story map of ice and fire songs, and manually marking each place on the map, including coordinate information of each place.
Step 3: and (3) arranging character, place and family information obtained by using a named entity recognition technology into a JSON data format, and inputting the JSON data format as a data source for visualizing the character track.
Step 4: the interactive line graph and scatter plot (as shown in fig. 4) are implemented using the open source visualization tool library echartis, showing the change in character trajectories. Each character will have its own trajectory, the intersection of the character trajectories representing that there is some event link between the current characters.
Step 5: the track visualization page also integrates the interactive function of chapters and places, so that readers can quickly find places and events, and chapter positioning is realized. The function is used as a navigation tool and has guidance on the reading process of readers.
In this embodiment, the visual analysis and display module for literature entities further includes:
the character emotion change visualization unit is used for displaying the change trend of character emotion along with the development of the drama in the literary works, extracting sentences describing the characters by using an open source tool NLTK, carrying out emotion analysis on the sentences, and calculating the score of each character in different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
in a specific example, the implementation steps of the figure emotion change visualization unit are as follows:
step 1: and extracting sentences describing the characters from the text by using an open source tool NLTK, and analyzing emotion. An emotion dictionary NRC issued by the national research council of canada is used, which contains emotion categories of happiness, fear, sadness, anger, surprise, aversion, trust, and desire. And calculating the score of each character in different emotion dimensions by judging emotion categories in sentences.
Step 2: and (3) sorting the character emotion analysis results, and creating an emotion data set comprising character names and corresponding emotion scores.
Step 3: the ECharts is used for realizing an interactive line graph (shown in fig. 5), and the emotion score and the change condition of the character are intuitively displayed. Each character has an independent emotion change broken line chart, and the size of a specific emotion score can be reflected through the height of the broken line chart.
Step 4: according to the user's needs, the system provides a sliding time axis, and the user can select different time ranges to observe the influence of plot development on the emotion change of the character.
In this embodiment, the visual analysis and display module for literature entities further includes:
the character appearance frequency visualization unit is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a figure out field frequency change line graph by using an open source visualization tool ECharts, wherein a user can check the number of times that a certain character appears in the whole novel by dragging a time axis so as to help understand the importance degree change trend of the figure in the plot development.
In a specific example, the implementation steps of the person attendance frequency analysis unit are as follows:
step 1: and counting the occurrence times of all the people in each section according to the section division of the novel, and constructing a data set of the people's attendance frequency.
Step 2: drawing a line graph of the character appearance frequency by adopting ECharts, wherein the horizontal axis in the graph represents chapters, and the vertical axis represents the character appearance frequency as shown in fig. 6; the user can drag the time axis to check the frequency of the occurrence of the specific role along with the development of the plot in the whole novel so as to help readers understand the change trend of the importance degree of the character in the development of the plot.
Claims (3)
1. A visual reading system for long literature based on cross-domain named entity recognition, comprising:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse-granularity entity data set, optimizing a model result by introducing a context constraint rule design rule weight network of a specific entity type, further identifying literature work source texts and generating a fine-granularity character entity, place entity and family entity data set;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
2. The visual reading system of claim 1, wherein the context constraint rules for a particular entity type in the literature entity optimization module include rules for identifying persona entities, rules for identifying place entities, and rules for identifying family entities.
3. The visual reading system of literature based on cross-domain named entity recognition of claim 1, wherein the visual analysis and display module for literature entities comprises:
the figure relation network visualization is used for showing relation networks among different figures in literary works, and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; using open source visualization tool D3.Js, using character names as nodes and relationship strength values between characters as weights of edges, and displaying a character relationship network in a visual mode;
the character movement track visualization is used for displaying movement tracks and important events of characters in literary works, and extracting place entities in character sections in the articles according to the character names and the appearance sequence to construct a character track data set; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; extracting important events in literary works by a rule-based matching method, and acquiring abstracts, places and chapter information of the important events; integrating the abstract, the place and the chapter of the important event into a figure moving track graph, and providing a reading navigation tool to assist readers in understanding and tracking the figure track and the plot development;
the character emotion change visualization is used for displaying the change trend of character emotion along with the development of the drama in literary works, extracting sentences describing the characters by using an open source tool NLTK, and carrying out emotion analysis on the extracted sentences, namely calculating the score of each character on different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
the character appearance frequency visualization is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a character field frequency change line graph by using an open source visualization tool ECharts, and checking the number of times of a character in the whole novel by dragging a time axis by a user to help readers understand the change trend of the number of field times of the character in the plot development.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311298279.1A CN117473078A (en) | 2023-10-09 | 2023-10-09 | Visual reading system of long literature based on cross-domain named entity recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311298279.1A CN117473078A (en) | 2023-10-09 | 2023-10-09 | Visual reading system of long literature based on cross-domain named entity recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117473078A true CN117473078A (en) | 2024-01-30 |
Family
ID=89636954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311298279.1A Pending CN117473078A (en) | 2023-10-09 | 2023-10-09 | Visual reading system of long literature based on cross-domain named entity recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117473078A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118170919A (en) * | 2024-05-13 | 2024-06-11 | 南昌理工学院 | Method and system for classifying literary works |
-
2023
- 2023-10-09 CN CN202311298279.1A patent/CN117473078A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118170919A (en) * | 2024-05-13 | 2024-06-11 | 南昌理工学院 | Method and system for classifying literary works |
CN118170919B (en) * | 2024-05-13 | 2024-07-19 | 南昌理工学院 | Method and system for classifying literary works |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11551567B2 (en) | System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter | |
CN109189942B (en) | Construction method and device of patent data knowledge graph | |
CN104915446B (en) | Event Evolvement extraction method and its system based on news | |
CN111444326B (en) | Text data processing method, device, equipment and storage medium | |
Gu et al. | " what parts of your apps are loved by users?"(T) | |
Sung et al. | CRIE: An automated analyzer for Chinese texts | |
US20180366013A1 (en) | System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter | |
CN110674271B (en) | Question and answer processing method and device | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN105512687A (en) | Emotion classification model training and textual emotion polarity analysis method and system | |
CN110612524B (en) | Information processing apparatus, information processing method, and recording medium | |
CN111209384A (en) | Question and answer data processing method and device based on artificial intelligence and electronic equipment | |
CN103064956A (en) | Method, computing system and computer-readable storage media for searching electric contents | |
CN111291210A (en) | Image material library generation method, image material recommendation method and related device | |
CN110825867B (en) | Similar text recommendation method and device, electronic equipment and storage medium | |
US20110231448A1 (en) | Device and method for generating opinion pairs having sentiment orientation based impact relations | |
US20160117954A1 (en) | System and method for automated teaching of languages based on frequency of syntactic models | |
CN117473078A (en) | Visual reading system of long literature based on cross-domain named entity recognition | |
CN115599899A (en) | Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph | |
CN114661872A (en) | Beginner-oriented API self-adaptive recommendation method and system | |
Da et al. | Deep learning based dual encoder retrieval model for citation recommendation | |
Morie et al. | Information extraction model to improve learning game metadata indexing | |
CN116578697A (en) | Finance-oriented language emotion analysis and labeling method | |
CN116257618A (en) | Multi-source intelligent travel recommendation method based on fine granularity emotion analysis | |
CN114780755A (en) | Playing data positioning method and device based on knowledge graph and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |