CN109508382A - A kind of label for labelling method and apparatus, computer readable storage medium - Google Patents

A kind of label for labelling method and apparatus, computer readable storage medium Download PDF

Info

Publication number
CN109508382A
CN109508382A CN201811221612.8A CN201811221612A CN109508382A CN 109508382 A CN109508382 A CN 109508382A CN 201811221612 A CN201811221612 A CN 201811221612A CN 109508382 A CN109508382 A CN 109508382A
Authority
CN
China
Prior art keywords
entity
tag types
marked
text
recorded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811221612.8A
Other languages
Chinese (zh)
Other versions
CN109508382B (en
Inventor
徐安华
张亚启
欧阳佑
路德龙
马瑞璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201811221612.8A priority Critical patent/CN109508382B/en
Publication of CN109508382A publication Critical patent/CN109508382A/en
Application granted granted Critical
Publication of CN109508382B publication Critical patent/CN109508382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

This application discloses a kind of label for labelling method and apparatus, computer readable storage medium, which comprises detects whether the entity in text to be marked is pre-recorded entity;If it is pre-recorded entity, then the corresponding tag types state chain of the pre-recorded entity is obtained, the tag types state chain is for storing the tag types sequence marked;According to the tag types state chain, to the entity automatic marking tag types in text to be marked.The application passes through according to the pre-recorded corresponding tag types state chain of entity, to the entity automatic marking tag types in text to be marked, substantially increase the efficiency of label for labelling, effective automatic marking repeats entity, greatly improve the efficiency of label for labelling, the operating quantity for reducing mark personnel considerably increases the friendly used user.

Description

A kind of label for labelling method and apparatus, computer readable storage medium
Technical field
The present invention relates to natural language processing (Natural Language Processing, NLP) technical fields, especially It is related to a kind of label for labelling method and apparatus, computer readable storage medium.
Background technique
It is universal with big data and artificial intelligence (Artificial Intelligence, AI), in enterprise-level application The relevant technology of natural language processing is used by more and more.Currently, although many major companies provide part of speech identification, entity is known Not, hypertext transfer protocol (Hyper Text Transfer Protocol, the HTTP) service of the models such as relation recognition, but this The Natural Language Processing Models overwhelming majority of a little service behinds is obtained by internet data training.And the text in internet Word content sources are extensive: the existing content from professional media, the content for also having netizen individual to generate.Internet text is with enterprise Portion's content of text is compared in the industry, and there are larger differences with writing style for word.Therefore, natural language processing technique is wanted in enterprise Reach preferable effect in grade application, generally require after text marks in enterprise, re -training at be suitable for enterprise from The Natural Language Processing Models that body needs.
For task important in NLP: such as part of speech identification, Entity recognition are required using the textual data in enterprise According to be labeled, then training pattern.In entity mark, many entities that mark can largely appear in different content of text In.Therefore when the entity marked occurs again, this entity maximum probability under current new content of text should be labeled.In order to The a large amount of repetitive operations for simplifying user learn from history label to labelling strategies to be necessary, this has just drawn auxiliary Label repeats entity problem.
Summary of the invention
The embodiment of the invention provides a kind of label for labelling method and apparatus, computer readable storage medium, can be significantly Improve the efficiency of label for labelling.
In order to solve the above-mentioned technical problem, the technical solution of the embodiment of the present invention is achieved in that
The embodiment of the invention provides a kind of label for labelling methods, comprising:
Detect whether the entity in text to be marked is pre-recorded entity;
If it is pre-recorded entity, then the corresponding tag types state chain of the pre-recorded entity, institute are obtained Tag types state chain is stated for storing the tag types sequence marked;According to the tag types state chain, to be marked Text in entity automatic marking tag types.
In one embodiment, it is assumed that the length of the tag types state chain is M, wherein M is natural number, the basis The tag types state chain, to the entity automatic marking tag types in text to be marked, comprising:
Count the times N that the pre-recorded entity occurs in the text to be marked, wherein N is natural number;
If N is less than or equal to M, institute is successively marked using the top n tag types in the tag types state chain State N number of pre-recorded entity in text to be marked;
If N is greater than M, successively marked using M tag types in the tag types state chain described to be marked Text in preceding M pre-recorded entities, use the m-th tag types in the tag types state chain to mark (M+1) in the text to be marked is to entity pre-recorded described in n-th.
In one embodiment, before the method further include:
According to the pre-recorded entity, the text to be marked is segmented.
In one embodiment, it when being segmented to the text to be marked, is carried out using Forward Maximum Method algorithm Participle, the Forward Maximum Method algorithm specifically:, will be described to be marked using the pre-recorded entity as dictionary for word segmentation With the matched continuation character of longest in the dictionary for word segmentation as the participle selected in text.
In one embodiment, the method also includes:
Detect whether the entity in the text to be marked updates tag types and the entity for updating tag types It whether is the pre-recorded entity;
If the entity in the text to be marked updates tag types and the entity for updating tag types is not Pre-recorded entity then records the entity and its corresponding tag types state chain;
If it is pre- that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The entity first recorded carries out the corresponding tag types state chain of the entity corresponding then according to the tag types of the update Modification.
In one embodiment, mark is marked when no to i-th of record of Mr. Yu's entity in the text to be marked When signing type, i-th of tag types is sky in the corresponding tag types state chain of the entity of the record, wherein i For natural number.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage Have one or more program, one or more of programs can be executed by one or more processor, with realize such as with The step of upper described in any item label for labelling methods.
The embodiment of the invention also provides a kind of label for labelling devices, including processor and memory, in which:
The processor is for executing the label for labelling program stored in memory, to realize as described in any of the above item The step of label for labelling method.
The embodiment of the invention also provides a kind of label for labelling devices, including memory module, detection module and automatic marking Module, in which:
Memory module, for storing pre-recorded entity and the corresponding tag types state chain of the entity, the mark Label type state chain is for storing the tag types sequence marked;
Detection module, for detecting whether the entity in text to be marked is reality pre-recorded in the memory module Body notifies automatic marking module if it is pre-recorded entity;
Automatic marking module obtains reality pre-recorded in the memory module for receiving the notice of detection module The corresponding tag types state chain of body, according to the tag types state chain, to the entity automatic marking in text to be marked Tag types.
In one embodiment, the label for labelling device further includes logging modle, in which:
Detection module is also used to, and whether the entity detected in the text to be marked updates tag types and the update Whether the entity of tag types is the pre-recorded entity, if the entity in the text to be marked updates tag class The entity of type and the update tag types is not the pre-recorded entity of the memory module, sends the first notice to record mould Block;If it is the storage that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The pre-recorded entity of module sends the second notice to logging modle;
Logging modle records the entity and its corresponding tag types for receiving the first notice of detection module State chain is to the memory module;The second notice for receiving detection module, according to the tag types of the update, is deposited to described The corresponding tag types state chain of the entity recorded in storage module is modified accordingly.
The technical solution of the embodiment of the present invention, has the following beneficial effects:
Label for labelling method and apparatus provided in an embodiment of the present invention, computer readable storage medium, by according in advance The corresponding tag types state chain of the entity of record proposes the entity automatic marking tag types in text to be marked significantly The high efficiency of label for labelling, effective automatic marking repeat entity, greatly improve the efficiency of label for labelling, reduce mark The operating quantity of note personnel considerably increases the friendly used user.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow diagram of label for labelling method of the embodiment of the present invention;
Fig. 2 is a kind of text structure schematic diagram by label for labelling of the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of the part labels type state chain of the text entry in Fig. 2;
Fig. 4 is that the another kind of the embodiment of the present invention passes through the text structure schematic diagram of label for labelling;
Fig. 5 is the structural schematic diagram of the tag types state chain of the text entry in Fig. 4;
Fig. 6 is a kind of structural schematic diagram of label for labelling device of the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another label for labelling device of the embodiment of the present invention;
Fig. 8 is the structural schematic diagram of another label for labelling device of the embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature can mutual any combination.
Natural language processing, be the data such as voice, text are handled, are converted, a major class problem of Extracting Information General name.Entity, emphasis refers to name Entity recognition (the Named Entity in natural language processing field here Recognition, NER), but it is not limited to name entity.Relationship, here emphasis refer to entity in natural language processing field with Relationship between entity.Entity recognition, from input text in extract the entity with certain semantic information, as name, the date, Place, organization etc..Relation recognition, from the pass extracted in input text between the entity and entity with certain semantic information System, such as parent and child, employ, hold a post, geographical relationship.Training, refer in machine learning field, machine according to training data with And loss function updates the process of model parameter.Chinese word segmentation (Chinese Word Segmentation, CWS) refers to One chinese character sequence is cut into individual word one by one.Participle be exactly by continuous word sequence according to certain specification again It is combined into the process of word sequence.
As shown in Figure 1, a kind of label for labelling method according to an embodiment of the present invention, includes the following steps:
Step 101: detecting whether the entity in text to be marked is pre-recorded entity;
Illustratively, a text structure marked is as shown in Fig. 2, the entity and the corresponding label of entity that part records Type state chain is as shown in figure 3, the entity of the record includes " tesla ", " radio ", " wireless remotecontrol ship ", " wireless Electrical remote control technology ", " radio communication " etc., the entity of each record correspond to a respective tag types state chain.
In one embodiment of this invention, before the method further include:
According to the pre-recorded entity, the text to be marked is segmented.
Before segmenting to the text to be marked, a vocabulary prepared in advance, each word in vocabulary is The pre-recorded entity;When being segmented to the text to be marked, by vocabulary in the text to be marked In each word select one by one.
In one embodiment of this invention, when segmenting to the text to be marked, Forward Maximum Method is used Algorithm is segmented, the Forward Maximum Method algorithm specifically:, will be described using the pre-recorded entity as dictionary for word segmentation With the matched continuation character of longest in the dictionary for word segmentation as the participle selected in text to be marked.
Illustratively, in Fig. 2 " he produces first wireless remotecontrol ship in the world, and radio remote control technology obtains Patent ", although " radio " is suffered in vocabulary, since subsequent word can form together longer word with " radio " " wireless remotecontrol ship " and " radio remote control technology " (this i.e. Forward Maximum Method algorithm), therefore, the participle selected are " wireless Electrical remote control ship " and " radio remote control technology ".
Step 102: if it is pre-recorded entity, then obtaining the corresponding tag types shape of the pre-recorded entity State chain, the tag types state chain is for storing the tag types sequence marked;It is right according to the tag types state chain Entity automatic marking tag types in text to be marked.
In one embodiment of this invention, it is assumed that the length of the tag types state chain is M, wherein M is natural number, In the step 102 according to the tag types state chain, to the entity automatic marking tag types in text to be marked, Include:
Count the times N that the pre-recorded entity occurs in the text to be marked, wherein N is natural number;
If N is less than or equal to M, institute is successively marked using the top n tag types in the tag types state chain State N number of pre-recorded entity in text to be marked;
If N is greater than M, successively marked using M tag types in the tag types state chain described to be marked Text in preceding M pre-recorded entities, use the m-th tag types in the tag types state chain to mark (M+1) in the text to be marked is to entity pre-recorded described in n-th.
Illustratively, in the text marked as shown in Figure 4, the corresponding tag types state of entity " watt " of record For chain as shown in figure 5, when marking next text to be marked, the corresponding tag types of first entity " watt " are labeled as list Position (Unit), the corresponding tag types of second entity " watt " are labeled as company (Company), after third and third The corresponding tag types of entity " watt " be labeled as name (Name).
In one embodiment of this invention, the method also includes:
Detect whether the entity in the text to be marked updates tag types and the entity for updating tag types It whether is the pre-recorded entity;
If the entity in the text to be marked updates tag types and the entity for updating tag types is not Pre-recorded entity then records the entity and its corresponding tag types state chain;
If it is pre- that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The entity first recorded carries out the corresponding tag types state chain of the entity corresponding then according to the tag types of the update Modification.
Using the embodiment of the present invention label for labelling method when, if the entity currently marked has tag types state chains Recording status then updates or covers original recording status (when the length of existing tag types state chain is less than or equal to new mark When signing the length of type state chain, existing tag types state chain is covered by new tag types state chain;When existing tag class When the length of type state chain is greater than the length of new tag types state chain, existing label is updated using new tag types state chain The state of the front of type state chain and the isometric record of new tag types state chain);Any tag types state if it does not exist The recording status of chain then creates record.
Illustratively, it is assumed that Mention is the array of the entity to be marked in text, and Mention [j] will mark for j-th The entity of note, the entity to be marked are generally a character string in text;Tag is tag types array, and Tag [k] is The title of kth kind tag types, the tag types can be name, address, date, company, invention, unit etc., Mention The appearance each time of [j] in the text all corresponds to a Tag [k], and wherein Tag [k] is possible to as sky;Position is described The position for the entity to be marked, such as k-th of position that the entity to be marked for j-th occurs in the text, can be registered as Position[j][k];MentionTag [j] [k] indicates the label of k-th of the position of j-th of entity to be marked in the text Type, wherein j, k are the integer more than or equal to 0;It is described to update the process for covering original recording status and newly-built record It is as follows:
For all position Position [j] [k] that Mention [j] occurs, each state MentionTag is obtained [j] [k], if it is not labeled, MentionTag [j] [k]=None, if it is marked as Tag [s] or is revised as Tag [s], wherein s is the integer more than or equal to 0, then MentionTag [j] [k]=Tag [s], for the entity in Fig. 4 " watt ", the entity recorded, tag types and tag types state chain are as follows:
Mention [0]=watt;Position [0] [k] (k=0~6);Tag [0]=' Unit ' (unit);Tag[1] =' Company ';Tag [2]=' Name ';MentionTag [0] [0]=Tag [0], MentionTag [0] [1]=Tag [1], MentionTag [0] [2]=Tag [2], MentionTag [0] [3]=Tag [2], MentionTag [0] [4]=Tag [2], MentionTag [0] [5]=Tag [2], MentionTag [0] [6]=Tag [2].
When the tag types state chain of current record to be applied in text to be marked, all records are first found out Entity name Mention [j], all Mention [s] recorded can be used at this time to the unlabelled content of full text (marked content can be sky, or be also possible to the tag types state chain application result of some other high priority) carries out Forward Maximum Method finds out the position Position [j] [k] of all Mention [j] appearance, then uses the mark of current record The content stored in label type state chain MentionTag [j] [k] is labeled.
When using each of MentionTag [j] [k] to text Mention [j], if Mention [j] is in the text Frequency of occurrence k1 be less than or equal to the length k2 of tag types state chain, then directly take preceding k1 in tag types state chain Tag types successively carry out automatic marking to the Mention [j] [0] to Mention [j] [k1-1] in text;If The frequency of occurrence k1 of Mention [j] in the text is greater than the length k2 of tag types state chain, then uses tag types state chain In k2 tag types automatic marking is successively carried out to the Mention [j] [0] to Mention [j] [k2-1] in text, it is right Mention [j] [k2] to Mention [j] [k1-1] in text uses 2 tag types MentionTag [j] of kth [k2-1] carries out automatic marking, wherein k1, k2 are natural number.
In one embodiment of this invention, it records described in the text to be marked i-th of certain entity and does not mark (tag types of this entity may be deleted, it is also possible to be never marked), the entity pair of the record when tag types I-th of tag types is sky in the tag types state chain answered, wherein i is natural number.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage Have one or more program, one or more of programs can be executed by one or more processor, with realize such as with The step of upper described in any item label for labelling methods.
The embodiment of the invention also provides a kind of label for labelling devices, including processor and memory, in which:
The processor is for executing the label for labelling program stored in memory, to realize as described in any of the above item The step of label for labelling method.
As shown in fig. 6, a kind of label for labelling device according to an embodiment of the present invention, including memory module 601, detection module 602 and automatic marking module 603, in which:
Memory module 601, it is described for storing pre-recorded entity and the corresponding tag types state chain of the entity Tag types state chain is for storing the tag types sequence marked;
Detection module 602, for detecting whether the entity in text to be marked is to remember in advance in the memory module 601 The entity of record notifies automatic marking module 603 if it is pre-recorded entity;
Automatic marking module 603 obtains in the memory module 601 in advance for receiving the notice of detection module 602 The corresponding tag types state chain of the entity of record, according to the tag types state chain, to the entity in text to be marked Automatic marking tag types.
Illustratively, a text structure marked is as shown in Fig. 2, the entity and the corresponding label of entity that part records Type state chain is as shown in figure 3, the entity of the record includes " tesla ", " radio ", " wireless remotecontrol ship ", " wireless Electrical remote control technology ", " radio communication " etc., the entity of each record correspond to a respective tag types state chain.
In one embodiment of this invention, it is assumed that the length of the tag types state chain is M, wherein M is natural number, The automatic marking module 603 according to the tag types state chain, to the entity automatic marking mark in text to be marked Sign type, comprising:
Count the times N that the pre-recorded entity occurs in the text to be marked, wherein N is natural number;
If N is less than or equal to M, institute is successively marked using the top n tag types in the tag types state chain State N number of pre-recorded entity in text to be marked;
If N is greater than M, successively marked using M tag types in the tag types state chain described to be marked Text in preceding M pre-recorded entities, use the m-th tag types in the tag types state chain to mark (M+1) in the text to be marked is to entity pre-recorded described in n-th.
Illustratively, in the text marked as shown in Figure 4, the corresponding tag types state of entity " watt " of record For chain as shown in figure 5, when the automatic marking module 603 marks next text to be marked, first entity " watt " is right The tag types mark answered is unit (Unit), and the corresponding tag types of second entity " watt " are labeled as company (Company), third and the later corresponding tag types of entity " watt " of third are labeled as name (Name).
In one embodiment of this invention, as shown in fig. 7, the label for labelling device further includes word segmentation module 604, In:
The word segmentation module 604, for according to entity pre-recorded in the memory module 601, to described to be marked Text segmented.
604 prepared in advance vocabularys of the word segmentation module, each word in vocabulary is the pre-recorded entity, According to the vocabulary, each word in vocabulary is selected one by one in the text to be marked.
In this embodiment, when segmenting to the text to be marked, the word segmentation module 604 uses forward direction most Big matching algorithm is segmented, the Forward Maximum Method algorithm specifically: using the pre-recorded entity as dictionary for word segmentation, Using in the text to be marked with the matched continuation character of longest in the dictionary for word segmentation as the participle selected.
Illustratively, in Fig. 2 " he produces first wireless remotecontrol ship in the world, and radio remote control technology obtains Patent ", although " radio " is suffered in vocabulary, since subsequent word can form together longer word with " radio " " wireless remotecontrol ship " and " radio remote control technology " (this i.e. Forward Maximum Method algorithm), therefore, the word segmentation module 604 select Participle out is " wireless remotecontrol ship " and " radio remote control technology ".
In one embodiment of this invention, as shown in figure 8, the label for labelling device further includes logging modle 605, In:
Detection module 602 is also used to, and detects whether the entity in the text to be marked updates tag types and described Whether the entity for updating tag types is the pre-recorded entity, if the entity in the text to be marked updates mark Signing type and the entity for updating tag types is not the pre-recorded entity of the memory module 601, sends the first notice To logging modle 605;If the entity in the text to be marked updates tag types and the reality for updating tag types Body is the pre-recorded entity of the memory module 601, sends the second notice to logging modle 605;
Logging modle 605 records the entity and its corresponding mark for receiving the first notice of detection module 602 Type state chain is signed to the memory module 601;The second notice for receiving detection module 602, according to the label of the update Type modifies the corresponding tag types state chain of the entity recorded in the memory module 601 accordingly.
Using the embodiment of the present invention label for labelling device when, it is described if the entity currently marked has recording status Logging modle 605 updates or covers original recording status (when the length of existing tag types state chain is less than or equal to new mark When signing the length of type state chain, existing tag types state chain is covered by new tag types state chain;When existing tag class When the length of type state chain is greater than the length of new tag types state chain, existing label is updated using new tag types state chain The state of the front of type state chain and the isometric record of new tag types state chain);Recording status if it does not exist, the then note Record the newly-built record of module 605.
Illustratively, it is assumed that Mention is the array of the entity to be marked in text, and Mention [j] will mark for j-th The entity of note, the entity to be marked are generally a character string in text;Tag is tag types array, and Tag [k] is The title of kth kind tag types, the tag types can be name, address, date, company, invention, unit etc., Mention The appearance each time of [j] in the text all corresponds to a Tag [k], and wherein Tag [k] is possible to as sky;Position is described The position for the entity to be marked, such as k-th of position that the entity to be marked for j-th occurs in the text, can be registered as Position[j][k];MentionTag [j] [k] indicates the label of k-th of the position of j-th of entity to be marked in the text Type, wherein j, k are the integer more than or equal to 0;Update described in the logging modle 605 covers original recording status It is as follows with the process of newly-built record:
For all position Position [j] [k] that Mention [j] occurs, each state MentionTag is obtained [j] [k], if it is not labeled, MentionTag [j] [k]=None, if it is marked as Tag [s] or is revised as Tag [s], wherein s is the integer more than or equal to 0, then MentionTag [j] [k]=Tag [s], for the entity in Fig. 4 " watt ", the entity recorded, tag types and tag types state chain are as follows:
Mention [0]=watt;Position [0] [k] (k=0~6);Tag [0]=' Unit ' (unit);Tag[1] =' Company ';Tag [2]=' Name ';MentionTag [0] [0]=Tag [0], MentionTag [0] [1]=Tag [1], MentionTag [0] [2]=Tag [2], MentionTag [0] [3]=Tag [2], MentionTag [0] [4]=Tag [2], MentionTag [0] [5]=Tag [2], MentionTag [0] [6]=Tag [2].
When the tag types state chain of current record to be applied in text to be marked, all records are first found out Entity name Mention [j], all Mention [s] recorded can be used at this time to the unlabelled content of full text (marked content can be sky, or be also possible to the tag types state chain application result of some other high priority) carries out Forward Maximum Method finds out the position Position [j] [k] of all Mention [j] appearance, then uses the mark of current record The content stored in label type state chain MentionTag [j] [k] is labeled.
When using each of MentionTag [j] [k] to text Mention [j], if Mention [j] is in the text Frequency of occurrence k1 be less than or equal to the length k2 of tag types state chain, then directly take preceding k1 in tag types state chain Tag types successively carry out automatic marking to the Mention [j] [0] to Mention [j] [k1-1] in text;If The frequency of occurrence k1 of Mention [j] in the text is greater than the length k2 of tag types state chain, then uses tag types state chain In k2 tag types automatic marking is successively carried out to the Mention [j] [0] to Mention [j] [k2-1] in text, it is right Mention [j] [k2] to Mention [j] [k1-1] in text uses 2 tag types MentionTag [j] of kth [k2-1] carries out automatic marking, wherein k1, k2 are natural number.
In one embodiment of this invention, it records described in the text to be marked i-th of certain entity and does not mark When tag types (tag types of this entity may be deleted, it is also possible to be never marked), the logging modle 605 is remembered I-th of tag types is sky in the corresponding tag types state chain of the entity of record, wherein i is natural number.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, one or more integrated circuits can be used also to realize in all or part of the steps of above-described embodiment.Accordingly Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
The above is only a preferred embodiment of the present invention, and certainly, the invention may also have other embodiments, without departing substantially from this In the case where spirit and its essence, those skilled in the art make various corresponding changes in accordance with the present invention And deformation, but these corresponding changes and modifications all should fall within the scope of protection of the appended claims of the present invention.

Claims (10)

1. a kind of label for labelling method characterized by comprising
Detect whether the entity in text to be marked is pre-recorded entity;
If it is pre-recorded entity, then the corresponding tag types state chain of the pre-recorded entity, the mark are obtained Label type state chain is for storing the tag types sequence marked;According to the tag types state chain, to text to be marked Entity automatic marking tag types in this.
2. the method according to claim 1, wherein assume the tag types state chain length be M, In, M is natural number, it is described according to the tag types state chain, to the entity automatic marking tag class in text to be marked Type, comprising:
Count the times N that the pre-recorded entity occurs in the text to be marked, wherein N is natural number;
If N is less than or equal to M, using the top n tag types in the tag types state chain successively mark described in N number of pre-recorded entity in the text of mark;
If N is greater than M, the text to be marked is successively marked using M tag types in the tag types state chain The preceding M pre-recorded entities in this, using described in the m-th tag types mark in the tag types state chain (M+1) in text to be marked is to entity pre-recorded described in n-th.
3. the method according to claim 1, wherein before the method further include:
According to the pre-recorded entity, the text to be marked is segmented.
4. according to the method described in claim 3, it is characterized in that, being used when being segmented to the text to be marked Forward Maximum Method algorithm is segmented, the Forward Maximum Method algorithm specifically: with the pre-recorded entity be point Word dictionary, using in the text to be marked with the matched continuation character of longest in the dictionary for word segmentation as the participle selected.
5. the method according to claim 1, wherein the method also includes:
Detect the entity in the text to be marked whether update tag types and it is described update tag types entity whether For the pre-recorded entity;
If it is not preparatory that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The entity of record then records the entity and its corresponding tag types state chain;
If it is to remember in advance that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The entity of record repairs the corresponding tag types state chain of the entity then according to the tag types of the update accordingly Change.
6. according to the method described in claim 5, it is characterized in that, working as in the text to be marked to the i-th of Mr. Yu's entity When a record is without mark tag types, i-th of institute in the corresponding tag types state chain of the entity of the record Tag types are stated as sky, wherein i is natural number.
7. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage have one or Multiple programs, one or more of programs can be executed by one or more processor, to realize such as claim 1 to 6 Any one of described in label for labelling method the step of.
8. a kind of label for labelling device, which is characterized in that including processor and memory, in which:
The processor is for executing the label for labelling program stored in memory, to realize such as any one of claims 1 to 6 The step of described label for labelling method.
9. a kind of label for labelling device, which is characterized in that including memory module, detection module and automatic marking module, in which:
Memory module, for storing pre-recorded entity and the corresponding tag types state chain of the entity, the tag class Type state chain is for storing the tag types sequence marked;
Detection module, for detecting whether the entity in text to be marked is entity pre-recorded in the memory module, If it is pre-recorded entity, automatic marking module is notified;
Automatic marking module obtains entity pair pre-recorded in the memory module for receiving the notice of detection module The tag types state chain answered, according to the tag types state chain, to the entity automatic marking label in text to be marked Type.
10. label for labelling device according to claim 9, which is characterized in that further include logging modle, in which:
Detection module is also used to, and whether the entity detected in the text to be marked updates tag types and the update label Whether the entity of type is the pre-recorded entity, if entity in the text to be marked update tag types and The entity for updating tag types is not the pre-recorded entity of the memory module, sends the first notice to logging modle; If it is the storage mould that the entity in the text to be marked, which updates tag types and the entity for updating tag types, The pre-recorded entity of block sends the second notice to logging modle;
Logging modle records the entity and its corresponding tag types state for receiving the first notice of detection module Chain is to the memory module;The second notice for receiving detection module, according to the tag types of the update, to the storage mould The corresponding tag types state chain of the entity recorded in block is modified accordingly.
CN201811221612.8A 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium Active CN109508382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811221612.8A CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811221612.8A CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109508382A true CN109508382A (en) 2019-03-22
CN109508382B CN109508382B (en) 2020-08-21

Family

ID=65746753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811221612.8A Active CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109508382B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006309559A (en) * 2005-04-28 2006-11-09 Dainippon Printing Co Ltd Web browser equipped with bookmark function, bookmark management method and bookmark management program
CN101799802A (en) * 2009-02-05 2010-08-11 日电(中国)有限公司 Method and system for extracting entity relationship by using structural information
US20160224543A1 (en) * 2012-12-10 2016-08-04 General Electric Company System and method for extracting ontological information from a body of text
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system
CN106156286A (en) * 2016-06-24 2016-11-23 广东工业大学 Type extraction system and method towards technical literature knowledge entity
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108647319A (en) * 2018-05-10 2018-10-12 思派(北京)网络科技有限公司 A kind of labeling system and its method based on short text clustering

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006309559A (en) * 2005-04-28 2006-11-09 Dainippon Printing Co Ltd Web browser equipped with bookmark function, bookmark management method and bookmark management program
CN101799802A (en) * 2009-02-05 2010-08-11 日电(中国)有限公司 Method and system for extracting entity relationship by using structural information
US20160224543A1 (en) * 2012-12-10 2016-08-04 General Electric Company System and method for extracting ontological information from a body of text
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system
CN106156286A (en) * 2016-06-24 2016-11-23 广东工业大学 Type extraction system and method towards technical literature knowledge entity
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108647319A (en) * 2018-05-10 2018-10-12 思派(北京)网络科技有限公司 A kind of labeling system and its method based on short text clustering

Also Published As

Publication number Publication date
CN109508382B (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN107766371B (en) Text information classification method and device
CN109902271B (en) Text data labeling method, device, terminal and medium based on transfer learning
CN107577662A (en) Towards the semantic understanding system and method for Chinese text
CN108334493B (en) Question knowledge point automatic extraction method based on neural network
GB2432448A (en) Method and system for word sequence processing
CN106776538A (en) The information extracting method of enterprise's noncanonical format document
CN110197279B (en) Transformation model training method, device, equipment and storage medium
CN107343223A (en) The recognition methods of video segment and device
CN111723569A (en) Event extraction method and device and computer readable storage medium
CN110209828A (en) Case querying method and case inquiry unit, computer equipment and storage medium
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN114186056A (en) Commodity label labeling method and device, equipment, medium and product thereof
CN110321549A (en) Based on the new concept method for digging for serializing study, relation excavation, Time-Series analysis
CN113434688B (en) Data processing method and device for public opinion classification model training
CN110196963A (en) Model generation, the method for semantics recognition, system, equipment and storage medium
CN104809105A (en) Method and system for identifying event argument and argument role based on maximum entropy
CN109446523A (en) Entity attribute extraction model based on BiLSTM and condition random field
CN112101029A (en) College instructor recommendation management method based on bert model
CN110348017A (en) A kind of text entities detection method, system and associated component
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
CN110442858B (en) Question entity identification method and device, computer equipment and storage medium
CN107526724A (en) For marking the method and device of language material
CN112084788A (en) Automatic marking method and system for implicit emotional tendency of image captions
CN110866394A (en) Company name identification method and device, computer equipment and readable storage medium
CN109508382A (en) A kind of label for labelling method and apparatus, computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant