CN104731768B - A kind of location of incident abstracting method towards Chinese newsletter archive - Google Patents

A kind of location of incident abstracting method towards Chinese newsletter archive Download PDF

Info

Publication number
CN104731768B
CN104731768B CN201510097822.0A CN201510097822A CN104731768B CN 104731768 B CN104731768 B CN 104731768B CN 201510097822 A CN201510097822 A CN 201510097822A CN 104731768 B CN104731768 B CN 104731768B
Authority
CN
China
Prior art keywords
newsletter archive
incident
feature
location
place
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510097822.0A
Other languages
Chinese (zh)
Other versions
CN104731768A (en
Inventor
何绯娟
孙霞
缪相林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University City College
Original Assignee
Xian Jiaotong University City College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University City College filed Critical Xian Jiaotong University City College
Priority to CN201510097822.0A priority Critical patent/CN104731768B/en
Publication of CN104731768A publication Critical patent/CN104731768A/en
Application granted granted Critical
Publication of CN104731768B publication Critical patent/CN104731768B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of location of incident abstracting method towards Chinese newsletter archive, participle is carried out to Chinese newsletter archive T first with ICTCLAS Chinese word segmentations instrument, and selection part of speech is mechanism name, the vocabulary of place noun, place name with constituting candidate events point set;Each vocabulary for candidate events in point set again, build comprising contextual feature, position feature, topological characteristic three-dimensional feature vector;Finally using constructed three-dimensional feature vector, each vocabulary using Random Forest graders by candidate events in point set carries out two-value classification according to location of incident and non-event place, so as to realize the extraction of location of incident.Multiclass feature in energy integrated use newsletter archive of the invention, contextual feature, position feature, three feature constitutive characteristic vectors of topological characteristic are extracted, location of incident is identified from participle acquisition mechanism name, place noun, place name using Random Forest graders;The place that media event occurs can be further identified on the basis of place name identification.

Description

A kind of location of incident abstracting method towards Chinese newsletter archive
Technical field
The present invention relates to the text mining in Computer Science and Technology, natural language processing, the analysis of public opinion field, especially It is related to a kind of location of incident abstracting method towards Chinese newsletter archive.
Background technology
In newsletter archive, there is vocabulary or the phrases such as mechanism name, place noun, place name, but their not necessarily things The place that part occurs.For example, in newsletter archive " on June 19th, 2012, the G20 summits held in Mexico's Loews card Butterworth Period, Argentine President's Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " in, There are place name " Mexico ", " Loews card Butterworth ", " Islas Malvinas " three place names, still " Islas Malvinas " simultaneously It is not locale.It is one in event extraction that location of incident how is identified from mechanism name, place noun, place name Individual problem.
Patent on the location of incident extraction towards Chinese newsletter archive.Patent name《A kind of place name identification method and Device》Publication number CN103186524A;A kind of place name identification method and apparatus of the disclosure of the invention, to carry out place name identification. The inventive method includes:Participle is carried out to character string to be identified and obtains candidate word;Each candidate word is obtained in address name storehouse Generic;Each candidate word is traveled through, if the generic of current candidate word is first category, by current candidate word It is added to alternative place name set as place name;If the generic of current candidate word be second category, to current candidate word with And the candidate word closed in the address name storehouse with current candidate word is combined and obtains compound word, and by the compound word It is added to alternative place name set as place name.But the patent can only recognize the place name in text, event can't be identified Point.
The content of the invention
In order to overcome the defect of above-mentioned prior art, it is an object of the invention to provide a kind of towards Chinese newsletter archive Location of incident abstracting method, this method extracts contextual feature, position feature, three feature structures of topological characteristic from newsletter archive Into characteristic vector, outgoing event is recognized from participle acquisition mechanism name, place noun, place name using Random Forest graders Place;The place that media event occurs can be further identified on the basis of place name identification.
To achieve the above objectives, the technical scheme is that:
A kind of location of incident abstracting method towards Chinese newsletter archive, comprises the following steps:
Step one:Extract in candidate events place
(1) first, participle is carried out to Chinese newsletter archive T using ICTCLAS Chinese word segmentations instrument, generation one is by binary The sequence S that group is constitutedT=(w1, p1), (w2, p2) ..., (wi, pi) ..., (wn, pn), wherein, n represents the vocabulary being syncopated as Number, n > 0, wiRepresent the vocabulary that ICTCLAS is syncopated as, piRepresent wiPart of speech;
(2) from STIn select all to meet p successivelyi=" ni " pi=" nl ", piThe binary of one of three kinds of situations of=" ns " Group, pi=" ni " pi=" nl ", pi=" ns " represents corresponding w respectivelyiFor mechanism name, place noun, place name;It is all selected Two tuples in wiConstitute a set WT={ w '1, w '2..., w 'j..., w 'm, WTCandidate events place will be used as Set, m represents WTThe number of middle vocabulary, m > 0;
Step 2:Characteristic vector is built
For set WTIn each w 'j, three features are selected, including:w′jContext in newsletter archive T Context features;w′jPosition feature in newsletter archive T;w′jTopological characteristic in newsletter archive T;The meter of three features Calculate as follows:
Feature one:w′jContextual feature c in newsletter archive Tj
w′jContextual feature w ' in newsletter archive TjThe weight of the regular expression matched is represented, is designated as cj
(1) if w 'jSome regular expression in following table can be matched in newsletter archive T, it is assumed that be k-th, then cj= ak;akRepresent " if k-th of regular expression of terminology match, the vocabulary is the ratio of location of incident ", akCalculation formula be: ak=| Lk|/|Sk|, wherein, SkRepresent the set of all vocabulary of energy k-th of regular expression of successful match in labeled data, Lk Expression belongs to SkAnd be the set that constitutes of vocabulary of location of incident in labeled data, labeled data refers to manually be labelled with event The newsletter archive collection in place;
(2) if w 'jMultiple regular expressions in following table can be matched in newsletter archive T, kth is set to1, k2..., kl (l > 1) individual regular expression, then
(3) if w 'jThe regular expression in following table can not be matched in newsletter archive T, then cj=0.
Feature two:w′jPosition feature p in newsletter archive Tj
pj=loc (w 'j, T), wherein, loc (w 'j, T) and represent vocabulary w 'jThe position occurred first in newsletter archive T, I.e. from text T section starts to vocabulary w 'jOccurs the number of words between position for the first time.
Feature three:w′jTopological characteristic t in newsletter archive Tj
Initialize null set E;For any two tuples (w 'j, w 'j)∈WT×WTAnd, perform following two steps:
STEP1:By character string " http://www.baike.com/wiki/ " and w 'jURL is constituted, URL correspondences are downloaded The page;If downloading less than not handling two tuple (w 'j, w 'i);
STEP2:Utilize regular expression
/<a[^>]*Href=[" ']([^’″>]*)[’″][^>]*>(.*)</a>/ ig matches pagefile, obtains Take all Anchor Texts;If including w ' in Anchor Texti, then E ∪ { (w 'j, w 'i) result be assigned to E;
For any two tuples (w 'j, w 'i)∈WT×WTAnd, perform after above-mentioned two step, generate with WTFor nodal set Close, E is the digraph G of oriented line set;
To any w 'j∈WT, calculate its convergence factor C (w 'j), convergence factor is used to weigh what is connected between different node Tightness degree;
In above formula, GΔ(w′j) represent to include w ' in GjThe quantity for closing three point groups, GΔ(w′j) it is to represent to include w ' in Gj's Open the quantity of three point groups;Close three point groups and refer to any three nodes being connected two-by-two in figure, open three point groups and refer in figure by two sides companies Pick up three nodes come;
Make w 'jTopological characteristic t in newsletter archive TjFor C (w 'j), i.e. tj=C (w 'j);
Utilize w 'jContextual feature, position feature, topological characteristic in newsletter archive T, build w 'jThree-dimensional feature to Measure (cj, pj, tj)。
Step 3:Location of incident is recognized
100-200 characteristic vector is selected, manually a class label (location of incident and non-thing is marked for each characteristic vector Part place), form a training dataset;Using one grader of Random Forest classifier trainings, the grader is utilized Will set WTIn each w 'jTwo-value classification is carried out according to location of incident and non-event place, so as to realize taking out for location of incident Take.
Contextual feature, position feature, topological characteristic that can be according to vocabulary in newsletter archive of the invention, sets up grader, Realize the automatic extraction of location of incident.
Brief description of the drawings
Accompanying drawing 1 is the location of incident extraction process towards Chinese newsletter archive.
Accompanying drawing 2 is to be used to calculate topological characteristic tjA digraph example.
Embodiment
The invention will now be described in detail with reference to the accompanying drawings.
Referring to the drawings, the specific embodiment of this method can be divided into the extraction of candidate events place, characteristic vector structure, event Place recognizes three steps.It is described in detail below:
Step one:Extract in candidate events place
A) first, participle is carried out to Chinese newsletter archive T using ICTCLAS Chinese word segmentations instrument, generation one is by binary The sequence S that group is constitutedT=(w1, p1), (w2, p2) ..., (wi, pi) ..., (wn, pn), wherein, n represents the vocabulary being syncopated as Number, n > 0, wiRepresent the vocabulary that ICTCLAS is syncopated as, piRepresent wiPart of speech;
B) from STIn select all to meet p successivelyi=" ni ", pi=" nl ", piThe binary of one of three kinds of situations of=" ns " Group, pi=" ni ", pi=" nl ", pi=" ns " represents corresponding w respectivelyiFor mechanism name, place noun, place name;It is all selected Two tuples in wiConstitute a set WT={ w '1, w '2..., w 'j..., w 'm, WTCandidate events place will be used as Set.
With newsletter archive, " on June 19th, 2012, during the G20 summits that Mexico's Loews card Butterworth is held, Argentina is total System Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " exemplified by, by above-mentioned step Suddenly, set { Mexico, Loews card Butterworth, Argentina, Britain, Islas Malvinas } can be generated.
Step 2:Characteristic vector is built
For set WTIn each w 'j, three features are selected, including:w′jContext in newsletter archive T (Context) feature;w′jPosition feature in newsletter archive T;w′jTopological characteristic in newsletter archive T;Three features It is calculated as follows:
Feature one:w′jContext (Context) feature c in newsletter archive Tj
w′jContextual feature w ' in newsletter archive TjThe weight of the regular expression matched is represented, is designated as cj
If a) w 'jSome regular expression in following table can be matched in newsletter archive T, it is assumed that be k-th, then cj= ak;akRepresent " if k-th of regular expression of terminology match, the vocabulary is the ratio of location of incident ", akCalculation formula be: ak=| Lk|/|Sk|, wherein, SkRepresent the set of all vocabulary of energy k-th of regular expression of successful match in labeled data, Lk Expression belongs to SkAnd be the set that constitutes of vocabulary of location of incident in labeled data, labeled data refers to manually be labelled with event The newsletter archive collection in place;
If b) w 'jMultiple regular expressions in following table can be matched in newsletter archive T, kth is set to1, k2..., kl (l > 1) individual regular expression, then
If c) w 'jThe regular expression in following table can not be matched in newsletter archive T, then cj=0.
With newsletter archive, " on June 19th, 2012, during the G20 summits that Mexico's Loews card Butterworth is held, Argentina is total System Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " exemplified by, gather { Mo Xi Brother, Loews card Butterworth, Argentina, Britain, Islas Malvinas in " Mexico " can match first regular expression " ^ W+ $ ", then corresponding cjFor 0.64.
Feature two:w′jPosition feature p in newsletter archive Tj
Newsletter archive statistical analysis to artificial mark location of incident shows, when mechanism name, place noun, place name occur Position is more forward, is more likely to be time place.Therefore, introducing position feature, and it is defined as:pj=loc (w 'j, T), wherein, loc(w′j, T) and represent vocabulary w 'jThe position occurred first in newsletter archive T, i.e., from text T section starts to vocabulary w 'jFirst The secondary number of words occurred between position.
It is still " on June 19th, 2012, Argentinian during the G20 summits that Mexico's Loews card Butterworth is held with newsletter archive Presidential Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " exemplified by, " Mexico " It is 12 in the position of the text first, therefore corresponding pjFor 12.
Feature three:w′jTopological characteristic t in newsletter archive Tj
Newsletter archive statistical analysis to artificial mark location of incident shows, if locale has multiple, There is stronger correlation in these location of incidents.Therefore, with place to whether occurring mutually in " interactive encyclopaedia " corresponding page To describe to whether there is correlation.
Initialize null set E;For any two tuples (w 'j, w 'i)∈WT×WTAnd, perform following two steps:
STEP1:By character string " http://www.baike.com/wiki/ " and w 'jURL is constituted, URL correspondences are downloaded The page;If downloading less than not handling two tuple (w 'j, w 'i);
STEP2:Utilize regular expression
/<a[^>]*Href=[" ']([^’″>]*)[’″][^>]*>(.*)</a>/ ig matches pagefile, obtains Take all Anchor Texts;If including w ' in Anchor Texti, then E ∪ { (w 'j, w 'i) result be assigned to E.
For any two tuples (w 'j, w 'i)∈WT×WTAnd, perform after above-mentioned two step, generate with WTFor nodal set Close, E is the digraph G of oriented line set;
With newsletter archive, " on June 19th, 2012, during the G20 summits that Mexico's Loews card Butterworth is held, Argentina is total System Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " exemplified by, Fig. 2 institutes can be generated The digraph shown.
To any w 'j∈WT, calculate its convergence factor C (w 'j), convergence factor is used to weigh what is connected between different node Tightness degree;Here, for weighing multiple degrees of correlation being selected between time place;
In above formula, GΔ(w′j) represent to include w ' in GjThe quantity for closing three point groups, GΔ(w′j) it is to represent to include w ' in Gj's Open the quantity of three point groups.Close three point groups and refer to any three nodes being connected two-by-two in figure, open three point groups and refer in figure by two sides companies Pick up three nodes come;
Make w 'jTopological characteristic t in newsletter archive TjFor C (w 'j), i.e. tj=C (w 'j)。
According to Fig. 2, the topological characteristic of " Mexico "
Utilize w 'jContextual feature, position feature, topological characteristic in newsletter archive T, build w 'jThree-dimensional feature to Measure (cj, pi, tj)。
With newsletter archive, " on June 19th, 2012, during the G20 summits that Mexico's Loews card Butterworth is held, Argentina is total System Christina submits the letters about Islas Malvinas sovereignty to British Prime Minister Cameron " therefore, " Mexico " exists Characteristic vector in this text is (0.64,12,0.6)
Step 3:Location of incident is recognized
100-200 characteristic vectors are selected, manually class label (location of incident and a non-event are marked for each characteristic vector Place), form a training dataset.
According to w 'jContextual feature, position feature, topological characteristic in newsletter archive T, using Random Forest Classifier training goes out a disaggregated model, can be to WTIn vocabulary carry out two-value point according to location of incident and non-event place Class, so as to realize the extraction of location of incident.

Claims (1)

1. a kind of location of incident abstracting method towards Chinese newsletter archive, it is characterised in that comprise the following steps:
Step one:Extract in candidate events place
(1) first, participle is carried out to Chinese newsletter archive T using ICTCLAS Chinese word segmentations instrument, generation one is by two tuple structures Into sequence ST=(w1, p1), (w2, p2) ..., (wi, pi) ..., (wn, pn), wherein, n represents for the vocabulary being syncopated as Number, n > 0, ωiRepresent the vocabulary that ICTCLAS is syncopated as, piRepresent wiPart of speech;
(2) from STIn select all to meet p successivelyi=" ni ", pi=" nl ", piTwo tuples of one of three kinds of situations of=" ns ", pi =" ni ", pi=" nl ", pi=" ns " represents corresponding w respectivelyiFor mechanism name, place noun, place name;All selected two W in tupleiConstitute a set WT={ w '1, w '2..., w 'j..., w 'm, WTUsing as the set in candidate events place, M represents WTThe number of middle vocabulary, m > 0;
Step 2:Characteristic vector is built
For set WTIn each w 'j, three features are selected, including:w′jContext Context in newsletter archive T is special Levy;w′jPosition feature in newsletter archive T;w′jTopological characteristic in newsletter archive T;
Step 3:Location of incident is recognized
100-200 characteristic vector is selected, manually class label (location of incident and non-event is marked for each characteristic vector Point), form a training dataset;Using one grader of Random Forest classifier trainings, it will be collected using the grader Close WTIn each w 'jTwo-value classification is carried out according to location of incident and non-event place, so as to realize the extraction of location of incident;
W ' described in described step twojContext Context features in newsletter archive T;w′jIn newsletter archive T Position feature;w′jTopological characteristic in newsletter archive T, three features are calculated as follows:
Feature one:w′jContextual feature in newsletter archive Tcj
w′jContextual feature w ' in newsletter archive TjThe weight of the regular expression matched is represented, is designated as cj
(1) if w 'jSome regular expression in following table can be matched in newsletter archive T, it is assumed that be k-th, then cj=ak;ak Represent " if k-th of regular expression of terminology match, the vocabulary is the ratio of location of incident ", akCalculation formula be:ak=| Lk|/|Sk|, wherein, SkRepresent the set of all vocabulary of energy k-th of regular expression of successful match in labeled data, LkRepresent Belong to SkAnd be the set that constitutes of vocabulary of location of incident in labeled data, labeled data refers to manually be labelled with location of incident Newsletter archive collection;
(2) if w 'jMultiple regular expressions in following table can be matched in newsletter archive T, kth is set to1, k2..., kl(l > 1) individual regular expression, then
(3) if w 'jThe regular expression in following table can not be matched in newsletter archive T, then cj=0;
Feature two:w′jPosition feature p in newsletter archive Tj
pj=loc (w 'j, T), wherein, loc (w 'j, T) and represent vocabulary w 'jThe position occurred first in newsletter archive T, i.e., from text This T section starts are to vocabulary w 'jOccurs the number of words between position for the first time;
Feature three:w′jTopological characteristic t in newsletter archive Tj
Initialize null set E;For any two tuples (w 'j, w 'i)∈WT×WTAnd, perform following two steps:
STEP1:By character string " http://www.baike.com/wiki/ " and w 'jURL is constituted, the corresponding pages of the URL are downloaded Face;If downloading less than not handling two tuple (w 'j, w 'i);
STEP2:Utilize regular expression
/<a[^>]*Href=[" ']([^'">]*)['"][^>]*>(.*)</a>/ ig matches pagefile, obtains institute There is Anchor Text;If including w ' in Anchor Texti, then E ∪ { (w 'j, w 'i) result be assigned to E;
For any two tuples (w 'j, w 'i)∈WT×WTAnd, perform after above-mentioned two step, generate with WTFor node set, E is The digraph G of oriented line set;
To any w 'j∈WT, calculate its convergence factor C (w 'j), convergence factor is used to weighing connect between different node close Degree;
In above formula, GΔ(w′j) represent to include w ' in GjThe quantity for closing three point groups, G(w′j) it is to represent to include w ' in GjOpen three The quantity of point group;Close three point groups and refer in figure any connected three nodes two-by-two, open three point groups and refer in figure and connected by two sides Three nodes come;
Make w 'jTopological characteristic t in newsletter archive TjFor C (w 'j), i.e. tj=C (w 'j);
Utilize w 'jContextual feature, position feature, topological characteristic in newsletter archive T, build w 'jThree-dimensional feature vector (cj, pj, tj)。
CN201510097822.0A 2015-03-05 2015-03-05 A kind of location of incident abstracting method towards Chinese newsletter archive Expired - Fee Related CN104731768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510097822.0A CN104731768B (en) 2015-03-05 2015-03-05 A kind of location of incident abstracting method towards Chinese newsletter archive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510097822.0A CN104731768B (en) 2015-03-05 2015-03-05 A kind of location of incident abstracting method towards Chinese newsletter archive

Publications (2)

Publication Number Publication Date
CN104731768A CN104731768A (en) 2015-06-24
CN104731768B true CN104731768B (en) 2017-10-20

Family

ID=53455666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510097822.0A Expired - Fee Related CN104731768B (en) 2015-03-05 2015-03-05 A kind of location of incident abstracting method towards Chinese newsletter archive

Country Status (1)

Country Link
CN (1) CN104731768B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528421B (en) * 2015-12-07 2018-09-04 中国人民大学 A kind of search dimension method for digging for query word in mass data
CN107016556B (en) * 2016-01-27 2021-02-05 创新先进技术有限公司 Data processing method and device
CN110298039B (en) * 2019-06-20 2023-05-30 北京百度网讯科技有限公司 Event place identification method, system, equipment and computer readable storage medium
CN111090994A (en) * 2019-11-12 2020-05-01 北京信息科技大学 Chinese-internet-forum-text-oriented event place attribution province identification method
CN113127713A (en) * 2019-12-30 2021-07-16 北大方正集团有限公司 Method, device, equipment and storage medium for identifying news location
CN113111169A (en) * 2020-02-13 2021-07-13 北京明亿科技有限公司 Deep learning model-based alarm receiving and processing text address information extraction method and device
CN113111170A (en) * 2020-02-13 2021-07-13 北京明亿科技有限公司 Method and device for extracting alarm receiving and processing text track ground information based on deep learning model
CN113111166A (en) * 2020-02-13 2021-07-13 北京明亿科技有限公司 Method and device for determining types of alarm receiving and processing places based on deep learning model
CN111581954B (en) * 2020-05-15 2023-06-09 中国人民解放军国防科技大学 Text event extraction method and device based on grammar dependency information
CN111914535B (en) * 2020-07-31 2023-03-24 平安科技(深圳)有限公司 Word recognition method and device, computer equipment and storage medium
CN112307364B (en) * 2020-11-25 2021-10-29 哈尔滨工业大学 Character representation-oriented news text place extraction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699568A (en) * 2013-11-16 2014-04-02 西安交通大学城市学院 Method for extracting hyponymy relation of field terms from wikipedia
CN103902649A (en) * 2014-02-17 2014-07-02 复旦大学 Knowledge extraction method based on online encyclopedia link entities

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130086072A1 (en) * 2011-10-03 2013-04-04 Xerox Corporation Method and system for extracting and classifying geolocation information utilizing electronic social media

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699568A (en) * 2013-11-16 2014-04-02 西安交通大学城市学院 Method for extracting hyponymy relation of field terms from wikipedia
CN103902649A (en) * 2014-02-17 2014-07-02 复旦大学 Knowledge extraction method based on online encyclopedia link entities

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《随机森林特征选择》;王全才;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120715;I138-1504 *
事件报道中地点实体的提取研究;高燕;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111215(第S1期);I141-139 *
基于SimRank 的百度百科词条语义相似度计算;尹坤 等;《山东大学学报(工学版)》;20140317;第44卷(第3期);第0节,图1 *
基于事件框架的突发事件信息抽取;冯礼;《中国优秀硕士学位论文全文数据库 信息科技辑》;20080715;第2.2.1、2.3.2、2.3.4节,图2、7 *

Also Published As

Publication number Publication date
CN104731768A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
CN104731768B (en) A kind of location of incident abstracting method towards Chinese newsletter archive
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN104462066B (en) Semantic character labeling method and device
CN106547739A (en) A kind of text semantic similarity analysis method
JP2022534377A (en) Text classification method, apparatus and computer readable storage medium
CN108121700A (en) A kind of keyword extracting method, device and electronic equipment
CN103870000B (en) The method and device that candidate item caused by a kind of pair of input method is ranked up
CN104809176A (en) Entity relationship extracting method of Zang language
CN107766371A (en) A kind of text message sorting technique and its device
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN112232058A (en) False news identification method and system based on deep learning three-layer semantic extraction framework
CN103761239A (en) Method for performing emotional tendency classification to microblog by using emoticons
CN103473380B (en) A kind of computer version sensibility classification method
CN104361037B (en) Microblogging sorting technique and device
CN110175221A (en) Utilize the refuse messages recognition methods of term vector combination machine learning
CN109710769A (en) A kind of waterborne troops&#39;s comment detection system and method based on capsule network
CN103577989A (en) Method and system for information classification based on product identification
CN105893362A (en) A method for acquiring knowledge point semantic vectors and a method and a system for determining correlative knowledge points
Nikhila et al. Text imbalance handling and classification for cross-platform cyber-crime detection using deep learning
CN107679199A (en) A kind of external the Chinese text readability analysis method based on depth local feature
Mahmud et al. Deep learning based sentiment analysis from Bangla text using glove word embedding along with convolutional neural network
CN110008312A (en) A kind of document writing assistant implementation method, system and electronic equipment
Sagcan et al. Toponym recognition in social media for estimating the location of events

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171020

Termination date: 20180305

CF01 Termination of patent right due to non-payment of annual fee