CN104765858A - Construction method for public security synonym library and obtained public security synonym library - Google Patents

Construction method for public security synonym library and obtained public security synonym library Download PDF

Info

Publication number
CN104765858A
CN104765858A CN201510190990.4A CN201510190990A CN104765858A CN 104765858 A CN104765858 A CN 104765858A CN 201510190990 A CN201510190990 A CN 201510190990A CN 104765858 A CN104765858 A CN 104765858A
Authority
CN
China
Prior art keywords
participle
data element
class
neologisms
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510190990.4A
Other languages
Chinese (zh)
Inventor
陈明洁
朱鑫巍
郭平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Branch Office Of Beijing Aerospace Changfeng Science And Technology Industry Group Co Ltd
China Changfeng Science Technology Industry Group Corp Shanghai Branch
Original Assignee
Shanghai Branch Office Of Beijing Aerospace Changfeng Science And Technology Industry Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Branch Office Of Beijing Aerospace Changfeng Science And Technology Industry Group Co Ltd filed Critical Shanghai Branch Office Of Beijing Aerospace Changfeng Science And Technology Industry Group Co Ltd
Priority to CN201510190990.4A priority Critical patent/CN104765858A/en
Publication of CN104765858A publication Critical patent/CN104765858A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a construction method for a public security synonym library and relates to the technical field of data processing. The construction method solves the technical problems that the calculation time is shortened and the calculating speed is increased. The method includes the steps that the synonym library is constructed firstly, a data element library is constructed according to known data elements, and all the data elements are divided into an object class, a feature class and an expression class; when a new word needs to be inserted, the new word is divided into three participles according to the object class, the feature class and the expression class, the object class data element, the feature class data element and the expression class data element in which the three participles appear most frequently are found from the data element library, the matching degree of the new word is calculated, whether the new word is a synonym is judged according to the matching degree of the new word, and if the new word is the synonym, the synonymy between the new word and the data elements is stored into the synonym library. The construction method is suitable for processing data for the public security.

Description

The construction method of public security thesaurus and the public security thesaurus of acquisition
Technical field
The present invention relates to the technical field of data processing, particularly relate to a kind of construction method of public security thesaurus and the public security thesaurus of acquisition.
Background technology
The construction method of existing thesaurus mainly contains following two kinds:
1) manually thesaurus is built;
In the method, relation between the inner word of thesaurus is by language specialist Manual definition, its advantage is that simple effectively but the result that obtains of this method affects comparatively large by the subjective consciousness of people, and its dynamic modificability changed with language change development is poor.
2) study of internet large-scale corpus is constructed to the vector space of word;
The method carries out machine learning based on internet large-scale corpus, net result is more objective, and be easy to change with the change of language material, but account form is complicated, there is computing time long, the defect that computing velocity is slow, and be mainly used in internet, the public security database of relative closure cannot be applied in.
Summary of the invention
For the defect existed in above-mentioned prior art, it is short for computing time that technical matters to be solved by this invention is to provide one, and computing velocity is fast, and the construction method of the objective public security thesaurus of result of calculation.
In order to solve the problems of the technologies described above, the construction method of a kind of public security thesaurus structure provided by the present invention comprises the following steps:
First build a thesaurus, and build data element storehouse according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, the maximum object class data element of first participle occurrence number is found out, the feature class data element that the second participle occurrence number is maximum, and the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
Second aspect of the present invention, there is provided a kind of public security thesaurus installed said method and obtain.
The construction method of public security thesaurus provided by the invention, in available data unit and synon basis, calculate the matching degree of neologisms and available data unit, synonym is thought when matching degree exceedes setting value, and by synonymy stored in thesaurus, its account form is simple, has computing time short, computing velocity is fast, and the objective feature of result of calculation.
Embodiment
Below in conjunction with specific embodiment, technical scheme of the present invention is described in further detail; but the present embodiment is not limited to the present invention; every employing analog structure of the present invention and similar change thereof, all should list protection scope of the present invention in, the pause mark in the present invention all represent and relation.
The construction method of a kind of public security thesaurus that the embodiment of the present invention provides comprises the following steps:
First build a thesaurus, and build data element storehouse (such as at present Shanghai public security department had an appointment 800 data elements) according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, find out the maximum object class data element of first participle occurrence number, and the feature class data element that the second participle occurrence number is maximum, the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
The construction method of public security thesaurus provided by the invention, in available data unit and synon basis, calculate the matching degree of neologisms and available data unit, synonym is thought when matching degree exceedes setting value, and by synonymy stored in thesaurus, its account form is simple, has computing time short, computing velocity is fast, and the objective feature of result of calculation.

Claims (2)

1. a construction method for public security thesaurus, is characterized in that comprising the following steps:
First build a thesaurus, and build data element storehouse according to known data element, and set a matching degree threshold value, and each data element is divided into three types, these three kinds of data element types are respectively object class, feature class, representation class;
When needing when there being neologisms to insert, perform following steps:
1) title of neologisms, type and length is obtained, and neologisms are split according to object class, feature class, this three types of representation class, neologisms are divided into three participles, these three participles are respectively the first participle, the second participle, the 3rd participle, the first participle is wherein object class participle, second participle is feature class participle, and the 3rd participle is representation class participle;
2) from data element storehouse, find out the maximum object class data element of first participle occurrence number, and the feature class data element that the second participle occurrence number is maximum, the representation class data element that the 3rd participle occurrence number is maximum;
3) calculate the matching degree of neologisms, specific formula for calculation is:
P=A×Q1+B×Q2+C×Q3;
A=X1/Xn,B=Y1/Yn,C=Z1/Zn;
Wherein, P is the matching degree of neologisms, A is the similarity in the first participle and data element storehouse, Q1 is object class weighted value, B is the similarity in the second participle and data element storehouse, and Q2 is character right weight values, and C is the similarity in the 3rd participle and data element storehouse, Q3 is representation class weighted value, and Q1, Q2, Q3 are the constant value preset;
Wherein, X1 is the maximal value of the single object class data element occurrence number of the first participle in data element storehouse, Xn is the total degree occurred in all object class data elements of the first participle in data element storehouse, Y1 is the maximal value of the single feature class data element occurrence number of the second participle in data element storehouse, Yn is the total degree occurred in all feature class data elements of the second participle in data element storehouse, Z1 is the maximal value of the single representation class data element occurrence number of the 3rd participle in data element storehouse, Zn is the total degree occurred in all representation class data elements of the 3rd participle in data element storehouse,
4) according to the matching degree of neologisms, neologisms are judged, if the matching degree of neologisms exceedes matching degree threshold value, then judge that neologisms are as synonym, and three data elements in neologisms and data element storehouse are set up synonymy, these three data elements are respectively: occur object class data element that first participle number of times is maximum, occur feature class data element that the second participle number of times is maximum, occur the representation class data element that the 3rd participle number of times is maximum;
5) by the synonymy of neologisms and three data elements stored in thesaurus.
2. the public security thesaurus of construction method acquisition according to claim 1.
CN201510190990.4A 2015-04-21 2015-04-21 Construction method for public security synonym library and obtained public security synonym library Pending CN104765858A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510190990.4A CN104765858A (en) 2015-04-21 2015-04-21 Construction method for public security synonym library and obtained public security synonym library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510190990.4A CN104765858A (en) 2015-04-21 2015-04-21 Construction method for public security synonym library and obtained public security synonym library

Publications (1)

Publication Number Publication Date
CN104765858A true CN104765858A (en) 2015-07-08

Family

ID=53647686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510190990.4A Pending CN104765858A (en) 2015-04-21 2015-04-21 Construction method for public security synonym library and obtained public security synonym library

Country Status (1)

Country Link
CN (1) CN104765858A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095281A1 (en) * 2016-11-25 2018-05-31 阿里巴巴集团控股有限公司 Name matching method and apparatus
CN110222266A (en) * 2019-05-31 2019-09-10 江苏三六五网络股份有限公司 A kind of house property profession phonetic searching system and method based on speech recognition
CN113139657A (en) * 2021-04-08 2021-07-20 北京泰豪智能工程有限公司 Method and device for realizing machine thinking

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006323594A (en) * 2005-05-18 2006-11-30 Ntt Docomo Inc Synonymous word extraction system and synonymous word extraction method
CN101901325A (en) * 2010-07-21 2010-12-01 赵步 Copyright protection method
CN102332137A (en) * 2011-09-23 2012-01-25 纽海信息技术(上海)有限公司 Goods matching method and system
CN103455623A (en) * 2013-09-12 2013-12-18 广东电子工业研究院有限公司 Clustering mechanism capable of fusing multilingual literature
CN103886093A (en) * 2014-04-03 2014-06-25 江苏物联网研究发展中心 Method for processing synonyms of electronic commerce search engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006323594A (en) * 2005-05-18 2006-11-30 Ntt Docomo Inc Synonymous word extraction system and synonymous word extraction method
CN101901325A (en) * 2010-07-21 2010-12-01 赵步 Copyright protection method
CN102332137A (en) * 2011-09-23 2012-01-25 纽海信息技术(上海)有限公司 Goods matching method and system
CN103455623A (en) * 2013-09-12 2013-12-18 广东电子工业研究院有限公司 Clustering mechanism capable of fusing multilingual literature
CN103886093A (en) * 2014-04-03 2014-06-25 江苏物联网研究发展中心 Method for processing synonyms of electronic commerce search engine

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095281A1 (en) * 2016-11-25 2018-05-31 阿里巴巴集团控股有限公司 Name matching method and apparatus
US10726028B2 (en) 2016-11-25 2020-07-28 Alibaba Group Holding Limited Method and apparatus for matching names
CN110222266A (en) * 2019-05-31 2019-09-10 江苏三六五网络股份有限公司 A kind of house property profession phonetic searching system and method based on speech recognition
CN113139657A (en) * 2021-04-08 2021-07-20 北京泰豪智能工程有限公司 Method and device for realizing machine thinking
CN113139657B (en) * 2021-04-08 2024-03-29 北京泰豪智能工程有限公司 Machine thinking realization method and device

Similar Documents

Publication Publication Date Title
AU2017260007A1 (en) System and method for displaying search results for a trademark query in an interactive graphical representation
MX368777B (en) System and method for automatic product matching.
WO2014200724A3 (en) Smart fill
WO2014066106A3 (en) Techniques for input method editor language models using spatial input models
MX2008014865A (en) Method and apparatus for multilingual spelling corrections.
CN106547743B (en) Translation method and system
JP2018081377A5 (en)
CN104765858A (en) Construction method for public security synonym library and obtained public security synonym library
WO2014190220A3 (en) Language model trained using predicted queries from statistical machine translation
WO2019140382A3 (en) Probabilistic modeling system and method
WO2015050321A8 (en) Apparatus for generating self-learning alignment-based alignment corpus, method therefor, apparatus for analyzing destructive expression morpheme by using alignment corpus, and morpheme analysis method therefor
WO2020026229A3 (en) Proposition identification in natural language and usage thereof
AU2019268138A1 (en) Determining digital value of a digital technology initiative
Hales et al. A post-processing method to calibrate large-scale hydrologic models with limited historical observation data leveraging machine learning and spatial analysis
Asim et al. Analytic network process decision making algorithm
Kim College Students' Attitudes toward World Englishes
WO2013182885A8 (en) Cross-language relevance determination device, cross-language relevance determination program, cross-language relevance determination method, and storage medium
MY181113A (en) Method and system for generating phonetically similar masked data
Yu et al. A comparative study of word sense disambiguation of english modal verb by BP neural network and support vector machine
Pathiraja et al. Model Uncertainty Quantification Methods For Data Assimilation In Partially Observed Multi-Scale Systems
로이 Jeju 4.3: Planetary Consciousness and Psychosocial Processes for Social Healing and Reconciliation
WO2016109269A3 (en) Simplified overlay ads
Chowdhury Mithun et al. Weakly Supervised Video Moment Retrieval From Text Queries
Peng et al. Sensitivity of Sea ice Simulations to Ice Dynamic Treatments
Mayes The formation of ultra-compact dwarf galaxies and their black holes in a cosmological simulation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150708