CN108062306A - A kind of index system establishment system and method for business environment evaluation - Google Patents

A kind of index system establishment system and method for business environment evaluation Download PDF

Info

Publication number
CN108062306A
CN108062306A CN201711479622.7A CN201711479622A CN108062306A CN 108062306 A CN108062306 A CN 108062306A CN 201711479622 A CN201711479622 A CN 201711479622A CN 108062306 A CN108062306 A CN 108062306A
Authority
CN
China
Prior art keywords
text data
data
lexical
index system
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711479622.7A
Other languages
Chinese (zh)
Inventor
常子青
孙玉权
夏耘海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201711479622.7A priority Critical patent/CN108062306A/en
Publication of CN108062306A publication Critical patent/CN108062306A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of index system establishment system and method for business environment evaluation, wherein, which includes:Text data acquisition module, for obtaining the text data set related with business environment data;Lexical feature determining module, for determining the lexical feature of each text data of text data concentration;Data clusters processing module for text data set to be carried out clustering processing based on the lexical feature of statistics, obtains multiple text data subsets after clustering processing;Index system establishes module, for determining the key word information of each text data subset, and establishes based on definite key word information to evaluate the index system of business environment.The present invention realizes the automatic structure for the index system that business environment is evaluated by text data feature extraction and corresponding Clustering Analysis Technology, avoids objectivity and limitation caused by artificial set quota, accuracy and adaptability are preferable.

Description

A kind of index system establishment system and method for business environment evaluation
Technical field
The present invention relates to business environment assessment technique field, in particular to a kind of index body of business environment evaluation System's structure system and method.
Background technology
With the rapid development of economic society, the problems such as economic speedup gearshift, structural adjustment throe are brought Challenge.Currently, governments at all levels are using the good business environment of construction as the important handgrip for tackling above-mentioned challenge.How to ensure passing through Business environment evaluation in Ji evolution, identification influence the factor of business environment evaluation, manager can be facilitated to understand in time And regulated and controled.
At present, the artificial set quota of generally use evaluates business environment.However, due to countries in the world politics and warp The difference for system of helping, artificial set quota can not be completely suitable for every country, and applicability is poor, it is also contemplated that artificial regulation Index depends on the formulation of artificial experience, lacks certain objectivity and accuracy.
As it can be seen that business environment can be evaluated there is an urgent need for a kind of strong applicability, high degree of automation, accuracy are good Index system establishment scheme.
The content of the invention
In view of this, the index system establishment system it is an object of the invention to provide a kind of evaluation of business environment and side Method, automatic structure avoid objectivity and limitation caused by artificial set quota for the index system of business environment evaluation Property, accuracy and adaptability are preferable.
In a first aspect, the present invention provides a kind of business environment evaluation index system establishment system, the system comprises:
Text data acquisition module, for obtaining the text data set related with business environment data;
Lexical feature determining module, for determining the lexical feature of each text data of the text data concentration;
Data clusters processing module, for being carried out the text data set at cluster based on the definite lexical feature Reason, obtains multiple text data subsets after clustering processing;
Index system establishes module, for determining the key word information of each text data subset, and based on definite pass Keyword information is established to evaluate the index system of the business environment.
With reference to first aspect, the present invention provides the first possible embodiment of first aspect, wherein, the vocabulary Characteristic determination module includes:
Data word segmentation processing unit, each text data for being concentrated to the text data carry out word segmentation processing, obtain Lexical data after to word segmentation processing;
Lexical data screening unit, for according to each lexical data after the word segmentation processing in the text data set The order of middle frequency of occurrence from high to low, default quantity lexical data before filtering out;
Lexical feature determination unit, for being directed to each text data that the text data is concentrated, according to what is filtered out The frequency that each lexical data occurs in this article notebook data determines the lexical feature of this article notebook data.
The possible embodiment of with reference to first aspect the first, second the present invention provides first aspect are possible Embodiment, wherein, the lexical feature determination unit, specifically for being each text data based on the lexical data filtered out Establish feature vector;For each text data, each in this article notebook data described filter out is determined using TF-IDF algorithms The score of lexical data, and the score is determined as in the corresponding feature vector of this article notebook data and the lexical data corresponding element Value.
With reference to first aspect, the present invention provides the third possible embodiment of first aspect, wherein, the index Establishing module includes keyword determination unit and index system establishes unit, wherein:
The keyword determination unit, will be each in text data subset for being directed to each text data subset Text data carries out word segmentation processing, obtains the lexical data after word segmentation processing;Existed according to each lexical data after word segmentation processing The frequency occurred in text data subset determines the key word information of text data subset;
The index system establishes unit, for the key word information determined based on the keyword determination unit, establishes For evaluating the index system of the business environment.
The third possible embodiment with reference to first aspect, the 4th kind the present invention provides first aspect are possible Embodiment, wherein, the index system establishes unit, should based on definite specifically for being directed to each text data subset Multiple key word informations of text data subset determine the corresponding subject information of text data subset;Believed based on the theme Breath determines the corresponding first order evaluation index of text data subset;According to the first order evaluation index, this article notebook data Multiple key word informations of collection and the user obtained pay close attention to index, determine the corresponding first order of text data subset Multiple second level evaluation indexes under evaluation index.
Second aspect, the present invention also provides a kind of index system construction method of business environment evaluation, the method bags It includes:
Obtain the text data set related with business environment data;
Determine the lexical feature for each text data that the text data is concentrated;
The text data set is carried out by clustering processing based on the definite lexical feature, is obtained more after clustering processing A text data subset;
It determines the key word information of each text data subset, and is established based on definite key word information for evaluating State the index system of business environment.
With reference to second aspect, the present invention provides the first possible embodiment of second aspect, wherein, it is described to determine The lexical feature for each text data that the text data is concentrated, including:
Word segmentation processing is carried out to each text data that the text data is concentrated, obtains the vocabulary number after word segmentation processing According to;
According to each lexical data after the word segmentation processing frequency of occurrence is concentrated in the text data from high to low Sequentially, default quantity lexical data before filtering out;
For each text data that the text data is concentrated, according to each lexical data filtered out in this article notebook data The frequency of middle appearance determines the lexical feature of this article notebook data.
With reference to the first possible embodiment of second aspect, second the present invention provides second aspect is possible Embodiment, wherein, each text data concentrated for the text data exists according to each lexical data filtered out The frequency occurred in this article notebook data determines the lexical feature of this article notebook data, including:
Feature vector is established for each text data based on the lexical data filtered out;
For each text data, each vocabulary number filtered out in this article notebook data is determined using TF-IDF algorithms According to score, and the score is determined as the value in the corresponding feature vector of this article notebook data with the lexical data corresponding element.
With reference to second aspect, the present invention provides the third possible embodiment of second aspect, wherein, it is described to determine The key word information of each text data subset, including:
For each text data subset, each text data in text data subset is subjected to word segmentation processing, is obtained Lexical data after to word segmentation processing;
According to the frequency that each lexical data after word segmentation processing occurs in text data subset, text number is determined According to the key word information of subset.
With reference to the third possible embodiment of second aspect, the 4th kind the present invention provides second aspect is possible Embodiment, wherein, the index system for establishing to evaluate the business environment based on definite key word information, bag It includes:
For each text data subset, based on multiple key word informations of definite text data subset, determining should The corresponding subject information of text data subset;
The corresponding first order evaluation index of text data subset is determined based on the subject information;
According to the first order evaluation index, multiple key word informations of text data subset and the user obtained Index is paid close attention to, determines multiple second level evaluation indexes under the corresponding first order evaluation index of text data subset.
The index system establishment system and method for business environment evaluation provided by the invention, text data acquisition module obtain Take the text data set related with business environment data;Lexical feature determining module determines each textual data that text data is concentrated According to lexical feature;Text data set is carried out clustering processing by data clusters processing module based on the lexical feature of statistics, is obtained Multiple text data subsets after clustering processing;Index system establishes the keyword letter that module determines each text data subset Breath, and based on definite key word information foundation for evaluating the index system of business environment, that is, it passes through text data spy Sign extraction and corresponding Clustering Analysis Technology realize the automatic structure of the index system of business environment evaluation, avoid artificial rule Determine objectivity caused by index and limitation, accuracy and adaptability are preferable.
For the above objects, features and advantages of the present invention is enable to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Description of the drawings
It in order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of scope, for those of ordinary skill in the art, without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of structure of the index system establishment system for business environment evaluation that the embodiment of the present invention is provided Schematic diagram;
Fig. 2 shows vocabulary in the index system establishment system that a kind of business environment that the embodiment of the present invention is provided is evaluated The structure diagram of characteristic determination module;
Fig. 3 shows index in the index system establishment system that a kind of business environment that the embodiment of the present invention is provided is evaluated The structure diagram of Establishing module;
Fig. 4 to Fig. 5 shows a kind of index system establishment system for business environment evaluation that the embodiment of the present invention is provided Application note figure;
Fig. 6 shows a kind of flow of the index system construction method for business environment evaluation that the embodiment of the present invention is provided Figure;
Fig. 7 shows the stream of the index system construction method for another business environment evaluation that the embodiment of the present invention is provided Cheng Tu;
Fig. 8 shows the stream of the index system construction method for another business environment evaluation that the embodiment of the present invention is provided Cheng Tu;
Fig. 9 shows the stream of the index system construction method for another business environment evaluation that the embodiment of the present invention is provided Cheng Tu;
Figure 10 shows the index system construction method for another business environment evaluation that the embodiment of the present invention is provided Flow chart.
Main element symbol description:
11st, text data acquisition module;22nd, lexical feature determining module;33rd, data clusters processing module;44th, index body System establishes module;221st, data word segmentation processing unit;222nd, lexical data screening unit;223rd, lexical feature determination unit; 441st, keyword determination unit;442nd, index system establishes unit.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention The technical solution in the embodiment of the present invention is clearly and completely described in middle attached drawing, it is clear that described embodiment is only It is part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can configure to arrange and design with a variety of.Therefore, below to provide in the accompanying drawings the present invention The detailed description of embodiment is not intended to limit the scope of claimed invention, but is merely representative of the selected reality of the present invention Apply example.Based on the embodiment of the present invention, institute that those skilled in the art are obtained on the premise of creative work is not made There is other embodiment, belong to the scope of protection of the invention.
In view of being evaluated in correlation technique using artificial set quota business environment, due to countries in the world politics and The difference of economic structure, artificial set quota can not be completely suitable for every country, and applicability is poor, it is also contemplated that artificial rule Determine the formulation that index depends on artificial experience, lack certain objectivity and accuracy.In view of this, the embodiment of the present invention Provide a kind of index system establishment system and method for business environment evaluation, the index that automatic structure is evaluated for business environment System, avoids objectivity and limitation caused by artificial set quota, and accuracy and adaptability are preferable.
Referring to Fig. 1, for the structural representation of the index system establishment system of business environment provided by the embodiments of the present application evaluation Figure, the index system establishment system include:
Text data acquisition module 11, for obtaining the text data set related with business environment data;
Lexical feature determining module 22, for determining the lexical feature of each text data of text data concentration;
Data clusters processing module 33 for text data set to be carried out clustering processing based on definite lexical feature, obtains Multiple text data subsets after to clustering processing;
Index system establishes module 44, for determining the key word information of each text data subset, and based on definite Key word information is established to evaluate the index system of business environment.
Here, each textual data concentrated for the text data set of above-mentioned acquisition, the embodiment of the present invention to text data According to feature extraction is carried out, corresponding lexical feature is obtained.Wherein, which is the high frequency vocabulary concentrated according to text data What the frequency of appearance of the data in each text data determined.The embodiment of the present invention is also based on definite each text data Text data set is carried out cluster analysis by lexical feature, will be belonged to a kind of text data cluster together together, is obtained each text Then data subset carries out word frequency analysis to each text data subset again, obtain the pass corresponding to each text data subset Keyword information, and establish based on the key word information to evaluate the index system of business environment.
Wherein, above-mentioned text data set is the set of the text data related with business environment data.This article notebook data can It is obtained in a manner of by data-interface and/or web crawlers, for data-interface level, above-mentioned text data can be with It is that the data-interface accurately opened from internet site (such as Chinese Government's net) is obtained, for web crawlers level, on It states text data and web crawlers technology may be employed, if Python realizes the function of reptile, crawl from network and business environment The related text data of data, for two levels of integrated data interface and web crawlers, the embodiment of the present invention can specifically lead to It crosses following manner and obtains text data set:
First, it is total from Department of Commerce, the State Administration for Industry and Commerce, State Administration of Taxation, quality inspection that more than 1500 pieces are had collected in being netted from Chinese Government Office, the National Audit Office, National Development and Reform Committee, house and department of the town and country construction Bu Deng central government issue with the relevant policy of business environment File carries out word frequency analysis to policy text, 10 keywords for referring to that number is most is picked out from all policy documents:Battalion Quotient ring border, simplify administrative procedures and delegate powers to the lower levels, business system, put pipe clothes, pressure the time limit, excellent flow, the market access, negative inventory, five card unification, one shine One yard, then, based on this 10 keywords, the whole network search is carried out, such as writes crawlers using Python, to related text Zhang Jinhang is crawled, and crawls related article 209236 in 171 home Web sites altogether, using more than 200,000 piece articles as the text obtained Data set, it is seen then that the data volume of the text data set obtained using aforesaid way is more enriched, more precisely.
In addition, in order to further improve the efficiency subsequently handled text data set, the embodiment of the present invention can be with After above-mentioned more than 200,000 piece article is got, article cleaning is carried out.Specifically, the embodiment of the present invention can be carried out at duplicate removal first Reason, that is, removing the article that title is identical or content is identical, is then further screened further according to article length, that is, It is incoherent to remove too short (as less than default number of words (such as 50 words)), too long (as more than default number of words (such as 5000 words)) and content Article, the embodiment of the present invention can then filter out 99536, article as business environment and analyze sample, and the business environment is divided Analysis sample is further analyzed and handled as text data set.
In order to preferably carry out lexical feature extraction to the text data that above-mentioned text data is concentrated, the embodiment of the present invention is first Word frequency analysis first is carried out to text data set, word frequency analysis is then carried out to each text data based on word frequency analysis result again. Referring to Fig. 2, the lexical feature determining module 22 in the embodiment of the present invention includes:
Data word segmentation processing unit 221, each text data for being concentrated to text data carry out word segmentation processing, obtain Lexical data after word segmentation processing;
Lexical data screening unit 222, for being concentrated out according to each lexical data after word segmentation processing in text data The existing order of the frequency from high to low, default quantity lexical data before filtering out;
Lexical feature determination unit 223, it is each according to what is filtered out for being directed to each text data of text data concentration The frequency that lexical data occurs in this article notebook data determines the lexical feature of this article notebook data.
Here, jiebaR bags can be selected to carry out the text data set of above-mentioned acquisition in R language development environment first Word segmentation processing obtains the lexical data after word segmentation processing, these lexical datas then are concentrated what is occurred according in text data The order of the frequency from high to low, default quantity (1000 such as preceding) lexical data, is finally based on the word filtered out before filtering out Data of converging establish feature vector for each text data, then, for each text data, using word frequency-reverse document-frequency (term frequency-inverse document frequency, TF-IDF) algorithm determines each to sieve in this article notebook data The score for the lexical data selected, and the score is determined as in the corresponding feature vector of this article notebook data and the lexical data pair Answer the value of element.
Wherein, after the embodiment of the present invention presets quantity lexical data before filtering out, based on the vocabulary number filtered out According to before establishing feature vector for each text data, the above-mentioned lexical data filtered out can also further be screened, The embodiment of the present invention can be based on the data similarity between each lexical data and business environment data filtered out, from all Filtered out in lexical data with the relevant lexical data of business environment data, to improve the accurate of follow-up text data word frequency analysis Degree and efficiency.
In the embodiment of the present invention, K-Means clustering methods may be employed to textual data based on above-mentioned definite lexical feature Cluster analysis, above-mentioned data clusters processing module 33, specifically for randomly selecting out k text from text data concentration are carried out according to collection Barycenter of the notebook data as cluster;Remaining text data is concentrated to distribute to minimum with this article notebook data distance text data In cluster corresponding to barycenter;Wherein, the distance between remaining text data and each barycenter are special by vocabulary between the two Sign similarity determines;The barycenter of each cluster is recalculated, cluster distribution is re-started based on the barycenter after calculating, until sentencing When the barycenter before updated barycenter and updating that breaks meets pre-determined distance threshold value, stop cluster distribution, after obtaining clustering processing Multiple text data subsets.
Text data set can be divided into default quantity (such as 30) by the embodiment of the present invention using K-Means clustering methods Text data subset, that is, text data set is divided into 30 classes, it, can root for sorted each text data subset According to the frequency that each lexical data in text data subset occurs, the keyword corresponding to text data subset is determined Information, and based on definite key word information, determines corresponding first order evaluation index, then based on the first order evaluation index, And above-mentioned key word information and user pay close attention to index, determine multiple second level evaluation indexes under first order evaluation index.Tool Body, referring to Fig. 3, index system provided in an embodiment of the present invention, which establishes module 44, includes keyword determination unit 441 and index Establishing unit 442, wherein:
Keyword determination unit 441, for being directed to each text data subset, by each text in text data subset Notebook data carries out word segmentation processing, obtains the lexical data after word segmentation processing;According to each lexical data after word segmentation processing at this The frequency occurred in text data subset determines the key word information of text data subset;
Index system establishes unit 442, for being directed to each text data subset, based on definite text data subset Multiple key word informations, determine the corresponding subject information of text data subset;This article notebook data is determined based on subject information The corresponding first order evaluation index of subset;According to first order evaluation index, multiple key word informations of text data subset, with And the user obtained pays close attention to index, determines multiple second level evaluation under the corresponding first order evaluation index of text data subset Index.
Here, for each text data subset, can be determined based on definite multiple key word informations corresponding Subject information can then determine the corresponding first order evaluation index of text data subset based on subject information, according to this first Grade evaluation index, multiple key word informations of text data subset and the user's concern index obtained, it may be determined that this article Multiple second level evaluation indexes under the corresponding first order evaluation index of notebook data subset, and so on, it can also determine this article Multiple third level evaluation indexes under the corresponding second level evaluation index of notebook data subset.As it can be seen that the embodiment of the present invention passes through text Notebook data feature extraction and corresponding Clustering Analysis Technology realize the automatic structure of the index system of business environment evaluation, avoid Objectivity and limitation caused by artificial set quota, accuracy and adaptability are preferable.
Wherein, the embodiment of the present invention can be closely related with business environment reform according to being mentioned in relevant policies file 10 departments, with reference to big data word frequency analysis as a result, selecting each department service amount maximum or the highest operational indicator of attention rate Index is paid close attention to as user.
What deserves to be explained is for 30 classification, as shown in figure 4, each sequence number both corresponds to a classification, each Classification is corresponding with one group of key word information, corresponding subject information can be determined based on every group of key word information, based on theme Information can then include each group of key word information in a corresponding classification, the summarized results based on each classification To automatically generate first order evaluation index.It, can for the above-mentioned each first order evaluation index automatically generated referring to Fig. 5 To pay close attention to index according to corresponding key word information and user, determine that the evaluation of the second level under each first order evaluation index refers to Mark, and so on, it can also determine the third level evaluation index under each second level evaluation index.
Based on same inventive concept, the index system establishment system with business environment evaluation is additionally provided in the embodiment of the present application It unites corresponding index system construction method, the principle and the embodiment of the present application solved the problems, such as due to the method in the embodiment of the present application These parameters system construction system is similar, and because the implementation of the method may refer to the implementation of system, overlaps will not be repeated.Such as Shown in Fig. 6, the flow chart for the index system construction method that the business environment that is provided by the embodiment of the present application is evaluated, this method bag It includes:
S101, the acquisition text data set related with business environment data;
S102, the lexical feature for determining each text data that text data is concentrated;
S103, text data set is carried out by clustering processing based on definite lexical feature, obtained multiple after clustering processing Text data subset;
S104, the key word information for determining each text data subset, and be used for based on definite key word information foundation Evaluate the index system of business environment.
In the embodiment of the present invention, referring to Fig. 7, above-mentioned steps 102 specifically comprise the following steps:
S201, word segmentation processing is carried out to each text data that text data is concentrated, obtains the vocabulary number after word segmentation processing According to;
S202, according to each lexical data after word segmentation processing frequency of occurrence from high to low suitable is concentrated in text data Sequence, default quantity lexical data before filtering out;
S203, each text data concentrated for text data, according to each lexical data filtered out in text number According to the frequency of middle appearance, the lexical feature of this article notebook data is determined.
Referring to Fig. 8, above-mentioned steps 203 specifically comprise the following steps:
S301, feature vector is established for each text data based on the lexical data filtered out;
S302, for each text data, the vocabulary for determining each to filter out in this article notebook data using TF-IDF algorithms The score of data, and the score is determined as in the corresponding feature vector of this article notebook data and the lexical data corresponding element Value.
Referring to Fig. 9, the above-mentioned key word information for determining each text data subset specifically comprises the following steps:
S401, for each text data subset, each text data in text data subset is carried out at participle Reason, obtains the lexical data after word segmentation processing;
S402, the frequency occurred according to each lexical data after word segmentation processing in text data subset, determining should The key word information of text data subset.
Referring to Figure 10, the above-mentioned index system for establishing to evaluate business environment based on definite key word information, specifically Include the following steps:
S501, for each text data subset, based on multiple key word informations of definite text data subset, really Determine the corresponding subject information of text data subset;
S502, the corresponding first order evaluation index of text data subset is determined based on subject information;
S503, the user according to first order evaluation index, multiple key word informations of text data subset and acquisition Index is paid close attention to, determines multiple second level evaluation indexes under the corresponding first order evaluation index of text data subset.
The index system establishment system and method for business environment evaluation provided in an embodiment of the present invention, text data obtain Module 11 obtains the text data set related with business environment data;Lexical feature determining module 22 determines what text data was concentrated The lexical feature of each text data;Lexical feature of the data clusters processing module 33 based on statistics gathers text data set Class processing, obtains multiple text data subsets after clustering processing;Index system establishes module 44 and determines each text data The key word information of collection, and based on definite key word information foundation for evaluating the index system of business environment, that is, it is logical Crossing text data feature extraction and corresponding Clustering Analysis Technology realizes the automatic structure for the index system that business environment is evaluated, Objectivity and limitation caused by artificial set quota are avoided, accuracy and adaptability are preferable.
In addition, the index system establishment system and method for business environment evaluation provided in an embodiment of the present invention can also be brought Following technique effect:
1), the index system established covers the various aspects of business environment, the effect in a government office handled affairs including enterprise Rate is investigated, and government more accords with the various aspects such as the social environment of the service ability of enterprise in society and economic construction and city Close the needs of China's actual conditions.
2), the evaluation index in the index system established can quantify, and avoid the presence of subjective index and uncertain index, Estimate and digitized percentage, accuracy and better adaptability using the index for having dimension.
3), establish index system each data may be employed data survey and the public data mode that is combined of acquisition into Row obtains, and such as examining and approving data can be obtained by carrying out data survey to government department and enterprise, government services and society Environmental data can directly be gathered by data disclosed in government and society and by internet and media, so that establishing Index system possess stronger updating ability, better adaptability.
The computer program of the method for the index system establishment for the progress business environment evaluation that the embodiment of the present invention is provided Product, the computer readable storage medium including storing program code, the instruction that said program code includes can be used for performing Method described in previous methods embodiment, specific implementation can be found in embodiment of the method, and details are not described herein.
The system of the index system establishment for the business environment evaluation that the embodiment of the present invention is provided can be the spy in equipment Determine hardware or the software being installed in equipment or firmware etc..In embodiment provided by the present invention, it should be understood that taken off Reveal apparatus and method, can realize by another way.The apparatus embodiments described above are merely exemplary, example Such as, the division of the unit is only a kind of division of logic function, can there is other dividing mode, and example in actual implementation Such as, multiple units or component may be combined or can be integrated into another system or some features can be ignored or does not hold Row.Another, shown or discussed mutual coupling, direct-coupling or communication connection can be by some communications The INDIRECT COUPLING of interface, device or unit or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in embodiment provided by the invention can be integrated in a processing unit, also may be used To be that unit is individually physically present, can also two or more units integrate in a unit.
If the function is realized in the form of SFU software functional unit and is independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be People's computer, server or network equipment etc.) perform all or part of the steps of the method according to each embodiment of the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
It should be noted that:Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing, in addition, term " the One ", " second ", " the 3rd " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Finally it should be noted that:Embodiment described above is only the specific embodiment of the present invention, to illustrate the present invention Technical solution, rather than its limitations, protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art In the technical scope disclosed by the present invention, can still modify to the technical solution recorded in previous embodiment or can be light It is readily conceivable that variation or equivalent substitution is carried out to which part technical characteristic;And these modifications, variation or replacement, do not make The essence of appropriate technical solution departs from the spirit and scope of technical solution of the embodiment of the present invention.The protection in the present invention should all be covered Within the scope of.Therefore, protection scope of the present invention described should be subject to the protection scope in claims.

Claims (10)

1. a kind of index system establishment system of business environment evaluation, which is characterized in that including:
Text data acquisition module, for obtaining the text data set related with business environment data;
Lexical feature determining module, for determining the lexical feature of each text data of the text data concentration;
Data clusters processing module, for the text data set to be carried out clustering processing based on the definite lexical feature, Obtain multiple text data subsets after clustering processing;
Index system establishes module, for determining the key word information of each text data subset, and based on definite keyword Information is established to evaluate the index system of the business environment.
2. index system establishment system according to claim 1, which is characterized in that the lexical feature determining module bag It includes:
Data word segmentation processing unit, each text data for being concentrated to the text data carry out word segmentation processing, are divided Word treated lexical data;
Lexical data screening unit, for being concentrated out according to each lexical data after the word segmentation processing in the text data The existing order of the frequency from high to low, default quantity lexical data before filtering out;
Lexical feature determination unit, for being directed to each text data that the text data is concentrated, according to each word filtered out The frequency that remittance data occur in this article notebook data determines the lexical feature of this article notebook data.
3. index system establishment system according to claim 2, which is characterized in that the lexical feature determination unit, tool Body is used to establish feature vector based on the lexical data filtered out for each text data;For each text data, using TF- IDF algorithms determine the score of each lexical data filtered out in this article notebook data, and the score is determined as the text Value in the corresponding feature vector of data with the lexical data corresponding element.
4. index system establishment system according to claim 1, which is characterized in that the index system, which establishes module, to be included Keyword determination unit and index system establish unit, wherein:
The keyword determination unit, for being directed to each text data subset, by each text in text data subset Data carry out word segmentation processing, obtain the lexical data after word segmentation processing;According to each lexical data after word segmentation processing in this article The frequency occurred in notebook data subset determines the key word information of text data subset;
The index system establishes unit, and for the key word information determined based on the keyword determination unit, foundation is used for Evaluate the index system of the business environment.
5. index system establishment system according to claim 4, which is characterized in that the index system establishes unit, tool Body is used to, for each text data subset, based on multiple key word informations of definite text data subset, determine this article The corresponding subject information of notebook data subset;Determine that the corresponding first order evaluation of text data subset refers to based on the subject information Mark;Referred to according to the user of the first order evaluation index, multiple key word informations of text data subset and acquisition concern Mark, determines multiple second level evaluation indexes under the corresponding first order evaluation index of text data subset.
6. a kind of index system construction method of business environment evaluation, which is characterized in that including:
Obtain the text data set related with business environment data;
Determine the lexical feature for each text data that the text data is concentrated;
The text data set is carried out by clustering processing based on the definite lexical feature, obtains multiple texts after clustering processing Notebook data subset;
It determines the key word information of each text data subset, and establishes based on definite key word information to evaluate the battalion The index system in quotient ring border.
7. index system construction method according to claim 6, which is characterized in that described to determine that the text data is concentrated Each text data lexical feature, including:
Word segmentation processing is carried out to each text data that the text data is concentrated, obtains the lexical data after word segmentation processing;
The order of frequency of occurrence from high to low is concentrated in the text data according to each lexical data after the word segmentation processing, Default quantity lexical data before filtering out;
For each text data that the text data is concentrated, gone out according to each lexical data filtered out in this article notebook data The existing frequency determines the lexical feature of this article notebook data.
8. index system construction method according to claim 7, which is characterized in that described to be concentrated for the text data Each text data, according to the frequency that each lexical data filtered out occurs in this article notebook data, determine this article notebook data Lexical feature, including:
Feature vector is established for each text data based on the lexical data filtered out;
For each text data, each lexical data filtered out in this article notebook data is determined using TF-IDF algorithms Score, and the score is determined as the value in the corresponding feature vector of this article notebook data with the lexical data corresponding element.
9. index system construction method according to claim 6, which is characterized in that described to determine each text data subset Key word information, including:
For each text data subset, each text data in text data subset is subjected to word segmentation processing, is divided Word treated lexical data;
According to the frequency that each lexical data after word segmentation processing occurs in text data subset, this article notebook data is determined The key word information of collection.
10. index system construction method according to claim 9, which is characterized in that described to be believed based on definite keyword Breath is established to evaluate the index system of the business environment, including:
For each text data subset, based on multiple key word informations of definite text data subset, the text is determined The corresponding subject information of data subset;
The corresponding first order evaluation index of text data subset is determined based on the subject information;
According to the first order evaluation index, multiple key word informations of text data subset and the user's concern obtained Index determines multiple second level evaluation indexes under the corresponding first order evaluation index of text data subset.
CN201711479622.7A 2017-12-29 2017-12-29 A kind of index system establishment system and method for business environment evaluation Pending CN108062306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711479622.7A CN108062306A (en) 2017-12-29 2017-12-29 A kind of index system establishment system and method for business environment evaluation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711479622.7A CN108062306A (en) 2017-12-29 2017-12-29 A kind of index system establishment system and method for business environment evaluation

Publications (1)

Publication Number Publication Date
CN108062306A true CN108062306A (en) 2018-05-22

Family

ID=62140955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711479622.7A Pending CN108062306A (en) 2017-12-29 2017-12-29 A kind of index system establishment system and method for business environment evaluation

Country Status (1)

Country Link
CN (1) CN108062306A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190986A (en) * 2018-09-10 2019-01-11 张连祥 Business environment analysis and evaluation system and method based on direct objective data
CN109389321A (en) * 2018-10-30 2019-02-26 北京筑龙信息技术有限责任公司 A kind of price evaluation method and device
CN109636150A (en) * 2018-11-30 2019-04-16 南京市城市规划编制研究中心 A kind of method for building up and its system of smart city " more rule unifications " appraisement system
CN109657070A (en) * 2018-12-11 2019-04-19 南京大学 A kind of construction method of terminal auxiliary SWOT index system
CN109684484A (en) * 2018-12-11 2019-04-26 南京大学 A kind of SWOT index system establishment system
CN110532357A (en) * 2019-09-04 2019-12-03 深圳前海微众银行股份有限公司 Generation method, device, equipment and the readable storage medium storing program for executing of ESG score-system
CN111767401A (en) * 2020-07-02 2020-10-13 中国标准化研究院 NQI index automatic generation method
CN111985836A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Medical insurance scoring index system construction method, device, equipment and storage medium
CN112508376A (en) * 2020-11-30 2021-03-16 中国科学院深圳先进技术研究院 Index system construction method
CN112836038A (en) * 2021-01-21 2021-05-25 中国科学院沈阳自动化研究所 Intelligent recommendation system based on multi-source data credibility

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009117830A1 (en) * 2008-03-27 2009-10-01 Hotgrinds Canada System and method for query expansion using tooltips
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN104008143A (en) * 2014-05-09 2014-08-27 启秀科技(北京)有限公司 Vocational ability index system establishment method based on data mining
CN104598532A (en) * 2014-12-29 2015-05-06 中国联合网络通信有限公司广东省分公司 Information processing method and device
CN105787097A (en) * 2016-03-16 2016-07-20 中山大学 Distributed index establishment method and system based on text clustering
CN105893551A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Method and device for processing data and knowledge graph
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method
CN107103043A (en) * 2017-03-29 2017-08-29 国信优易数据有限公司 A kind of Text Clustering Method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009117830A1 (en) * 2008-03-27 2009-10-01 Hotgrinds Canada System and method for query expansion using tooltips
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN104008143A (en) * 2014-05-09 2014-08-27 启秀科技(北京)有限公司 Vocational ability index system establishment method based on data mining
CN104598532A (en) * 2014-12-29 2015-05-06 中国联合网络通信有限公司广东省分公司 Information processing method and device
CN105787097A (en) * 2016-03-16 2016-07-20 中山大学 Distributed index establishment method and system based on text clustering
CN105893551A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Method and device for processing data and knowledge graph
CN106610955A (en) * 2016-12-13 2017-05-03 成都数联铭品科技有限公司 Dictionary-based multi-dimensional emotion analysis method
CN107103043A (en) * 2017-03-29 2017-08-29 国信优易数据有限公司 A kind of Text Clustering Method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王宇等: "基于电子商务评论的商家信誉维度构建", 《数据分析与知识发现》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190986A (en) * 2018-09-10 2019-01-11 张连祥 Business environment analysis and evaluation system and method based on direct objective data
CN109389321B (en) * 2018-10-30 2021-08-06 北京筑龙信息技术有限责任公司 Item list classification method and device
CN109389321A (en) * 2018-10-30 2019-02-26 北京筑龙信息技术有限责任公司 A kind of price evaluation method and device
CN109636150A (en) * 2018-11-30 2019-04-16 南京市城市规划编制研究中心 A kind of method for building up and its system of smart city " more rule unifications " appraisement system
CN109657070B (en) * 2018-12-11 2023-06-09 南京大学 Construction method of terminal-assisted SWOT index system
CN109684484A (en) * 2018-12-11 2019-04-26 南京大学 A kind of SWOT index system establishment system
CN109657070A (en) * 2018-12-11 2019-04-19 南京大学 A kind of construction method of terminal auxiliary SWOT index system
CN109684484B (en) * 2018-12-11 2023-06-09 南京大学 SWOT index system construction system
CN110532357A (en) * 2019-09-04 2019-12-03 深圳前海微众银行股份有限公司 Generation method, device, equipment and the readable storage medium storing program for executing of ESG score-system
CN110532357B (en) * 2019-09-04 2024-03-12 深圳前海微众银行股份有限公司 ESG scoring system generation method, device, equipment and readable storage medium
CN111767401A (en) * 2020-07-02 2020-10-13 中国标准化研究院 NQI index automatic generation method
CN111767401B (en) * 2020-07-02 2023-04-28 中国标准化研究院 NQI index automatic generation method
CN111985836A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Medical insurance scoring index system construction method, device, equipment and storage medium
CN111985836B (en) * 2020-08-31 2024-04-05 平安医疗健康管理股份有限公司 Medical insurance scoring index system construction method, device, equipment and storage medium
CN112508376A (en) * 2020-11-30 2021-03-16 中国科学院深圳先进技术研究院 Index system construction method
CN112836038A (en) * 2021-01-21 2021-05-25 中国科学院沈阳自动化研究所 Intelligent recommendation system based on multi-source data credibility

Similar Documents

Publication Publication Date Title
CN108062306A (en) A kind of index system establishment system and method for business environment evaluation
Pozzana et al. Measuring bot and human behavioral dynamics
CN104281653B (en) A kind of opining mining method for millions scale microblogging text
CN110795568A (en) Risk assessment method and device based on user information knowledge graph and electronic equipment
CN108804704A (en) A kind of user's depth portrait method and device
CN106650273A (en) Behavior prediction method and device
Rathore et al. Identifying groups of fake reviewers using a semisupervised approach
CN108364199A (en) A kind of data analysing method and system based on Internet user's comment
CN108021651A (en) Network public opinion risk assessment method and device
CN105378732A (en) Subject-matter analysis of tabular data
Cao et al. Applying data mining in money laundering detection for the Vietnamese banking industry
CN102129568A (en) Method for detecting image-based spam email by utilizing improved gauss hybrid model classifier
WO2021111540A1 (en) Evaluation method, evaluation program, and information processing device
CN106651547A (en) Data processing method and apparatus
CN106250398A (en) A kind of complaint classifying content decision method complaining event and device
Nazah et al. An unsupervised model for identifying and characterizing dark web forums
McDonald et al. Finding patterns in behavioral observations by automatically labeling forms of wikiwork in barnstars
CN110472664A (en) A kind of certificate image identification method, device and equipment based on deep learning
CN111784360B (en) Anti-fraud prediction method and system based on network link backtracking
CN112634048A (en) Anti-money laundering model training method and device
CN107506407A (en) A kind of document classification, the method and device called
CN110189016A (en) Technology life cycle appraisal procedure and device
Bayat et al. Estimation of Twitter user's nationality based on friends and followers information
KR101265975B1 (en) Future technology value appraisal system and method
CN109063485A (en) A kind of vulnerability classification statistical system and method based on loophole platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100070, No. 101-8, building 1, 31, zone 188, South Fourth Ring Road, Beijing, Fengtai District

Applicant after: Guoxin Youyi Data Co.,Ltd.

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20180522

RJ01 Rejection of invention patent application after publication