CN114708000A - Enterprise credit classification system construction method and device, electronic equipment and storage medium - Google Patents

Enterprise credit classification system construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114708000A
CN114708000A CN202210227344.0A CN202210227344A CN114708000A CN 114708000 A CN114708000 A CN 114708000A CN 202210227344 A CN202210227344 A CN 202210227344A CN 114708000 A CN114708000 A CN 114708000A
Authority
CN
China
Prior art keywords
enterprise
data
credit
entities
enterprise credit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210227344.0A
Other languages
Chinese (zh)
Inventor
金鑫
李成龙
杨虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central university of finance and economics
Original Assignee
Central university of finance and economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central university of finance and economics filed Critical Central university of finance and economics
Priority to CN202210227344.0A priority Critical patent/CN114708000A/en
Publication of CN114708000A publication Critical patent/CN114708000A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Animal Behavior & Ethology (AREA)

Abstract

The application provides a method and a device for constructing an enterprise credit classification system, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring enterprise-related public data, fusing and sorting the public data into standard unified data, and storing the standard unified data to obtain a public database; constructing an enterprise knowledge graph according to the public database; selecting an enterprise credit evaluation index according to the enterprise knowledge map; and based on the enterprise credit evaluation indexes, carrying out enterprise credit classification prediction by adopting a classification algorithm. The enterprise credit classification system constructed by the scheme can improve the accuracy of enterprise credit classification.

Description

Enterprise credit classification system construction method and device, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of market economy, and particularly relates to a method and a device for constructing an enterprise credit classification system, electronic equipment and a storage medium.
Background
The credit evaluation of enterprises is an important ring for constructing and improving the market economy order, and the market economy is the credit economy. Today, enterprise business is embodied in many areas. Enterprise credit may generally be expressed in two ways, one that measures whether the enterprise has a willingness to actively fulfill a commitment, and another that measures whether the enterprise has the ability to fulfill the commitment on time. The credit construction in the field of bidding and inviting is an important component part of the social credit system construction, and in the present stage, the field of bidding and inviting has partial illegal distrust phenomena, and the common phenomena of counterfeit materials, series cheating, performance refusal, benefit association and the like seriously damage the competitive environment and infringe the public benefit. Therefore, a credit system in the bidding field is established, government credit monitoring is realized, and the method is favorable for forming a uniform, normative, open, fair and competitive and ordered bidding market.
In the development of the last hundred years, the enterprise credit evaluation method is gradually mature. The technical system of enterprise credit evaluation mainly goes through three stages. The first stage is an experience judgment stage, and enterprise credit is judged according to financial element information such as 6C, 5P, LAPP and the like and experience of evaluators; the second stage is mainly a statistical method stage, such as a linear discriminant model and a quadratic discriminant model; the third phase is an artificial intelligence method phase.
The existing research on enterprise credit classification is mostly based on the external view of the evaluated object, and various methods are applied to judge the possibility of occurrence of default risk, and from the research conclusion of the existing literature, the influencing factors related to enterprise credit risk can be summarized into the following broad categories: financial, corporate, shareholder, credit record features. Wherein the financial characteristics are the important basis for measuring the credit of enterprises.
Because corporate financial data is difficult to obtain or is low in authenticity, more and more researchers are beginning to measure corporate credit using more multidimensional data features. With the development of the internet, the amount of available information increases, data mining technology is gradually applied to assessment of enterprise credit, for example, a data mining method is used in the midsummer to predict credit rating change, a network loan credit prediction model based on an Adaboost model has high classification precision, Chengyun and the like propose a hybrid integration strategy, and an integrated learning model based on an RSA SVM is used to improve the prediction accuracy of a credit risk assessment model.
Disclosure of Invention
The embodiment of the specification aims to provide a method and a device for constructing an enterprise credit classification system, an electronic device and a storage medium.
In order to solve the above technical problem, the embodiments of the present application are implemented as follows:
in a first aspect, the present application provides a method for constructing an enterprise credit classification system, where the method includes:
acquiring enterprise-related public data, fusing and sorting the public data into standard unified data, and storing the standard unified data to obtain a public database;
constructing an enterprise knowledge graph according to the public database;
selecting an enterprise credit evaluation index according to the enterprise knowledge map;
and based on the enterprise credit evaluation indexes, carrying out enterprise credit classification prediction by adopting a classification algorithm.
In one embodiment, building an enterprise knowledge graph from the public database includes:
extracting entities in the public database and the relation between the entities to construct a preliminary enterprise knowledge map;
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph, and supplementing hidden relations among the entities to obtain the enterprise knowledge graph.
In one embodiment, extracting entities and relationships between entities in a public database and constructing a preliminary enterprise knowledge graph includes:
and extracting entities in the public database through the bidirectional long-short term memory model recurrent neural network and the conditional random field, and extracting the relationship among the entities through a dependency syntactic analysis method to construct a preliminary enterprise knowledge map.
In one embodiment, performing hidden knowledge inference on the preliminary enterprise knowledge graph to supplement hidden relations between entities to obtain the enterprise knowledge graph includes:
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph through a TransE model, and supplementing hidden relations among entities to obtain the enterprise knowledge graph.
In one embodiment, the enterprise knowledge graph comprises a plurality of dimensional characteristics of the enterprise;
according to the enterprise knowledge map, selecting an enterprise credit evaluation index, comprising the following steps:
determining feature correlation between each feature and the label corresponding to the feature;
selecting an optimal feature subset from all feature correlations;
all indexes in the optimal feature subset are enterprise credit evaluation indexes.
In one embodiment, selecting the optimal feature subset from all feature correlations includes:
and sorting all the feature correlations in a descending order, and searching an optimal feature subset by adopting an optimal searching method.
In one embodiment, the classification algorithm includes a support vector machine algorithm, a decision tree algorithm, a naive bayes algorithm, a deep forest algorithm.
In a second aspect, the present application provides an apparatus for constructing an enterprise credit classification system, the apparatus comprising:
the data acquisition module is used for acquiring enterprise-related public data, fusing and sorting the public data into standard unified data and storing the standard unified data to obtain a public database;
the map building module is used for building an enterprise knowledge map according to the public database;
the enterprise credit evaluation index selection module is used for selecting enterprise credit evaluation indexes according to an enterprise knowledge map;
and the classification prediction module is used for performing enterprise credit classification prediction by adopting a classification algorithm based on the enterprise credit evaluation indexes.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for constructing the enterprise credit classification system according to the first aspect is implemented.
In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the enterprise credit classification system construction method according to the first aspect.
According to the technical scheme provided by the embodiment of the specification, the enterprise credit classification system constructed by the scheme can improve the accuracy of enterprise credit classification.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a schematic flow chart of a method for constructing an enterprise credit classification system provided by the present application;
FIG. 2 is a schematic representation of entity extraction by Bi-LSTM-CRF as provided herein;
FIG. 3 is a diagram illustrating relationships between entities extracted by a dependency parsing method according to the present application;
FIG. 4 is a graph of four trends of evaluation curves provided herein;
FIG. 5 is a schematic structural diagram of an enterprise credit classification system construction device provided by the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to the skilled person from the description of the present application. The specification and examples are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
In the present application, "parts" are in parts by mass unless otherwise specified.
With the development of computer technology, big data technology becomes an important ring of enterprise credit classification methods, more enterprise information is obtained through the big data technology, enterprise credit conditions are comprehensively and objectively shown, deeper data mining can be performed on a certain industry or field, and a more targeted enterprise credit classification method is provided.
In 2021, the Kong apricot and the like apply a KSVM model to research data of 2019 bidding enterprises in 12 cities of 285 in Jiangsu and Zhejiang, aiming at data in the bidding field. In the research, an author proposes to pay attention to the relationship between two important influence factors of enterprise credit, namely honest behavior and credit ability, and develops credit research inside an enterprise to analyze the enterprise credit so as to provide a new idea for the credit value of the enterprise.
The specific classification process is as follows:
1. collecting data, collecting related data of 292 companies in total in 12 cities of Jiangsu and Zhejiang, cleaning and storing the data, removing inapplicable data, and finally carrying out research and analysis on 285 companies.
2. An enterprise credit value index system is constructed, and specifically comprises enterprise quality, public opinion information, capital strength, bidding behavior monitoring and the like, and comprises 2 first-level indexes, 7 second-level indexes, 17 third-level indexes and 52 fourth-level indexes.
3. And performing empirical analysis on 285 enterprises by using a KSVM method according to the established index system. And (4) forecasting the integrity behavior of the enterprise through the credit capability index on the assumption that the credit capability and the integrity behavior are consistent. The specific KSVM method is described as follows: (1) classifying the sample data II into a training set and a test set, and constructing a second classifier and a support vector set; (2) for test set sample xiCalculating the distance f (x) between the sample and the classifier; (3) and comparing the distance f (x) with a distance threshold epsilon, and selecting a KNN algorithm or a classifier according to the result to classify.
The existing enterprise credit classification method also has the following defects:
1) low credit data utilization for enterprises
In the current credit classification research of domestic enterprises, the relevant information stored in annual newspapers of enterprises, bank databases and the like is mainly used, and the utilization of network public data such as credit notations, bid-winning announcements and the like is lacked, because the public data are often presented in the structures of texts, pictures and the like and cannot be directly used. Therefore, in the current enterprise credit classification, the utilization of network public data is less, and the enterprise credit cannot be measured more perfectly.
Aiming at the defect, the application uses python crawler technology to obtain relevant data of enterprises participating in bidding activities nationwide, wherein the relevant data comprises relevant information obtained from websites such as China government purchasing network, national enterprise credit information public system, enterprise annual newspaper and the like and a wind financial database, and multi-dimensional information such as basic information, bid-winning transaction information, credit information and the like of the relevant enterprises is stored by constructing an enterprise knowledge map in the bidding field, and various types of data stored on the network are fully utilized to depict enterprise images from more angles to classify enterprise credits.
2) Existing enterprise credit data feature selection is inaccurate
At present, the national enterprise credit research mainly uses the financial data of enterprises as a measuring index, but in the field of bidding, the scale of a large part of companies is small, the financial data is not public, so that the financial data of the companies are difficult to obtain or inaccurate, and the credit evaluation method which depends on the financial data as the main characteristic cannot accurately evaluate the credit of the enterprises in the field of bidding; meanwhile, in the field of bidding and tendering, relevant information such as the winning time, the winning frequency, the winning amount, the performance record and the like of an enterprise is also an important characteristic for measuring the credit of the enterprise, and the current credit evaluation method does not consider data of relevant dimensions.
Aiming at the defects, the method designs a proper feature selection method by combining the relevant data of enterprises in the bidding field, incorporates the bid-winning information of the enterprises into the credit evaluation feature and reduces the dependence on the financial data of the enterprises in the credit evaluation.
3) The existing enterprise credit classification system has low precision
For the application of the classification algorithm in enterprise credit classification, the influence of unbalanced data on the accuracy of the classification algorithm is fully considered. When the existing classification algorithm classifies enterprises, data imbalance is rarely considered, and in actual application, the proportion of the enterprise losing credit is small, so that the existing classification algorithm cannot obtain a good result in enterprise credit classification.
Aiming at the defects, bidirectional sampling is carried out on unbalanced enterprise credit data, namely, small-class data are subjected to oversampling, large-class data are subjected to undersampling, an ensemble learning method is adopted, small-class samples are synthesized for many times, and large-class samples are randomly selected to form a plurality of training sets with large differences, so that information loss during undersampling of the large-class samples is reduced, and accuracy of a classification model is improved.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, a flow chart diagram of a method for constructing an enterprise credit classification system provided by an embodiment of the present application is shown. It will be appreciated that the method can be applied to enterprise credit classification in the field of bidding.
As shown in fig. 1, the method for constructing the enterprise credit classification system may include:
and S110, acquiring enterprise-related public data, fusing and arranging the public data into standard unified data for storage, and obtaining a public database.
In the current bidding field, bidding data such as government procurement, bidding documents, credit records and the like are quite dispersed and widely distributed in websites of governments, financial departments, industrial and commercial credits and the like at all levels, and local data is often used by governments without considering information of other areas and institutions when bidding company auditing is carried out by each government. The data are stored in different databases or websites, and the storage structure is various and difficult to be directly integrated and utilized. The method is based on the premise that data in various websites and databases need to be acquired, fused and arranged into standard and unified data for storage and utilization.
Therefore, the application acquires the relevant information of the enterprises participating in bidding activities nationwide by using the Python crawler from the public network and referring to the evaluation standard of the enterprise credit, and the relevant information of the enterprises including the websites such as the Chinese government purchasing network, the national enterprise credit information public system, the annual newspaper of the enterprise and the wind financial database is acquired, and the types of the data relate to structural data, non-structural data and the like. The data are collated and stored in a public database to prepare for the next data analysis work.
S120, constructing an enterprise knowledge graph according to the public database, wherein the construction comprises the following steps:
extracting entities in the public database and the relation between the entities to construct a preliminary enterprise knowledge map;
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph, and supplementing hidden relations among the entities to obtain the enterprise knowledge graph.
Extracting entities in the public database and relations among the entities, and constructing a primary enterprise knowledge graph, wherein the method comprises the following steps:
and extracting entities in the public database through the bidirectional long-short term memory model recurrent neural network and the conditional random field, and extracting the relation between the entities through a dependency syntax analysis method to construct a preliminary enterprise knowledge map.
Wherein, to preliminary enterprise knowledge map, hide knowledge inference, supplement the hidden relation between the entity, obtain enterprise knowledge map, include:
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph through a TransE model, and supplementing hidden relations among entities to obtain the enterprise knowledge graph.
Specifically, for the bidding field, the hidden relationship between the entities includes, for example, an association relationship between bidding companies and review experts, and the like.
The method comprises the steps of extracting entities in unstructured texts through a Bidirectional Long Short Term Memory model cyclic neural network and a Conditional Random field (Bi-LSTM-CRF), extracting the relation among the entities through a dependence syntactic analysis method, and constructing a preliminary enterprise knowledge graph. Referring to FIG. 2, a schematic diagram of entity extraction by a two-way long-short term memory model recurrent neural network and conditional random fields is shown. As shown in fig. 2, the bottom layer is unstructured text, where entities in the unstructured text include a person name, a business name, a place name, an organization name, an address name, etc., and this layer performs BIO label labeling on each word in the text, for example, when the text is a person name, the first last name of the person name is labeled B-Per (e.g., "top" in fig. 2), the word following the person name is labeled I-Per (e.g., "sensitive" in fig. 2), and other non-entity parts are labeled O (e.g., "connected", and "person" in fig. 2); and (3) performing Word vector coding on the marked data by adopting Word2vec (Word vector coding) to convert the marked data into Word vectors, namely x1-x5 in the graph 2, inputting the Word vectors into Bi-LSTM models (including Forward LSTM and Backward LSTM) in a Word vector mode, performing Forward and reverse bidirectional training, outputting labels of BIO (binary output) of the Bi-LSTM models as p1-p5, inputting p1-p5 into a CRF (cross frequency) layer, and outputting accurate labels.
The method of dependency syntax analysis is a method for finding semantic relations existing in sentences by analyzing the dependency relations between every word in the sentences. Specifically, a sentence is firstly split into words, a word source is divided into structures such as a subject, a predicate, an object and the like, and then analysis is performed according to grammatical knowledge to discover the relation among the words in the sentence, so as to further discover the semantic relation hidden in the sentence. And constructing triples formed by the extracted entities, the relationships among the entities and the hidden relationships among the supplemented entities, and storing the triples into a Neo4j database, namely the constructed enterprise knowledge graph. For example, the hayman language platform provides an open-source dependency syntactic analysis system for us, the relation extraction is shown in fig. 3, for example sentences are segmented, part-of-speech tagging is performed on each word, dependency relations among the words are judged according to the part-of-speech tagging in the sentences, the sentences are found to be the main-meaning relations through syntactic analysis, the 'north ming software finite company' and the 'information service platform project' are entities, the core verb is the 'winning-bid', namely the relation between the 'north ming software finite company' and the 'information service platform project' is the 'winning-bid', and therefore triplets of the entities and the entities are extracted to be the 'north ming software finite company, the winning bid, and the information service platform project'.
And (3) constructing an enterprise knowledge graph in the bidding field, and carrying out enterprise portrait from more dimensions. The enterprise image can be analyzed from multiple aspects such as enterprise basic information (stockholders, registered capital, operating years and the like), enterprise credit information (administrative punishment, default records and the like), bidding transaction records (bid-winning times, total bid-winning amount, bid-winning time and the like), and the like, so that the enterprise credit evaluation index can be more accurately extracted, and the enterprise credit can be measured.
According to the embodiment of the application, the enterprise knowledge map in the bidding field is established, the network public data is sorted, the utilization rate of the enterprise related credit data is improved, and the credit image of the enterprise related to bidding is depicted from more dimensions. According to the method, the public data of websites such as a Chinese government purchasing network and a national enterprise credit information public system are acquired, enterprise information with more dimensions is collected, entity relationship completion is conducted through knowledge reasoning, enterprise image portrayal is conducted from aspects such as enterprise shareholder information, registration information, default information and bid winning information, the aspects of enterprise credit are reflected more comprehensively, and support is improved for selecting proper enterprise credit characteristics.
S130, selecting the enterprise credit evaluation indexes according to the enterprise knowledge graph, wherein the selection comprises the following steps:
determining feature correlation between each feature and the label corresponding to the feature;
selecting an optimal feature subset from all feature correlations;
all indexes in the optimal feature subset are enterprise credit evaluation indexes.
Wherein, selecting an optimal feature subset from all feature correlations includes:
and sorting all the feature correlations in a descending order, and searching an optimal feature subset by adopting an optimal searching method.
Specifically, the selection of the enterprise credit evaluation indexes and the construction of the credit model are an important ring for evaluating the enterprise credit.
The feature correlation between a feature and a corresponding tag is first measured by a commonly used method, as follows: chi-squared, Relieff, information gain, gain ratio. Respectively calculating the feature correlation by adopting the method, then carrying out normalization processing on the calculated feature correlation, and then calculating the final feature correlation of each feature through averaging the normalized feature correlation; then, performing descending order on the final feature correlation to obtain an ordered feature list R ═ R1, R2., RN ]; and finally, searching an optimal feature subset by using an optimal searching method.
Defining: evaluation value m (S)i) Is a use of only the feature subset SiThe classification performance metric of the predictor in the training dataset.
The evaluation curve f is based on the feature subset size | Si| to represent the evaluation value m (S)i) The evaluation curves of (2) are shown in fig. 4 as four kinds of variation trends.
Since the feature list is sorted in descending order, when a feature Ri is added to the previous subset Si-1Time, evaluation value m (S)i) Without change, in the four cases shown in fig. 4, the optimal solution with the smallest feature subset size is found. The method comprises the following specific steps:
step 1: initialization It=[at,bt](ii) a t is 0; wherein, atIs the minimum value in the feature list, i.e. RN, btIs the maximum value in the feature list, i.e., R1.
Step 2: when ItIf | is greater than ε, wherein, | It|=bt-atAnd epsilon is an area threshold (which can be set according to actual requirements):
Figure BDA0003536300010000091
Figure BDA0003536300010000092
and step 3: if f is1≥f2Then, It+1=[at,pt+1]Otherwise, It+1=[kt+1,bt]。
And 4, step 4: order tot is t +1, the step 2 is returned to, and the execution is continued until | ItThe final characteristic subset feasible region I is obtainedT=[a(T),b(T)]。
And 5: according to the feasible region I of the final feature subsetT=[a(T),b(T)]Determining a final candidate subset S ═ { S ═ Sj|j∈IT}。
Step 6: from the final candidate subset S ═ Sj|j∈ITFind the optimal feature subset S*And all indexes in the optimal feature subset are enterprise credit evaluation indexes.
For finding the optimal feature subset, first, an optimal value m is found by using an optimal finding method*=maxm(Si) After iteration, a region I can be generatedT=[a(T),b(T)]So that m is*≥max{m(Sa(T)),m(Sb(T)) }; for the final feasible region ITAnd performing sequence backward search to obtain an optimal feature subset, wherein the aim is to find the optimal subset and evaluate the actual performance of the obtained optimal feature subset, and optimize the understandability and the performance of the model. In terms of model performance, AUC was chosen for the measurement.
In the embodiment, a new feature selection method is provided for enterprise credit information in the bidding field. Aiming at the difficulty in acquiring financial data of various companies in the bidding field, the embodiment proposes that the bidding transaction data of the bidding enterprises in the bidding field is used for evaluating the credit of the enterprises, the transaction data, default records and the like of the bidding enterprises are included in the credit evaluation indexes of the enterprises, and the appropriate credit features related to bid winning are selected for classifying the credit of the enterprises.
And S140, based on the enterprise credit evaluation indexes, carrying out enterprise credit classification prediction by adopting a classification algorithm.
Specifically, a proper classification algorithm is selected for enterprise credit classification prediction aiming at the enterprise credit evaluation indexes obtained in the last stage. In practical application, classified data are often unbalanced, a ratio of distressed enterprises is a small part in enterprise classification in the bidding field, and for the unbalanced data, a bidirectional sampling method is adopted to reduce the influence of the unbalanced data. The original data set (namely the characteristics in the enterprise knowledge graph) is divided into different clustering clusters through a K-Means algorithm, then the samples of small class are over-sampled, and the samples of large class are under-sampled. In order to reduce information loss caused in the undersampling process of large samples, an ensemble learning method is used for executing for multiple times, and a plurality of training sets with large differences are obtained for classification. And finally, selecting the most appropriate model to carry out credit classification prediction of enterprises in the bidding field by comparing classification algorithms such as a support vector machine algorithm, a decision tree algorithm, a naive Bayes algorithm, a deep forest algorithm and the like.
In the credit classification algorithm for the unbalanced enterprise credit data samples, the data sets are divided into different classification clusters by the K-Means algorithm, the small samples are oversampled, the large samples are undersampled, in order to reduce data loss caused by undersampling of the large samples, the data sets with large differences are obtained during sampling, the classifiers with the differences are obtained by training, and the classification effect is improved by integrated learning.
At present, the existing enterprise credit evaluation method is rarely specially researched for related enterprises in the bidding field, enterprise credit characteristics do not consider enterprise related information in the bidding field, and the existing classification method too depends on financial indexes and does not well consider the influence of unbalanced data samples in a classification algorithm. On the basis of previous research, enterprise public data of more sources are collected, an enterprise knowledge map in the bidding field is built, the hidden incidence relation between entities is complemented by using a knowledge reasoning technology, the credit classification feature selection of an enterprise is perfected, an enterprise basic information network, an enterprise stakeholder information network and an enterprise bidding transaction information network are built, the dependence on financial data is reduced by using enterprise bid-winning information, the influence of data imbalance on classification accuracy is reduced through bidirectional sampling and integrated learning, and the credit evaluation of enterprises in the bidding field is realized.
Referring to fig. 5, a schematic structural diagram of an enterprise credit classification system construction apparatus according to an embodiment of the application is shown.
As shown in fig. 5, the apparatus 500 for constructing the enterprise credit classification system may include:
the data acquisition module 510 is configured to acquire public data related to an enterprise, fuse the public data, and arrange the public data into standard unified data for storage, so as to obtain a public database;
the map construction module 520 is used for constructing an enterprise knowledge map according to the public database;
an enterprise credit evaluation index selection module 530, configured to select an enterprise credit evaluation index according to an enterprise knowledge graph;
and the classification prediction module 540 is configured to perform enterprise credit classification prediction by using a classification algorithm based on the enterprise credit evaluation index.
Optionally, the map building module 520 is further configured to:
extracting entities in the public database and the relation between the entities to construct a preliminary enterprise knowledge map;
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph, and supplementing hidden relations among the entities to obtain the enterprise knowledge graph.
Optionally, the map building module 520 is further configured to:
and extracting entities in the public database and the relationship between the entities through the bidirectional long-short term memory model recurrent neural network and the conditional random field, and constructing a preliminary enterprise knowledge map.
Optionally, the map building module 520 is further configured to:
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph through a TransE model, and supplementing hidden relations among entities to obtain the enterprise knowledge graph.
Optionally, the enterprise credit evaluation index selecting module 530 is further configured to:
determining feature correlation between each feature and the label corresponding to the feature;
selecting an optimal feature subset from all feature correlations;
and all indexes in the optimal feature subset are the enterprise credit evaluation indexes.
Optionally, the enterprise credit evaluation index selecting module 530 is further configured to:
and sorting all the feature correlations in a descending order, and searching an optimal feature subset by adopting an optimal searching method.
Optionally, the classification algorithm includes a support vector machine algorithm, a decision tree algorithm, a naive bayes algorithm, and a deep forest algorithm.
The device for constructing an enterprise credit classification system provided by this embodiment may implement the embodiments of the method described above, and its implementation principle and technical effect are similar, and are not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 6, a schematic structural diagram of an electronic device 300 suitable for implementing the embodiments of the present application is shown.
As shown in fig. 6, the electronic apparatus 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the apparatus 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 306 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.
In particular, the process described above with reference to fig. 1 may be implemented as a computer software program, according to an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described enterprise credit classification system construction method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 309, and/or installed from the removable medium 311.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some cases constitute a limitation of the unit or module itself.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
As another aspect, the present application also provides a storage medium, which may be the storage medium contained in the foregoing device in the above embodiment; or may be a storage medium that exists separately and is not assembled into the device. The storage medium stores one or more programs that are used by one or more processors to perform the enterprise credit classification system construction methods described herein.
Storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (10)

1. A method for constructing an enterprise credit classification system is characterized by comprising the following steps:
acquiring enterprise-related public data, fusing and sorting the public data into standard unified data, and storing the standard unified data to obtain a public database;
constructing an enterprise knowledge graph according to the public database;
selecting an enterprise credit evaluation index according to the enterprise knowledge map;
and based on the enterprise credit evaluation index, carrying out enterprise credit classification prediction by adopting a classification algorithm.
2. The method of claim 1, wherein said building an enterprise knowledge graph from said public database comprises:
extracting entities in the public database and the relation between the entities to construct a preliminary enterprise knowledge map;
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph, and supplementing hidden relations among the entities to obtain the enterprise knowledge graph.
3. The method of claim 2, wherein the extracting entities and relationships between entities in the public database to construct a preliminary enterprise knowledge graph comprises:
and extracting entities in the public database through a bidirectional long-short term memory model recurrent neural network and a conditional random field, and extracting the relation between the entities through a dependency syntax analysis method to construct a preliminary enterprise knowledge map.
4. The method of claim 2, wherein performing hidden knowledge inference on the preliminary enterprise knowledge graph to supplement hidden relationships between the entities to obtain the enterprise knowledge graph comprises:
and carrying out hidden knowledge inference on the preliminary enterprise knowledge graph through a TransE model, and supplementing hidden relations among the entities to obtain the enterprise knowledge graph.
5. The method of any of claims 1-4, wherein the enterprise knowledge graph comprises a number of dimensional features of an enterprise;
the selecting of the enterprise credit evaluation index according to the enterprise knowledge graph comprises the following steps:
determining feature correlation between each feature and the label corresponding to the feature;
selecting an optimal feature subset from all the feature correlations;
all indexes in the optimal feature subset are the enterprise credit evaluation indexes.
6. The method of claim 5, wherein said selecting an optimal subset of features from all of said feature correlations comprises:
and sorting all the feature correlations in a descending order, and searching the optimal feature subset by adopting an optimal searching method.
7. The method according to any one of claims 1-4, wherein the classification algorithm comprises a support vector machine algorithm, a decision tree algorithm, a naive Bayes algorithm, a deep forest algorithm.
8. An apparatus for constructing an enterprise credit classification system, the apparatus comprising:
the data acquisition module is used for acquiring enterprise-related public data, fusing and sorting the public data into standard unified data and storing the standard unified data to obtain a public database;
the map construction module is used for constructing an enterprise knowledge map according to the public database;
the enterprise credit evaluation index selection module is used for selecting enterprise credit evaluation indexes according to the enterprise knowledge map;
and the classification prediction module is used for performing enterprise credit classification prediction by adopting a classification algorithm based on the enterprise credit evaluation indexes.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of constructing an enterprise credit classification system according to any one of claims 1 to 7 when executing the program.
10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of constructing an enterprise credit classification system according to any one of claims 1 to 7.
CN202210227344.0A 2022-03-08 2022-03-08 Enterprise credit classification system construction method and device, electronic equipment and storage medium Pending CN114708000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210227344.0A CN114708000A (en) 2022-03-08 2022-03-08 Enterprise credit classification system construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210227344.0A CN114708000A (en) 2022-03-08 2022-03-08 Enterprise credit classification system construction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114708000A true CN114708000A (en) 2022-07-05

Family

ID=82169381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210227344.0A Pending CN114708000A (en) 2022-03-08 2022-03-08 Enterprise credit classification system construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114708000A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115456753A (en) * 2022-09-07 2022-12-09 安徽省优质采科技发展有限责任公司 Enterprise credit information analysis method and system for bidding platform
CN117788132A (en) * 2024-02-28 2024-03-29 东亚银行(中国)有限公司 Bank anti-money laundering stock right tracing method and device based on knowledge graph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115456753A (en) * 2022-09-07 2022-12-09 安徽省优质采科技发展有限责任公司 Enterprise credit information analysis method and system for bidding platform
CN117788132A (en) * 2024-02-28 2024-03-29 东亚银行(中国)有限公司 Bank anti-money laundering stock right tracing method and device based on knowledge graph
CN117788132B (en) * 2024-02-28 2024-05-31 东亚银行(中国)有限公司 Bank anti-money laundering stock right tracing method and device based on knowledge graph

Similar Documents

Publication Publication Date Title
US11556992B2 (en) System and method for machine learning architecture for enterprise capitalization
Wang et al. Images don’t lie: Duplicate crowdtesting reports detection with screenshot information
AU2017200585A1 (en) System and engine for seeded clustering of news events
Gong et al. A survey on dataset quality in machine learning
CN114708000A (en) Enterprise credit classification system construction method and device, electronic equipment and storage medium
Li et al. Stock prediction via sentimental transfer learning
Al Qundus et al. Exploring the impact of short-text complexity and structure on its quality in social media
Shahi et al. Automatic analysis of corporate sustainability reports and intelligent scoring
Budhiraja et al. A supervised learning approach for heading detection
Ganguly et al. Empirical evaluation of three common assumptions in building political media bias datasets
Altenburger et al. Is Yelp actually cleaning up the restaurant industry? A re-analysis on the relative usefulness of consumer reviews
Türegün Text mining in financial information
CN116245139A (en) Training method and device for graph neural network model, event detection method and device
Lee et al. The firm life cycle forecasting model using machine learning based on news articles
Chen et al. Detecting fake reviews of hype about restaurants by sentiment analysis
Gorbushin et al. Automated intellectual analysis of consumers' opinions in the scope of internet marketing and management of the international activity in educational institution
CN114493853A (en) Credit rating evaluation method, credit rating evaluation device, electronic device and storage medium
JP2006286026A (en) Opinion collection/analysis device, opinion collection/analysis method used therefor and its program
Eswaraiah et al. A Hybrid Deep Learning GRU based Approach for Text Classification using Word Embedding
Righi et al. Integration of survey data and big data for finite population inference in official statistics: statistical challenges and practical applications
Meyer et al. Categorizing Learning Objects Based On Wikipedia as Substitute Corpus.
Stepaniak et al. Technology of Text Content Topic Classification Based on Machine Learning Methods
Pekar et al. Explainable text-based features in predictive models of crowdfunding campaigns
Blatt Differentiating, describing, and visualizing scientific space: A novel approach to the analysis of published scientific abstracts
CN115994217B (en) Financial report fraud detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination