CN110598066B - Bank full-name rapid matching method based on word vector expression and cosine similarity - Google Patents

Bank full-name rapid matching method based on word vector expression and cosine similarity Download PDF

Info

Publication number
CN110598066B
CN110598066B CN201910851391.0A CN201910851391A CN110598066B CN 110598066 B CN110598066 B CN 110598066B CN 201910851391 A CN201910851391 A CN 201910851391A CN 110598066 B CN110598066 B CN 110598066B
Authority
CN
China
Prior art keywords
word
idf
bank
word vector
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910851391.0A
Other languages
Chinese (zh)
Other versions
CN110598066A (en
Inventor
李振
鲍东岳
张刚
尹正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minsheng Science And Technology Co ltd
Original Assignee
Minsheng Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minsheng Science And Technology Co ltd filed Critical Minsheng Science And Technology Co ltd
Priority to CN201910851391.0A priority Critical patent/CN110598066B/en
Publication of CN110598066A publication Critical patent/CN110598066A/en
Application granted granted Critical
Publication of CN110598066B publication Critical patent/CN110598066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a bank name rapid matching method based on word vector expression and cosine similarity, which comprises the steps of taking a bank name library as a training set, training the training set to obtain a word vector matrix and a training model, then segmenting the bank name to be matched and carrying out word vector processing, finally carrying out transposition multiplication on a word vector processing result to be retrieved and a word vector matrix based on a cosine similarity calculation method, combining a maximum value result of each row of the matrix after multiplication with a comparison result in the retrieved word and the training model to obtain the bank name, converting the maximum value result into a matrix multiplication and carrying out simultaneous calculation with 2 processes in order to improve the speed, and finally reaching the speed of 2000 pieces of 2 s; the cosine similarity between the input result of each behavior and the result recorded in the bank and the word vector is calculated through a matrix, so that the using circulation speed is greatly reduced.

Description

Bank full-name rapid matching method based on word vector expression and cosine similarity
Technical Field
The invention belongs to the technical field of bank information processing, and particularly relates to a bank full-name rapid matching method based on word vector expression and cosine similarity.
Background
In the modern day of the growing times, bank public-to-public business is continuously increased due to the rapid increase of medium and small enterprises and micro-enterprises, and the bank public-to-public business comprises enterprise electronic banks, unit deposit business, credit business, institution business, international business, entrusted housing finance, fund clearing, intermediate business, asset recommendation, fund escrow and the like. The basic departments and works inside the bank include: savings (private), accounting (public), and credit. Accounting is the background and service department of credit, credit is the deposit and loan business of units, and all business transactions between the units and banks are realized through the accounting department. Specifically, the public business is mainly the customers of enterprise legal people, units and the like, and various check, exchange, loan and other businesses are developed around public accounts, the business has the problem of slow speed of large-batch manual retrieval, and the algorithm of text similarity matching on the market is slow at present, so that the requirement of banks for fast searching cannot be met.
Disclosure of Invention
In order to solve the existing problems, 1, the internal part of the bank has a large number of tasks for public business, and the manual retrieval speed is low; 2. the invention provides a bank full-name rapid matching method based on word vector expression and cosine similarity, which comprises the steps of processing a bank full-name library to obtain a training set, training the training set to obtain a word vector matrix and a training model, then segmenting and processing the bank full-name to be matched, finally transposing and multiplying a word vector processing result to be retrieved and the word vector matrix based on a cosine similarity calculation method, and obtaining the bank full-name by combining a maximum value result of each row of the multiplied matrix and a comparison result in the matched bank full-name and the training model;
further, the fast matching method comprises the following steps:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s5: multiplying the tf-idf word vector converted in the S3 by the transpose of the tf-idf word vector matrix of the training set in the S2, and selecting a bank full name corresponding to the position of the maximum value in each row according to the multiplied matrix result;
s6: comparing the bank full name to be matched with the training model, merging the two parts of bank full names according to the comparison result and the result in the S5, and outputting the final result;
further, the de-wording, segmentation and combination processing in S1 and S3 specifically includes:
and (3) word removal: removing the words of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
the combined 2-word group and the set of the multiple simplified entries in the S1 are used as a training set;
further, the words without key information in the segmentation process include but are not limited to companies, stocks companies, banks and branches;
further, the method for obtaining the "2 word phrase" in S3 and S1 is as follows: randomly selecting two characters from the simplified entry, and arranging and combining all possible characters according to the positive sequence of the Chinese characters in the simplified entry to form a 2-character word group;
further, the S2 specifically includes: converting each simplified entry in the training set and a 2-word phrase obtained by correspondingly combining the simplified entries into a character string, then converting the character string into tf-idf word vectors, and finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, wherein each row vector of the matrixes is standardized and the modulus is equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector;
further, the training model in S2 is a specific method for converting characters into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relationship of the tf-idf word vectors corresponding to each word;
further, the S5 specifically includes: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
wherein result _ cos _ sim is a matrix of a × m, and each behavior in the matrix has cosine similarity between an input and a word vector recorded in a bank;
further, the tf-idf is calculated in the following manner:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
Figure BDA0002197062800000041
idf calculation formula:
Figure BDA0002197062800000042
the larger the tf-idf value is, the larger the probability of being a keyword is;
the invention has the following beneficial effects:
1) in order to increase the speed, the speed is converted into matrix multiplication and 2 processes for simultaneous calculation, and finally the speed of 2000 strips of 2s can be achieved;
2) one input of each behavior and a result recorded in a bank, and cosine similarity between word vectors greatly reduce the speed of using circulation through matrix operation;
3) dividing the input bank full name with possible errors into two batches, running by two processes and increasing the speed by one time.
Drawings
FIG. 1 is a detailed flow chart of the training steps in the method of the present invention;
fig. 2 is a diagram of the steps of matching the bank full name in the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
The invention is further described with reference to the following figures and specific examples, which are not intended to be limiting. The following are preferred examples of the present invention:
as shown in fig. 1-2, the present invention provides a bank full-name fast matching method based on word vector expression and cosine similarity, the fast matching method includes:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s5: multiplying the tf-idf word vector converted in the S3 by the transpose of the tf-idf word vector matrix of the training set in the S2, and selecting a bank full name corresponding to the position of the maximum value in each row according to the multiplied matrix result;
s6: comparing the bank full name to be matched with the training model, merging the two parts of bank full names according to the comparison result and the result in the S5, and outputting the final result;
the de-word, segmentation and combination processing in S1 and S3 specifically comprises:
and (3) word removal: removing characters of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
the combined 2-word group and the set of the multiple simplified entries in the S1 are used as a training set;
characters without key information in the segmentation processing include but are not limited to companies, stocks companies, banks and branches;
the obtaining method of the 2-word phrase in the S3 and the S1 is as follows: randomly selecting two characters from the simplified entries, and arranging and combining all possible characters according to the positive sequence of the Chinese characters in the simplified entries to form a 2-character group;
the S2 specifically includes: converting each simplified entry in the training set and a 2-word phrase obtained by correspondingly combining the simplified entries into a character string, then converting the character string into tf-idf word vectors, and finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, wherein each row vector of the matrixes is standardized and the modulus is equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector;
the training model in the S2 is a specific method for converting characters into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relation of the tf-idf word vectors corresponding to each word and character;
the S5 specifically includes: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
the result _ cos _ sim is a matrix of a × m, and cosine similarity between input of each behavior in the matrix and word vectors recorded in the bank is calculated;
further, the tf-idf is calculated in the following manner:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
Figure BDA0002197062800000071
idf calculation formula:
Figure BDA0002197062800000072
the larger the tf-idf value, the greater the probability of being a keyword.
The invention mainly solves the problem that a plurality of (about 2000 general) bank names which possibly have errors are manually input to match the correct bank full name. The invention converts all texts into word vectors, then calculates cosine similarity between the word vectors, and converts the cosine similarity into matrix multiplication and 2 processes for simultaneous calculation in order to improve speed. This process can eventually reach a speed of 2000 strips for 2 s.
The formula used in the present invention is as follows:
1. cosine similarity calculation formula:
Figure BDA0002197062800000073
the larger this value represents
Figure BDA0002197062800000074
And
Figure BDA0002197062800000075
the closer together.
2. tf-idf calculation mode:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
Figure BDA0002197062800000081
idf calculation formula:
Figure BDA0002197062800000082
the larger the tf-idf value is, the larger the probability of being a keyword is, the larger the tf-idf value is, the larger the probability of being a keyword is.
The following explanation takes the civil bank cis-chequer branch as a detailed procedure of an embodiment:
1. the words of irrelevant key information of the bank complete library (such as words of company Limited, stock Limited, bank, branch, etc.) are removed to reduce the calculation amount, and the example is as follows: "Minsheng Bank shun Yi Zhi xing" - "Minsheng shun Yi".
2. Segmenting words, namely segmenting each text according to each character, such as: "the folk life is shun Yi" - "the civilian, raw, shun, Yi".
3. Constructing a new 2-word phrase: the individual often makes an abbreviation for the bank, and needs to consider the sequential thinking, for example: the bank is different from bank, and the characters of the bank name are combined into a word in pairs. For example: "Minsheng shun yi" - "Minsheng, Minshun, Minyi, sheng shun, sheng yi, shun;
4. through the operations of the first three steps, the obtained result is that: the 'Minsheng Bank consequent branch' outputs: "Min, Sheng, shun, Yi, Min Sheng, Min shun, Min Yi, Sheng shun, Sheng Yi, and Shun Yi".
5. Each record in the bank is converted into a character string after the first three steps of processing, and then converted into a tf _ idf word vector, and finally all the word vectors are a matrix (137907 × 305628) and are recorded as tf-idf-train, and each row vector of the matrix is normalized and is modulo equal to 1 (the previous steps are shown in fig. 1).
6. The input bank full name is directly converted into tf-idf vector in the same way, for example, 2000 pieces are input, the word vector (2000 × 305628) is recorded as tf-idf-test, and result _ cos _ sim is recorded as tf-idf-test. According to the formula for calculating cosine similarity, in the formula (1), result _ cos _ sim is a matrix of (2000 × 137907), cosine similarity between input words and 137907 recorded word vectors in a bank is formed in each action, and the speed of using circulation is greatly reduced through matrix operation.
7. And each row is called the bank with the position with the maximum cosine similarity.
8. Dividing the input bank full name with possible errors into two batches, running together in two processes, and increasing the speed by one time (steps 6-8 are shown in figure 2).
The above-described embodiment is only one of the preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A bank full-name quick matching method based on word vector expression and cosine similarity is characterized in that a bank full-name library is processed by the quick matching method to obtain a training set, the training set is trained to obtain a word vector matrix and a training model, then segmentation and word vector processing are carried out on the bank full-name to be matched, finally, a calculation method based on cosine similarity is used for multiplying the word vector processing result to be retrieved and the word vector matrix in a transposition mode, and the bank full-name is obtained by combining the maximum value result of each row of the matrix after multiplication with the comparison result in the matched bank full-name and the training model; the quick matching method comprises the following steps:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s4: and multiplying the tf-idf word vector converted in the step S3 by the transpose of the tf-idf word vector matrix of the training set in the step S2, and selecting the bank corresponding to the position of the maximum value in each row as an output final result according to the multiplied matrix result.
2. The method according to claim 1, wherein the de-wording, slicing and combining processes in S1 and S3 are specifically:
and (3) word removal: removing characters of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
and the combined 2-word group and the set of the multiple reduced entries in the S1 are used as a training set.
3. The method of claim 2, wherein the non-critical information-free text in the segmentation process includes but is not limited to companies, stocks, banks, and branches.
4. The method according to claim 2, wherein the "2-word phrase" in S3 and S1 is obtained by: two characters are selected from the simplified entry at will, and all possible permutation and combination are carried out according to the positive sequence permutation of the Chinese characters in the simplified entry to form a 2-character phrase.
5. The method according to claim 2, wherein S2 is specifically: converting each simplified entry in the training set and a 2-word phrase obtained after the simplified entries are correspondingly combined into a character string, then converting the character string into tf-idf word vectors, finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, and standardizing each row vector of the matrixes with the modulus equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector.
6. The method as claimed in claim 5, wherein the training model in S2 is a specific method for converting text into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relationship between each word and the tf-idf word vector corresponding to the word.
7. The method according to claim 5, wherein the S4 is specifically: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
wherein result _ cos _ sim is a matrix of a × m, and each behavior in the matrix has cosine similarity between an input and a word vector recorded in a bank;
tf-idf-train.T is the transpose of tf-idf-train.
8. The method of claim 6, wherein tf-idf is calculated by:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
Figure FDA0003517031950000021
idf calculation formula:
Figure FDA0003517031950000031
the larger the tf-idf value is, the larger the probability of being a keyword is.
CN201910851391.0A 2019-09-10 2019-09-10 Bank full-name rapid matching method based on word vector expression and cosine similarity Active CN110598066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910851391.0A CN110598066B (en) 2019-09-10 2019-09-10 Bank full-name rapid matching method based on word vector expression and cosine similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910851391.0A CN110598066B (en) 2019-09-10 2019-09-10 Bank full-name rapid matching method based on word vector expression and cosine similarity

Publications (2)

Publication Number Publication Date
CN110598066A CN110598066A (en) 2019-12-20
CN110598066B true CN110598066B (en) 2022-05-10

Family

ID=68858416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910851391.0A Active CN110598066B (en) 2019-09-10 2019-09-10 Bank full-name rapid matching method based on word vector expression and cosine similarity

Country Status (1)

Country Link
CN (1) CN110598066B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797206A (en) * 2020-07-09 2020-10-20 民生科技有限责任公司 Bank name matching method and system based on natural language word vectors
CN111797616A (en) * 2020-07-09 2020-10-20 民生科技有限责任公司 TF-IDF word vector-based bank name batch correction method and system
CN112164391B (en) * 2020-10-16 2024-04-05 腾讯科技(深圳)有限公司 Statement processing method, device, electronic equipment and storage medium
CN112862604B (en) * 2021-04-25 2021-08-24 腾讯科技(深圳)有限公司 Card issuing organization information processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295145A (en) * 2012-02-28 2013-09-11 北京星源无限传媒科技有限公司 Mobile phone advertising method based on user consumption feature vector
CN103413215A (en) * 2013-07-12 2013-11-27 广州银联网络支付有限公司 Electronic bank code matching method based on matrix similarity algorithm
CN104102626A (en) * 2014-07-07 2014-10-15 厦门推特信息科技有限公司 Method for computing semantic similarities among short texts
CN106649597A (en) * 2016-11-22 2017-05-10 浙江大学 Method for automatically establishing back-of-book indexes of book based on book contents
CN107967255A (en) * 2017-11-08 2018-04-27 北京广利核***工程有限公司 A kind of method and system for judging text similarity
CN108280689A (en) * 2018-01-30 2018-07-13 浙江省公众信息产业有限公司 Advertisement placement method, device based on search engine and search engine system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8122043B2 (en) * 2009-06-30 2012-02-21 Ebsco Industries, Inc System and method for using an exemplar document to retrieve relevant documents from an inverted index of a large corpus
US9842110B2 (en) * 2013-12-04 2017-12-12 Rakuten Kobo Inc. Content based similarity detection
CN107403068B (en) * 2017-07-31 2018-06-01 合肥工业大学 Merge the intelligence auxiliary way of inquisition and system of clinical thinking

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295145A (en) * 2012-02-28 2013-09-11 北京星源无限传媒科技有限公司 Mobile phone advertising method based on user consumption feature vector
CN103413215A (en) * 2013-07-12 2013-11-27 广州银联网络支付有限公司 Electronic bank code matching method based on matrix similarity algorithm
CN104102626A (en) * 2014-07-07 2014-10-15 厦门推特信息科技有限公司 Method for computing semantic similarities among short texts
CN106649597A (en) * 2016-11-22 2017-05-10 浙江大学 Method for automatically establishing back-of-book indexes of book based on book contents
CN107967255A (en) * 2017-11-08 2018-04-27 北京广利核***工程有限公司 A kind of method and system for judging text similarity
CN108280689A (en) * 2018-01-30 2018-07-13 浙江省公众信息产业有限公司 Advertisement placement method, device based on search engine and search engine system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Research on text similarity computing based on word vector model of neural networks;Yuan Sun 等;《2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)》;20151130;994-997 *
基于语义情感倾向的文本相似度计算;游春晖;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20090415(第04(2009)期);I138-1131 *
征信***中实体匹配方法及应用研究;陈波;《中国博士学位论文全文数据库 经济与管理科学辑》;20100915(第09(2010)期);J145-31 *

Also Published As

Publication number Publication date
CN110598066A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598066B (en) Bank full-name rapid matching method based on word vector expression and cosine similarity
US11507601B2 (en) Matching a first collection of strings with a second collection of strings
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN109101489B (en) Text automatic summarization method and device and electronic equipment
CN111046142A (en) Text examination method and device, electronic equipment and computer storage medium
CN116402630B (en) Financial risk prediction method and system based on characterization learning
Phi et al. Distant supervision for relation extraction via piecewise attention and bag-level contextual inference
Bondielli et al. On the use of summarization and transformer architectures for profiling résumés
CN112328735A (en) Hot topic determination method and device and terminal equipment
Wu et al. Tedm-pu: A tax evasion detection method based on positive and unlabeled learning
Whitfield Using gpt-2 to create synthetic data to improve the prediction performance of nlp machine learning classification models
EP4040328A1 (en) Extracting mentions of complex relation types from documents
CN110287493A (en) Risk phrase chunking method, apparatus, electronic equipment and storage medium
CN116776173A (en) Power measurement data desensitization method based on convolutional neural network
CN117077682A (en) Document analysis method and system based on semantic recognition
Mishra et al. Explainability for NLP
CN114580398A (en) Text information extraction model generation method, text information extraction method and device
US20210342640A1 (en) Automated machine-learning dataset preparation
CN113822063A (en) Event similarity comparison method based on improved cosine similarity algorithm
CN111797206A (en) Bank name matching method and system based on natural language word vectors
CN111080433A (en) Credit risk assessment method and device
CN112989814B (en) Search map construction method, search device, search apparatus, and storage medium
US8176407B2 (en) Comparing values of a bounded domain
Chin et al. Leveraging Natural Language Processing on Chinese Broker Research Reports for Stock Selection.
Justnes Using Word Embeddings to Determine Concepts of Values In Insurance Claim Spreadsheets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant