CN110598066B - Bank full-name rapid matching method based on word vector expression and cosine similarity - Google Patents
Bank full-name rapid matching method based on word vector expression and cosine similarity Download PDFInfo
- Publication number
- CN110598066B CN110598066B CN201910851391.0A CN201910851391A CN110598066B CN 110598066 B CN110598066 B CN 110598066B CN 201910851391 A CN201910851391 A CN 201910851391A CN 110598066 B CN110598066 B CN 110598066B
- Authority
- CN
- China
- Prior art keywords
- word
- idf
- bank
- word vector
- name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a bank name rapid matching method based on word vector expression and cosine similarity, which comprises the steps of taking a bank name library as a training set, training the training set to obtain a word vector matrix and a training model, then segmenting the bank name to be matched and carrying out word vector processing, finally carrying out transposition multiplication on a word vector processing result to be retrieved and a word vector matrix based on a cosine similarity calculation method, combining a maximum value result of each row of the matrix after multiplication with a comparison result in the retrieved word and the training model to obtain the bank name, converting the maximum value result into a matrix multiplication and carrying out simultaneous calculation with 2 processes in order to improve the speed, and finally reaching the speed of 2000 pieces of 2 s; the cosine similarity between the input result of each behavior and the result recorded in the bank and the word vector is calculated through a matrix, so that the using circulation speed is greatly reduced.
Description
Technical Field
The invention belongs to the technical field of bank information processing, and particularly relates to a bank full-name rapid matching method based on word vector expression and cosine similarity.
Background
In the modern day of the growing times, bank public-to-public business is continuously increased due to the rapid increase of medium and small enterprises and micro-enterprises, and the bank public-to-public business comprises enterprise electronic banks, unit deposit business, credit business, institution business, international business, entrusted housing finance, fund clearing, intermediate business, asset recommendation, fund escrow and the like. The basic departments and works inside the bank include: savings (private), accounting (public), and credit. Accounting is the background and service department of credit, credit is the deposit and loan business of units, and all business transactions between the units and banks are realized through the accounting department. Specifically, the public business is mainly the customers of enterprise legal people, units and the like, and various check, exchange, loan and other businesses are developed around public accounts, the business has the problem of slow speed of large-batch manual retrieval, and the algorithm of text similarity matching on the market is slow at present, so that the requirement of banks for fast searching cannot be met.
Disclosure of Invention
In order to solve the existing problems, 1, the internal part of the bank has a large number of tasks for public business, and the manual retrieval speed is low; 2. the invention provides a bank full-name rapid matching method based on word vector expression and cosine similarity, which comprises the steps of processing a bank full-name library to obtain a training set, training the training set to obtain a word vector matrix and a training model, then segmenting and processing the bank full-name to be matched, finally transposing and multiplying a word vector processing result to be retrieved and the word vector matrix based on a cosine similarity calculation method, and obtaining the bank full-name by combining a maximum value result of each row of the multiplied matrix and a comparison result in the matched bank full-name and the training model;
further, the fast matching method comprises the following steps:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s5: multiplying the tf-idf word vector converted in the S3 by the transpose of the tf-idf word vector matrix of the training set in the S2, and selecting a bank full name corresponding to the position of the maximum value in each row according to the multiplied matrix result;
s6: comparing the bank full name to be matched with the training model, merging the two parts of bank full names according to the comparison result and the result in the S5, and outputting the final result;
further, the de-wording, segmentation and combination processing in S1 and S3 specifically includes:
and (3) word removal: removing the words of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
the combined 2-word group and the set of the multiple simplified entries in the S1 are used as a training set;
further, the words without key information in the segmentation process include but are not limited to companies, stocks companies, banks and branches;
further, the method for obtaining the "2 word phrase" in S3 and S1 is as follows: randomly selecting two characters from the simplified entry, and arranging and combining all possible characters according to the positive sequence of the Chinese characters in the simplified entry to form a 2-character word group;
further, the S2 specifically includes: converting each simplified entry in the training set and a 2-word phrase obtained by correspondingly combining the simplified entries into a character string, then converting the character string into tf-idf word vectors, and finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, wherein each row vector of the matrixes is standardized and the modulus is equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector;
further, the training model in S2 is a specific method for converting characters into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relationship of the tf-idf word vectors corresponding to each word;
further, the S5 specifically includes: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
wherein result _ cos _ sim is a matrix of a × m, and each behavior in the matrix has cosine similarity between an input and a word vector recorded in a bank;
further, the tf-idf is calculated in the following manner:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
idf calculation formula:
the larger the tf-idf value is, the larger the probability of being a keyword is;
the invention has the following beneficial effects:
1) in order to increase the speed, the speed is converted into matrix multiplication and 2 processes for simultaneous calculation, and finally the speed of 2000 strips of 2s can be achieved;
2) one input of each behavior and a result recorded in a bank, and cosine similarity between word vectors greatly reduce the speed of using circulation through matrix operation;
3) dividing the input bank full name with possible errors into two batches, running by two processes and increasing the speed by one time.
Drawings
FIG. 1 is a detailed flow chart of the training steps in the method of the present invention;
fig. 2 is a diagram of the steps of matching the bank full name in the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
The invention is further described with reference to the following figures and specific examples, which are not intended to be limiting. The following are preferred examples of the present invention:
as shown in fig. 1-2, the present invention provides a bank full-name fast matching method based on word vector expression and cosine similarity, the fast matching method includes:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s5: multiplying the tf-idf word vector converted in the S3 by the transpose of the tf-idf word vector matrix of the training set in the S2, and selecting a bank full name corresponding to the position of the maximum value in each row according to the multiplied matrix result;
s6: comparing the bank full name to be matched with the training model, merging the two parts of bank full names according to the comparison result and the result in the S5, and outputting the final result;
the de-word, segmentation and combination processing in S1 and S3 specifically comprises:
and (3) word removal: removing characters of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
the combined 2-word group and the set of the multiple simplified entries in the S1 are used as a training set;
characters without key information in the segmentation processing include but are not limited to companies, stocks companies, banks and branches;
the obtaining method of the 2-word phrase in the S3 and the S1 is as follows: randomly selecting two characters from the simplified entries, and arranging and combining all possible characters according to the positive sequence of the Chinese characters in the simplified entries to form a 2-character group;
the S2 specifically includes: converting each simplified entry in the training set and a 2-word phrase obtained by correspondingly combining the simplified entries into a character string, then converting the character string into tf-idf word vectors, and finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, wherein each row vector of the matrixes is standardized and the modulus is equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector;
the training model in the S2 is a specific method for converting characters into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relation of the tf-idf word vectors corresponding to each word and character;
the S5 specifically includes: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
the result _ cos _ sim is a matrix of a × m, and cosine similarity between input of each behavior in the matrix and word vectors recorded in the bank is calculated;
further, the tf-idf is calculated in the following manner:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
idf calculation formula:
the larger the tf-idf value, the greater the probability of being a keyword.
The invention mainly solves the problem that a plurality of (about 2000 general) bank names which possibly have errors are manually input to match the correct bank full name. The invention converts all texts into word vectors, then calculates cosine similarity between the word vectors, and converts the cosine similarity into matrix multiplication and 2 processes for simultaneous calculation in order to improve speed. This process can eventually reach a speed of 2000 strips for 2 s.
The formula used in the present invention is as follows:
1. cosine similarity calculation formula:
2. tf-idf calculation mode:
tf-idf ═ tf × idf, wherein;
tf calculation formula:
idf calculation formula:
the larger the tf-idf value is, the larger the probability of being a keyword is, the larger the tf-idf value is, the larger the probability of being a keyword is.
The following explanation takes the civil bank cis-chequer branch as a detailed procedure of an embodiment:
1. the words of irrelevant key information of the bank complete library (such as words of company Limited, stock Limited, bank, branch, etc.) are removed to reduce the calculation amount, and the example is as follows: "Minsheng Bank shun Yi Zhi xing" - "Minsheng shun Yi".
2. Segmenting words, namely segmenting each text according to each character, such as: "the folk life is shun Yi" - "the civilian, raw, shun, Yi".
3. Constructing a new 2-word phrase: the individual often makes an abbreviation for the bank, and needs to consider the sequential thinking, for example: the bank is different from bank, and the characters of the bank name are combined into a word in pairs. For example: "Minsheng shun yi" - "Minsheng, Minshun, Minyi, sheng shun, sheng yi, shun;
4. through the operations of the first three steps, the obtained result is that: the 'Minsheng Bank consequent branch' outputs: "Min, Sheng, shun, Yi, Min Sheng, Min shun, Min Yi, Sheng shun, Sheng Yi, and Shun Yi".
5. Each record in the bank is converted into a character string after the first three steps of processing, and then converted into a tf _ idf word vector, and finally all the word vectors are a matrix (137907 × 305628) and are recorded as tf-idf-train, and each row vector of the matrix is normalized and is modulo equal to 1 (the previous steps are shown in fig. 1).
6. The input bank full name is directly converted into tf-idf vector in the same way, for example, 2000 pieces are input, the word vector (2000 × 305628) is recorded as tf-idf-test, and result _ cos _ sim is recorded as tf-idf-test. According to the formula for calculating cosine similarity, in the formula (1), result _ cos _ sim is a matrix of (2000 × 137907), cosine similarity between input words and 137907 recorded word vectors in a bank is formed in each action, and the speed of using circulation is greatly reduced through matrix operation.
7. And each row is called the bank with the position with the maximum cosine similarity.
8. Dividing the input bank full name with possible errors into two batches, running together in two processes, and increasing the speed by one time (steps 6-8 are shown in figure 2).
The above-described embodiment is only one of the preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. A bank full-name quick matching method based on word vector expression and cosine similarity is characterized in that a bank full-name library is processed by the quick matching method to obtain a training set, the training set is trained to obtain a word vector matrix and a training model, then segmentation and word vector processing are carried out on the bank full-name to be matched, finally, a calculation method based on cosine similarity is used for multiplying the word vector processing result to be retrieved and the word vector matrix in a transposition mode, and the bank full-name is obtained by combining the maximum value result of each row of the matrix after multiplication with the comparison result in the matched bank full-name and the training model; the quick matching method comprises the following steps:
s1: carrying out word removal, segmentation and combination processing on the bank full-name library to obtain a training set;
s2: performing word vector processing on the training set to obtain a tf-idf word vector matrix of the training set, performing standardization processing on each row, and simultaneously storing the tf-idf word vector matrix and a training model;
s3: inputting the bank full-name to be matched, carrying out word removal, segmentation and combination processing on the bank full-name to obtain a plurality of 2-word phrases, converting the bank full-name subjected to the word removal, segmentation and combination processing and the plurality of 2-word phrases into a character string, and finally converting the character string into a tf-idf word vector;
s4: and multiplying the tf-idf word vector converted in the step S3 by the transpose of the tf-idf word vector matrix of the training set in the step S2, and selecting the bank corresponding to the position of the maximum value in each row as an output final result according to the multiplied matrix result.
2. The method according to claim 1, wherein the de-wording, slicing and combining processes in S1 and S3 are specifically:
and (3) word removal: removing characters of irrelevant key information in the bank full name to reduce the calculated amount;
cutting: carrying out word segmentation processing on the bank full name without the key information characters to obtain a simplified entry;
combining: performing 2-word combination on the simplified entries after word segmentation processing to obtain a plurality of 2-word phrases;
and the combined 2-word group and the set of the multiple reduced entries in the S1 are used as a training set.
3. The method of claim 2, wherein the non-critical information-free text in the segmentation process includes but is not limited to companies, stocks, banks, and branches.
4. The method according to claim 2, wherein the "2-word phrase" in S3 and S1 is obtained by: two characters are selected from the simplified entry at will, and all possible permutation and combination are carried out according to the positive sequence permutation of the Chinese characters in the simplified entry to form a 2-character phrase.
5. The method according to claim 2, wherein S2 is specifically: converting each simplified entry in the training set and a 2-word phrase obtained after the simplified entries are correspondingly combined into a character string, then converting the character string into tf-idf word vectors, finally, enabling all tf-idf word vectors to be m-n matrixes which are recorded as tf-idf-train, and standardizing each row vector of the matrixes with the modulus equal to 1, wherein m is the total sample number in the training set, and n is the corresponding dimension of the word vector.
6. The method as claimed in claim 5, wherein the training model in S2 is a specific method for converting text into tf-idf word vectors, and the tf-idf word vectors are calculated on a training set to obtain a mapping relationship between each word and the tf-idf word vector corresponding to the word.
7. The method according to claim 5, wherein the S4 is specifically: converting the bank full scale to be matched into tf-idf word vectors, wherein the tf-idf word vectors are a, b and are recorded as tf-idf-test, wherein a is the number of the bank full scale to be matched, and b is the dimension of the word vectors;
result_cos_sim=tf-idf-test*tf-idf-train.T;
wherein result _ cos _ sim is a matrix of a × m, and each behavior in the matrix has cosine similarity between an input and a word vector recorded in a bank;
tf-idf-train.T is the transpose of tf-idf-train.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910851391.0A CN110598066B (en) | 2019-09-10 | 2019-09-10 | Bank full-name rapid matching method based on word vector expression and cosine similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910851391.0A CN110598066B (en) | 2019-09-10 | 2019-09-10 | Bank full-name rapid matching method based on word vector expression and cosine similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110598066A CN110598066A (en) | 2019-12-20 |
CN110598066B true CN110598066B (en) | 2022-05-10 |
Family
ID=68858416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910851391.0A Active CN110598066B (en) | 2019-09-10 | 2019-09-10 | Bank full-name rapid matching method based on word vector expression and cosine similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110598066B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797206A (en) * | 2020-07-09 | 2020-10-20 | 民生科技有限责任公司 | Bank name matching method and system based on natural language word vectors |
CN111797616A (en) * | 2020-07-09 | 2020-10-20 | 民生科技有限责任公司 | TF-IDF word vector-based bank name batch correction method and system |
CN112164391B (en) * | 2020-10-16 | 2024-04-05 | 腾讯科技(深圳)有限公司 | Statement processing method, device, electronic equipment and storage medium |
CN112862604B (en) * | 2021-04-25 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Card issuing organization information processing method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295145A (en) * | 2012-02-28 | 2013-09-11 | 北京星源无限传媒科技有限公司 | Mobile phone advertising method based on user consumption feature vector |
CN103413215A (en) * | 2013-07-12 | 2013-11-27 | 广州银联网络支付有限公司 | Electronic bank code matching method based on matrix similarity algorithm |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
CN106649597A (en) * | 2016-11-22 | 2017-05-10 | 浙江大学 | Method for automatically establishing back-of-book indexes of book based on book contents |
CN107967255A (en) * | 2017-11-08 | 2018-04-27 | 北京广利核***工程有限公司 | A kind of method and system for judging text similarity |
CN108280689A (en) * | 2018-01-30 | 2018-07-13 | 浙江省公众信息产业有限公司 | Advertisement placement method, device based on search engine and search engine system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8122043B2 (en) * | 2009-06-30 | 2012-02-21 | Ebsco Industries, Inc | System and method for using an exemplar document to retrieve relevant documents from an inverted index of a large corpus |
US9842110B2 (en) * | 2013-12-04 | 2017-12-12 | Rakuten Kobo Inc. | Content based similarity detection |
CN107403068B (en) * | 2017-07-31 | 2018-06-01 | 合肥工业大学 | Merge the intelligence auxiliary way of inquisition and system of clinical thinking |
-
2019
- 2019-09-10 CN CN201910851391.0A patent/CN110598066B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295145A (en) * | 2012-02-28 | 2013-09-11 | 北京星源无限传媒科技有限公司 | Mobile phone advertising method based on user consumption feature vector |
CN103413215A (en) * | 2013-07-12 | 2013-11-27 | 广州银联网络支付有限公司 | Electronic bank code matching method based on matrix similarity algorithm |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
CN106649597A (en) * | 2016-11-22 | 2017-05-10 | 浙江大学 | Method for automatically establishing back-of-book indexes of book based on book contents |
CN107967255A (en) * | 2017-11-08 | 2018-04-27 | 北京广利核***工程有限公司 | A kind of method and system for judging text similarity |
CN108280689A (en) * | 2018-01-30 | 2018-07-13 | 浙江省公众信息产业有限公司 | Advertisement placement method, device based on search engine and search engine system |
Non-Patent Citations (3)
Title |
---|
Research on text similarity computing based on word vector model of neural networks;Yuan Sun 等;《2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)》;20151130;994-997 * |
基于语义情感倾向的文本相似度计算;游春晖;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20090415(第04(2009)期);I138-1131 * |
征信***中实体匹配方法及应用研究;陈波;《中国博士学位论文全文数据库 经济与管理科学辑》;20100915(第09(2010)期);J145-31 * |
Also Published As
Publication number | Publication date |
---|---|
CN110598066A (en) | 2019-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110598066B (en) | Bank full-name rapid matching method based on word vector expression and cosine similarity | |
US11507601B2 (en) | Matching a first collection of strings with a second collection of strings | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN109101489B (en) | Text automatic summarization method and device and electronic equipment | |
CN111046142A (en) | Text examination method and device, electronic equipment and computer storage medium | |
CN116402630B (en) | Financial risk prediction method and system based on characterization learning | |
Phi et al. | Distant supervision for relation extraction via piecewise attention and bag-level contextual inference | |
Bondielli et al. | On the use of summarization and transformer architectures for profiling résumés | |
CN112328735A (en) | Hot topic determination method and device and terminal equipment | |
Wu et al. | Tedm-pu: A tax evasion detection method based on positive and unlabeled learning | |
Whitfield | Using gpt-2 to create synthetic data to improve the prediction performance of nlp machine learning classification models | |
EP4040328A1 (en) | Extracting mentions of complex relation types from documents | |
CN110287493A (en) | Risk phrase chunking method, apparatus, electronic equipment and storage medium | |
CN116776173A (en) | Power measurement data desensitization method based on convolutional neural network | |
CN117077682A (en) | Document analysis method and system based on semantic recognition | |
Mishra et al. | Explainability for NLP | |
CN114580398A (en) | Text information extraction model generation method, text information extraction method and device | |
US20210342640A1 (en) | Automated machine-learning dataset preparation | |
CN113822063A (en) | Event similarity comparison method based on improved cosine similarity algorithm | |
CN111797206A (en) | Bank name matching method and system based on natural language word vectors | |
CN111080433A (en) | Credit risk assessment method and device | |
CN112989814B (en) | Search map construction method, search device, search apparatus, and storage medium | |
US8176407B2 (en) | Comparing values of a bounded domain | |
Chin et al. | Leveraging Natural Language Processing on Chinese Broker Research Reports for Stock Selection. | |
Justnes | Using Word Embeddings to Determine Concepts of Values In Insurance Claim Spreadsheets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |