CN115329076A - Bank data screening processing method, device, system and medium - Google Patents

Bank data screening processing method, device, system and medium Download PDF

Info

Publication number
CN115329076A
CN115329076A CN202210955359.9A CN202210955359A CN115329076A CN 115329076 A CN115329076 A CN 115329076A CN 202210955359 A CN202210955359 A CN 202210955359A CN 115329076 A CN115329076 A CN 115329076A
Authority
CN
China
Prior art keywords
information
financial
policy information
words
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210955359.9A
Other languages
Chinese (zh)
Inventor
牙祖将
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202210955359.9A priority Critical patent/CN115329076A/en
Publication of CN115329076A publication Critical patent/CN115329076A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a bank data screening processing method, a device, a system and a medium, which can be applied to the field of big data or the field of finance. The method comprises the following steps: the method comprises the steps of obtaining financial supervision policy information, converting the financial supervision policy information into vectors in a vector space by using a vector space model, wherein the vectors comprise initial feature words of the financial supervision policy information, calculating the weight of the initial feature words by using a word frequency-inverse text frequency index technology, filtering the initial feature words with the weight lower than a preset threshold value to obtain filtering feature words, screening the filtering feature words by using a hidden Dirichlet distributed topic model to construct a feature dictionary, and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information. Therefore, the financial supervision policy information can be screened without manual processing, the efficiency and the classification accuracy are improved, and the latest policy of bank supervision can be timely and effectively acquired.

Description

Bank data screening processing method, device, system and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a system, and a medium for screening and processing bank data.
Background
With the advent of the internet, text data is growing explosively, and even though the text data is processed, it is still very difficult to get rid of manpower completely, and it is still a global problem. Therefore, text mining is an important research direction in big data research. Text mining refers to a process of data mining of text information, text clustering is an important indication of text mining, and documents with high content similarity are classified into the same class as much as possible by automatically classifying document sets according to the similarity of the documents.
Because text data is complex and is usually semi-structured data or unstructured data, and Chinese texts have the characteristics of deep semantics and the like, although the traditional platform and website use a crawler technology, the captured data cannot be automatically processed by a machine because the traditional clustering algorithm is not suitable for text clustering, and manual cleaning, manual labeling and manual auditing are required.
Along with the continuous development of finance of various countries, domestic and overseas, bank supervision policies of various countries are more and more, and at present, banks do not have a unified platform for inquiring and analyzing aiming at the bank supervision policies, so that the accuracy and the efficiency are low, and the latest policy of bank supervision cannot be timely and effectively acquired.
Disclosure of Invention
In view of the above, this summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The application aims to provide a bank data screening processing method, a device, a system and a medium, which can improve the efficiency of bank data screening processing and the accuracy of classification and timely and effectively acquire the latest policy of bank supervision.
In order to achieve the purpose, the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for screening and processing bank data, including:
acquiring financial supervision policy information;
converting the financial regulatory policy information into vectors in vector space using a vector space model; the vector comprises initial feature words of the financial regulatory policy information;
calculating the weight of the initial characteristic words by using a word frequency-inverse text frequency index technology, and filtering the initial characteristic words with the weight lower than a preset threshold value to obtain filtering characteristic words;
screening the filtering feature words by utilizing a hidden Dirichlet distributed topic model to construct and form a feature dictionary;
and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
In one possible implementation, the vector includes:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein the V (Doc) is the financial regulatory policy information, the t 1 、t 2 …t n The initial characteristic words are w1 (d), w2 (d) \8230, wn (d) are weights of the initial characteristic words in the financial supervision policy information respectively.
In one possible implementation, the obtaining financial regulatory policy information includes:
capturing initial information of financial supervision policies;
performing word segmentation on the initial information of the financial supervision policy to obtain word segmentation information of the financial supervision policy;
and filtering punctuation marks, special characters and stop words in the financial supervision policy word segmentation information to obtain the financial supervision policy information.
In one possible implementation, the method further includes:
and displaying the classification information according to a preset theme.
In a second aspect, an embodiment of the present application provides a bank data screening processing apparatus, including:
the acquisition unit is used for acquiring financial supervision policy information;
a conversion unit for converting the financial regulatory policy information into vectors in a vector space using a vector space model; the vector comprises initial feature words of the financial regulatory policy information;
the calculation unit is used for calculating the weight of the initial characteristic words by using a word frequency-inverse text frequency index technology and filtering the initial characteristic words with the weight lower than a preset threshold value to obtain filtering characteristic words;
the screening unit is used for screening the filtering feature words by utilizing a hidden Dirichlet distributed topic model to construct and form a feature dictionary;
and the classification unit is used for classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
In one possible implementation, the vector includes:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein the V (Doc) is the financial regulatory policy information, the t 1 、t 2 …t n The initial characteristic words are w1 (d), w2 (d) \8230, wn (d) are weights of the initial characteristic words in the financial supervision policy information respectively.
In a possible implementation manner, the obtaining unit is specifically configured to:
capturing initial information of financial supervision policies;
performing word segmentation on the initial information of the financial supervision policy to obtain word segmentation information of the financial supervision policy;
and filtering punctuation marks, special characters and stop words in the financial supervision policy word segmentation information to obtain the financial supervision policy information.
In one possible implementation, the apparatus further includes:
and the display unit is used for displaying the classification information according to a preset theme.
In a third aspect, an embodiment of the present application provides a system for screening and processing bank data, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the bank data screening processing and analyzing method when the computer program is executed.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, and when the computer program is processed and executed, the computer program implements the steps of the bank data screening processing method as described above.
Compared with the prior art, the embodiment of the application has the following beneficial effects:
the embodiment of the application provides a bank data screening processing method, a device, a system and a medium, which can be applied to the field of big data or the field of finance. The method comprises the following steps: the method comprises the steps of obtaining financial supervision policy information, converting the financial supervision policy information into vectors in a vector space by using a vector space model, wherein the vectors comprise initial feature words of the financial supervision policy information, calculating the weight of the initial feature words by using a word frequency-inverse text frequency index technology, filtering the initial feature words with the weight lower than a preset threshold value to obtain filtering feature words, screening the filtering feature words by using a hidden Dirichlet distributed topic model to construct a feature dictionary, and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information. Therefore, the financial supervision policy information can be screened and processed without manual processing, the efficiency and the classification accuracy are improved, and the latest policy of bank supervision can be timely and effectively acquired.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a flowchart illustrating a method for screening and processing bank data according to an embodiment of the present application;
fig. 2 is a flowchart illustrating another bank data screening processing method according to an embodiment of the present application;
fig. 3 is a flowchart illustrating a further method for screening and processing bank data according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a framework of a data screening processing apparatus according to an embodiment of the present application;
fig. 5 shows a schematic diagram of a bank data screening processing apparatus according to an embodiment of the present application.
Detailed Description
It should be noted that the bank data screening processing method, device, system and medium provided by the invention can be applied to the field of big data or the field of finance. The foregoing is merely an example, and does not limit the application fields of the bank data screening method, apparatus, system, and medium provided by the present invention.
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
As described in the background art, the applicant has found that with the advent of the internet, text data is explosively increased, and even though the text data is processed, it is still very difficult to get rid of manpower completely, and still a global problem remains. Therefore, text mining is an important research direction in big data research. Text mining refers to a process of data mining of text information, text clustering is an important indication of text mining, and documents with high content similarity are classified into the same class as much as possible by automatically classifying document sets according to the similarity of the documents.
Due to the fact that text data are complex and are usually semi-structured data or unstructured data, and Chinese texts have the characteristics of deep semantics and the like, the crawler technology is used for the traditional platform and the traditional website, but captured data cannot be automatically processed by a machine due to the fact that the traditional clustering algorithm is not suitable for text clustering, and manual cleaning, manual labeling and manual auditing are needed.
Along with the continuous development of finance of various countries, domestic and overseas, bank supervision policies of various countries are more and more, and at present, banks do not have a unified platform for inquiring and analyzing aiming at the bank supervision policies, so that the accuracy and the efficiency are low, and the latest policy of bank supervision cannot be timely and effectively acquired.
In order to solve the above technical problems, embodiments of the present application provide a method, an apparatus, a system, and a medium for screening and processing bank data, which may be applied to the field of big data or the field of finance. The method comprises the following steps: the method comprises the steps of obtaining financial supervision policy information, converting the financial supervision policy information into vectors in a vector space by using a vector space model, wherein the vectors comprise initial feature words of the financial supervision policy information, calculating the weight of the initial feature words by using a word frequency-inverse text frequency index technology, filtering the initial feature words with the weight lower than a preset threshold value to obtain filtering feature words, screening the filtering feature words by using a hidden Dirichlet distributed topic model to construct a feature dictionary, and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information. Therefore, the financial supervision policy information can be screened without manual processing, the efficiency and the classification accuracy are improved, and the latest policy of bank supervision can be timely and effectively acquired.
Exemplary method
Referring to fig. 1, a flowchart of a method for screening and processing bank data provided in an embodiment of the present application includes:
s101: financial regulatory policy information is obtained.
In the embodiment of the application, since the financial supervision authorities at home and abroad have different financial supervision, strategies and policies, large banks such as banks having both domestic institutions and numerous overseas branches are required, the supervision policies at home and abroad need to be mastered in daily operation activities, and meanwhile, the information can be automatically acquired at the first time in information transmission and acquisition.
Therefore, financial regulatory policy information at home and abroad can be acquired first. In a possible implementation manner, referring to fig. 2, a flowchart of another bank data screening processing method provided in the embodiment of the present application is shown.
The information of financial supervision policy websites at home and abroad can be respectively captured by constructing a user-defined crawler and a policy accurate search engine, and the extracted text is stored locally as a corpus. The capturing modes mainly include policy search, policy index, policy analysis and web page capturing. For example, the information of the custom crawler crawling policy website 1, 2 \8230;, n can be constructed by using Python (computer programming Language), selenium (browser automation testing framework), and Xpath (extensible markup Language Path Language).
In a possible implementation mode, the financial supervision policy initial information can be captured firstly, and the financial supervision policy initial information is subjected to word segmentation to obtain financial supervision policy word segmentation information; punctuation marks, special characters and stop words in the financial supervision policy word segmentation information are filtered to obtain the financial supervision policy information.
Specifically, the segmentation of the financial monitoring policy initial information may include chinese and english segmentation, and the segmentation refers to segmenting a sentence into separate word sequences. For example, the word segmentation can be performed by using a final word segmentation tool of Python language, which supports the implementation of the segmentation of proper nouns and fixed phrases by importing a custom dictionary.
In the text mining process, in order to avoid the influence of redundant data on results, the analysis efficiency is improved, punctuation marks, special characters and stop words can be filtered out, and the words are prevented from interfering valuable information contained in the text.
S102: converting the financial regulatory policy information into vectors in vector space using a vector space model; the vector includes initial feature words of the financial regulatory policy information.
In the embodiment of the application, a Vector Space Model (VSM) can simplify the processing of text content into Vector operation in a Vector Space, and the similarity of semantics is expressed by a spatial similarity, so that the Model is intuitive and easy to understand.
Specifically, a webpage corpus can be converted into a series of feature item vectors by using a vector space model, for example, a vector formed by converting text financial regulatory policy information Doc can be represented as:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein V (Doc) is financial regulatory policy information, t 1 、t 2 …t n As initial feature words, w1 (d), w2(d) \8230wn (d) is the weight of the initial feature words in the financial supervision policy information respectively.
Wherein, t 1 、t 2 …t n Is a series of different characteristic words, the initial characteristic words respectively have weights w1 (d), w2 (d) \823030wn (d) in the financial supervision policy information, and the wn (d) can be generally expressed as t 1 、t 2 …t n Frequency of presentation in financial regulatory policy information.
S103: and calculating the weight of the initial characteristic words by using a word frequency-inverse text frequency index technology, and filtering the initial characteristic words with the weight lower than a preset threshold value to obtain filtering characteristic words.
In the embodiment of the application, the weight of the initial feature words can be calculated by using a word frequency-inverse text frequency index technology, and the initial feature words with the weight lower than a preset threshold value are filtered to obtain the filtering feature words.
Among them, the term frequency-inverse text frequency index (TF-IDF) technique is a classic weight calculation technique used for text mining in recent years. The method is used for calculating the importance degree of the characteristic words according to the frequency of the characteristic words appearing in the text and the frequency of the documents appearing in the whole data set, the characteristic words with higher influence degree are reserved as much as possible, and some common but inconsequential words are filtered.
S104: and screening the filtering feature words by utilizing a hidden Dirichlet distributed topic model to construct and form a feature dictionary.
In the embodiment of the application, the latent dirichlet allocation topic model can be used for screening and filtering the feature words so as to construct and form the feature dictionary.
The implicit Dirichlet Allocation (LDA) is a topic model (topic model), which can give the topic of each document in the document set in the form of probability distribution.
Specifically, each word of a document may be distributed to a topic according to a certain probability, and a related word set may be selected from the topic. The embodiment of the application selects an LDA model and combines semantic similarity to construct a feature dictionary.
S105: and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
In the embodiment of the application, the feature dictionary can be classified by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
Wherein, the Balanced Iterative Clustering algorithm (BIRCH, balanced Iterative Reducing and Clustering hierarchy) based on the hierarchical structure is a comprehensive hierarchical Clustering algorithm. The Clustering Feature (CF) and the CF Tree (CF Tree) are used for summarizing Clustering description. The clustering feature tree summarizes useful information of clustering, occupies a much smaller space than a metadata set, and can be stored in a memory, so that the clustering speed and the scalability of the algorithm on a large data set can be improved.
The characteristic word frequency moment can be constructed, the topic analysis of the clustered documents is realized, the screening processing of the financial supervision policy information is realized, and the efficiency and the classification accuracy are improved.
In one possible implementation, referring to fig. 3, the classification information may be displayed according to a preset theme, for example, the classification information may be a display engine that displays various policy themes after the aggregation process, including but not limited to policy notifications, policy documents, public announcement, and policy headlines.
In addition, policy analysis and screening can be performed from multiple angles, the oriented screening of the subject dimension and the feature dictionary is adjusted, the data engine platform for supervising the big data of the policy is accessed, the processing efficiency of the data engine is improved through reasonable configuration and subject definition, and feedback can be performed.
In a possible implementation manner, referring to fig. 4, a schematic diagram of a framework of a data screening processing apparatus provided in an embodiment of the present application, that is, a regulatory policy service big data platform architecture diagram, includes:
according to the architecture established by the data specification system and the financial supervision knowledge base system, the portal website + APP can provide services such as searching, pushing, docking and aggregation for financial enterprise employees and individuals.
The data engine can comprise a supervision policy precision search engine and a basic database, and the supervision policy precision search engine can comprise policy search, policy index, policy analysis and webpage crawling; the base database may include modules for policy queries, policy documents, public announcements, and policy headlines.
A management platform module: the module mainly has the basic management functions of member management, API configuration, resource configuration, service management, platform cooperation, reference management and the like.
The Platform as a Service (Platform as a Service) module mainly comprises a big data cloud Platform and module functions of learning, deep learning, natural semantic technology, artificial intelligence and the like.
The IaaS (Infrastructure as a Service) module mainly includes functions of calculation, storage and the like.
The embodiment of the application provides a bank data screening processing method which can be applied to the field of big data or the field of finance. The method comprises the following steps: the method comprises the steps of obtaining financial supervision policy information, converting the financial supervision policy information into vectors in a vector space by using a vector space model, wherein the vectors comprise initial feature words of the financial supervision policy information, calculating the weight of the initial feature words by using a word frequency-inverse text frequency index technology, filtering the initial feature words with the weight lower than a preset threshold value to obtain filtering feature words, screening the filtering feature words by using a hidden Dirichlet distributed topic model to construct a feature dictionary, and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information. Therefore, the financial supervision policy information can be screened and processed without manual processing, the efficiency and the classification accuracy are improved, and the latest policy of bank supervision can be timely and effectively acquired.
Exemplary devices
Referring to fig. 5, a schematic diagram of a bank data screening processing apparatus provided in an embodiment of the present application is shown, including:
an obtaining unit 201, configured to obtain financial monitoring policy information;
a conversion unit 202, configured to convert the financial regulatory policy information into a vector in a vector space by using a vector space model; the vector comprises initial feature words of the financial regulatory policy information;
the calculating unit 203 is configured to calculate a weight of the initial feature word by using a word frequency-inverse text frequency index technique, and filter the initial feature word with the weight lower than a preset threshold to obtain a filtered feature word;
a screening unit 204, configured to screen the filtering feature words by using a hidden dirichlet allocation topic model to construct and form a feature dictionary;
and the classifying unit 205 is configured to classify the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
In one possible implementation, the vector includes:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein the V (Doc) is the financial regulatory policy information, the t 1 、t 2 …t n The w1 (d), the w2 (d) \8230, wn (d) are weights of the initial characteristic words in the financial supervision policy information respectively.
In a possible implementation manner, the obtaining unit is specifically configured to:
capturing initial information of financial supervision policies;
performing word segmentation on the initial information of the financial supervision policy to obtain word segmentation information of the financial supervision policy;
and filtering punctuation marks, special characters and stop words in the financial supervision policy word segmentation information to obtain the financial supervision policy information.
In one possible implementation, the apparatus further includes:
and the display unit is used for displaying the classification information according to a preset theme.
The embodiment of the application provides a bank data screening processing apparatus, which can be applied to the field of big data or the field of finance. The method applied to the device comprises the following steps: the method comprises the steps of obtaining financial supervision policy information, converting the financial supervision policy information into vectors in a vector space by using a vector space model, wherein the vectors comprise initial feature words of the financial supervision policy information, calculating the weights of the initial feature words by using a word frequency-inverse text frequency index technology, filtering the initial feature words with the weights lower than a preset threshold value to obtain filtering feature words, screening the filtering feature words by using a hidden Dirichlet distributed topic model to construct a feature dictionary, and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information. Therefore, the financial supervision policy information can be screened and processed without manual processing, the efficiency and the classification accuracy are improved, and the latest policy of bank supervision can be timely and effectively acquired.
On the basis of the foregoing embodiments, an embodiment of the present application provides a bank data screening processing system, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the bank data screening processing method when executing the computer program.
On the basis of the foregoing embodiment, an embodiment of the present application further provides a computer readable medium, where a computer program is stored on the computer readable medium, and when the computer program is processed and executed, the steps of the foregoing bank data screening processing method are implemented.
It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the system described above; or may exist separately and not be assembled into the system.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.
The foregoing is merely a preferred embodiment of the present application and, although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Those skilled in the art can now make numerous possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, using the methods and techniques disclosed above, without departing from the scope of the claimed embodiments. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present application are still within the protection scope of the technical solution of the present application without departing from the content of the technical solution of the present application.

Claims (10)

1. A bank data screening processing method is characterized by comprising the following steps:
acquiring financial supervision policy information;
converting the financial regulatory policy information into vectors in vector space using a vector space model; the vector comprises initial feature words of the financial regulatory policy information;
calculating the weight of the initial characteristic words by using a word frequency-inverse text frequency index technology, and filtering the initial characteristic words with the weight lower than a preset threshold value to obtain filtering characteristic words;
screening the filtering feature words by utilizing a hidden Dirichlet distributed topic model to construct and form a feature dictionary;
and classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
2. The method of claim 1, wherein the vector comprises:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein the V (Doc) is the financial regulatory policy information, the t 1 、t 2 …t n The initial characteristic words are w1 (d), w2 (d) \8230wn (d) is the weight of the initial characteristic words in the financial supervision policy information respectively.
3. The method of claim 1, wherein obtaining financial regulatory policy information comprises:
capturing initial information of a financial supervision policy;
performing word segmentation on the initial information of the financial supervision policy to obtain word segmentation information of the financial supervision policy;
and filtering punctuation marks, special characters and stop words in the financial supervision policy word segmentation information to obtain the financial supervision policy information.
4. The method of claim 1, further comprising:
and displaying the classification information according to a preset theme.
5. A bank data screening processing apparatus, comprising:
the acquisition unit is used for acquiring financial supervision policy information;
a conversion unit for converting the financial regulatory policy information into vectors in a vector space using a vector space model; the vector comprises initial feature words of the financial regulatory policy information;
the calculation unit is used for calculating the weight of the initial characteristic words by using a word frequency-inverse text frequency index technology and filtering the initial characteristic words with the weight lower than a preset threshold value to obtain filtering characteristic words;
the screening unit is used for screening the filtering feature words by utilizing a hidden Dirichlet distributed topic model to construct and form a feature dictionary;
and the classification unit is used for classifying the feature dictionary by using a balanced iterative clustering algorithm based on a hierarchical structure to obtain classification information.
6. The apparatus of claim 5, wherein the vector comprises:
V(Doc)=(t 1 w 1 (d),t 2 w 2 (d),…,t n w n (d));
wherein the V (Doc) is the financial regulatory policy information, the t 1 、t 2 …t n The initial characteristic words are w1 (d), w2 (d) \8230, wn (d) are weights of the initial characteristic words in the financial supervision policy information respectively.
7. The apparatus according to claim 5, wherein the obtaining unit is specifically configured to:
capturing initial information of financial supervision policies;
performing word segmentation on the initial information of the financial supervision policy to obtain word segmentation information of the financial supervision policy;
and filtering punctuation marks, special characters and stop words in the financial supervision policy word segmentation information to obtain the financial supervision policy information.
8. The apparatus of claim 5, further comprising:
and the display unit is used for displaying the classification information according to a preset theme.
9. A system for screening and processing bank data, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the bank data screening processing method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, and when being processed and executed, the computer program implements the steps of the bank data screening processing method according to any one of claims 1 to 4.
CN202210955359.9A 2022-08-10 2022-08-10 Bank data screening processing method, device, system and medium Pending CN115329076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210955359.9A CN115329076A (en) 2022-08-10 2022-08-10 Bank data screening processing method, device, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210955359.9A CN115329076A (en) 2022-08-10 2022-08-10 Bank data screening processing method, device, system and medium

Publications (1)

Publication Number Publication Date
CN115329076A true CN115329076A (en) 2022-11-11

Family

ID=83921673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210955359.9A Pending CN115329076A (en) 2022-08-10 2022-08-10 Bank data screening processing method, device, system and medium

Country Status (1)

Country Link
CN (1) CN115329076A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611917A (en) * 2023-07-19 2023-08-18 建信金融科技有限责任公司 Financial risk early warning method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611917A (en) * 2023-07-19 2023-08-18 建信金融科技有限责任公司 Financial risk early warning method, device, equipment and storage medium
CN116611917B (en) * 2023-07-19 2023-10-03 建信金融科技有限责任公司 Financial risk early warning method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11604798B2 (en) Techniques for ranking content item recommendations
US20210165955A1 (en) Methods and systems for modeling complex taxonomies with natural language understanding
Chen et al. Websrc: A dataset for web-based structural reading comprehension
US8266148B2 (en) Method and system for business intelligence analytics on unstructured data
US8452772B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a socials sphere
US11468342B2 (en) Systems and methods for generating and using knowledge graphs
US20080010292A1 (en) Techniques for clustering structurally similar webpages based on page features
US20080010291A1 (en) Techniques for clustering structurally similar web pages
CN104504150A (en) News public opinion monitoring system
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN106227885A (en) Processing method, device and the terminal of a kind of big data
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
Nikhil et al. A survey on text mining and sentiment analysis for unstructured web data
Das et al. A CV parser model using entity extraction process and big data tools
Li et al. Wikipedia based short text classification method
Wei et al. Online education recommendation model based on user behavior data analysis
CN115329076A (en) Bank data screening processing method, device, system and medium
Nodarakis et al. Using hadoop for large scale analysis on twitter: A technical report
CN113377739A (en) Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium
Phan et al. Applying skip-gram word estimation and SVM-based classification for opinion mining Vietnamese food places text reviews
Liu et al. Research on relation extraction of named entity on social media in smart cities
CN116738068A (en) Trending topic mining method, device, storage medium and equipment
CN111126073A (en) Semantic retrieval method and device
Sivaramakrishnan et al. Validating effective resume based on employer’s interest with recommendation system
Xu et al. Research on Tibetan hot words, sensitive words tracking and public opinion classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination