CN114760365B - Data extraction method and device and electronic equipment - Google Patents

Data extraction method and device and electronic equipment Download PDF

Info

Publication number
CN114760365B
CN114760365B CN202210421274.2A CN202210421274A CN114760365B CN 114760365 B CN114760365 B CN 114760365B CN 202210421274 A CN202210421274 A CN 202210421274A CN 114760365 B CN114760365 B CN 114760365B
Authority
CN
China
Prior art keywords
message
domain
information
standard
storing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210421274.2A
Other languages
Chinese (zh)
Other versions
CN114760365A (en
Inventor
许彦键
杨润斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202210421274.2A priority Critical patent/CN114760365B/en
Publication of CN114760365A publication Critical patent/CN114760365A/en
Application granted granted Critical
Publication of CN114760365B publication Critical patent/CN114760365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a data extraction method, a data extraction device and electronic equipment, wherein a plurality of message information storage tables are preconfigured in the data extraction device, and the plurality of message information storage tables have specific mapping relations. When the message extraction is needed, the message type is acquired, a message standard page corresponding to the message type is determined from the SWIFT message standard book, the message main body information is extracted from the message standard page, and the message main body information is stored in a corresponding table in a plurality of message information storage tables; the message body information comprises a message domain, a set domain corresponding to the message domain is determined, the message domain and the standard interface of the set domain are sequentially jumped to, at least one of name information, format information and domain coding information of the standard interface is extracted and stored in a corresponding table in a plurality of message information storage tables, and the problem that data in SWIFT message standard books need to be extracted is solved.

Description

Data extraction method and device and electronic equipment
Technical Field
The present invention relates to the field of data extraction, and in particular, to a data extraction method, apparatus, and electronic device.
Background
Before SWIFT upgrading is carried out each time, part of banking business systems participate in upgrading projects in a sponsorship or sponsorship mode, and upgrading scope is determined by analyzing a message standard book upgrading packet and combining with the message types related to the current system. The analysis of the upgrade package is mainly to automatically extract the data in the new SWIFT message standard book and store the data as a static data table.
The architecture of the SWIFT message standard book is complex, and one message type is described by nesting a plurality of web pages, so how to extract the data in the SWIFT message standard book is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a data extraction method, apparatus and electronic device, so as to solve the problem that data in a SWIFT message standard book needs to be extracted.
In order to solve the technical problems, the invention adopts the following technical scheme:
A data extraction method applied to a data extraction device, wherein a plurality of message information storage tables are preconfigured in the data extraction device, and have a specific mapping relation, the data extraction method comprises:
Obtaining a message type, and determining a message standard page corresponding to the message type from a SWIFT message standard book;
extracting message main body information from the message standard page, and storing the message main body information into corresponding tables in the multiple message information storage tables; the message body information comprises a message domain;
determining a set domain corresponding to the message domain;
sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interfaces, and storing the at least one of name information, format information and domain coding information in corresponding tables in the various message information storage tables.
Optionally, the multiple message information storage tables include a message main table, a message set domain mapping table, a message name table, a message format table and a message domain coding table;
extracting message main body information from the message standard page, and storing the message main body information into corresponding tables in the multiple message information storage tables, wherein the method comprises the following steps:
And at least extracting the message domain, the circulation information and the essential input information from the message standard page, and storing the message domain, the circulation information and the essential input information into the message main body table.
Optionally, determining the set domain corresponding to the message domain includes:
extracting an option value of the message domain;
determining whether the option value is a special standard symbol;
if yes, identifying the set domain corresponding to the message domain based on the type of the special standard symbol.
Optionally, after determining the set domain corresponding to the message domain, the method further includes:
And storing the mapping relation between the message domain and the set domain corresponding to the message domain into the message set domain mapping table.
Optionally, sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interface, and storing the at least one of name information, format information and domain coding information in corresponding tables in the multiple message information storage tables, where the at least one of name information, format information and domain coding information comprises:
Sequentially jumping to a first standard interface of the message domain, and extracting name information, format information and domain coding information of the first standard interface;
sequentially jumping to a second standard interface of the aggregation domain, and extracting name information and format information of the second standard interface;
Storing the extracted name information into a message name table, storing the extracted format information into a message format table, and storing the extracted domain coding information into a message domain coding table.
Optionally, after storing the domain coding information in the message domain coding table, the method further includes:
And converting the data stored in the various message information storage tables into a database script language, and taking the database script language as a static data model.
A data extraction apparatus applied to a data extraction device in which a plurality of message information storage tables having a specific mapping relationship are preconfigured, the data extraction apparatus comprising:
the page determining module is used for acquiring the message type and determining a message standard page corresponding to the message type from the SWIFT message standard book;
the first data storage module is used for extracting message main body information from the message standard page and storing the message main body information into corresponding tables in the multiple message information storage tables; the message body information comprises a message domain;
The set domain determining module is used for determining a set domain corresponding to the message domain;
and the second data storage module is used for sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interfaces, and storing the at least one of the name information, the format information and the domain coding information into corresponding tables in the various message information storage tables.
Optionally, the multiple message information storage tables include a message main table, a message set domain mapping table, a message name table, a message format table and a message domain coding table;
The first data storage module is specifically configured to:
And at least extracting the message domain, the circulation information and the essential input information from the message standard page, and storing the message domain, the circulation information and the essential input information into the message main body table.
Optionally, the aggregate domain determining module is specifically configured to:
and extracting an option value of the message domain, determining whether the option value is a special standard symbol, and if so, identifying a set domain corresponding to the message domain based on the type of the special standard symbol.
An electronic device, comprising: a memory and a processor;
wherein the memory is used for storing programs;
the processor invokes the program and is used to perform the data extraction method described above.
Compared with the prior art, the invention has the following beneficial effects:
The invention provides a data extraction method, a data extraction device and electronic equipment, wherein various message information storage tables are preconfigured in the data extraction device, and the various message information storage tables have specific mapping relations. When the message extraction is needed, the message type is acquired, a message standard page corresponding to the message type is determined from a SWIFT message standard book, message main body information is extracted from the message standard page, and the message main body information is stored in a corresponding table in the multiple message information storage tables; the message body information comprises a message domain, a set domain corresponding to the message domain is determined, the message domain and the standard interface of the set domain are sequentially jumped to, at least one of name information, format information and domain coding information of the standard interface is extracted and stored in a corresponding table in the various message information storage tables, and the problem that data in a SWIFT message standard book need to be extracted is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a mapping relation diagram of a message information storage table according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for extracting data according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a data extraction method according to an embodiment of the present invention;
FIG. 4 is another schematic diagram of a data extraction method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data extraction method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data extraction method according to an embodiment of the present invention;
Fig. 7 is a fifth schematic diagram of a data extraction method according to an embodiment of the present invention;
fig. 8 is a sixth schematic diagram of a data extraction method according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a data extraction device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the technical content of the present invention may be more clearly understood by those skilled in the art, some terms of art will be explained.
SWIFT: the global banking and financial telecommunication association.
SWIFT message: the SWIFT provides a message in a standard format for inter-bank communication.
SWIFT message standard book: SWIFT message standard handbook, standard format for inquiry message, etc.
HTML: a hypertext markup language for creating web pages.
Generally, the SWIFT organizes the upgrading work of the message standard each year, including the related message types and message fields, and accordingly, the transaction elements are synchronously modified and the mapping relation is adjusted. Part of banking systems used by SWIFT members (banks) need to support the latest message standard, and the normal receiving and transmitting of system messages are not affected only when switching is finished at a specified time point. In this case, the associated system is required to respond and implement the fastest to such upgrades to meet the latest industry standards and business requirements.
At present, before SWIFT upgrading, part of banking systems participate in upgrading projects in a sponsorship or sponsorship mode, and the upgrading scope is determined by analyzing a message standard book upgrading packet and combining with the message types related to the current system. The analysis of the upgrade package is mainly to analyze the change of the new SWIFT message standard domain and change the mapping relation according to the change of the message domain, at this time, if the new message standard domain data can be automatically extracted and stored as a static data table by a method, the mapping relation is formed in the program, and when the data is updated each time, the data is extracted and the static data table is updated, so that the workload of manual work and updating of the data can be reduced, and a large amount of updating time is reduced.
At present, when data extraction is performed, the data extraction tag and the use format of the data tag can be defined first, then the tag data extraction and analysis are performed on the tag of the HTML webpage pair through a tag data analyzer, multi-dimensional data is obtained after the data analysis, and finally a text file is generated.
However, because the SWIFT message standard book is complex, one message type is described by nesting a plurality of web pages, and the tag analysis mode cannot be suitable for analysis of the SWIFT message standard book.
In order to solve the technical problem that the standard book of the SWIFT message cannot be analyzed, the inventor finds that the standard book of the SWIFT message is displayed in the form of an HTML webpage, and node data can be extracted by acquiring an HTML node, but the message standard of the SWIFT is complex, and comprises a standard domain of the message, a module affiliated to the standard domain, optional and optional attributes of the domain, standardability of the domain, whether circulation is carried out, and the like, wherein most of the domains also comprise optional domains A, B, C and the like, the standardability of the domain also changes due to different types, so that the data is simply extracted and stored into a form, and the form data is difficult to be converted into data required by a developer.
Thus, the data of the SWIFT message standard book can be presented in 5 different data sheet modes. And automatically extracting and generating corresponding database insert sentences of the data from the standard books, wherein the data of the 5 tables are in one-to-one correspondence according to the types and the numbers of the messages, and each performs the function to form a mapping relation to form a static data model. According to the technical scheme, the new SWIFT message standard data can be quickly obtained, program modification caused by the change of the message domain is reduced, and meanwhile, the manual operation workload is reduced, so that the method has the characteristics of convenience, flexibility, strong adaptability and centralized management.
Based on the foregoing, the embodiment of the invention provides a data extraction method, which is applied to data extraction equipment, wherein the data extraction equipment can be a controller, a processor and other equipment.
The data extraction method is mainly used for extracting SWIFT message standard books. In the SWIFT message standard book, each message contains a large number of domains, each domain has a large number of attributes, the attributes of each domain of each message are mapped into static table data one by one, the mapping relationship complexity of the message domains is mainly represented as the following points:
1. the message domain is divided into a necessary transmission domain and an optional domain;
2. some fields in the message are cyclic;
3. The same field may appear in a different Sequence, such as 57a (one field of the message) may appear in Sequence a (a Sequence of the message) and also in Sequence B (B Sequence of the message);
4. Sub-sequences may also be present in the same Sequence, e.g., sequence B may contain Subsequence B a, subsequence B a (sub-sequences in Sequence B of message), etc.;
5. The same domain contains multiple aggregation domains, such as 57A contains 57A, 57D,57J, and the like, and the format and meaning of each aggregation domain are different, such as the meaning of 57A domain is PARTY IDENTIFIER and IDENTIFIER CODE, and the meaning of 57D domain is PARTY IDENTIFIER and NAME AND ADDRESS;
6. The formats of each domain are different, the formats are regular expressions, and the domains comprise optional domains and necessary domains, for example, 57D domains are [/1 ]! a ] [ 34x ] 4x 35x,/1 ]! a and/34 x are optional domains, and 4x 35x is a mandatory domain;
7. Most fields contain multiple formats, and most message fields are assembled from multiple regular expressions, e.g., 57A fields are [/1 ]! a ] [ 34x ]4 ]! a2-! a2-! c 3-! c ], include/1-! a. /34x, 4-! a. 2-! a. 2-! c. 3-! c, a plurality of necessary domains and optional domains, wherein the bands [ ] are optional domains;
8. the partial fields have default values and optional values.
Because the attribute of the message domain is complex, if one static data table is used for storage, the complex mapping relation can not be met far, so the invention designs five static data tables, and the mapping relation of the message domain can be clearly shown by forming a message structure tree through the mapping relation and the hierarchical progressive relation of the five tables.
Therefore, in the present invention, the data extraction device is preconfigured with a plurality of message information storage tables, and the plurality of message information storage tables have a specific mapping relationship.
Specifically, the five message structure tree table is designed with reference to fig. 1:
1. Message body table: the table is used for storing the message body, mainly storing which domains the message type has, and the types of the domains, whether the message is necessary to be transmitted, whether the message is circulated, and the like.
The main body information of the message domain is registered in the main body table of the message, and after the main body information is classified according to different message types, each message domain has a unique message domain number.
2. Message set domain mapping table: the table is applied to domain information having a plurality of aggregation domains for storing mapping relationships between the plurality of aggregation domains, such as 57A mapping 57A, 57d,57j, etc. of Sequence a.
Registering a mapping relation between message aggregation domains in a message aggregation domain mapping table, if the message aggregation domain has an aggregation domain (such as 57A/57D/57J), generating a new message domain number for the aggregation domain in the table so as to distinguish the message aggregation domains and mapping the new message domain number in a subsequent table mapping relation;
The message domain numbers in the message set domain mapping table correspond to the message domain numbers in the message body table. If a message domain has a corresponding set domain, the message domain has a corresponding message set domain mapping table.
3. Message format table: the table is used for storing the format of each message field, and one field contains a plurality of formats which are split into a plurality of pieces of data.
The message domain number in the message format table corresponds to the message domain number in the message main body table and the new message domain number in the message aggregation domain mapping table, and the format of each message domain (including the message domain and the aggregation domain) is stored in the message format table, and because each domain may contain multiple rows, the new message sub-row number is generated in the table according to the number of the rows of the domain, and a mapping relationship is formed between the new message sub-row number and the message domain name table and the message domain coding table.
4. Message name table: the table is used for storing the name of each message domain, if one domain is divided into a plurality of rows, the message is divided into a plurality of pieces of data, and the names of each row are also in one-to-one correspondence.
The name of each sub-row is stored in the message name table, and a mapping relation is formed between the name of each sub-row and the number of the sub-row of the message in the message format table.
5. A message domain encoding table: the table is used to store some fields and values that have default values and optional values.
The message domain coding table stores the optional value information of the sub-row, and forms a mapping relation with the number of the sub-row of the message in the message format table.
On the basis of the above five message information storage tables, referring to fig. 2, the data extraction method may include:
s11, obtaining the message type, and determining a message standard page corresponding to the message type from a SWIFT message standard book.
Because SWIFT message standard book is the form of HTML, each item of data information can trace back to a specific certain node label value, so we can extract the data that needs through the mode of obtaining the different nodes of HTML.
Specifically, the message type is acquired, i.e. the message type that we need to acquire is determined first, e.g. the message data of MT300 needs to be acquired. The SWIFT message standard book can be classified according to the message types, so that the corresponding message standard page can be inquired from the SWIFT message standard book according to the message types.
If the message of MT300 is queried, the message standard page of MT300 can be entered after the message of MT300 is found from the 3-header message, referring specifically to fig. 3.
S12, extracting message main body information from the message standard page, and storing the message main body information into corresponding tables in the multiple message information storage tables.
Wherein, the message body information comprises a message domain.
Specifically, at least the message domain name, the circulation information and the essential input information are extracted from the message standard page and stored in the message main body table.
In detail, the information of the message type standard home page is extracted: extracting all message domain names of the messages, whether the basic attributes of the messages are necessary to be transmitted, whether the messages are circulated and the like, and putting the basic attributes into a message body table data list. The message domain name is available according to the Tag field of the SWIFT message standard book (refer specifically to fig. 3). The status value of the domain can be extracted through the tag node value of HTML to obtain whether the domain is a necessary domain, referring to fig. 4, if M is necessary, if O is optional.
Referring to fig. 5, when a loop flag arrow exists in the list, a loop body is formed to the field between the END loop flags, and whether or not to loop is determined by whether or not a loop body exists.
After the above-mentioned message domain name and basic attributes of whether it must be input or not and whether it is circulated or not are extracted, they are used as message main body information, and stored into the message main body table.
Other information in the main body table of the message, such as: the generation process of the message type, the message number, the sequence order, the message domain type and the message domain number is as follows:
Message type: the SWIFT message is fixed as MT.
Message number: obtained from the SWIFT message Standard book front page, the title of the message Standard book detail page is truncated (see title of FIG. 7).
Sequence number: the script is generated according to a certain rule, and the unique identification is generated.
Sequence order: the text fields of the type are sequentially generated by scripts according to sequence numbers from small to large.
Message field type: whether the domain is a set domain is judged according to Content/Options of the SWIFT message standard book (refer to FIG. 3).
Message domain number: the script is generated according to a certain rule and the message domain.
S13, determining the set domain corresponding to the message domain.
Specifically, the option value of the message domain is extracted, and whether the option value is a special standard symbol is determined. If yes, identifying the set domain corresponding to the message domain based on the type of the special standard symbol.
In detail, referring to fig. 6, it is checked whether the message has a special field (including multiple types): and extracting option values of the message domain, and judging whether the option values are special standard symbols such as A, D, or J. If so, classifying according to types such as A/D/J, determining a set domain corresponding to the message domain, and storing the mapping relation between the message domain and the set domain corresponding to the message domain into the message set domain mapping table.
The fields configured in the message aggregation domain mapping table are as follows: message domain number, message type, message number, new message domain number, and message domain order.
Wherein, the message domain number: corresponding to the message domain number of the message main body table.
Message type: corresponding to the message type of the message main body table.
Message number: corresponding to the message number of the message main body table.
New message domain number: is the number of the aggregation domain corresponding to the number of the message domain in the main body table of the message, and is generated by the script according to a certain rule according to the number of the message domain.
Message domain order: and a message domain is taken as a whole, and the sequence numbers from small to large are sequentially generated by the script.
S14, sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interfaces, and storing the at least one of the name information, the format information and the domain coding information into corresponding tables in the various message information storage tables.
Specifically, step S14 may include:
1) Sequentially jumping to a first standard interface of the message domain, and extracting name information, format information and domain coding information of the first standard interface.
2) Sequentially jumping to a second standard interface of the aggregation domain, and extracting name information and format information of the second standard interface.
3) Storing the extracted name information into a message name table, storing the extracted format information into a message format table, and storing the extracted domain coding information into a message domain coding table.
Specifically, the format of each field (including the message field and the aggregate field) is circularly extracted: and circularly entering a standard page (such as a first standard interface of a message domain and a second standard interface of a collection domain) of each domain, splitting according to the number of lines by reading the format of each domain, splitting the optional attribute of the format (the optional domain with [ ] in the format of the domain, such as the optional value of [/1!a ]/34 x ]4 |a2 |c [3 |c ] in the format of [/1!a ] ], and the optional value of the [ ] in the format of the domain) and the content of the message format, and putting the split data and the content of the message format into a message format table and a message name table.
As shown in fig. 7, fig. 7 is a detail page of the 53a field of the MT300, and the format of each line in the 53a field and the corresponding description thereof (contents in the text box in fig. 7) are extracted to form a mapping relationship.
Through the steps, the name information and the format information can be obtained, and the domain coding information is extracted only from the first standard interface of the message domain.
Specifically, it is checked whether the content of the message field has an optional value: checking whether the codes value has an optional list value in a standard page of the domain, and if so, putting the codes value into a message domain coding table;
As shown in fig. 8, fig. 8 shows that optional values of 49 fields of MT300 are N and Y, and optional values of fields can be obtained by extracting node values in header CODES and stored in the message field coding table.
In this embodiment, the meaning of the fields in the message format table is as follows:
Message domain number: corresponding to the message domain number of the message main body table and the new message domain number of the message set domain mapping table.
Message type: corresponding to the message type of the message main body table.
Message number: corresponding to the message number of the message main body table.
Message field type: corresponding to the type of the message domain in the message body table.
Message domain order: and taking a message domain as a whole, and sequentially generating the sequence numbers from small to large according to the number of the lines by the script.
Message subrow numbering: according to the number of the message domain, the script is generated according to a certain rule.
Message format: obtained from the format part second column of the message field (see fig. 7).
A start symbol: a start character of the message format.
End symbol: an end symbol in the message format.
The meaning of the fields in the message name table is as follows:
Message domain number: corresponding to the message domain number of the message main body table and the new message domain number of the message set domain mapping table.
Message domain order: corresponding to the sequence of the message domains of the message aggregation domain mapping table.
Message subrow numbering: corresponding to the message subrow numbers of the message format table.
Message subrow order: corresponding to the sequence of message fields of the message format table.
Message subrow name: obtained by the format part third column of the message field (see fig. 7).
The meaning of the fields in the message field encoding table is as follows:
Message domain number: corresponding to the message domain number of the message main body table and the new message domain number of the message set domain mapping table.
Message domain order: corresponding to the sequence of the message domains of the message aggregation domain mapping table.
Message subrow numbering: corresponding to the message subrow numbers of the message format table.
Message subrow order: corresponding to the sequence of message fields of the message format table.
Optional value order: and taking a message domain as a whole, and sequentially generating the sequence numbers from small to large according to the number of the lines by the script.
Optional values: obtained by the first column of the codes part of the message field (fig. 8).
Type (2): corresponding to the message type of the message main body table.
Whether or not to input: corresponding to whether the message main body table is necessary to be input or not.
After the domain coding information is stored in the message domain coding table, the method further comprises:
And converting the data stored in the various message information storage tables into a database script language, and taking the database script language as a static data model.
Specifically, the extracted data is formatted into a database scripting language (sql) to construct a static data model.
The invention extracts the standard format data of SWIFT message into the form of static data table, and establishes a set of data model, thereby realizing centralized management.
In addition, the invention realizes the datamation of the message standard, if the message configuration is written in the code in a mode of reading data, the change of the code layer of the system can be reduced when the subsequent SWIFT message standard is changed, and the repeated work is greatly reduced.
Optionally, on the basis of the embodiment of the data extraction method, another embodiment of the present invention provides a data extraction device, which is applied to a data extraction apparatus, where a plurality of message information storage tables are preconfigured in the data extraction apparatus, and the plurality of message information storage tables have a specific mapping relationship, and referring to fig. 9, the data extraction device includes:
The page determining module 11 is configured to obtain a message type, and determine a message standard page corresponding to the message type from a SWIFT message standard book;
the first data storage module 12 is configured to extract message body information from the message standard page, and store the message body information into a corresponding table in the multiple message information storage tables; the message body information comprises a message domain;
a set domain determining module 13, configured to determine a set domain corresponding to the message domain;
the second data storage module 14 is configured to sequentially jump to the standard interfaces of the message domain and the aggregation domain, extract at least one of name information, format information and domain coding information of the standard interface, and store the at least one of name information, format information and domain coding information in corresponding tables in the multiple message information storage tables.
Further, the multiple message information storage tables comprise a message main body table, a message set domain mapping table, a message name table, a message format table and a message domain coding table;
The first data storage module is specifically configured to:
And at least extracting the message domain, the circulation information and the essential input information from the message standard page, and storing the message domain, the circulation information and the essential input information into the message main body table.
Further, the aggregate domain determining module is specifically configured to:
and extracting an option value of the message domain, determining whether the option value is a special standard symbol, and if so, identifying a set domain corresponding to the message domain based on the type of the special standard symbol.
Further, the method further comprises the following steps:
And the third data storage module is used for storing the mapping relation between the message domain and the set domain corresponding to the message domain into the message set domain mapping table.
Further, the second data storage module 14 includes:
the first extraction sub-module is used for sequentially jumping to a first standard interface of the message domain and extracting name information, format information and domain coding information of the first standard interface;
The second extraction submodule is used for sequentially jumping to a second standard interface of the aggregation domain and extracting name information and format information of the second standard interface;
The data storage sub-module is used for storing the extracted name information into a message name table, storing the extracted format information into a message format table and storing the extracted domain coding information into a message domain coding table.
Further, the method further comprises the following steps:
the model determining module is used for converting the data stored in the message information storage tables into a database script language and taking the database script language as a static data model.
In this embodiment, the data extraction device is preconfigured with a plurality of message information storage tables, where the plurality of message information storage tables have a specific mapping relationship. When the message extraction is needed, the message type is acquired, a message standard page corresponding to the message type is determined from a SWIFT message standard book, message main body information is extracted from the message standard page, and the message main body information is stored in a corresponding table in the multiple message information storage tables; the message body information comprises a message domain, a set domain corresponding to the message domain is determined, the message domain and the standard interface of the set domain are sequentially jumped to, at least one of name information, format information and domain coding information of the standard interface is extracted and stored in a corresponding table in the various message information storage tables, and the problem that data in a SWIFT message standard book need to be extracted is solved.
The working process of each module and sub-module in this embodiment is referred to the corresponding description in the above embodiment, and will not be repeated here.
Optionally, on the basis of the embodiments of the data extraction method and apparatus, another embodiment of the present invention provides an electronic device, which may be the data extraction device described above.
An electronic device includes: a memory and a processor;
wherein the memory is used for storing programs;
the processor invokes the program and is used to perform the data extraction method described above.
In this embodiment, the data extraction device is preconfigured with a plurality of message information storage tables, where the plurality of message information storage tables have a specific mapping relationship. When the message extraction is needed, the message type is acquired, a message standard page corresponding to the message type is determined from a SWIFT message standard book, message main body information is extracted from the message standard page, and the message main body information is stored in a corresponding table in the multiple message information storage tables; the message body information comprises a message domain, a set domain corresponding to the message domain is determined, the message domain and the standard interface of the set domain are sequentially jumped to, at least one of name information, format information and domain coding information of the standard interface is extracted and stored in a corresponding table in the various message information storage tables, and the problem that data in a SWIFT message standard book need to be extracted is solved.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A data extraction method, characterized in that it is applied to a data extraction device, in which a plurality of message information storage tables are preconfigured, the plurality of message information storage tables having a specific mapping relationship, the data extraction method comprising:
Obtaining a message type, and determining a message standard page corresponding to the message type from a SWIFT message standard book;
extracting message main body information from the message standard page, and storing the message main body information into corresponding tables in the multiple message information storage tables; the message body information comprises a message domain;
determining a set domain corresponding to the message domain;
Sequentially jumping to standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interfaces, and storing the at least one of name information, format information and domain coding information in corresponding tables in the various message information storage tables;
the multiple message information storage tables comprise a message main body table, a message set domain mapping table, a message name table, a message format table and a message domain coding table; the message name table is used for storing the names of the message domains; the message domain coding table is used for storing domains and values with default values and optional values;
Extracting message main body information from the message standard page, and storing the message main body information into corresponding tables in the multiple message information storage tables, wherein the method comprises the following steps: extracting at least a message domain, circulation information and necessary entry information from the message standard page, and storing the message domain, the circulation information and the necessary entry information into the message main body table;
Sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interface, and storing the at least one of name information, format information and domain coding information into corresponding tables in the various message information storage tables, wherein the method comprises the following steps: sequentially jumping to a first standard interface of the message domain, and extracting name information, format information and domain coding information of the first standard interface; sequentially jumping to a second standard interface of the aggregation domain, and extracting name information and format information of the second standard interface; storing the extracted name information into a message name table, storing the extracted format information into a message format table, and storing the extracted domain coding information into a message domain coding table.
2. The method of claim 1, wherein determining the aggregate domain corresponding to the message domain comprises:
extracting an option value of the message domain;
determining whether the option value is a special standard symbol;
if yes, identifying the set domain corresponding to the message domain based on the type of the special standard symbol.
3. The data extraction method according to claim 2, further comprising, after determining the set domain corresponding to the message domain:
And storing the mapping relation between the message domain and the set domain corresponding to the message domain into the message set domain mapping table.
4. The data extraction method according to claim 1, further comprising, after storing the domain coding information in a message domain coding table:
And converting the data stored in the various message information storage tables into a database script language, and taking the database script language as a static data model.
5. A data extraction apparatus, characterized in that it is applied to a data extraction device in which a plurality of message information storage tables having a specific mapping relationship are preconfigured, the data extraction apparatus comprising:
the page determining module is used for acquiring the message type and determining a message standard page corresponding to the message type from the SWIFT message standard book;
the first data storage module is used for extracting message main body information from the message standard page and storing the message main body information into corresponding tables in the multiple message information storage tables; the message body information comprises a message domain;
The set domain determining module is used for determining a set domain corresponding to the message domain;
The second data storage module is used for sequentially jumping to the standard interfaces of the message domain and the aggregation domain, extracting at least one of name information, format information and domain coding information of the standard interfaces, and storing the at least one of the name information, the format information and the domain coding information into corresponding tables in the various message information storage tables;
the multiple message information storage tables comprise a message main body table, a message set domain mapping table, a message name table, a message format table and a message domain coding table; the message name table is used for storing the names of the message domains; the message domain coding table is used for storing domains and values with default values and optional values;
The first data storage module is specifically configured to extract at least a message domain, cycle information and essential input information from the message standard page, and store the extracted message domain, cycle information and essential input information in the message main body table;
The second data storage module is specifically configured to jump to a first standard interface of the message domain in sequence, and extract name information, format information and domain coding information of the first standard interface; sequentially jumping to a second standard interface of the aggregation domain, and extracting name information and format information of the second standard interface; storing the extracted name information into a message name table, storing the extracted format information into a message format table, and storing the extracted domain coding information into a message domain coding table.
6. The data extraction device of claim 5, wherein the aggregate domain determining module is specifically configured to:
and extracting an option value of the message domain, determining whether the option value is a special standard symbol, and if so, identifying a set domain corresponding to the message domain based on the type of the special standard symbol.
7. An electronic device, comprising: a memory and a processor;
wherein the memory is used for storing programs;
a processor calling a program and being adapted to perform the data extraction method according to any of claims 1-4.
CN202210421274.2A 2022-04-21 2022-04-21 Data extraction method and device and electronic equipment Active CN114760365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210421274.2A CN114760365B (en) 2022-04-21 2022-04-21 Data extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210421274.2A CN114760365B (en) 2022-04-21 2022-04-21 Data extraction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114760365A CN114760365A (en) 2022-07-15
CN114760365B true CN114760365B (en) 2024-06-11

Family

ID=82332072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210421274.2A Active CN114760365B (en) 2022-04-21 2022-04-21 Data extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114760365B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344183A (en) * 2018-01-30 2019-02-15 深圳壹账通智能科技有限公司 Data interactive method, device, computer equipment and storage medium
CN113312108A (en) * 2021-06-18 2021-08-27 中国农业银行股份有限公司 SWIFT message checking method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344183A (en) * 2018-01-30 2019-02-15 深圳壹账通智能科技有限公司 Data interactive method, device, computer equipment and storage medium
WO2019149019A1 (en) * 2018-01-30 2019-08-08 深圳壹账通智能科技有限公司 Data interaction method and apparatus, computer device, and storage medium
CN113312108A (en) * 2021-06-18 2021-08-27 中国农业银行股份有限公司 SWIFT message checking method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114760365A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN103077185A (en) Object-based user-defined information-expanding method
CN113312108B (en) SWIFT message verification method and device, electronic equipment and storage medium
US20100223214A1 (en) Automatic extraction using machine learning based robust structural extractors
CN107678943B (en) Page automatic testing method of abstract page object
CN105095067A (en) User interface element object identification and automatic test method and apparatus
CN113419729B (en) Front-end page building method, device, equipment and storage medium based on componentization
CN107220274A (en) One kind visualization data-interface fairground implementation method
CN108170409B (en) Development method and system of WEB front-end control
CN111628975A (en) Method and device for assembling XML message
CN112417338B (en) Page adaptation method, system and equipment
CN110020358A (en) Method and apparatus for generating dynamic page
CN108664546B (en) XML data structure conversion method and device
CN110851136A (en) Data acquisition method and device, electronic equipment and storage medium
CN113868568A (en) Webpage keyword highlighting method, device, equipment and storage medium
CN116521621A (en) Data processing method and device, electronic equipment and storage medium
CN112463261B (en) Interface calling method, device, electronic equipment, medium and product
CN114398138A (en) Interface generation method and device, computer equipment and storage medium
CN113409111A (en) Bidding information processing method, system and readable storage medium
CN112650492A (en) Rendering method, system and related device of Web page
CN113297831A (en) Method and system for generating verifiable report webpage by Excel
US20150248500A1 (en) Documentation parser
CN114760365B (en) Data extraction method and device and electronic equipment
CN107977459B (en) Report generation method and device
CN113656000B (en) Webpage processing method and device
CN113127776A (en) Breadcrumb path generation method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant