CN112288586A

CN112288586A - Insurance industry data integration method based on HBase and related equipment

Info

Publication number: CN112288586A
Application number: CN202011312448.9A
Authority: CN
Inventors: 范铮; 陈学亮; 赵星光; 高擎阳; 袁利鸥; 曲明钰
Original assignee: China Life Insurance Co Ltd China
Current assignee: China Life Insurance Co Ltd China
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-01-29

Abstract

One or more embodiments of the present specification provide an insurance industry data integration method based on HBase and related devices; the method comprises the following steps: carrying out reverse order or Hash processing on the policy number, and taking the policy number subjected to the reverse order or Hash processing as a row key rowkey of an HBase data integration model table of the starting database; setting the column name of each column in the HBase data integration model table as the table name of a data source table to be integrated and the primary key value of each row of the data source table; and respectively splicing each row of data of the data source table into JSON character strings, and storing the JSON character strings into corresponding fields of the HBase data integration model table. The method and the related equipment provided by the specification utilize the technical advantages of HBase and combine the characteristics of insurance industry data, and solve the problems of difficult data updating, high performance consumption in the data integration process and high field expansion cost of the data integration scheme.

Description

Insurance industry data integration method based on HBase and related equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of big data technologies, and in particular, to an insurance industry data integration method based on HBase and a related device.

Background

Many systems, such as a core business transaction system, a client resource management system, etc., are purchased or developed in the process of building the insurance company system, and as business develops, the systems are evolving, and there may be a plurality of systems for generating business data. These data are valuable assets for insurance enterprises, but the assets are scattered in information islands and cannot exert the value of the assets. How to effectively integrate data with large capacity, multiple types, rapid growth and low value density is a difficult problem for each insurance company.

In the prior art, an insurance industry data warehouse has a data integration function, and usually establishes a plurality of theme tables according to business processes (such as policy, insurance, claim settlement, etc.), and integrates data of a plurality of systems into the theme tables. However, the insurance industry is characterized in that the service data is not only newly added (insert), but also is much updated (update), and in the data integration process, the update to be processed in the traditional relational database is much more complicated than simple addition, and the original data is often required to be deleted and then written, so that the system performance overhead is higher.

Based on this, a data integration method that can realize simple update and small system performance overhead is needed.

Disclosure of Invention

In view of the above, one or more embodiments of the present disclosure are directed to an insurance industry data integration method based on HBase and related equipment, so as to overcome all or part of the deficiencies in the prior art.

In view of the above, one or more embodiments of the present specification provide an insurance industry data integration method based on HBase, including:

determining a row key of an open source database HBase data integration model table according to a policy number in at least one data source table to be integrated;

setting the column name of each column of the HBase data integration model table according to the table name of the at least one data source table;

splicing fields of each row of data of the at least one data source table into a character string respectively; and

storing the character string into a corresponding field in the HBase data integration model table; the row key of the corresponding field is a row key determined by the policy number corresponding to the character string; and the column name of the corresponding field is the column name set by the table name of the data source table where the character string is located.

Based on the same inventive concept, one or more embodiments of the present specification further provide an insurance industry data integration apparatus based on HBase, including:

the determining module is configured to determine a row key of an open source database HBase data integration model table according to the policy number in at least one data source table to be integrated;

the setting module is configured to set a column name of each column of the HBase data integration model table according to the table name of the at least one data source table;

the splicing and storing module is used for splicing the fields of each row of data of the at least one data source table into a character string; and

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method as described in any one of the above items when executing the program.

Based on the same inventive concept, one or more embodiments of the present specification also provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to perform the method as described in any one of the above.

As can be seen from the above, the insurance industry data integration method based on the HBase and the related device provided in one or more embodiments of the present disclosure form a data integration scheme by using the technical advantages of the HBase and combining the characteristics of insurance industry data, and solve the problems of difficulty in data updating, high performance consumption in the data integration process, and high field expansion cost of the data integration scheme.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a flow diagram of a HBase-based insurance industry data integration method according to one or more embodiments of the present disclosure;

FIG. 2 is a diagram illustrating an HBase-based billing form data integration method in one or more embodiments of the present disclosure;

FIG. 3 is a schematic diagram of an HBase-based insurance industry data integration method in accordance with one or more embodiments of the present disclosure;

FIG. 4 is a schematic structural diagram of an HBase-based insurance industry data integration apparatus according to one or more embodiments of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of the terms "comprising" or "including" and the like in one or more embodiments of the present specification is intended to mean that the element or item presented before the term "comprises" or "comprising" is included in the list of elements or items listed after the term and its equivalents, without excluding other elements or items.

As described in the background section, the existing insurance industry data integration method has the problems of complex updating, high system performance overhead and the like. In the process of implementing the present disclosure, the applicant finds that the existing insurance industry data integration method has the following defects:

(1) multiple large data tables are associated with a large cost. The integrated model often encounters the situation that different fields come from different systems or different tables, and when a row of records in the model needs to be updated, the data causing changes is firstly taken as a condition, and the whole row is spliced and then updated, so that the associated cost is large.

(2) The detail data is easy to lose. The model created in the traditional relational database is stored in a two-dimensional table form, the fields are fixed, all details are difficult to keep, and therefore when new fields need to be added, the model needs to be continuously modified, changes of machining programs are brought, data are completely re-machined, and the cost is very high.

In recent years, more and more enterprises introduce big data technologies to process the problems of data integration, real-time analysis and the like which are difficult to deal with by traditional software, new components and technologies are in endless, wherein an open-source framework Hadoop is one of main stream big data solutions which are continuously developed, and a large number of components are derived by the open-source framework to meet various business requirements of the enterprises.

The distributed key-value type open source database HBase is an important member in Hadoop ecology, runs on a distributed file system HDFS, has high availability and expansibility of the HDFS, and also plays many characteristics of the key-value type database together, such as: insert i.e. update, allow only partial column updates, can accommodate billions of rows and millions of columns, large concurrent query millisecond returns, etc. The characteristics of HBase are expected to be fully utilized to form a set of solution for loading, storing, processing and inquiring, and a new design choice is provided for the construction of a large enterprise data platform in the insurance industry.

In view of this, one or more embodiments of the present disclosure provide an insurance industry data integration method based on HBase, and specifically, a policy number is first subjected to reverse order or hash processing, and the policy number subjected to the reverse order or hash processing is used as a row key rowkey of an HBase data integration model table of an open source database, so that a rowkey for uniformly hashing data can be obtained. And then setting the column name of each column in the HBase data integration model table as the table name of a data source table to be integrated and the primary key value of each row of the data source table. And finally, splicing each row of data of the data source table into JSON character strings respectively, and storing the JSON character strings into corresponding fields of the HBase data integration model table to finish data integration.

Therefore, the insurance industry data integration method based on the HBase and the related equipment in one or more embodiments of the specification form a data integration scheme by using the technical advantages of the HBase and combining the characteristics of insurance industry data, and solve the problems of difficult data updating, high performance consumption in the data integration process and high field expansion cost of the data integration scheme.

The technical solutions of one or more embodiments of the present specification are described in detail below with reference to specific embodiments.

Referring to fig. 1, an embodiment of the present disclosure of an HBase-based insurance industry data integration method includes the following steps:

step S101, determining a row key of an open source database HBase data integration model table according to a policy number in at least one data source table to be integrated.

In this step, the HBase data integration model is a model in which all data penetrating through the service entity is recorded in one row, and when data in each data source table is imported into the HBase data integration model, a rowkey field is specified, and for an insurance company, most of applications and data are centered on a policy, so that the policy number is used as the rowkey of the HBase data integration model table. The best case for the HBase data integration model table is that data is uniformly distributed in a plurality of regions, each region has a boundary: startrow and endrow (except for 1 region, there is no startrow and endrow), and because data is stored in different regions according to rowkey, the rowkey is ordered according to the ascii code, if the data is to be uniformly distributed, a rowkey capable of uniformly hashing the data is needed, the number of the policy is the serial number of the organization and the year from the previous bit, and the n last bits are usually self-increment sequences (serial numbers), therefore, the policy number needs to be processed in reverse order to make the serial number in front, and the policy number processed in reverse order is used as the rowkey of the HBase data integration model table.

The negative sequence of the policy number is specifically as follows: and writing the character strings corresponding to the policy number into the open source database HBase data integration model table in a reverse order so as to lead the serial number in the policy number to be in front.

In this embodiment, in addition to performing reverse order processing on the policy number to obtain a rowkey capable of uniformly hashing the data, hash processing may be performed on the policy number, and the policy number after the hash processing is used as the rowkey of the HBase data integration model table.

And S102, setting the column name of each column of the HBase data integration model table according to the table name of the at least one data source table.

In this step, specifically, the column name of each column in the HBase data integration model table needs to be set as the table name of the at least one data source table plus the primary key value of each row of the data source table. This allows for differentiation of each column in the HBase data integration model table. For example: in the insurance industry, the multiple charges for a policy are recorded in multiple rows of a table in a source system conforming to a paradigm, while in the HBase data integration model, each row exists in a separate column named table name of a data source table + primary key value of each row of the data source table. Specifically, referring to fig. 2, a schematic diagram of a charging form data integration method based on HBase in an embodiment of this specification is shown, in the diagram, a table name of a data source table is "charging", there are three rows of data, primary key values of the three rows of data are 01, 02, and 03, a plurality of fields need to be stored in an HBase data integration model, that is, column names of each column are set in an HBase data integration model table, and then the column names are: charge 01, charge 02, charge 03.

Step S103, splicing fields of each line of data of the at least one data source table into a character string respectively and storing the character string into a corresponding field in the HBase data integration model table; the row key of the corresponding field is a row key determined by the policy number corresponding to the character string; and the column name of the corresponding field is the column name set by the table name of the data source table where the character string is located.

In this step, the character string is in a JSON format, and the JSON character string is in a key-value format. JSON strings are a relatively common format that preserves every field in a data source table, and are available as handlers for the JSON format in many high-level languages.

For example, the data source table a has two fields a and b, and a row value in the data source table a is v _ a and v _ b, respectively, then the JSON character string spliced into the key-value format is { a: v _ a, b: v _ b }, that is, "column name: column value "stored in the corresponding field in the HBase data integration model table. First, if the data source table fields are set to be in one-to-one correspondence, namely the data source table fields are all scattered, the growth speed of the columns can be amplified by dozens of times, and the performance problem is caused; secondly, the data is stored in the form of JSON character strings, so that the data can keep the row-column relationship (a plurality of columns in the same row) in the data source table, and the relationship is difficult to process if the data is stored in a scattered manner. Specifically, referring to fig. 2, the data source table has three rows of data, and to integrate the data source table into the HBase data integration model table, each row of data needs to be spliced into a corresponding JSON character string, that is, a primary key: 01, warranty number: 001, type: a, amount: 100}, { primary key: 02, warranty number: 001, type: b, amount: 100}, { primary key: 03, warranty number: 001, type: c, amount: 100, and stored in the corresponding column of the HBase data integration model table, the three rows of data of the data source table become three columns of data.

Referring to fig. 3, a schematic diagram of an insurance industry data integration method based on HBase according to an embodiment of the present disclosure is shown, in the insurance industry, there are data source tables of multiple topics, and the data source tables are, for example: the policy table, the applicant table, the claim settlement table, the security table, the charging table and the like are all established by taking a policy number as a center, the policy numbers are stored in the data source tables, each row of records of the data source tables needs to be spliced into JSON character strings respectively, then the JSON character strings spliced by each row of records are stored in each field of the HBase data integration model table of the open source database respectively, and therefore data related to each policy number are placed in each row in the HBase data integration model table respectively.

After the source data table is integrated according to the HBase data integration model, a policy detail table is formed, all data of the policy are stored in one large-width table, and data are aggregated according to the table and the rows.

As can be seen, in the insurance industry data integration method based on HBase provided in the embodiments of the present specification, the policy number after reverse order or hash processing is used as the row key rowkey of the table of the HBase data integration model, so that data can be uniformly distributed; the characteristics of insurance industry data are combined, the technical advantages of HBase are utilized, and the problems that data updating is difficult, performance consumption is high in the data integration process, and the field expansion cost of a data integration scheme is high are solved; the policy number after the reverse order or hash processing is used as a row key, and all the data related to the policy number is stored in a row, so that the integrated data is very suitable for being used as a detail layer integrated in a data warehouse, the updating is convenient, the data detail can be kept, and the loss of the detail data is avoided.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, corresponding to any embodiment method, one or more embodiments of the present specification further provide an insurance industry data integration device based on HBase. Referring to fig. 4, the HBase-based insurance industry data integration apparatus includes:

a determining module 401 configured to determine a row key of an open source database HBase data integration model table according to a policy number in at least one data source table to be integrated;

a setting module 402 configured to set a column name of each column of the HBase data integration model table according to a table name of the at least one data source table;

a splicing and storing module 403 configured to splice fields of each row of data of the at least one data source table into a character string; and

As an optional embodiment, the determining module 401 is specifically configured to perform reverse order processing on the policy number; and taking the policy number processed in the reverse order as a row key of the HBase data integration model table.

As an optional embodiment, the reverse order processing on the policy number is specifically configured to write a character string corresponding to the policy number into the HBase data integration model table in a reverse order, so that the serial number in the policy number is before.

As an optional embodiment, the determining module 401 is specifically configured to perform hash processing on the policy number; and taking the policy number after the hash processing as a row key of the HBase data integration model table.

As an alternative embodiment, the setting module 402 is specifically configured to set the column name of each column in the HBase data integration model table as the table name of the at least one data source table plus the primary key value of each row of the data source table.

As an optional embodiment, the splicing the fields of each row of data of the at least one data source table into a string is specifically configured to splice each row of data of the data source table into a JSON string.

As an alternative embodiment, the HBase data integration model table is a wide table.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus in the foregoing embodiment is used to implement the corresponding method for integrating insurance industry data based on HBase in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above embodiments, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the HBase-based insurance industry data integration method according to any of the above embodiments is implemented.

Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device in the above embodiment is used to implement the corresponding HBase-based insurance industry data integration method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above-described embodiment methods, one or more embodiments of the present specification further provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the HBase-based insurance industry data integration method according to any of the above-described embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the HBase-based insurance industry data integration method according to any of the above embodiments, and have the beneficial effects of corresponding method embodiments, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An insurance industry data integration method based on HBase comprises the following steps:

2. The method according to claim 1, wherein the determining the row key of the HBase data integration model table according to the policy number in the at least one data source table to be integrated comprises:

carrying out reverse order processing on the policy number; and

and taking the policy number processed in the reverse order as a row key of the HBase data integration model table.

3. The method of claim 2, wherein said reverse ordering the policy number comprises: and writing the character string corresponding to the policy number into the HBase data integration model table in a reverse order so that the serial number in the policy number is in front.

4. The method according to claim 1, wherein the determining the row key of the HBase data integration model table according to the policy number in the at least one data source table to be integrated comprises:

performing hash processing on the policy number; and

and taking the policy number after the hash processing as a row key of the HBase data integration model table.

5. The method according to claim 1, wherein the setting of the column name of each column of the HBase data integration model table according to the table name of the at least one data source table comprises: and setting the column name of each column in the HBase data integration model table as the table name of the at least one data source table plus the primary key value of each row of the data source table.

6. The method of claim 1, wherein the splicing fields of each row of data of the at least one data source table into a string comprises: and respectively splicing each row of data of the data source table into a JSON character string.

7. The method according to claim 1, wherein the HBase data integration model table is a wide table.

8. An insurance industry data integration device based on HBase, includes:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to perform the method of any one of claims 1 to 7.