CN114490738A - External data query method and device based on Flink and computer equipment - Google Patents

External data query method and device based on Flink and computer equipment Download PDF

Info

Publication number
CN114490738A
CN114490738A CN202210015771.2A CN202210015771A CN114490738A CN 114490738 A CN114490738 A CN 114490738A CN 202210015771 A CN202210015771 A CN 202210015771A CN 114490738 A CN114490738 A CN 114490738A
Authority
CN
China
Prior art keywords
interface
flink
parallelism
lookuptable
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210015771.2A
Other languages
Chinese (zh)
Inventor
王鲁宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Daishu Technology Co ltd
Original Assignee
Hangzhou Daishu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Daishu Technology Co ltd filed Critical Hangzhou Daishu Technology Co ltd
Priority to CN202210015771.2A priority Critical patent/CN114490738A/en
Publication of CN114490738A publication Critical patent/CN114490738A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

The invention provides an external data query method, an external data query device and computer equipment based on Flink, wherein the method comprises the following steps: expanding a Java interface of the Flink system, and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option; modifying an SQL compiler arranged in the Flink system; modifying the code of the Flink system connected with the external database; compiling a Flink SQL statement for inquiring an external database, and judging whether the user sets the parallelism of the LookupTable through a first interface or a second interface; and if the user sets the parallelism of the LookupTable, running the Flink SQL statement by using the parallelism of the LookupTable. Therefore, the user can set the parallelism of the LookupTable according to the actual situation, the number of threads of the LookupTable can be controlled, the phenomenon that an external data source cannot bear due to too many threads and the phenomenon that the real-time query performance is reduced due to too few threads can be avoided, and the usability of the system is improved.

Description

External data query method and device based on Flink and computer equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method for inquiring external data based on Flink, a device for inquiring external data based on Flink and computer equipment.
Background
With the development of enterprise digital transformation, the real-time requirement of enterprises on data processing is higher and higher, especially the real-time processing of mass data. Mass data real-time processing belongs to Flink with best function and performance in the current open source code project. After mass data enter the Flink system, the result is calculated in millisecond level. The system can be used in various fields such as real-time recommendation, real-time monitoring, real-time information analysis and the like. In the real-time calculation execution process, a data table of an external data source, such as MySQL, PostgreSQL, etc., needs to be queried. The tables of the external data sources are abstracted into a Lookup Table in a Flink system, and a Lookup tabletable resource is provided in the Flink and can be used for realizing a dimension Table.
In the related technology, only the whole parallelism of FlinkSQL can be specified, but the parallelism cannot be set for the LookupTable by self. Generally one parallelism corresponds to one thread. This results in resource or performance bottlenecks during the query process, or too many threads make external data sources unacceptable; or too few threads degrade real-time computing performance.
Disclosure of Invention
The present invention is directed to solving at least one of the technical problems in the art to some extent. Therefore, a first object of the present invention is to provide an external data query method based on Flink, which enables a user to set the parallelism of the lookup table according to an actual situation, and further, can control the number of threads of the lookup table, and can avoid a phenomenon that an external data source cannot bear due to too many threads and a phenomenon that the real-time query performance is reduced due to too few threads, thereby improving system usability.
The second purpose of the invention is to provide an external data inquiry device based on Flink.
A third object of the invention is to propose a computer device.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides an external data query method based on Flink, including:
expanding a Java interface of the Flink system, and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option;
modifying an SQL compiler built in the Flink system so that the SQL compiler identifies the parallelism setting option;
code for modifying the Flink system to connect to an external database to cause the Flink system to connect to the external database using the first interface and the second interface;
compiling a Flink SQL statement for inquiring the external database, and judging whether the user sets the parallelism of the LookupTable through the first interface or the second interface;
if the user sets the parallelism of the LookupTable, the parallel of the LookupTable is used for operating the Flink SQL statement;
and if the user does not set the parallelism of the LookupTable, acquiring the global parallelism, and operating the SQL statement by using the global parallelism.
In addition, the external data query method based on Flink proposed according to the above embodiment of the present invention may also have the following additional technical features:
according to an embodiment of the present invention, extending the Java interface of the Flink system and adding the parallelism setting option to obtain the first interface and the second interface with the parallelism setting option includes:
calling a TableFuntionProvider interface, an Async TableFunctionProvider interface and a Paralleisi Provider interface of the Flink system;
inheriting the TableFuntionProvider interface and the parallelismProvider interface by using Java standard grammar to obtain the first interface, and adding a parallelism setting option to the first interface;
inheriting the AsyncTableFunctionProvider interface and the ParalleisimProvider interface by using Java standard grammar to obtain the second interface, and adding a parallelism setting option to the second interface;
placing the first interface and the second interface in an org.apache.flink.table. connector.source packet of a Flink-table-common module of the Flink system;
compiling the Flink-table-common module, and replacing a program Jar packet beginning with the Flink-table-common in the Flink lib directory by using the compiled packet.
According to one embodiment of the invention, the first interface is named paralleltablefunction provider and the second interface is named parallelasynctablefunction provider.
According to an embodiment of the present invention, modifying the SQL compiler built in the Flink system to make the SQL compiler recognize the parallelism setting option includes:
modifying a CommonLookupjoin source code of an SQL compiler arranged in the Flink system so that the SQL compiler identifies the parallelism setting option;
obtaining a modified class, and placing the modified class in an org.apache.flink.table. plan.non.common packet of a flink-table-plan-book module;
compiling the flex-table-planer-blink module, and replacing a program Jar packet beginning with the flex-table-planer-blink under the flex lib directory by using the compiled packet.
According to an embodiment of the present invention, the method for querying external data based on Flink further includes:
adding judgment logic into the common Lookupjoin of the source code class;
judging whether a ParallelismProvider interface is realized or not based on the judgment logic, and reading and using the parallelism of the LookupTable if the ParallelismProvider interface is realized;
if the ParallelismProvider interface is not implemented, the original code is executed.
According to an embodiment of the present invention, modifying the code of the Flink system connecting to the external database to make the Flink system connect to the external database using the first interface and the second interface includes:
modifying source codes of any flink-conductors to enable the source codes of the flink-conductors to use the first interface and the second interface;
adding a parallelism preset option in the flink-conductors source code, and setting the parallelism preset option to the parallelism setting option of the first interface or the second interface.
According to an embodiment of the present invention, the parallelism of the LookupTable is set by the user by setting the value of the parallelism setting option.
In order to achieve the above object, a second embodiment of the present invention provides an external data query device based on Flink, including:
the expansion module is used for expanding Java interfaces of the Flink system and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option;
the first modification module is used for modifying an SQL compiler built in the Flink system so as to enable the SQL compiler to identify the parallelism setting option;
a second modification module, configured to modify code of the Flink system connecting to an external database, so that the Flink system connects to the external database using the first interface and the second interface;
the compiling module is used for compiling a Flink SQL statement for inquiring the external database and judging whether the parallelism of the LookupTable is set by the user through the first interface or the second interface;
the first running module is used for running the Flink SQL statement by using the parallelism of the LookupTable if the parallelism of the LookupTable is set by a user;
and the second operation module is used for acquiring the global parallelism and operating the SQL statement by using the global parallelism if the user does not set the parallelism of the LookupTable.
In order to achieve the above object, an embodiment of a third aspect of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for querying external data based on Flink according to the embodiment of the first aspect of the present invention is implemented.
According to the technical scheme, a Java interface of a Flink system is expanded, a parallelism setting option is added, and a SQL compiler arranged in the Flink system and a code of a Flink system connected with an external database are modified, so that a user can define the parallelism of the LookupTable through the expanded new interface, and can adjust the parallelism of the LookupTable according to specific requirements, and further, when the external database is queried, query statements can be operated through the parallelism of the LookupTable set by the user, and the query of the external database is realized. Therefore, the user can set the parallelism of the LookupTable according to the actual situation, the number of threads of the LookupTable can be controlled, the phenomenon that an external data source cannot bear due to too many threads and the phenomenon that the real-time query performance is reduced due to too few threads can be avoided, and the usability of the system is improved.
Drawings
Fig. 1 is a flowchart of an external data query method based on Flink according to an embodiment of the present invention.
FIG. 2 is a logic diagram of querying external data, in accordance with one embodiment of the present invention.
Fig. 3 is a schematic diagram of an expanded interface structure according to an embodiment of the present invention.
Fig. 4 is a block diagram of an external data query device based on Flink according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an external data query method based on Flink according to an embodiment of the present invention.
As shown in fig. 1, the method comprises the steps of:
and S1, expanding the Java interface of the Flink system, and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option.
The parallelism setting option is an option for customizing the parallelism of the Lookup Table, the parallelism is a concept defined in a flash system, the parallelism setting option is essentially a thread in a corresponding computer operating system, and 5 threads of 5 operating systems are started to run Lookup Table codes when the parallelism is set to be 5.
Specifically, the original Java interfaces of the Flink system, such as the TableFuntionProvider interface and the asynctablefunctionoprovider interface, may be expanded, the original interfaces do not have a parallelism setting option, the parallelism setting option is added after the expansion, and the first interface having the parallelism setting option and the second interface having the parallelism setting option are obtained by the expansion, that is, the parallelism setting function is provided.
S2, modifying the SQL compiler built in the Flink system so that the SQL compiler identifies the parallelism setting option.
Specifically, the SQL compiler built in the Flink system may be modified according to the Flink platform compiler standard of Flink, in which there is a translateto platform internal method of commonlookuppjoin class, which does not recognize the parallelism option itself, and the code of this method is extended in step S2 to recognize the parallelism setting options of the first interface and the second interface.
S3, modifying the code of the Flink system for connecting the external database so that the Flink system uses the first interface and the second interface for connecting the external database.
Specifically, the code of the external database connected with the Flink may be modified according to the Flink Table API Connector development standard, so that the external database is connected with the Flink through the first interface and the second interface.
S4, writing a Flink SQL statement for inquiring an external database, and judging whether the user sets the parallelism of the LookupTable through the first interface or the second interface.
S5, if the user sets the parallelism of the LookupTable, the parallel of the LookupTable is used for running the Flink SQL statement.
S6, if the user does not set the parallelism of the LookupTable, acquiring the global parallelism, and using the global parallelism to run the SQL statement.
The global parallelism can be the overall parallelism of the Flink SQL and is set by a user in a self-defined way.
Specifically, the user may customize the parallelism of the LookupTable through the first interface or the second interface, and after writing the SQL statement for querying the external database, as shown in fig. 2, determine whether the user sets the parallelism of the LookupTable, and if so, use the set value, that is, the parallelism of the LookupTable to run the SQL statement; otherwise, the SQL statement is operated by using the global parallelism (global value) defined by the user, and the data of the external database can be inquired after the operation is finished.
Compared with the method for inquiring the external database only by adopting the global parallelism in the related technology, the embodiment of the invention expands the Java interface of the Flink system on the basis of the method for keeping the global parallelism, adds the parallelism setting option, and modifies the SQL compiler arranged in the Flink system and the code of the Flink system connected with the external database, so that the user can define the parallelism of the LookupTable by the expanded new interface, and can adjust the parallelism of the LookupTable according to the specific requirement, and further, when inquiring the external database, the inquiry statement can be operated by the parallelism of the LookupTable set by the user, and the inquiry of the external database is realized. Therefore, compared with the prior art, the embodiment of the invention can prevent resource or performance bottleneck from occurring in the query process, and the user can set the parallelism of the LookupTable according to specific conditions, thereby avoiding the phenomenon that an external data source cannot bear due to too many threads and the phenomenon that the real-time query performance and the real-time computing performance are reduced due to too few threads.
Therefore, according to the external data query method based on the Flink, the user can set the parallelism of the LookupTable according to the actual situation, so that the number of threads of the LookupTable can be controlled, the phenomenon that an external data source cannot bear due to too many threads and the phenomenon that the real-time query performance is reduced due to too few threads can be avoided, and the usability of the system is improved.
In an embodiment of the present invention, the step S1 may include the following steps S11 to S15.
S11, calling the TableFuntionProvider interface, the Async TableFunctionProvider interface and the ParalleisimProvider interface of the Flink system.
Wherein, the TableFunctionProvider Interface, the AsyncTableFunctionProvider Interface and the ParallelismProvider Interface are Java interfaces (interfaces) already in the Flink source code. The TableFunctionProvider interface and the AsyncTableFunctionProvider interface are used to create a table in Flink, and the ParallelismProvider interface is used to set parallelism.
S12, inheriting the TableFuntionProvider interface and the parallelismProvider interface by using Java standard syntax to obtain a first interface, and adding a parallelism setting option to the first interface.
Wherein, the name of the first interface is paralleltablefunction provider.
Specifically, in the related Flink technology, only two interfaces, namely, a tablefuntionanprovider interface and an AsyncTableFunctionProvider interface, are provided, and the two interfaces cannot set the parallelism degree. As shown in fig. 3, in the embodiment of the present invention, the original tablee funtionanprovider interface and ParallelismProvider interface of Flink are inherited by using a Java standard syntax, and a parallelism function is added, so that the inherited sub-interface (i.e., the first interface) may be named paralleltable functionprovider.
The parallelfunction provider interface is a device integrating the parallelfunction provider interface and the ParallelismProvider interface, so that a table can be created and the parallelism can be set.
S13, inheriting the AsyncTableFunctionProvider interface and the parallelismProvider interface by using Java standard syntax to obtain a second interface, and adding a parallelism setting option to the second interface.
Wherein, the second interface is named as parallelasyncttablefunction provider.
Specifically, referring to fig. 3, the original async table function provider interface and ParallelismProvider interface of Flink may be inherited using Java standard syntax, a parallelism function is added, and the inherited sub-interface (i.e., the second interface) may be named parallell table function provider.
The parallelfunction provider interface is a device integrating the parallelfunction provider interface and the ParallelismProvider interface, so that a table can be created and the parallelism can be set.
It should be noted that the names of the first interface and the second interface are not limited to those provided in the embodiments of the present invention, and may also be defined as other names that do not affect the operation of the system.
S14, placing the first interface and the second interface in the org.ap.f. flex.table.connector.source packet of the flex-table-common module of the flex.
The Flink-Table-common module of the Flink is a Maven module of a Flink source code, and all public codes related to the Flink Table module are contained in the module. The source packet is a Flank source code packet, and the content in the source packet is an official standard interface for inquiring external system data by a Flank system.
S15, compiling the Flink-table-common module, and replacing the program Jar packet beginning with the Flink-table-common in the Flink lib directory with the compiled packet.
Specifically, the new first interface parallelfunction provider and the second interface parallelasynacttablefunction provider are placed in the org. And then compiling the Flink-table-common module to obtain a compiled package, wherein the compiled package can be loaded by the system only when the compiled file is placed in a Flink lib directory, and then replacing a program Jar package at the beginning of the Flink-table-common in the Flink lib directory with the compiled package. The Java program is compiled into a Jar package, and the modified code of the embodiment of the invention can be ensured to be effective only by replacing the original Jar package of the Flink lib directory after the code is changed.
Through the above steps S11 to S15, the setting of the paralleltablefunction provider interface and the parallelasynctablefunction provider interface is realized, and the set interface structure can be seen as shown in fig. 3.
In an embodiment of the present invention, the step S2 may include the following steps S21 to S23.
S21, modifying the CommonLookupjoin of the source code of the SQL compiler built in the Flink system so that the SQL compiler can identify the parallelism setting option.
The common lookuppejoin is part of the compiler of the Flink Blink planer standard, and the code in this class is executed as long as the lookuppetable is used.
S22, obtaining the modified class, and placing the modified class in an org.apache.flink.table. plane.n.nodes.common packet of the flink-table-plane-link module.
The flag-table-planer-Blink module is a masen module of a flag source code, all compiler codes of the flag-table-planer standard are in the module, and a packet of the flag-table-planer module includes codes of a part of a flag-Blink-planer standard compiler.
S23, compiling the Flink-table-planer-blink module, and replacing the program Jar packet beginning with the Flink-table-planer-blink under the Flink lib directory with the compiled packet.
Further, the external data query method based on Flink may further include: adding judgment logic into the common Lookupjoin in the source code class; judging whether a ParallelismProvider interface is realized or not based on the judgment logic, and reading and using the parallelism of the LookupTable if the ParallelismProvider interface is realized; if the ParallelismProvider interface is not implemented, the original code is executed. The Java program is compiled into a Jar package, and the modified code can be ensured to be effective only by replacing the original Jar package of the Flink lib directory after the code is modified.
Specifically, the Flink SQL compiler source code is modified to identify the new paralleltablefunction provider interface and parallelasynctablefunction provider interface. A specific source code class is CommonLookupjoin. And adding judgment logic into CommonLookupjoin in the source code class, and reading and using the parallelism if the parallelismProvider interface is realized. If not, the original code is executed. The modified class is then placed in the org.apache.flink.table plane.plane.n.nodes.common packet of the flink-table-plane-blink module. And then compiling the flex-table-planer-blink module to obtain a compiled package, wherein the compiled package can be loaded by the system only when the compiled file is placed in a flex lib directory, and replacing a program Jar package at the beginning of the flex-table-planer-blink in the flex lib directory with the compiled package.
In an embodiment of the present invention, the step S3 may include the following steps S31 and S32.
S31, modifying the source codes of any flink-conductors, so that the source codes of the flink-conductors use the first interface and the second interface.
The modules in which the Flink connects to the external data system (query or write) are all referred to as "Flink-connectors". For example, a link file system, is called "flash-connector-file".
And S32, adding a parallelism preset option in any fin-connectors source code, and setting the parallelism preset option to the parallelism setting option of the first interface or the second interface.
The parallelism preset option refers to an option which is added in the flink-connectors source code and can be used for setting the parallelism, and the parallelism preset option can be named in the embodiment of the invention, for example, the parallelism preset option can be named as lookup-parallelisms.
Further, the parallelism of the lotkuptable in the above-described steps S4 to S6 is set by the user by setting the value of the parallelism setting option.
Specifically, any source code of the flink-connectors may be acquired, and the source code of the flink-connectors may be modified to use a new paralleltablefunction provider interface or parallelasynctablefunction provider interface. The previous steps S1 and S2 have completed the Flink kernel partial modification. The purpose of this step S3 is to make the Flink connector plug-in available to use the new Flink kernel code.
A lookup-parallelism option may be added to any flash-connectors source code, and an option value of the parallelism option is set to the parallelism setting option of the paralleltablefunctional provider interface or the parallelasynctablefunctional provider interface. The effect of this step is to allow the user to use all code modified by embodiments of the present invention by configuring the lookup-parallelisms option.
After the above steps S1 to S3 are performed, steps S4 to S6 are performed, that is: writing a Flink SQL statement for inquiring external data, wherein the parallelism of the LookupTable can be freely controlled by setting a Lookup-parallelism option in the DDL.
It should be noted that the steps S11 to S15, S21 to S23, and S31 and S32 may be implemented based on the Flink 1.12 version source code.
Taking mass users to transmit to a real-time computing engine and inquire user identity information as an example, the implementation of the invention is realized by the following steps:
step 1: and compiling, writing and reading message queues, searching a database and writing out database logic, namely SQL sentences.
Step 2: the global parallelism is set to 5.
And step 3: the part for searching the external database is modified by the Flink source code according to the embodiment of the invention, so that the parallelism can be modified to be 3, and the access pressure on the database is reduced.
And 4, step 4: and submitting the computing task to the Flink computing cluster. The computational logic is executed.
In summary, the external data query method based on the Flink in the embodiment of the present invention expands and modifies the Apache Flink source code, so that the Lookup Table module can define the parallelism by user, and the usability of the whole system is improved.
Fig. 4 is a block diagram of an external data query device based on Flink according to an embodiment of the present invention.
As shown in fig. 4, the external data query device 100 based on Flink includes: an extension module 10, a first modification module 20, a second modification module 30, a authoring module 40, a first execution module 50, and a second execution module 60.
The expansion module 10 is configured to expand a Java interface of the Flink system and add a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option; the first modification module 20 is used for modifying the built-in SQL compiler of the Flink system so that the SQL compiler can identify the parallelism setting option; a second modification module 30, configured to modify a code of the Flink system connecting to the external database, so that the Flink system connects to the external database using the first interface and the second interface; the compiling module 40 is used for compiling a Flink SQL statement for querying an external database and judging whether the user sets the parallelism of the LookupTable through the first interface or the second interface; the first running module 50 is configured to run the Flink SQL statement by using the parallelism of the LookupTable if the parallelism of the LookupTable is set by the user; and a second running module 60, configured to obtain the global parallelism if the user does not set the parallelism of the LookupTable, and run the SQL statement using the global parallelism.
In one embodiment, the expansion module 10 may be specifically configured to: calling a TableFuntionProvider interface, an Async TableFunctionProvider interface and a Paralleisi Provider interface of the Flink system; inheriting the TableFuntionProvider interface and the parallelismProvider interface by using Java standard grammar to obtain the first interface, and adding a parallelism setting option to the first interface; inheriting the AsyncTableFunctionProvider interface and the ParalleisimProvider interface by using Java standard grammar to obtain the second interface, and adding a parallelism setting option to the second interface; placing the first interface and the second interface in an org.apache.flink.table. connector.source packet of a Flink-table-common module of the Flink system; compiling the Flink-table-common module, and replacing a program Jar packet beginning with the Flink-table-common in the Flink lib directory by using the compiled packet.
In one embodiment, the first modification module 20 may be specifically configured to: modifying a source code CommonLookupjoin of an SQL compiler arranged in the Flink system so that the SQL compiler can identify the parallelism setting option; obtaining a modified class, and placing the modified class in an org.apache.flink.table. plan.non.common packet of a flink-table-plan-book module; compiling the flex-table-planer-blink module, and replacing a program Jar packet beginning with the flex-table-planer-blink under the flex lib directory by using the compiled packet.
Further, the external data query device 100 based on Flink may further include: the adding module is used for adding judgment logic into the common Lookupjoin of the source code class; the judging module is used for judging whether a ParallelismProvider interface is realized or not based on the judging logic, and reading and using the parallelism of the ParallelismProvider interface if the ParallelismProvider interface is realized; and the execution module is used for executing the original code if the ParallelismProvider interface is not realized.
In an embodiment, the second modification module 30 may be specifically configured to: modifying the source codes of any flash-connectors to use the first interface or the second interface; adding a parallelism preset option in the source code of any flip-connectors, and setting the parallelism preset option to the parallelism setting option of the first interface or the second interface.
It should be noted that, in the embodiment of the present invention, other specific embodiments of the external data query device based on Flink may refer to the specific embodiments of the external data query method based on Flink, and in order to avoid redundancy, no further description is given here.
The device can make the user set up the degree of parallelism of the LookupTable according to actual conditions, and then can control the thread number of the LookupTable, can avoid leading to the phenomenon that the external data source can't bear because of the thread is too much and the phenomenon that the thread reduces real-time query performance too little to promote system's ease of use.
The invention further provides a computer device corresponding to the embodiment.
The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the method for querying external data based on the Flink according to the above-mentioned embodiment of the invention can be realized.
According to the computer equipment provided by the embodiment of the invention, when the processor executes a computer program, the Java interface of the Flink system is expanded, the parallelism setting option is added, and the SQL compiler arranged in the Flink system and the code of the Flink system connected with the external database are modified, so that a user can define the parallelism of the LookupTable through the expanded new interface. Therefore, the user can set the parallelism of the LookupTable according to the actual situation, the number of threads of the LookupTable can be controlled, the phenomenon that an external data source cannot bear due to too many threads and the phenomenon that the real-time query performance is reduced due to too few threads can be avoided, and the usability of the system is improved.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. An external data query method based on Flink is characterized by comprising the following steps:
expanding a Java interface of the Flink system, and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option;
modifying an SQL compiler built in the Flink system to enable the SQL compiler to identify the parallelism setting option;
code for modifying the Flink system to connect to an external database to cause the Flink system to connect to the external database using the first interface and the second interface;
compiling a Flink SQL statement for inquiring the external database, and judging whether the user sets the parallelism of the LookupTable through the first interface or the second interface;
if the user sets the parallelism of the LookupTable, the parallel of the LookupTable is used for operating the Flink SQL statement;
and if the user does not set the parallelism of the LookupTable, acquiring the global parallelism, and operating the SQL statement by using the global parallelism.
2. The method for external data query based on Flink according to claim 1, wherein the step of extending Java interface of the Flink system and adding the option of setting parallelism to obtain the first interface and the second interface with the option of setting parallelism comprises:
calling a TableFuntionProvider interface, an AsyncTableFunctionprovider interface and a parallelismProvider interface of the Flink system;
inheriting the TableFuntionProvider interface and the parallelismProvider interface by using Java standard grammar to obtain the first interface, and adding a parallelism setting option to the first interface;
inheriting the AsyncTableFunctionProvider interface and the ParalleisimProvider interface by using Java standard grammar to obtain the second interface, and adding a parallelism setting option to the second interface;
placing the first interface and the second interface in an org.apache.flink.table. connector.source packet of a Flink-table-common module of the Flink system;
compiling the Flink-table-common module, and replacing a program Jar packet beginning with the Flink-table-common in the Flink lib directory by using the compiled packet.
3. The Flink-based external data query method of claim 2, wherein the first interface is named paralleltablefunction provider and the second interface is named parallelasynctablefunction provider.
4. The Flink-based external data query method according to claim 1, wherein modifying a SQL compiler built in the Flink system to make the SQL compiler recognize the parallelism setting option comprises:
modifying a CommonLookupjoin source code of an SQL compiler arranged in the Flink system so that the SQL compiler identifies the parallelism setting option;
obtaining a modified class, and placing the modified class in an org.apache.flink.table. plan.non.common packet of a flink-table-plan-book module;
compiling the flex-table-planer-blink module, and replacing a program Jar packet beginning with the flex-table-planer-blink under the flex lib directory by using the compiled packet.
5. The Flink-based external data query method according to claim 4, further comprising:
adding judgment logic into the common Lookupjoin of the source code class;
judging whether a parallelismProvider interface is realized or not based on the judging logic, and reading and using the parallelism of the LookupTable if the parallelismProvider interface is realized;
if the ParallelismProvider interface is not implemented, the original code is executed.
6. The Flink-based external data query method according to claim 1, wherein modifying the code of the Flink system connecting to the external database to make the Flink system connect to the external database using the first interface and the second interface comprises:
modifying source codes of any flink-conductors to enable the source codes of the flink-conductors to use the first interface and the second interface;
adding a parallelism preset option in the flink-conductors source code, and setting the parallelism preset option to the parallelism setting option of the first interface or the second interface.
7. The Flink-based external data query method according to claim 6, wherein the parallelism of the LookupTable is set by the user by setting the value of the parallelism setting option.
8. An external data query device based on Flink, comprising:
the expansion module is used for expanding Java interfaces of the Flink system and adding a parallelism setting option to obtain a first interface and a second interface with the parallelism setting option;
the first modification module is used for modifying an SQL compiler built in the Flink system so as to enable the SQL compiler to identify the parallelism setting option;
a second modification module, configured to modify code of the Flink system connecting to an external database, so that the Flink system connects to the external database using the first interface and the second interface;
the compiling module is used for compiling a Flink SQL statement for inquiring the external database and judging whether the parallelism of the LookupTable is set by the user through the first interface or the second interface;
the first running module is used for running the Flink SQL statement by using the parallelism of the LookupTable if the parallelism of the LookupTable is set by a user;
and the second operation module is used for acquiring the global parallelism and operating the SQL statement by using the global parallelism if the user does not set the parallelism of the LookupTable.
9. Computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the Flink based external data query method according to any of the claims 1-7 when executing the computer program.
CN202210015771.2A 2022-01-07 2022-01-07 External data query method and device based on Flink and computer equipment Pending CN114490738A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210015771.2A CN114490738A (en) 2022-01-07 2022-01-07 External data query method and device based on Flink and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210015771.2A CN114490738A (en) 2022-01-07 2022-01-07 External data query method and device based on Flink and computer equipment

Publications (1)

Publication Number Publication Date
CN114490738A true CN114490738A (en) 2022-05-13

Family

ID=81509550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210015771.2A Pending CN114490738A (en) 2022-01-07 2022-01-07 External data query method and device based on Flink and computer equipment

Country Status (1)

Country Link
CN (1) CN114490738A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760368A (en) * 2022-11-24 2023-03-07 中电金信软件有限公司 Credit business approval method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129713A1 (en) * 2016-11-10 2018-05-10 Sap Se Efficient execution of data stream processing systems on multi-core processors
US20190236194A1 (en) * 2018-01-31 2019-08-01 Splunk Inc. Dynamic query processor for streaming and batch queries
CN112084016A (en) * 2020-07-27 2020-12-15 北京明略软件***有限公司 Flow calculation performance optimization system and method based on flink
CN113535354A (en) * 2021-06-30 2021-10-22 深圳市云网万店电子商务有限公司 Method and device for adjusting parallelism of Flink SQL operator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129713A1 (en) * 2016-11-10 2018-05-10 Sap Se Efficient execution of data stream processing systems on multi-core processors
US20190236194A1 (en) * 2018-01-31 2019-08-01 Splunk Inc. Dynamic query processor for streaming and batch queries
CN112084016A (en) * 2020-07-27 2020-12-15 北京明略软件***有限公司 Flow calculation performance optimization system and method based on flink
CN113535354A (en) * 2021-06-30 2021-10-22 深圳市云网万店电子商务有限公司 Method and device for adjusting parallelism of Flink SQL operator

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760368A (en) * 2022-11-24 2023-03-07 中电金信软件有限公司 Credit business approval method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110704037B (en) Rule engine implementation method and device
US9053134B2 (en) View variants in database schema mapping
EP2831774A1 (en) Method and system for centralized issue tracking
TWI762851B (en) Data verification method, system, device and equipment in blockchain ledger
CN110430255A (en) The processing method of service request, system and electronic equipment in distributed type assemblies
CN114490738A (en) External data query method and device based on Flink and computer equipment
CN110674205B (en) Single table query method, device, terminal and readable storage medium
CN116661943A (en) Pod data volume dynamic mounting method and device under Kubernetes system platform
CN114880013A (en) Method and device for processing configuration information of business process
CN112052048B (en) Data loading method and device, equipment and storage medium
CN110442636B (en) Data reading and writing method and device and data reading and writing equipment
CN109918059B (en) Application function expansion method and device, terminal equipment and storage medium
TWI501152B (en) Method for simplifying interfaces having dynamic libraries
CN113626007B (en) Application method and device of connector model and server
CN116233254A (en) Business cut-off method, device, computer equipment and storage medium
CN112817922B (en) Log dynamic printing method and device, computer equipment and storage medium
CN113268483B (en) Request processing method and device, electronic equipment and storage medium
CN114138777A (en) Database and table dividing method and device, electronic equipment and storage medium
CN113986592A (en) Log recording method and device, terminal equipment and readable storage medium
CN115033551A (en) Database migration method and device, electronic equipment and storage medium
US20110231700A1 (en) Management device, correction candidate output method, and computer product
CN112907198A (en) Service state circulation maintenance method and device and electronic equipment
CN112749189A (en) Data query method and device
CN111581088A (en) Spark-based SQL program debugging method, device, equipment and storage medium
CN112559444A (en) SQL (structured query language) file migration method and device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination