CN102158531A - Distributed transmission method for query data stream - Google Patents

Distributed transmission method for query data stream Download PDF

Info

Publication number
CN102158531A
CN102158531A CN2011100341229A CN201110034122A CN102158531A CN 102158531 A CN102158531 A CN 102158531A CN 2011100341229 A CN2011100341229 A CN 2011100341229A CN 201110034122 A CN201110034122 A CN 201110034122A CN 102158531 A CN102158531 A CN 102158531A
Authority
CN
China
Prior art keywords
inquiry
node
data
query
network node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100341229A
Other languages
Chinese (zh)
Inventor
陈立军
汪罕
卢阳
王潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2011100341229A priority Critical patent/CN102158531A/en
Publication of CN102158531A publication Critical patent/CN102158531A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed transmission method for a query data stream, which comprises the following steps of: 1) storing the data stream passing through each network node or the latest query result to form a view; 2) when a new query is generated in the network node, adopting the stored data stream passing through the network node or the latest query result in the transmission path as a new query data source, and adopting the network node including the new query data with least cost as a query execution node; and 3) querying the query data stream from a query generation node to the query execution node. The application of the method can realize shared transmission among a plurality of data streams as much as possible during distributed data stream query processing so that the data transmission quantity in the network is reduced.

Description

A kind of distributed query data flow transmission method
Technical field
The present invention relates to be used for a kind of method that can effectively reduce the communication data amount when on network, carrying out the distributed real-time transmit flow data.
Background technology
Present distributed traffic query processing all is a best placement location of considering the operator of each inquiry separately, so that reduce the data communication amount, does not consider the sharing problem between each bar data flow.In actual conditions, have many data flow between any two network nodes and exist simultaneously, exist overlapping between them or inclusion relation.If these data flow are independent transmission separately, obviously can increase the data volume of Network Transmission.And at present typical data flow is shared method StreamGlobe, has just considered the situation that comprises between the inquiry, still can not make full use of the lap between the data flow, and not possess the ability of dynamic adjustment.
Summary of the invention
In order to overcome the excessive shortcoming of redundancy communication amount that the distributed traffic independent transmission is caused in the existing network, the present invention proposes a kind of data flow and shares method, this method can fully be found distributed traffic comprising and overlapping relation each other, thereby reduces the traffic of transmitting in network.
Notebook data stream is shared the principle that transmission method adopted: inquiry is placed on to handle on the nearest network node of data source obtains Query Result in the hope of as early as possible data flow being processed, reduce the Network Transmission amount of information, and then reduce Network Transmission and postpone.Consider that simultaneously network node disposal ability and storage capacity near data source all are limited, these joint behaviors are descended if all inquiries all are placed on.Simultaneously, preserve the result of nearest inquiry on each node, be stored as Materialized View.When a new inquiry arrives, more or less exist some views on each node in the network.These views are that the data flow by this node of flowing through forms or formed by the query processing result on this node.May there be inclusion relation between to inquiry newly and the already present view.The data source of these views can be utilized so, and original data source needn't be leaveed no choice but use as new inquiry.Like this, not only can alleviate, can also reduce the amount of information and the network delay of Network Transmission near the load on the initial data source node.When a plurality of inquiries then simultaneously, consider the publicly-owned part of a plurality of Query Results, will have part and independently consider the data flow transmission path, reduce total cost of transmission and delay.For inquiry,, provide dynamic adjusting method and optimize query configuration in the network to optimize information transmitted amount and network delay in the network according to dynamic adjustment principle through submitting to.
A kind of distributed query data flow transmission method, its step comprises:
1) preserves the data flow of this node of flowing through or the result of nearest inquiry on each network node;
When 2) generating new the inquiry on the network node, utilize the data source of the result of the data flow of this node of flowing through that the delivering path upper network node preserves or nearest inquiry as new inquiry, with the network node that comprises this new data query of cost minimum as the inquiry XM;
3) data query stream generates node from inquiry and is transferred to the inquiry XM and inquires about.
The data flow of node or the result of nearest inquiry of flowing through is stored as view.
Described step 2) judge that the method that comprises new inquiry is: with formation of the input of the network node on the delivering path of inquiry, select a network node, each the semantic section that sends the view buffer memory for this node merges; Judge whether the semantic section after merging comprises inquiry.
Described step 2) adopt to generate the plan function calculation and generate delay between the node as the network node of data source and new inquiry, the network node of cost minimum is as the inquiry XM.
Described step 3) adopts dynamic adjusting method that the inquiry on inquiry XM and the inquiry generation node transmission path is adjusted, and concrete steps are:
1) carrying out query node is kept in the array to the node between the generated query node.
2) to each node in the array, check the inquiry q in the query caching, if inquiry q was not adjusted and quilt is newly comprised to inquiry, it is labeled as adjusted;
3) the node k of the result data flow point fork of a certain node on the result data stream that finds newly inquiry and the transmission path is placed into node k with inquiry and goes up and carry out, and deletes q from the query caching of node n.
4) revise the result data stream of the data source of q for inquiry Q.
Each inquiry q on the XM to inquiry Q common data source is arranged with new, and new income merges greater than the inquiry of financial value originally.
If income is carried out query decomposition less than the value of original income, find the new result data flow point knuckle point k that arrives inquiry Q and merge inquiry q ' earlier, interpolation inquiry Q and q ' carry out on node k.
At the node that uses queue stores, the node of only joining the team at every turn.
Network node carries out buffer memory when receiving data, if the cache size deficiency, according to the time as standard, the data that are buffered the earliest are released.
When discharging data in buffer, at first judge between the view whether can merge, and judge cost and the income that merges on the space of bringing, if the cost that income-cost>newly-built view needs merges the view of buffer memory.
When judging transmission path, the father node of selecting by the same way to be positioned at current node is sent to this next node as next node with new nodal information, and according to the reverse manner of transmitting data, data transfer request is transmitted to a lower node, until being sent to root node.
Be combined view, take to write down the territory of selecting operator, determine a view by the union of selecting operator, when merging,, directly use the dijskstra algorithm to seek shortest path and to root node, go to obtain if data volume is less, if data volume is big, is lower than income and does not just merge.
After having carried out inquiry, need the buffer memory on each node be upgraded, there is new inquiry to come then again, on new buffered results, move algorithm.
The invention has the beneficial effects as follows, in the distributed traffic query processing,, reduced the volume of transmitted data in the network by between a plurality of data flow, realizing sharing transmission as much as possible.
Description of drawings
Fig. 1 is that data flow of the present invention is shared transmission method tree network structure chart.
Fig. 2 is the structure chart that data flow of the present invention is shared single network node in the transmission method.
Fig. 3 is based on and inquires about the flow chart that the data flow that comprises is shared transmission method.
Fig. 4 is that data flow of the present invention is shared the transmission cost optimization design sketch of transmission method to a plurality of inquiries.
Fig. 5 is the flow chart of dynamic adjusting method.
Fig. 6 is based on the flow chart of the shared method of data flow of inquiry merging.
Wherein: the inquiry Q that 1-newly arrives, 2-data flow, 3-root node, 4-inquiry and registration node, 5-inquires about XM, and 6-receives view buffer memory (RVC), 7-query caching (QC), 8-remaining space, 9-sends view buffer memory (SVC), the 10-network node, and the 11-metadata, 12-is data cached.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
As shown in Figure 1, tree structure figure in the distributed data continuous query, when distributed query, its network node structure as shown in the figure, if generate a new inquiry 1 at node 4, node 4 is register node of the inquiry of newly arriving, in order to transmit the convenience of data, inquiry is placed on the nearest network node 5 of data source inquires about.
As shown in Figure 2, be the structure chart of single network node among the network node structure figure, what store initial data is root node, and remaining node is a child node.Wherein, network node 10 comprises and comprises remaining space 8 in metadata 11 and data cached 12, data cached 12 and send view buffer memory (SVC) 9, sends view buffer memory (SVC) 9 and comprise reception view buffer memory (RVC) 6, query caching (QC) 7 and remaining space 8.
As shown in Figure 3, share the method that the method inquiry comprises for data flow of the present invention: Q is an inquiry of newly arriving, and need arrange node for it and handle; V is the register node (each inquiry all has the register node of oneself, and newly to inquiry meeting is on its register node) of inquiry Q, and the Query Result of Q need return to node v.That export is the node p that inquiry Q is performed.
1. at first node v is pressed among the Queue of node queue that needs to carry out, cost is initialized as infinity with cost, and flag is used for showing the placement node that whether has found inquiry Q, is initialized as false.
2. it is not empty working as formation, and method is carried out following circulation:
1) from node queue according to the principle of first in first out, the node n that enters the earliest ejection of falling out.
2) for each the semantic section q of SVC among the node n, q is merged.
3) judge whether the q after merging comprises Q.
4) if the q after merging comprises Q,, be saved on the cost1 with the delay of predefined transmission cost function calculation node n to node v.
5) if cost1 is littler than cost, illustrate that inquiry is placed into node n than good in the original plan, be stored in respectively among the node p and cost of storage final result placing node and postponing cost, the flag assignment is true.When the father node of node n is not root node, the father node of node n is pressed into formation Queue.
3. do not find the operable data source of Q after jumping out circulation, with the XM of n as Q, this moment, n was the lower level node of root node, used the data flow of root node to carry out Q as data source.
4. upgrade the semantic section (data cached when carrying out, just upgrade) of each node from node n to node v according to transmission path.
5. last return node p.
As shown in Figure 4, shown the transmission cost optimization effect of the shared transmission method of data flow of the present invention to a plurality of inquiries.Carry out the path of query transmission from both sides, after optimizing, transmit to both sides through intermediate transmission path.
As shown in Figure 5, be the schematic diagram of dynamic adjusting method, the inquiry on the transmission path dynamically adjusted:
Q is the inquiry of newly arriving, and need arrange XM for it; V is the register node of inquiry Q.The output node p of method is the node that inquiry Q is placed.
1. at first use the data flow that comprises based on inquiry to share the placement node p that method finds inquiry Q.
2. next will carry out dynamically whole to node p inquiry of node to this paths of node v.Node p is kept among the array array to the node between the node v.
3. to each node among the array, check the inquiry q in its query caching.Comprised if inquiry q is not adjusted and inquired about Q, and just its position was carried out and adjust.At first it is labeled as and adjusted, in order to avoid repeat later on to adjust.
4. find the result data stream of Q and the node k of the result data flow point fork of q with the diverge function.To inquire about q and be placed into upward execution of node k, from the query caching of node n, delete q.
5. revise the result data stream of the data source of q for inquiry Q.Because variation has taken place the executing location of q, may make some use the q Query Result to be affected as the inquiry of data source.But because inquiry Q comprises inquiry q, the result data stream of Q also can be used as data source and uses for those affected inquiries, revises the data source of the inquiry that was adjusted, the result data source of being revised as Q.
6. at last, return the XM p of Q.
As shown in Figure 6, share the register node that the method that inquiry merges in the method: v is inquiry Q for data flow of the present invention, the Query Result of Q need return to node v.That method is exported is the node p that inquiry Q is performed.
1. at first using the shared method of the data flow that contains based on inquiry packet finds the placement node of inquiry Q that it is kept among the p.
2. next on the decision node p whether the inquiry that can merge is arranged.Initialization income c is 0, and the inquiry q ' of merging is empty.Go up each the inquiry q that carries out for p, judge whether inquiry Q and inquiry q satisfy the merging condition, and promptly they have common data source, calculate income according to income formula (original cost deducts the cost after the merging) then.
3. if new income greater than original value, then is kept at new income among the c, the inquiry that merge is kept among the q '.Work as q ' not for empty, illustrating has the inquiry that can merge on node p.
4. will inquire about Q and inquire about q ' with the mergeQ function and be merged into a new inquiry Q '.Deletion Q and q ' will newly inquire about Q ' and add in the query caching of node p.
5. next, handle query decomposition.Find the result data flow point knuckle point k of Q and q ' earlier, on this node, add inquiry Q and q ' execution.Return the XM p of inquiry Q at last.
When it should be noted that the inquiry after the inquiry q ' that will merge has been a merging, query decomposition need decide according to each its node of flowing through of inquiry that participates in merging.

Claims (10)

1. distributed query data flow transmission method, its step comprises:
1) preserves the data flow of this node of flowing through or the result of nearest inquiry on each network node;
When 2) generating new the inquiry on the network node, utilize the data source of the result of the data flow of this node of flowing through that the delivering path upper network node preserves or nearest inquiry as new inquiry, with the network node that comprises this new data query of cost minimum as the inquiry XM;
3) data query stream generates node from inquiry and is transferred to the inquiry XM and inquires about.
2. the method for claim 1 is characterized in that, the result of the data flow of the node of flowing through or nearest inquiry is stored as view.
3. the method for claim 1, it is characterized in that, described step 2) judge that the method that comprises new inquiry is: with formation of the input of the network node on the delivering path of inquiry, select a network node, each the semantic section that sends query caching for this node merges; Judge whether the semantic section after merging comprises inquiry.
4. the method for claim 1 is characterized in that, described step 2) adopt to generate the plan function calculation and generate delay between the node as the network node of data source and new inquiry, the network node of cost minimum is as the network node of carrying out inquiry.
5. the method for claim 1 is characterized in that, described step 3) adopts dynamic adjusting method that the data query stream on inquiry XM and the inquiry generation node transmission path is adjusted, and concrete steps are:
1) the inquiry XM is kept in the array to the node that inquiry Q generates between the node;
2) to each node in the array, check the inquiry q in the query caching, if inquiry q was not adjusted and quilt is newly comprised to inquiry, it is labeled as adjusted;
3) find newly the node k of the result data flow point fork of a certain node on data query stream and the transmission path, inquiry is placed into node k goes up and carry out, from the query caching of node n, delete q;
4) revise the result data stream of the data source of q for inquiry Q.
6. the method for claim 1 is characterized in that, also comprises an inquiry combining step:
Each inquiry q on the XM to inquiry Q common data source is arranged with new, and new income merges greater than the inquiry of financial value originally.
7. method as claimed in claim 5 is characterized in that, if income is carried out query decomposition less than the value of original income, finds the new result data flow point knuckle point k that arrives inquiry Q and merge inquiry q ' earlier, and interpolation inquiry Q and q ' carry out on node k.
8. method as claimed in claim 3 is characterized in that, uses the node of queue stores, the node of at every turn only joining the team.
9. the method for claim 1 is characterized in that, network node carries out buffer memory when receiving data, if the cache size deficiency, according to the time as standard, the data that are buffered the earliest are released.
10. method as claimed in claim 9 is characterized in that, when discharging data in buffer, if can merge between the data in buffer, and the cost that income-cost>newly-built data in buffer needs, data in buffer is merged.
CN2011100341229A 2010-02-01 2011-01-31 Distributed transmission method for query data stream Pending CN102158531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100341229A CN102158531A (en) 2010-02-01 2011-01-31 Distributed transmission method for query data stream

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201010103453 2010-02-01
CN201010103453.9 2010-02-01
CN2011100341229A CN102158531A (en) 2010-02-01 2011-01-31 Distributed transmission method for query data stream

Publications (1)

Publication Number Publication Date
CN102158531A true CN102158531A (en) 2011-08-17

Family

ID=44439708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100341229A Pending CN102158531A (en) 2010-02-01 2011-01-31 Distributed transmission method for query data stream

Country Status (1)

Country Link
CN (1) CN102158531A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN102737134A (en) * 2012-06-29 2012-10-17 电子科技大学 Query processing method being suitable for large-scale real-time data stream
CN103177130A (en) * 2013-04-25 2013-06-26 苏州大学 Continuous query method and continuous query system for K-Skyband on distributed data stream
CN103207897A (en) * 2013-03-15 2013-07-17 北京京东世纪贸易有限公司 Distributed storage query system, operation method thereof and operation device
CN103399943A (en) * 2013-08-14 2013-11-20 曙光信息产业(北京)有限公司 Communication method and communication device for parallel query of clustered databases
CN103916478A (en) * 2014-04-11 2014-07-09 华为技术有限公司 Streaming data cube establishing method and device based on distributed system
CN105812202A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Log real time monitoring and early warning method and device employing same
CN106021284A (en) * 2016-04-29 2016-10-12 乐视控股(北京)有限公司 Data query method, data monitoring method and device
CN106897446A (en) * 2017-03-02 2017-06-27 中国农业银行股份有限公司 A kind of data flow method for visualizing and device
CN108881415A (en) * 2018-05-31 2018-11-23 广州亿程交通信息集团有限公司 Distributed big data analysis system in real time
CN110689418A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Bill generation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ABERER, K. ET AL.: "《Infrastructure for Data Processing in Large-Scale Interconnected Sensor Networks》", 《2007 INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT》 *
王潇等: "《网内查询处理中一种基于数据流共享的过滤查询算法》", 《计算机研究与发展》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546247B (en) * 2011-12-29 2014-08-27 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN102737134A (en) * 2012-06-29 2012-10-17 电子科技大学 Query processing method being suitable for large-scale real-time data stream
CN103207897A (en) * 2013-03-15 2013-07-17 北京京东世纪贸易有限公司 Distributed storage query system, operation method thereof and operation device
CN103207897B (en) * 2013-03-15 2016-08-17 北京京东世纪贸易有限公司 A kind of distributed storage inquiry system and operation method thereof and running gear
CN103177130A (en) * 2013-04-25 2013-06-26 苏州大学 Continuous query method and continuous query system for K-Skyband on distributed data stream
CN103177130B (en) * 2013-04-25 2016-03-23 苏州大学 K-Skyband continuous-query method and system on a kind of distributed traffic
CN103399943A (en) * 2013-08-14 2013-11-20 曙光信息产业(北京)有限公司 Communication method and communication device for parallel query of clustered databases
CN103916478B (en) * 2014-04-11 2017-06-06 华为技术有限公司 The method and apparatus that streaming based on distributed system builds data side
CN103916478A (en) * 2014-04-11 2014-07-09 华为技术有限公司 Streaming data cube establishing method and device based on distributed system
US10019505B2 (en) 2014-04-11 2018-07-10 Huawei Technologies Co., Ltd. Method and apparatus for creating data cube in streaming manner based on distributed system
CN105812202A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Log real time monitoring and early warning method and device employing same
CN106021284A (en) * 2016-04-29 2016-10-12 乐视控股(北京)有限公司 Data query method, data monitoring method and device
CN106897446A (en) * 2017-03-02 2017-06-27 中国农业银行股份有限公司 A kind of data flow method for visualizing and device
CN108881415A (en) * 2018-05-31 2018-11-23 广州亿程交通信息集团有限公司 Distributed big data analysis system in real time
CN108881415B (en) * 2018-05-31 2020-11-17 广州亿程交通信息集团有限公司 Distributed real-time big data analysis system
CN110689418A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Bill generation method and device

Similar Documents

Publication Publication Date Title
CN102158531A (en) Distributed transmission method for query data stream
CN102523285B (en) Storage caching method of object-based distributed file system
CN102591970B (en) Distributed key-value query method and query engine system
CN102523279B (en) A kind of distributed file system and focus file access method thereof
Amble et al. Content-aware caching and traffic management in content distribution networks
EP2002343B1 (en) Multi-cache cooperation for response output caching
CN103312624B (en) A kind of Message Queuing Services system and method
CN101662483A (en) Cache system for cloud computing system and method thereof
US20240211491A1 (en) Export data from tables into partitioned folders on an external data lake
CN101499095B (en) Buffer construction method used for data sharing platform
CN105550338A (en) HTML5 application cache based mobile Web cache optimization method
CN101331739A (en) Method and device for transmitting contents of an equity network
CN102779132A (en) Data updating method, system and database server
CN102821113A (en) Cache method and system
CN102263822B (en) Distributed cache control method, system and device
RU2013104414A (en) METHOD AND DEVICE FOR FAVORABLE JOINT CACHING NETWORK
CN110119405B (en) Distributed parallel database resource management method
CN105022700A (en) Named data network cache management system based on cache space division and content similarity and management method
US20240028592A1 (en) Scalable query processing
CN103905538A (en) Neighbor cooperation cache replacement method in content center network
CN102546674A (en) Directory tree caching system and method based on network storage device
CN101860938B (en) Network node and method for realizing autonomous routing control by sensing network context information
CN102404372A (en) Method, system and node device for storing content in WEB cache in distributed mode
CN102917287A (en) Intelligent optical network exchange device and edge cashing method facing content center
CN109525494A (en) Opportunistic network routing mechanism implementation method based on message next-hop Dynamic Programming

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110817