CN105630874A - Array model-based database system - Google Patents

Array model-based database system Download PDF

Info

Publication number
CN105630874A
CN105630874A CN201510952434.6A CN201510952434A CN105630874A CN 105630874 A CN105630874 A CN 105630874A CN 201510952434 A CN201510952434 A CN 201510952434A CN 105630874 A CN105630874 A CN 105630874A
Authority
CN
China
Prior art keywords
task
data
database node
database
subsystem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510952434.6A
Other languages
Chinese (zh)
Inventor
李晖
陈梅
邱能俊
李宏源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Youlian Borui Technology Co Ltd
Guizhou University
Original Assignee
Guizhou Youlian Borui Technology Co Ltd
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Youlian Borui Technology Co Ltd, Guizhou University filed Critical Guizhou Youlian Borui Technology Co Ltd
Priority to CN201510952434.6A priority Critical patent/CN105630874A/en
Publication of CN105630874A publication Critical patent/CN105630874A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Abstract

The invention discloses an array model-based database system. The array model-based database system comprises a processing subsystem (1), wherein the data processing subsystem distributes a task to a data storage subsystem (2); a system monitoring subsystem (3) acquires monitoring data for the data storage subsystem (2) and feeds the monitoring data back to the data processing subsystem (1); the system monitoring subsystem (3) adjusts the task distributed by the data processing subsystem (1) through a distribution method; the data processing subsystem (1) is used for data processing, analysis and conversion; the data storage subsystem (2) is used for supporting the seamless storage from data to an array database; and the system monitoring subsystem (3) is used for monitoring all the database nodes in real time and obtaining a system monitoring log. According to the array model-based database system, the system monitoring subsystem is set and a scientific distribution method is adopted, so that the scientific data analysis and storage demands are satisfied, and the requirements of system adjustment and optimization as well as optimized analysis are adapted without manual intervention.

Description

A kind of Database Systems based on Array Model
Technical field
The present invention relates to a kind of system, particularly a kind of Database Systems based on Array Model.
Background technology
Computer mould fits scientific equipment all can produce the science data of magnanimity every year, and major part science data all exist in the form of an array, and tradition is not well positioned to meet in the storage of data-intensive scientific domain and analysis demand based on the data base of relationship type. Nowadays it is widely used in various field based on the traditional database of relationship type such as SQLServer, but, for the strong noise of science data, the feature of intensive calculations and complicated non-relation schema, traditional relational can not well support storage and the analysis of extensive science data.
Database Systems (such as SQLServer, MySQL, Oracle) based on relational model are now widely used for commercial field. But, the high noisy of science data, the feature of complicated calculations analysis is not appropriate for traditional Relational DataBase. In the process of scientific research, a lot of scientists wish to spend the more time with in science data and scientific analysis result, but not in the study of the analyzing and processing instrument of science data, and the data management system of current business can not meet storage and the analysis demand of science data.
SciDB is a Scientific Database Systems for Scientific data management and the open source code of analysis, and it is mainly by Stonebraker leader's exploitation and the patronage obtaining Paradigm4 company. Its design original intention aims to solve the problem that the problem in science such as data volume is big, data are hereditary in scientific research. With tradition DBMS the difference is that, SciDB can provide large-scale complicated analysis to support for scientific application field, in order to meet its growing demand. It adopts array data model, therefore, it is possible to support that multidimensional science data are analyzed preferably. Feature main for SciDB is as follows. First it is that SciDB can store the data of different editions without coverage property, by time dimension as the mark distinguishing different history array. Additionally, also adopt compression algorithm to save space. Next to that introduce the characteristic of original position data, SciDB defines the data form of oneself, and provides adapter to write interface for conventional external data form. By adapter, user just can not pass through to load data into SciDB engine and be made directly Data Analysis Services. 3rd is name version characteristic: a part for array is performed specific change by user, and it is constant to retain remainder. 4th feature being characterized by meeting the repeatable requirement of data derivation. Often collecting inaccurate data message in the data-gathering process of scientific domain, in order to solve this problem, SciDB also supports that data have error.
Existing SciDB system, it is impossible to dock with the application interface of KVM virtualization technology, it is necessary to artificial intervention, and when overabundance of data, the shortcomings such as task cannot be allocated by data base effectively, causes the waiting time long, and the load time is slow. Therefore, the distribution method of existing SciDB system can not meet the analysis storage demand of science data, it is difficult to adapts to carry out system adjustment and optimization and optimize the needs analyzed.
Summary of the invention
It is an object of the invention to, it is provided that a kind of Database Systems based on Array Model. The present invention, by the distribution method of setting system Monitor And Control Subsystem and the science of employing, meets the analysis storage demand of science data, and then has adapted to system adjustment and optimization and needing further exist for that optimization is analyzed; Also add the application interface of KVM virtualization technology, it is not necessary to too much artificial intervention.
Technical scheme: a kind of Database Systems based on Array Model, including data process subsystem, data process subsystem assigns tasks to data storage subsystem, after data storage subsystem is collected monitoring data by system monitoring subsystem, feeding back to data process subsystem, the distribution task of data process subsystem is adjusted by system monitoring subsystem by distribution method;
Data process subsystem processes for data, analyzes and conversion;
Data storage subsystem is for supporting that data arrive the seamless storage of array database engine;
System monitoring subsystem is used for all database nodes of monitor in real time, and obtains system monitoring daily record.
Aforesaid based in the Database Systems of Array Model, described system monitoring subsystem, including monitor client, monitor client collection monitoring data are sent to monitoring agent end, monitoring agent end will monitor data feedback again to monitoring service end, and the distribution task of data process subsystem is adjusted by monitoring service end by distribution method.
Aforesaid based in the Database Systems of Array Model, described data process subsystem, including coordinator's engine, the first enforcement engine it is provided with in coordinator's engine, coordinator's engine be responsible for being loaded into user the data of database system be loaded into each database node locally stored in, and in metadatabase, update the statistical information of the local data of each database node.
Aforesaid based in the Database Systems of Array Model, described data storage subsystem, it is provided with 1 data above storehouse node, each database node correspondence is provided with monitor client, is additionally provided with the second enforcement engine in database node.
Aforesaid based in the Database Systems of Array Model, the monitoring data that described system monitoring subsystem is collected, including: cpu busy percentage, amount of ram, disk read-write speed, network bandwidth utilization factor, the average latency of task, task quantity and data storehouse node load in the second in past t, wherein, 600��t��7200.
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If the cpu busy percentage on certain database node is more than a%, then this database node is not allowed to participate in the execution of task T and subtask thereof, wherein, 65%��a%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If the amount of ram that task T and subtask thereof are required when running on certain database node, exceed the b% of the free memory amount of this database node, then do not allow this database node participate in the execution of task T and subtask thereof, wherein 60%��b%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If the disk read-write speed on certain database node has exceeded the c% of maximum speed, then this database node is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��c%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If the network bandwidth utilization factor on certain database node has exceeded the d% of most high bandwidth, then this database node is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��d%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If the task in the current pending task queue such as on certain database node, its average latency estimated has exceeded t1Second, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 300��t1�� 7200;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If on certain database node etc. pending task quantity exceeded n1, then this database node is not allowed to participate in the execution of task T and subtask thereof, wherein 10��n1��
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past2In second, its CPU average utilization has exceeded e%, and the current utilization rate of CPU has been over r �� e%, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t2�� 7200,65%��e%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past3In second, its internal memory average utilization has exceeded f%, and the current utilization rate of internal memory has been over r �� f%, and the amount of ram required when running on this database node in task T and subtask thereof exceedes the g% of free memory amount of this database node, this database node is not then allowed to participate in the execution of task T and subtask thereof, wherein, 0.6��r��1, g < f, 300��t3�� 7200,60%��f%��1,75%��g%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past4In second, its disk read-write speed has exceeded the h% of maximum speed, and current disk read-write speed has been over the r �� h% of maximum speed, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t4�� 7200,60%��h%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past5In second, its network bandwidth utilization factor has exceeded the k% of most high bandwidth, and current network bandwidth utilization factor has been over the r �� k% of most high bandwidth, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t5�� 7200,60%��k%��1;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past6In second, the average latency of the medium pending task of its task queue is T1, task quantity is n2, and the average latency of estimating of the task in its current task queue has been over r �� T1, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t6�� 7200,1��n2, T1=t6/n2;
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If certain database node is at the time t in past7In second, the length of its task waiting list is N, and the quantity of the current medium pending task of task queue of this database node has been over r �� N, then do not allow this database node participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t7�� 7200,10��N.
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
For being able to carry out n the database node of task T and subtask thereof, they are designated as CN1,CN2,......CNnIf, at the time t in past8In second, run on CN1��CN2To CNnOn the average latency respectively TCN of task1,TCN2,......TCNnIf, TCNiIt is TCN1,TCN2,......TCNnMiddle value reckling, then allocation database node CNiThe execution of participation task T and subtask thereof, for the database node that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to database node free memory amount, carries out task distribution; Wherein, 300��t8��7200��
Aforesaid based on, in the Database Systems of Array Model, when carrying out task distribution, distributing task according to the monitoring data that system monitoring subsystem is collected according to following distribution method:
If all of database node does not possess the qualification participating in execution task T and subtask thereof, then task T is divided into more subtask, then according at past t9In second, the principle that average latency of the task in task queue the shortest database node is preferential, the database node of distribution correspondence participates in the execution of task T and subtask thereof, for the database node that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to free memory amount, carry out task distribution, wherein, 300��t9��7200��
In order to meet storage and the analysis demand of extensive science data, this paper presents a kind of Database Systems based on Array Model (FASTDB). It is researched and developed based on the SciDB software system towards array, and being one can carry out the Scientific Database Systems that storage is analyzed to extensive science data.
Compared with prior art, the present invention is by system monitoring subsystem built-in in FASTDB system, this subsystem real-time collecting can monitor the data relevant with load state in data-base cluster, including cpu busy percentage, amount of ram, disk read-write speed, network bandwidth utilization factor, average latency and task quantity etc. After these data collected, feed back to data process subsystem, the monitoring data that data process subsystem is collected based on system monitoring subsystem formulate method for allocating tasks, then apply these distribution methods medium for system pending task is assigned to suitable database node to be performed, FASTDB system designs and meets the analysis storage demand of science data, and then has adapted to the tuning of system and optimized needing further exist for of analysis; The present invention adds the application interface of KVM virtualization technology, thus the database node in FASTDB data-base cluster, it is possible to by being deployed in KVM virtual machine node easily, without too much manual intervention.
The FASTDB system of the application and the SkyServer system based on traditional relational have been carried out series of experiments by applicant, and experimental design is as follows with performance evaluation:
Experimental situation
SkyServer is that Sloan Digital Sky Survey data warehouse provides publicly-owned interface, it is allowed to astronomer and masses conduct interviews. Experiment is used the Sloan Digital Sky Survey data set (SDSSDR9) of the 9th version. SDSSDR9 is the sky of Sloan Digital Sky Survey observation 25%, the multicolour photometry data of ten thousand celestial bodies more than 100 got and spectroscopic data, and wherein the size of relational data is approximately 12TB.
What all clustered nodes of experiment adopted is Intel (R) Xeon (R) E5-26202.00GHz dual processors, operating system adopts CentOS6.4 and Windows2008, the internal memory being configured to 40GB of coordinator node and the hard drive space of 1TB, the internal memory being configured to 8GB of each working node (database node) and the hard drive space of 1TB. The storage engines of FASTDB adopts the storage engines of SciDB14.3, SkyServer to adopt MicrosoftSQLServer2008R2. For the repeatability of result, SciDB14.3 and MicrosoftSQLServer2008R2 all adopts default configuration.
Experimental design
At commercial field, traditional data storage management system is based on relevant database mostly, and the data based on SQLServer store and management system is one of which comparative maturity and the system that is widely used. In astronomy field, the data based on SkyServer store and management system is also a kind of system being widely recognized as, and this system has been done certain optimization for the execution of some inquiries conventional in astronomical field and anolytic sentence by computer professional. FASTDB is based on SciDB and realizes storage and the management of astronomical big data, and SkyServer is then based on SQLServer, thus, that chooses SkyServer and SciDB carries out performance contrast experiment. In order to assess the performance of SkyServer and FASTDB in experiment, three kinds of different types of astronomical analysis task are adopted to test. Three kinds of analysis task comprise following 8 query analysis task altogether:
Q1 query analysis is that to find a movement velocity from PhotoObj table be 1336, and number of fields is the celestial body of 11.
Q2 query analysis is to find to meet the delustring brighter than 22 luminosity, local all galaxies more than 0.175 from Galaxy table. Wherein, shading value is more little then more bright, and r is light number, and extinction_r is local delustring.
Q3 query analysis is to find satisfied-0.642788*cx+0.766044*cy from Galaxy table by coordinate>=0 with the galaxy of-0.984808*cx-0.173648*cy<0 range areas.
Q4 query analysis be by white dwarf star and closely satellite from PhotoPrimary table, find cataclysmic variable or precataclysmic variable.
Q5 query analysis is to find quasar from Star table.
Q6 query analysis is to find out the galaxy being mingled in celestial body from Galaxy table, and exports the information such as the luminosity of this galaxy and the objID of celestial body.
Q7 query analysis finds out all celestial bodies having phase advancing coloud nearside in 30 rads from PhotoPrimary table, and wherein, celestial body carrys out identification information by objID.
Q8 query analysis analysis meeting is by merging all galaxies, and exports the sum of galaxy.
In above analysis task, Q1 to Q5 belongs to the astronomical analysis of the first type, and this quasi-sentence can be classified as " SELECT*FROM*WHERE* " form. Q1 finds celestial body by specified conditions, and Q2 finds galaxy by given luminance information etc., and Q3, Q4, Q5 search for whole starry sky in given condition and find celestial body. Q6 belongs to the astronomical analysis task of the second type, and this quasi-sentence can be classified as " SELECT*FROM*JOIN*ON*WHERE* " form. The astronomical analysis task of the third type can be classified as " SELECT*FROM*AS*JOIN*ON*AS*JOIN*ON*WHERE*AND* " form. Q7, Q8 are just belonging to this kind of analysis. Testing additionally, we have randomly drawed 5 kinds of different size of data sets by SDSSDR9, concrete size of data and record number are as shown in table 1.
Table 1
When SkyServer and FASTDB is analyzed by we time, experimental situation adopts the mode of cold start-up and scavenging system buffer memory, in order to avoid experiment is impacted. We extract 80000 records as minimum data set, and 8000000 records are tested as maximum data set.
Experimental result and performance evaluation
Fig. 3 to Figure 10 clearly shows that SkyServer and FASTDB performs the performance comparison result of 8 analysis task on 5 kinds of different pieces of information collection. What abscissa referred to is size of data, and vertical coordinate has represented the time that each analysis task consumes. Running Q6, Q7 and Q8 at FASTDB and the time consumed be far longer than SkyServer, its response time has exceeded the effective time threshold value of regulation simultaneously, and therefore we ignore this part-time interval exceeded. Such as Fig. 8 to Figure 10.
According to interpretation figure, FASTDB performs the response time of five query analysis gained of Q1 to Q5 well below SkyServer system. Take the meansigma methods of total response time, fast 2 orders of magnitude of the performance of the FASTDB performance than SkyServer. Can furthermore, it was found that along with the increase of data set size, the performance of FASTDB be become better and better from figure. Cause that FASTDB performance is so good following two reason, first, FASTDB uses array as first-class citizen, the multi-dimension array model that it is supported disclosure satisfy that the Data Storage Models in major part scientific domain, nowadays in scientific domain, most science data all exist in the form of an array, if its data storage features (i.e. array) can be kept in the moment analyzed, are very beneficial for further excavating its scientific value. It practice, most scientific analysis task all can use linear operation, such as matrix manipulation etc., the maximum of these operations is characterized by based on Array Model. Therefore the data model characteristic of FASTDB well supports the analysis of science data. Secondly, FASTDB data compression and is split the characteristic stored in each working node and also the performance that it is superior is provided support. In FASTDB, compression mechanism can accelerate the speed of query analysis, reduces its response time simultaneously, and its compression mechanism can be that data carry out network transmission with less granularity in addition, and then reduces network overhead.
From the analysis result of second and the third type and Q6, Q7, Q8 statement, both analysis task have a common feature. This is characterized in that these three anolytic sentence all includes at least the anolytic sentence of a join. At present, distributed scientific library can not well support that join operates, because join operation needs to operate data on the worker node of all FASTDB, network packet transmission frequently so can be caused to form the performance bottleneck of network. For a lot of distributed data bases, if it occur that the blocking of network, query analysis task tends not to effectively be performed. Additionally, FASTSDB by data compression, so can bring the decompression expense of a part when storing data to join inquiry, this Section Overhead cannot be avoided, and this itself can cause the FASTDB hydraulic performance decline when performing join query statement. But for other inquiry, the impact of this expense is not as big, because other inquiries Parallel Implementation can decompress this process. Further analyzing, join operation needs to carry out substantial amounts of calculating at coordinator node, along with this situation that increases of data set often causes huge memory cost.
In order to more obviously obtain FASTDB and SkyServer performance, Q5 response time in different system is converted into different units and obtains result as is illustrated by figs. 11 and 12.
From figure, increase along with data set, time overhead needed for FASTDB and SkyServer execution Q5 also increases therewith, but, data set from 20GB increase to 50GB time, both there is an obvious performance turnover, because its data are distributed in all nodes and carry out parallel analyzing and processing when the FASTDB performance change in this interval is less relative to SkyServer system, and the data volume of the result returned does not reach network bottleneck, thus inquiry velocity is fast, FASTDB is when distributing data to each database node simultaneously, also data have been done compression process. therefore its performance improves about 10 to 30 times than SkyServer.
Accompanying drawing explanation
Fig. 1 is the base frame block diagram of the present invention;
Fig. 2 is the system monitoring subsystem structure block diagram of the present invention;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 3 is Q1;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 4 is Q2;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 5 is Q3;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 6 is Q4;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 7 is Q5;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 8 is Q6;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Fig. 9 is Q7;
FASTDB and SkyServer overall performance performance comparison diagram in different pieces of information collection size in the process time that Figure 10 is Q8;
Figure 11 is that SkyServer performs the response time of Q5 with data volume variation diagram;
Figure 12 is that FASTDB performs the response time of Q5 with data volume variation diagram.
Being labeled as in accompanying drawing: 1-data process subsystem, 2-data storage subsystem, 3-system monitoring subsystem, 4-coordinator's engine, 5-metadatabase, 6-monitor client, 7-monitoring agent end, 8-monitors service end, 9-database node, 10-monitors system interface, the front-end interface of 11-sing on web browser, 12-the first enforcement engine, 13-is locally stored, 14-FASTDB DLL, 15-the second enforcement engine, 16-KVM virtual machine.
Detailed description of the invention
Embodiment 1. A kind of Database Systems based on Array Model, including data process subsystem 1, data process subsystem 1 assigns tasks to data storage subsystem 2, after data storage subsystem 2 is collected monitoring data by system monitoring subsystem 3, feeding back to data process subsystem 1, the distribution task of data process subsystem 1 is adjusted by system monitoring subsystem 3 by distribution method;
Data process subsystem 1 processes for data, analyzes and conversion;
Data storage subsystem 2 is for supporting that data arrive the seamless storage of array database engine;
System monitoring subsystem 3 is for all database nodes of monitor in real time, and obtains system monitoring daily record.
Described system monitoring subsystem 3, including monitor client 6, monitor client 6 collection monitoring data are sent to monitoring agent end 7, and monitoring agent end 7 will monitor data feedback again to monitoring service end 8, and the distribution task of data process subsystem 1 is adjusted by monitoring service end 8 by distribution method.
Described data process subsystem 1, including coordinator's engine 4, the first enforcement engine 12 it is provided with in coordinator's engine 4, coordinator's engine 4 is responsible for being loaded into user the data of database system and is loaded in locally stored the 13 of each database node 9, and updates the statistical information of the local data of each database node 9 in metadatabase 5.
Described data storage subsystem 2, is provided with 1 data above storehouse node 9, and each database node 9 correspondence is provided with monitor client 6, is additionally provided with the second enforcement engine 15 in database node 9.
The monitoring data that described system monitoring subsystem 3 is collected, including: cpu busy percentage, amount of ram, disk read-write speed, network bandwidth utilization factor, the average latency of task, task quantity and data storehouse node load in the second in past t, wherein, 600��t��7200.
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If the cpu busy percentage on certain database node 9 is more than a%, then this database node 9 is not allowed to participate in the execution of task T and subtask thereof, wherein, 65%��a%��1; , native system default setting a%=65%, this parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If the amount of ram that task T and subtask thereof are required when running on certain database node 9, exceed the b% of the free memory amount of this database node 9, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein 60%��b%��1; Native system default setting b%=60, this parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If the disk read-write speed on certain database node 9 has exceeded the c% of maximum speed, then this database node 9 is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��c%��1; Native system default setting c%=60%, this parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If the network bandwidth utilization factor on certain database node 9 has exceeded the d% of most high bandwidth, then this database node 9 is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��d%��1; Native system default setting d%=60%, this parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If the task in the current pending task queue such as on certain database node 9, its average latency estimated has exceeded t1Second, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 300��t1�� 7200; This parameter can be adjusted by the system manager of database system according to running situation. The value of the average latency estimated is the database node 9 task average latency within the second in past t;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If on certain database node 9 etc. pending task quantity exceeded n1, then this database node 9 is not allowed to participate in the execution of task T and subtask thereof, wherein 10��n1. Native system default setting n1=10, this parameter condition can be adjusted by the system manager of database system according to running situation.
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past2In second, its CPU average utilization has exceeded e%, and the current utilization rate of CPU has been over r �� e%, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t2�� 7200,65%��e%��1; Native system default setting r=0.6, t2=300, e%=65%, each parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past3In second, its internal memory average utilization has exceeded f%, and the current utilization rate of internal memory has been over r �� f%, and the amount of ram required when running on this database node 9 in task T and subtask thereof exceedes the g% of free memory amount of this database node 9, this database node 9 is not then allowed to participate in the execution of task T and subtask thereof, wherein, 0.6��r��1, g < f, 300��t3�� 7200,60%��f%��1,75%��g%��1; Native system default setting r=0.6, t3=300, f%=65%, g%=75%, each parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past4In second, its disk read-write speed has exceeded the h% of maximum speed, and current disk read-write speed has been over the r �� h% of maximum speed, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t4�� 7200,60%��h%��1; Native system default setting r=0.6, t4=300, h%=60%, each parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past5In second, its network bandwidth utilization factor has exceeded the k% of most high bandwidth, and current network bandwidth utilization factor has been over the r �� k% of most high bandwidth, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t5�� 7200,60%��k%��1; Native system default setting r=0.6, t5=300, k%=60%, each parameter can be adjusted by the system manager of database system according to running situation;
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past6In second, the average latency of the medium pending task of its task queue is T1, and the average latency of estimating of the task in its current task queue has been over r �� T1, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t6�� 7200; Native system default setting r=1.3, t6=300, each parameter can be adjusted by the system manager of database system according to running situation; Assume that current task amount is n2, n2T more than or equal to 1, in this1It is a relative reference time, T1=t6/n2��
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If certain database node 9 is at the time t in past7In second, the length of its task waiting list is N, and the quantity of the current medium pending task of task queue of this database node 9 has been over r �� N, then do not allow this database node 9 participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t7�� 7200,10��N. Native system default setting r=1.3, t7=300, N=10, each parameter can be adjusted by the system manager of database system according to running situation.
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
For being able to carry out n the database node 9 of task T and subtask thereof, they are designated as CN1,CN2,......CNnIf, at the time t in past8In second, run on CN1��CN2To CNnOn the average latency respectively TCN of task1,TCN2,......TCNnIf, TCNiIt is TCN1,TCN2,......TCNnMiddle value reckling, then allocation database node CNiThe execution of participation task T and subtask thereof, for the database node 9 that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to database node 9 free memory amount, carries out task distribution; Wherein, 300��t8�� 7200. Native system default setting t8=300, parameter can be adjusted by the system manager of database system according to running situation.
When carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem 3 is collected according to following distribution method:
If all of database node 9 does not possess the qualification participating in execution task T and subtask thereof, then task T is divided into more subtask, then according at past t9In second, the principle that average latency of the task in task queue the shortest database node is preferential, the database node 9 of distribution correspondence participates in the execution of task T and subtask thereof, for the database node 9 that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to free memory amount, carry out task distribution, wherein, 300��t9�� 7200. Native system default setting t9=300, parameter can be adjusted by the system manager of database system according to running situation.
Embodiment 2.
1. framework
As it is shown in figure 1, FASTDB is a distributed system for extensive science data analysis and management. It adopts without sharing (share-nothing) framework, is made up of three below subsystem: data process subsystem 1, system monitoring subsystem 3 and data storage subsystem 2. Data process subsystem can allow for user and uploads one's own science data and carry out analyzing and processing and data conversion, and system monitoring subsystem can all database nodes 9 of monitor in real time, and obtain a series of system monitoring daily record. Science data of supporting data storage subsystem 2 arrive the seamless storage of SciDB storage engines. Data process subsystem 1 assigns tasks to data storage subsystem 2, and system monitoring subsystem 3 feeds back to data process subsystem 1 after data storage subsystem 2 is carried out performance monitoring collect performance information. A series of self-defining task allocation rule can be specified, in order to the execution of task is better allocated and regulates by data process subsystem 1 based on system monitoring subsystem 3; FASTDB can be deployed in individual data storehouse node 9 or by the cluster environment of tcp/ip communication, its extendible framework can also apply to basis cloud service.
Described data process subsystem 1, including coordinator's engine 4, the first enforcement engine 12 it is provided with in coordinator's engine 4, coordinator's engine 4 is responsible for being loaded into user the data of database system and is loaded in locally stored the 13 of each database node 9, and updates the statistical information of the local data of each database node 9 in metadatabase 5.
FASTDB system uses the SciDB storage carrying out data as rear end storage engines to process. SciDB cluster has multiple database instance, these database instances are distributed on different virtual machines (VM), here the Intel Virtualization Technology adopted is the fully virtualized technology of KVM, refer to that virtual platform simulates complete bottom hardware, including processor, physical memory, clock, peripheral hardware etc. so that the operating system designed for original hardware or other systems soft ware make no modifications just can run in virtual machine completely. The virtual machine adopting fully virtualized technology can be used as virtual platform as physical platform, and the Guest operating system driving being operated on virtual platform just looks like operate on real hardware. The reason selecting KVM is all built-in KVM module of main flow linux kernel, provide simple realization and the continued support to Linux vital task, simultaneously it all the time can direct addressin hardware, it is not necessary to revises virtualized operating system, simplifies the control to virtualization process. FASTDB cluster is based on can by lightweight cloud platform unified management to each KVM virtual machine 16. User can use AQL and AFL language connected by system interface SciDB-J and iquery and operate SciDB data base. After user uploads the private data of oneself by front end, FASTDB, by these part data being carried out pretreatment and being converted to the data form of CSV, finally stores in SciDB storage engines. Hereafter, data process subsystem carries out various analysis by obtaining data from storage subsystem according to the requirement of user and returns result to user. And all of status information of system such as CPU, internal memory, all being collected by Monitor And Control Subsystem and present to user of magnetic disc i/o.
2. system monitoring
In order to improve the tasks carrying performance of FASTDB system, we devise system monitoring subsystem 3. System monitoring is a part critically important in distributed system, and monitoring mechanism can judgement system state whether normal. Additionally, it can also monitor numerous parameters of network and the health of server and integrity. When major part, it is all the premise of system function optimization. Such as, when user passes through can reach when system monitoring subsystem 3 receives warning message to make up to Performance optimization according to monitoring information adjustment analysis task in time.
System monitoring subsystem 3 comprises monitor client 6 and monitoring 8 two assemblies of service end. Each database node 9 of FASTDB system will be installed and configuration monitoring client 6. By SNMP, monitor client 6 can collect various system information. The monitoring information collected can be processed by monitoring service end 8 by predefined rule, simultaneously, it is allowed to user's configuration nearly all affair alarm based on Email. All of monitoring information includes static information and configuration information all by the front-end interface 11 being illustrated in sing on web browser, the front-end interface 11 of sing on web browser is connected with data process subsystem 1 by FASTDB DLL 14, by monitoring system interface 10, user can be apparent from system current state situation. All of control function can be transferred through configuration file and realizes. System monitoring subsystem 3 structured flowchart is as shown in Figure 2.
Considering from the scale of monitored server, autgmentability, maintainability, the framework of monitoring system is designed to monitoring service end 8, monitoring agent end 7 and monitor client 6 framework, and its intermediate layer is a monitoring agent end 7. Monitoring agent end 7 only can accept the configuration information of monitoring service end 8, then timing transfers data to monitoring service end 8, and the distribution task of data process subsystem 1 is adjusted by monitoring service end 8 by distribution method. Monitoring agent end 7 this locality only preserves the data not sent recently. All configurations will carry out in monitoring service end 8.

Claims (19)

1. the Database Systems based on Array Model, it is characterized in that: include data process subsystem (1), data process subsystem (1) assigns tasks to data storage subsystem (2), after data storage subsystem (2) is collected monitoring data by system monitoring subsystem (3), feeding back to data process subsystem (1), the distribution task of data process subsystem (1) is adjusted by system monitoring subsystem (3) by distribution method;
Data process subsystem (1) processes for data, analyzes and conversion;
Data storage subsystem (2) is for supporting that data arrive the seamless storage of array database engine;
System monitoring subsystem (3) is for all database nodes of monitor in real time, and obtains system monitoring daily record.
2. the Database Systems based on Array Model according to claim 1, it is characterized in that: described system monitoring subsystem (3), including monitor client (6), monitor client (6) collection monitoring data are sent to monitoring agent end (7), monitoring agent end (7) will monitor data feedback again to monitoring service end (8), and the distribution task of data process subsystem (1) is adjusted by monitoring service end (8) by distribution method.
3. the Database Systems based on Array Model according to claim 1, it is characterized in that: described data process subsystem (1), including coordinator's engine (4), the first enforcement engine (12) it is provided with in coordinator's engine (4), coordinator's engine (4) is responsible for being loaded into user the data of database system and is loaded in locally stored (13) of each database node (9), and updates the statistical information of the local data of each database node (9) in metadatabase (5).
4. the Database Systems based on Array Model according to claim 1, it is characterized in that: described data storage subsystem (2), it is provided with 1 data above storehouse node (9), each database node (9) correspondence is provided with monitor client (6), is additionally provided with the second enforcement engine (15) in database node (9).
5. the Database Systems based on Array Model according to claim 1, it is characterized in that: the monitoring data that described system monitoring subsystem (3) is collected, including: cpu busy percentage, amount of ram, disk read-write speed, network bandwidth utilization factor, the average latency of task, task quantity and data storehouse node load in the second in past t, wherein, 600��t��7200.
6. the Database Systems based on Array Model according to claim 1 to 5 any claim, it is characterized in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If the cpu busy percentage on certain database node (9) is more than a%, then this database node (9) is not allowed to participate in the execution of task T and subtask thereof, wherein, 65%��a%��1.
7. the Database Systems based on Array Model according to claim 6, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If task T and subtask thereof certain database node (9) is upper run time required amount of ram, exceed the b% of the free memory amount of this database node (9), this database node (9) is not then allowed to participate in the execution of task T and subtask thereof, wherein 60%��b%��1.
8. the Database Systems based on Array Model according to claim 7, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If the disk read-write speed on certain database node (9) has exceeded the c% of maximum speed, then this database node (9) is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��c%��1.
9. the Database Systems based on Array Model according to claim 8, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If the network bandwidth utilization factor on certain database node (9) has exceeded the d% of most high bandwidth, then this database node (9) is not allowed to participate in the execution of task T and subtask thereof, wherein 60%��d%��1.
10. the Database Systems based on Array Model according to claim 9, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If the task in the upper current pending task queue such as of certain database node (9), its average latency estimated has exceeded t1Second, then do not allow this database node (9) participate in the execution of task T and subtask thereof, wherein, 300��t1��7200��
11. the Database Systems based on Array Model according to claim 10, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If on certain database node (9) etc. pending task quantity exceeded n1, then this database node (9) is not allowed to participate in the execution of task T and subtask thereof, wherein 10��n1��
12. the Database Systems based on Array Model according to claim 11, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past2In second, its CPU average utilization has exceeded e%, and the current utilization rate of CPU has been over r �� e%, then do not allow this database node (9) participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t2�� 7200,65%��e%��1.
13. the Database Systems based on Array Model according to claim 12, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past3In second, its internal memory average utilization has exceeded f%, and the current utilization rate of internal memory has been over r �� f%, and task T and subtask thereof this database node (9) is upper run time required amount of ram exceed the g% of free memory amount of this database node (9), this database node (9) is not then allowed to participate in the execution of task T and subtask thereof, wherein, 0.6��r��1, g < f, 300��t3�� 7200,60%��f%��1,75%��g%��1.
14. the Database Systems based on Array Model according to claim 13, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past4In second, its disk read-write speed has exceeded the h% of maximum speed, and current disk read-write speed has been over the r �� h% of maximum speed, this database node (9) is not then allowed to participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t4�� 7200,60%��h%��1.
15. the Database Systems based on Array Model according to claim 14, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past5In second, its network bandwidth utilization factor has exceeded the k% of most high bandwidth, and current network bandwidth utilization factor has been over the r �� k% of most high bandwidth, this database node (9) is not then allowed to participate in the execution of task T and subtask thereof, wherein, 0.6��r��1,300��t5�� 7200,60%��k%��1.
16. the Database Systems based on Array Model according to claim 15, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past6In second, the average latency of the medium pending task of its task queue is T1, task quantity is n2, and the average latency of estimating of the task in its current task queue has been over r �� T1, then do not allow this database node (9) participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t6�� 7200,1��n2, T1=t6/n2��
17. the Database Systems based on Array Model according to claim 16, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If certain database node (9) is at the time t in past7In second, the length of its task waiting list is N, and the quantity of the current medium pending task of task queue of this database node (9) has been over r �� N, this database node (9) is not then allowed to participate in the execution of task T and subtask thereof, wherein, 1.3��r, 300��t7�� 7200,10��N.
18. the Database Systems based on Array Model according to claim 17, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
For being able to carry out n the database node (9) of task T and subtask thereof, they are designated as CN1,CN2,......CNnIf, at the time t in past8In second, run on CN1��CN2To CNnOn the average latency respectively TCN of task1,TCN2,......TCNnIf, TCNiIt is TCN1,TCN2,......TCNnMiddle value reckling, then allocation database node CNiThe execution of participation task T and subtask thereof, for the database node (9) that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to database node (9) free memory amount, carries out task distribution; Wherein, 300��t8��7200��
19. the Database Systems based on Array Model according to claim 18, it is characterised in that: when carrying out task distribution, distribute task according to the monitoring data that system monitoring subsystem (3) is collected according to following distribution method:
If all of database node (9) does not possess the qualification participating in execution task T and subtask thereof, then task T is divided into more subtask, then according at past t9In second, the principle that average latency of the task in task queue the shortest database node is preferential, the database node (9) of distribution correspondence participates in the execution of task T and subtask thereof, for the database node (9) that the task average latency is identical, then, cpu busy percentage less priority principle bigger according to free memory amount, carry out task distribution, wherein, 300��t9��7200��
CN201510952434.6A 2015-12-18 2015-12-18 Array model-based database system Pending CN105630874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510952434.6A CN105630874A (en) 2015-12-18 2015-12-18 Array model-based database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510952434.6A CN105630874A (en) 2015-12-18 2015-12-18 Array model-based database system

Publications (1)

Publication Number Publication Date
CN105630874A true CN105630874A (en) 2016-06-01

Family

ID=56045807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510952434.6A Pending CN105630874A (en) 2015-12-18 2015-12-18 Array model-based database system

Country Status (1)

Country Link
CN (1) CN105630874A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169130A (en) * 2017-06-08 2017-09-15 贵州优联博睿科技有限公司 The visual inquiry method and system of a kind of database
CN112596811A (en) * 2020-12-17 2021-04-02 杭州艾芯智能科技有限公司 Method, system, computer equipment and storage medium for reducing memory overhead by dynamic data loading

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1972311A (en) * 2006-12-08 2007-05-30 华中科技大学 A stream media server system based on cluster balanced load
CN102158540A (en) * 2011-02-18 2011-08-17 广州从兴电子开发有限公司 System and method for realizing distributed database
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
US20140222871A1 (en) * 2011-12-29 2014-08-07 Teradata Us, Inc. Techniques for data assignment from an external distributed file system to a database management system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1972311A (en) * 2006-12-08 2007-05-30 华中科技大学 A stream media server system based on cluster balanced load
CN102158540A (en) * 2011-02-18 2011-08-17 广州从兴电子开发有限公司 System and method for realizing distributed database
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
US20140222871A1 (en) * 2011-12-29 2014-08-07 Teradata Us, Inc. Techniques for data assignment from an external distributed file system to a database management system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169130A (en) * 2017-06-08 2017-09-15 贵州优联博睿科技有限公司 The visual inquiry method and system of a kind of database
CN112596811A (en) * 2020-12-17 2021-04-02 杭州艾芯智能科技有限公司 Method, system, computer equipment and storage medium for reducing memory overhead by dynamic data loading

Similar Documents

Publication Publication Date Title
Vuppalapati et al. Building an elastic query engine on disaggregated storage
CN107329982A (en) A kind of big data parallel calculating method stored based on distributed column and system
CN106021268B (en) File system block level layering and co-allocation
JP6542785B2 (en) Implementation of semi-structured data as first class database element
Lim et al. How to Fit when No One Size Fits.
Raicu et al. Accelerating large-scale data exploration through data diffusion
Kamatkar et al. Database performance tuning and query optimization
US20080071755A1 (en) Re-allocation of resources for query execution in partitions
KR20180027326A (en) Efficient data caching management in scalable multi-stage data processing systems
US11914602B2 (en) Resource provisioning in database systems
US20190303479A1 (en) Distinct value estimation for query planning
CA2963088A1 (en) Apparatus and method for scheduling distributed workflow tasks
CN106462578A (en) Method for querying and updating entries in database
US20160103914A1 (en) Offloading search processing against analytic data stores
US20140331235A1 (en) Resource allocation apparatus and method
Humbetov Data-intensive computing with map-reduce and hadoop
CN103430144A (en) Data source analytics
US10885062B2 (en) Providing database storage to facilitate the aging of database-accessible data
US20210004712A1 (en) Machine Learning Performance and Workload Management
CN104050042A (en) Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
US11080207B2 (en) Caching framework for big-data engines in the cloud
Zhang et al. The optimization for recurring queries in big data analysis system with MapReduce
Vaidya Parallel processing of cluster by map reduce
Costa et al. A survey on data-driven performance tuning for big data analytics platforms
da Silva et al. Big Data Analytics Technologies and Platforms: A Brief Review.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160601

RJ01 Rejection of invention patent application after publication