CN108491507A - A kind of parallel continuous Query method of uncertain traffic flow data based on Hadoop distributed environments - Google Patents
A kind of parallel continuous Query method of uncertain traffic flow data based on Hadoop distributed environments Download PDFInfo
- Publication number
- CN108491507A CN108491507A CN201810240305.8A CN201810240305A CN108491507A CN 108491507 A CN108491507 A CN 108491507A CN 201810240305 A CN201810240305 A CN 201810240305A CN 108491507 A CN108491507 A CN 108491507A
- Authority
- CN
- China
- Prior art keywords
- data
- carried out
- flow data
- dbscan
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 37
- 230000003044 adaptive effect Effects 0.000 claims abstract description 9
- 238000009826 distribution Methods 0.000 claims description 20
- 230000009467 reduction Effects 0.000 claims description 11
- 238000000513 principal component analysis Methods 0.000 claims description 4
- 238000007619 statistical method Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000012106 screening analysis Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 19
- 238000004364 calculation method Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000007405 data analysis Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000009412 basement excavation Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of querying method of the uncertain higher-dimension traffic flow data based on Hadoop distributed environments, includes the following steps:It received by MapReduce batch processing Computational frames, store flow data and the flow data is pre-processed and obtains data set;The adaptive DBSCAN that parallelization is carried out to the data set is clustered and query demand is combined to generate query result.The present invention can successfully manage data characteristics variation, provide efficient query result in real time.
Description
Technical field
The present invention relates to magnanimity not to know flow data management domain, and Hadoop distributed environments are based on more particularly to one kind
The parallel continuous Query method of uncertain traffic flow data.
Background technology
With the fast development of social informatization, there is the growth of data explosion formula in global every field.According to《2016
Chinese big data transaction white paper》It is expected that Chinese big data industry size or 1,362,600,000,000 yuan will be reached in the year two thousand twenty.Data city
Field is not only showed data scale and is improved as unit of the order of magnitude, but also data itself produce a series of new features, including
Unstructured, multi-source heterogeneous and dynamic evolution etc..
Wisdom traffic system covers communications and transportation various aspects, and it is wisdom that big data theory, which is introduced into traffic system,
The development of railway and highway system creates new thinking and technology connotation.The traffic data for the magnanimity complexity that wisdom traffic system is accumulated
Stream, derives from a wealth of sources, is various informative, has data-intensive processing feature;Meanwhile while by data integration, propagation delay time and
The influence of the low equal complicated reason of equipment precision is widely present uncertain data in transport data stream.With wisdom traffic system
It is increasingly taken seriously, an important required course of traffic data management platform is rationally had become using big data resource.
Big data technology is applied in the processing of traffic operation mass data mining analysis, will be wisdom traffic provider
Method supports and data are supported.Flow data is the data normality in traffic system, meanwhile, traffic flow data has number in wisdom traffic
According to the features such as magnanimity, storage and rate of interaction be fast, therefore to become vehicle remote monitoring flat for acquisition, storage and the retrieval of its data
Critical issue in platform.Also, to meet the traffic controls demand such as traffic guidance of modernization, need to traffic behavior carry out compared with
Accurately to judge and predicting, it is therefore desirable to obtain accurate traffic flow data in real time.However, since current system is strong
Strong property is insufficient, it is difficult to voluntarily judge the quality of data, may have missing values not so as to cause in the certain dimensions of traffic flow data
Determine data flow characteristics, therefore, every traffic control demand will be difficult to meet due to the missing of reliable initial data, to lead
Cause the overall value of wisdom traffic system by large effect.
The management that mass data is carried out based on Hadoop can make full use of the autgmentability of MapReduce, solve magnanimity number
The problem of according to the autgmentability and scale that face is managed.It, can not be fine but since MapReduce is batch processing Computational frame
Ground adapts to and processing flow data, it is therefore desirable to be assisted MapReduce frames so that it successfully manages flow data.
Therefore, in conjunction with modes such as big data and cloud computings, existing search algorithm is improved, is allowed to merge with better adapting to property
In wisdom traffic system, go to meet to having many characteristics, such as that the progress of the streaming traffic data of uncertain, high latitude is persistently looked into parallel
The requirement of inquiry undoubtedly has become the strength boost motor for accelerating traffic system development.
Invention content
The purpose of the present invention is to provide a kind of uncertain higher-dimension traffic flow data based on Hadoop distributed environments
Querying method, a kind of inquiry of the uncertain higher-dimension traffic flow data based on Hadoop distributed environments proposed by the present invention
Method can successfully manage data characteristics variation, provide efficient query result in real time.
In order to achieve the above objectives, the present invention uses following technical proposals:It is a kind of not true based on Hadoop distributed environments
The querying method of qualitative higher-dimension traffic flow data, includes the following steps:
It received by MapReduce batch processing Computational frames, store flow data and the flow data is pre-processed simultaneously
Obtain data set;
The adaptive DBSCAN that parallelization is carried out to the data set is clustered and query demand is combined to generate query result.
Preferably, it received by MapReduce batch processing Computational frames, store flow data and the flow data carried out pre-
Processing includes:
MapReduce batch processings Computational frame is set to be carried out in flow data environment to passing in real time by sliding window pattern
Defeated flow data is received and stored;
Data item screening and principal component analysis dimensionality reduction are carried out respectively to the flow data received to obtain dimensionality reduction number
According to;
It carries out standard deviation calculating by the dimensionality reduction data and brings section expression formula into obtain data set.
Preferably, the adaptive DBSCAN that parallelization is carried out to the data set is clustered and query demand is combined to generate
Query result includes:
The data set is decomposed and obtains several data subsets;
Adaptivity DBSCAN clusters are carried out respectively to several data subsets and obtain each data subset
Data distribution characteristics and data structure feature;
The data distribution characteristics and data structure feature of each data subset are integrated and obtain whole number
According to the data distribution characteristics and data structure feature of collection;
Data division is carried out to the data structure feature of the whole data set and query demand is combined to generate query result.
Preferably, described that adaptivity DBSCAN clusters are carried out respectively to several data subsets and obtain each institute
The data distribution characteristics and data structure feature for stating data subset include:
Line number statistical analysis of going forward side by side is distributed to several data subset progress KNN and obtains pre-set parameter;
DBSCAN clusters are carried out respectively to several data subsets according to the pre-set parameter and obtain each institute
State the data distribution characteristics and data structure feature of data subset.
Beneficial effects of the present invention are as follows:
(1) it introduces sliding window pattern and reads stream data, and introduce " interval number " concept and carry out data rewriting, to one
Determine to compensate for the analytical error that uncertain data is brought in degree, and MapReduce batch processing Computational frames provide one
Kind answers the processing means of streaming data;
(2) core calculations part clusters by adaptive DBSCAN and obtains data structure feature, can successfully manage data
Characteristic changes and improve efficiency data query can efficiently visit especially under mass data environment by cluster mode
Rope data common feature realizes the rapid excavation to data entirety feature.
Description of the drawings
Specific embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows the step flow chart of querying method of the present invention;
Fig. 2 shows the step flow charts of data prediction part in the present invention;
Fig. 3 shows to rewrite the result schematic diagram of initial data in the present invention according to interval number representation method;
Fig. 4 shows the step flow chart of data core calculations part in the present invention;
Fig. 5 shows the step flow chart that adaptivity DBSCAN is clustered in the present invention.
Specific implementation mode
In order to illustrate more clearly of the present invention, the present invention is done further with reference to preferred embodiments and drawings
It is bright.Similar component is indicated with identical reference numeral in attached drawing.It will be appreciated by those skilled in the art that institute is specific below
The content of description is illustrative and be not restrictive, and should not be limited the scope of the invention with this.
For under wisdom traffic background, the characteristic properties such as the streaming of traffic data, high-dimensional, uncertain propose a kind of
Environment is background in a distributed manner, is based on density clustering algorithm, can successfully manage data characteristics variation, provides efficiently look into real time
Ask the uncertain traffic flow data comprehensive inquiry method of result.
The described a kind of uncertain based on Hadoop distributed environments of the present invention is discussed in detail with reference to above-mentioned target
The querying method of property higher-dimension traffic flow data, as shown in Figure 1, including the following steps:
Step 100:It received by MapReduce batch processing Computational frames, store flow data and the flow data is carried out
It pre-processes and obtains data set;
Step 200:The adaptive DBSCAN that parallelization is carried out to data set is clustered and query demand is combined to generate inquiry knot
Fruit.
Fig. 2 is the step flow chart of data prediction part in the present invention, as shown in Fig. 2, step 100 includes following step
Suddenly:
Step 110:Using sliding window model, the flow data of real-time Transmission is received, and buffering area is combined to carry out
The short-term storage of initial data, to be calculated in real time for the data in unit interval piece;
Flow data is one group of characteristic data sequence, can often regard the dynamic data set to increase without limitation at any time as.Stream
The characteristics of data includes mainly in short:(1) data reach in real time, have quick unlimitedness;(2) it is only to reach order for data
It is vertical, and it is unknown to generate the characteristics such as speed and time;(3) data flow changes over time;(4) single pass is required, after data processing
It cannot generally be handled by taking-up again;(5) a large amount of flow data analysis generally requires query result to meet trueness error requirement i.e.
Can, there is result approximation.For the characteristic of flow data, need to carry out corresponding back work to MapReduce frames, so that
It adapts to the processing of stream data.
MapReduce frames based on Hadoop platform are a batch processing Computational frames.Batch processing can be used for calculating pair
The arbitary inquiry of different data collection is generally used for realizing the in-depth analysis to large data sets.On the contrary, stream process then needs intake one
A data sequence, incrementally updating index, report and collect statistics are as a result, to respond the data record each reached.This place
Reason method is more suitable for real time monitoring and receptance function.But batch processing and stream process are not incompatible with, can pass through combined use
Two methods build a kind of mixed mode, while maintaining real-time process layer and batch processing layer, a kind of more with suitable to be formed
The processing scheme of answering property and use value.Therefore, by introduce sliding window model, in time read or receive data, not only with
Smaller data window and buffer cache is combined, realization effectively reduces requirement of the algorithm to memory, and disclosure satisfy that
Data are received in time, and for the needs that Recent data is analysed in depth.
The concrete methods of realizing of sliding window:The data that real-time Transmission is stored by core buffer, in each data block
It all include a plurality of initial data received.Sliding window reads a certain number of data blocks every time, and as time goes by,
The position of mobile sliding window, reads the data in new sliding window, to realize that emphasis carries out processing and feature to Recent data
Analysis.
It is worth noting that, there are certain defects for sliding window model itself, since stale data cannot be timely in window
It deletes completely, a degree of memory is caused to waste.The present invention is directed to the limitation of sliding window, is introducing sliding window model
On the basis of, it is realized to the timely processing of expired tuple, is avoided since expired tuple is not timely by the conversion of buffering area
Ground deletes, and to cause the waste of memory source, and is impacted to the cluster process and result in later stage.By designing above and
It improves, effectively improves clustering result quality and data-handling capacity, while memory overhead is greatly saved.
Step 120:With reference to the data characteristics that historical data accumulates, according to the required precision of practical problem, to what is received
High-dimensional initial data in unit interval piece, filters out the data item being affected to principal component, then carries out simplified master
Constituent analysis dimensionality reduction calculates;
With the development of wisdom traffic, field of traffic data record and data attribute scale show becoming of expanding rapidly
Gesture, while high dimensional data is in occupation of sizable proportion, but such case fully may result in and be produced in data analysis application
Raw quite bad performance, so for big data processing platform, Data Dimensionality Reduction becomes increasingly part and parcel.
Under the conditions of the time restriction of flow data processing, by the data characteristics that historical data accumulates, by largely going through
History or in the recent period classification traffic data carry out principal component analysis repeatedly and calculate, and obtain the principal component shadow to newly being formed in each formatted data
Ring universal data item bigger than normal.Under certain accuracy enabled condition, for particular demands, consideration passes through above-mentioned analysis and calculating
Obtained result preference carries out data item screening and principal component analysis dimensionality reduction.
As a part for data prediction, which can effectively reduce the time complexity of Data Dimensionality Reduction Algorithm, not only
Meet the needs of convenient for carrying out subsequent processing to high dimensional data, and improves algorithm operational efficiency.
Step 130:Dimensionality reduction data in the unit interval piece obtained for step 120 carry out standard deviation calculating, count respectively
The standard deviation of each data item is calculated, and carries it into interval number expression formula, to be rewritten to each data item of the data,
The data point object that newly defines is formed to obtain data set.
It can be obtained by interval number correlation theory, the corresponding error vectors of data point Xi are usedIt indicates, due to measurement data
It is distributed in sectionProbability be 68.3%, in section
Probability be 95.4%, in sectionProbability be 99.7%.Error according to actual needs
Required precision selects suitable section to indicate.
Fig. 3 is the result schematic diagram for rewriting initial data in the present invention according to interval number representation method.Make in illustrative example
WithAs the re-writing mode of raw data points Xi, then initial data can be rewritten into new definition
Data object obtain data set.
Fig. 4 is the step flow chart of data core calculations part in the present invention, as shown in figure 4, step 200 includes following step
Suddenly:
Step 210:By the MapReduce parallel computation frames of Hadoop distributed processing system(DPS)s, will by step 110 to
The data set that step 130 processing obtains is divided, and the data subset of several scale is smallers is formed, and is then directed to each data
Collection carries out adaptivity DBSCAN clusters, obtains the data distribution characteristics and data structure feature of small-scale data subset;
Step 220:The cluster result of each data subset is integrated by MapReduce parallel computation frames, is obtained
The data structure feature and data distribution characteristics of whole data set;
Traditional uniprocessor algorithm when handling large-scale data sample, often existence time and space expense it is excessive and knot
The bad problem of fruit accuracy.For the general data Processing Algorithm including clustering algorithm, due to the limitation of Installed System Memory,
When data volume increased dramatically, memory and I/O consumption will significantly increase.In this regard, algorithm is rewritten as to be arranged in distributed environment
In, and piecemeal is carried out to set of data samples and handles and very effective can evade the above problem.
By distributed processing system(DPS), piecemeal processing is carried out to data set so that raw data set in large scale is closed
Reason is divided into the data subset of several scale is smallers, to meet the purpose that parallelization handles these data samples.
Therefore, it realizes that parallelization calculates by the MapReduce parallel computation frames of Hadoop distributed processing system(DPS)s to calculate
Method thought.Parallelization data processing and cluster process are realized by writing Map () function, and then can be small-sized to being broken down into
The data of data subset carry out speed faster, the higher implementing result of accuracy.
Step 230:For specific actual demand, obtains data structure feature by step 220 and carry out data division, so
Relational data areas is inquired again afterwards, obtains targetedly query result.
On the basis of carrying out resolution process to data and integrating subarea clustering result of calculation, according to final inquiry
Demand Design querying condition, to carry out further targetedly analysis to data and study.
Fig. 5 is the step flow chart that adaptivity DBSCAN is clustered in the present invention.For characterized by big data instantly
For, the signature analysis and demand established in mass data explore the mainstream research for increasingly becoming big data analysis and processing
Direction.For the processing and research for meeting to large data sets, cloud computing and machine learning are gradually developed.And clustering algorithm can
Characteristic feature in abundant mining data distribution and structure is that a kind of algorithm for having larger potentiality in machine learning field is thought
Think.
So present invention selection carries out core calculations and the processing of data by clustering algorithm, clustering algorithm is given full play to
Advantage on efficient heuristic data common feature meets the rapid excavation to data entirety feature, and non-existing algorithm is to a
The inquiry and concern of other data.
DBSCAN clustering algorithms are a kind of typical density-based algorithms, and are a kind of efficient cluster calculations
Method.It is main to rely on two parameters during the algorithm is realized:Radius Eps and density threshold minPts, the two parameters are set
It is fixed to have more crucial influence to the speed of service of cluster and the quality of cluster result.
Existing DBSCAN clustering algorithms rely on user defeated in advance the setting of two kinds of parameter values of Eps and minPts substantially
Enter.User rule of thumb carries out parameter setting, and then according to result progress parameters revision is attempted, ideal can be generated by gradually finding
The more suitable parameter value of cluster result.This mode is a kind of parameter selection scheme of existing relatively meet demand, but
For the data set larger to data volume, this progress repeatedly clusters the mode that parameters revision is then carried out by comparing result,
Resource consumption caused by each run is very important, meanwhile, small data quantity set is compared in the accuracy of parameter selection also can be
It reduces.In this regard, it is contemplated that introduce a kind of system by being investigated to data set, the mode of adaptive setting parameter value.
As shown in figure 5, step 210 includes the following steps:
Step 211:It is decomposed into line data set using Map () function of MapReduce, to carry out parallel clustering calculating;
Step 212:KNN distributions are carried out for each data subset, being found according to k-dist distribution curves can be representative anti-
The k value k0 for mirroring the shape of other distk curves, root for statistical analysis to the k- nearest neighbor distances data (distk0) of k0
Analysis result is by distk probability distribution region the most intensive according to statistics, as the setting value of radius parameter Eps, therefore, for
Selection area data carry out models fitting, find most suitable model and calculate knee of curve f (x0), then radius Eps=f
(x0);
The adaptive of parameter Eps and minPts in DBSCAN clustering algorithms is selected, it can be by being carried out to data set
KNN is distributed the pre-set parameter analyzed with mathematical statistics, and then obtain the more science that feedback is come.
Specifically, first, according to the pretreated data set D of input be calculated range distribution matrix D ISTn ×
Then n calculates the value of each element in range distribution matrix, and carries out ascending order arrangement to each row in DISTn × i, obtain
KNN is distributed.The k value k0 for the shape that representative can reflect other distk curves are found according to k-dist distribution curves, and right
The k- nearest neighbor distances data (distk0) of k0 are for statistical analysis.It can be obtained according to statistic analysis result, be existed in distk
By smooth variation to the point steeply risen, i.e. distk probability distribution region the most intensive, you can be considered as radius parameter
The setting value of Eps.Therefore, a variety of models fittings such as Fourier, Gauss and multinomial are carried out to the data, finds most suitable mould
Type calculates knee of curve f (x0), then radius Eps=f (x0).
Step 213:After obtaining radius Eps by step 212, the calculation to density threshold minPts is to calculate successively
Then the number of objects for the Eps neighborhoods each put calculates the mathematic expectaion of data object, the as value of MinPts;
Step 214:The science value of the parameter Eps and minPts that are calculated through the above steps, as main
Pre-set parameter carries out the clusters of the DBSCAN based on density to data subset, and the distance between data point is calculated when cluster and is no longer pressed
According to Euclidean distance calculation formula, but the cluster result of data subset is finally obtained apart from calculation according to interval number
And the analysis result to its data structure feature.
Show that this method can successfully manage the processing work of flow data by test result;As data set capacity gradually increases
Greatly, the time shortens rapidly, and efficiency is apparently higher than general query algorithm;Moreover, uncertain data and noise point can be effectively reduced
The error brought, it is of less demanding to data set data characteristics.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention may be used also on the basis of the above description for those of ordinary skill in the art
To make other variations or changes in different ways, all embodiments can not be exhaustive here, it is every to belong to this hair
Row of the obvious changes or variations that bright technical solution is extended out still in protection scope of the present invention.
Claims (4)
1. a kind of querying method of the uncertain higher-dimension traffic flow data based on Hadoop distributed environments, which is characterized in that
Include the following steps:
It received by MapReduce batch processing Computational frames, store flow data and the flow data is pre-processed and obtained
Data set;
The adaptive DBSCAN that parallelization is carried out to the data set is clustered and query demand is combined to generate query result.
2. querying method according to claim 1, which is characterized in that received by MapReduce batch processing Computational frames,
It stores flow data and pretreatment is carried out to the flow data and include:
MapReduce batch processings Computational frame is set to be carried out to real-time Transmission in flow data environment by sliding window pattern
Flow data is received and stored;
Data item screening and principal component analysis dimensionality reduction are carried out respectively to the flow data received to obtain dimensionality reduction data;
It carries out standard deviation calculating by the dimensionality reduction data and brings section expression formula into obtain data set.
3. querying method according to claim 1, which is characterized in that described to carry out the adaptive of parallelization to the data set
It answers DBSCAN to cluster and query demand is combined to generate query result and include:
The data set is decomposed and obtains several data subsets;
Adaptivity DBSCAN clusters are carried out respectively to several data subsets and obtain the number of each data subset
According to distribution characteristics and data structure feature;
The data distribution characteristics and data structure feature of each data subset are integrated and obtain whole data set
Data distribution characteristics and data structure feature;
Data division is carried out to the data structure feature of the whole data set and query demand is combined to generate query result.
4. querying method according to claim 3, which is characterized in that described to be carried out respectively to several data subsets
Adaptivity DBSCAN is clustered and is obtained the data distribution characteristics of each data subset and data structure feature includes:
Line number statistical analysis of going forward side by side is distributed to several data subset progress KNN and obtains pre-set parameter;
DBSCAN clusters are carried out respectively to several data subsets according to the pre-set parameter and obtain each number
According to the data distribution characteristics and data structure feature of subset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810240305.8A CN108491507B (en) | 2018-03-22 | 2018-03-22 | Uncertain traffic flow data parallel continuous query method based on Hadoop distributed environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810240305.8A CN108491507B (en) | 2018-03-22 | 2018-03-22 | Uncertain traffic flow data parallel continuous query method based on Hadoop distributed environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108491507A true CN108491507A (en) | 2018-09-04 |
CN108491507B CN108491507B (en) | 2022-03-11 |
Family
ID=63319250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810240305.8A Expired - Fee Related CN108491507B (en) | 2018-03-22 | 2018-03-22 | Uncertain traffic flow data parallel continuous query method based on Hadoop distributed environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108491507B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140244701A1 (en) * | 2013-02-25 | 2014-08-28 | Emc Corporation | Data analytics platform over parallel databases and distributed file systems |
CN104156463A (en) * | 2014-08-21 | 2014-11-19 | 南京信息工程大学 | Big-data clustering ensemble method based on MapReduce |
CN105677752A (en) * | 2015-12-30 | 2016-06-15 | 深圳先进技术研究院 | Streaming computing and batch computing combined processing system and method |
CN107341210A (en) * | 2017-06-26 | 2017-11-10 | 西安理工大学 | C DBSCAN K clustering algorithms under Hadoop platform |
-
2018
- 2018-03-22 CN CN201810240305.8A patent/CN108491507B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140244701A1 (en) * | 2013-02-25 | 2014-08-28 | Emc Corporation | Data analytics platform over parallel databases and distributed file systems |
CN104156463A (en) * | 2014-08-21 | 2014-11-19 | 南京信息工程大学 | Big-data clustering ensemble method based on MapReduce |
CN105677752A (en) * | 2015-12-30 | 2016-06-15 | 深圳先进技术研究院 | Streaming computing and batch computing combined processing system and method |
CN107341210A (en) * | 2017-06-26 | 2017-11-10 | 西安理工大学 | C DBSCAN K clustering algorithms under Hadoop platform |
Non-Patent Citations (2)
Title |
---|
冯青平 等: ""基于MapReduce和聚类算法的交通状态识别"", 《信息技术》 * |
谢娟英 等: ""K近邻优化的密度峰值快速搜索聚类算法"", 《中国科学:信新科学》 * |
Also Published As
Publication number | Publication date |
---|---|
CN108491507B (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | A machine learning method for predicting driving range of battery electric vehicles | |
CN110782658B (en) | Traffic prediction method based on LightGBM algorithm | |
CN112069310B (en) | Text classification method and system based on active learning strategy | |
CN110533112B (en) | Internet of vehicles big data cross-domain analysis and fusion method | |
CN112037009A (en) | Risk assessment method for consumption credit scene based on random forest algorithm | |
CN110555989B (en) | Xgboost algorithm-based traffic prediction method | |
CN111008726B (en) | Class picture conversion method in power load prediction | |
Meng et al. | A two-stage short-term traffic flow prediction method based on AVL and AKNN techniques | |
CN107832456B (en) | Parallel KNN text classification method based on critical value data division | |
CN110619419B (en) | Passenger flow prediction method for urban rail transit | |
CN111105035A (en) | Neural network pruning method based on combination of sparse learning and genetic algorithm | |
CN111832839B (en) | Energy consumption prediction method based on sufficient incremental learning | |
Vychuzhanin et al. | Analysis and structuring diagnostic large volume data of technical condition of complex equipment in transport | |
CN114066073A (en) | Power grid load prediction method | |
CN113743453A (en) | Population quantity prediction method based on random forest | |
CN114463978B (en) | Data monitoring method based on track traffic information processing terminal | |
CN112035536A (en) | Electric automobile energy consumption prediction method considering dynamic road network traffic flow | |
CN116737360A (en) | Multi-reference driving parameter adjusting server energy efficiency adjusting method and device for mixed load | |
CN108491507A (en) | A kind of parallel continuous Query method of uncertain traffic flow data based on Hadoop distributed environments | |
Wang et al. | A Second-Order HMM Trajectory Prediction Method based on the Spark Platform. | |
CN114185956A (en) | Data mining method based on canty and k-means algorithm | |
CN111353523A (en) | Method for classifying railway customers | |
Sun et al. | A novel abnormal traffic incident detection method based on improved support vector machine | |
Zhang et al. | Understanding mobility via deep multi-scale learning | |
Luo et al. | An interpretable prediction model for pavement performance prediction based on XGBoost and SHAP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220311 |