CN1790344A - Method and apparatus for sampling and storing urban road traffic flow data - Google Patents

Method and apparatus for sampling and storing urban road traffic flow data Download PDF

Info

Publication number
CN1790344A
CN1790344A CN 200410098917 CN200410098917A CN1790344A CN 1790344 A CN1790344 A CN 1790344A CN 200410098917 CN200410098917 CN 200410098917 CN 200410098917 A CN200410098917 A CN 200410098917A CN 1790344 A CN1790344 A CN 1790344A
Authority
CN
China
Prior art keywords
sample
sampling
data
optimal
traffic flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410098917
Other languages
Chinese (zh)
Inventor
于雷
吴家庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN 200410098917 priority Critical patent/CN1790344A/en
Publication of CN1790344A publication Critical patent/CN1790344A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a sampling storage method and device of urban traffic data, which is characterized by the following: adapting error quadratic sum and mutual testing method to obtain the optimum sample quantity of sea-quantity traffic data; comparing each sample with the difference value of sample total character; calculating the optimum sample of minimum difference value within the sample content scale; utilizing the optimum sample data instead of total sample to reserve the primitive data regulation and information. The device contains primitive data reading device, sampling quantity affirming device, optimum sample device of error quadratic sum, optimum sample device of mutual testing and data output device of optimum sample. The invention can test the sample variation, which displays the weight sensitivity of optimum sample.

Description

A kind of urban road traffic flow sampling of data storage means and device
Affiliated technical field
The present invention relates to a kind of urban road traffic flow sampling of data storage means and device.
Background technology
Sample survey is an important means of obtaining statistical data, has a wide range of applications in fields such as society, economy, scientific researches.Sample technique is an important branch in the statistical research.Basic probability sampling method comprises simple random sampling, stratified sampling, chester sampling, multistage sampling, equidistant sampling and sampling with unequal probability etc.
Original road traffic flow data not only can be directly used in Real time Adaptive Traffic Control (comprising signal timing dial, accident forecast etc.), also is used for satisfying potential application demands such as traffic programme.Raw information is carried out the active data sampling, extract useful information, realize the data minus appearance, can satisfy requirements of different users better.For example,, need carry out long term at the trend of freeway facility and analyze, this means that the data conversion with gigabyte has become one page Useful Information for the application software of traffic programme.The key of sample technique is the precision and the reliability of sampling.At first understand overall variance according to historical summary, promptly overall dispersion degree, variance is big, just needs to increase sample size; Secondly definite precision that needs, precision is high more, and required sample size is big more; Be exactly technical matters at last, improve efficiency in extracting and then can reduce required sample number.
" the sample size analysis in the traffic information collection " that high sea otter is write in second the international ITS meeting in Beijing provided the method for determining suitable sample size in the data acquisition.Aspect transport by road, Ministry of Communications has put into effect " transport by road whole industry statistical sampling investigation investigator handbook " in 1999, specially corresponding argumentation has been done in the establishment and the selection of sample.Different to the traffic flow data sampling with the sampling on the traditional mathematics meaning.
By the existing data management of urban transportation command centre being put into practice the summary of work, find that work difficulty concentrates on three aspects: size of data, data layout and the quality of data.Most of traffic control centers are used for storing that the method for historical mass data is too simple and arbitrarily, and mass data is not filed, but these data are very useful again to satisfying different demands in the future, thereby need be to the data storage of sampling.
Summary of the invention
In order to overcome the deficiency of existing technology, the invention provides a kind of urban road traffic flow sampling of data storage means and device, the technical solution adopted for the present invention to solve the technical problems is: a kind of urban road traffic flow sampling of data storage means comprises: a kind of is the error sum of squares method based on the optimized methods of sampling, by comparing the difference of each sample and sample general characteristic, adding up wherein, the sample of difference minimum obtains the interior optimal sampling sample of sample size scope; Another kind of is the method for testing mutually based on the optimized methods of sampling, and by comparing each sample and the difference that remains the sample general characteristic, adding up wherein, the sample of difference minimum obtains the interior optimal sampling sample of sample size scope.
A kind of urban road traffic flow sampling of data memory storage comprises: the raw data reader unit, determine that optimal sample amount device, error sum of squares method determine optimal sampling day device, test method mutually and determine optimal sampling day device and five parts of optimal sampling day data output unit.These five parts interconnect, and have realized above-mentioned method.
An important content will be stored magnanimity, real time data exactly in the traffic flow data management.Because the ITS data volume is huge, under all data conditions of unnecessary storage, can consider from the group data set with similarity closes, to select the data sample that to represent integral body, when using this sample to replace the conceptual data set, system can only preserve these optimal sample data, the implication of Here it is sampling of data.Sample in actual mechanical process under selected certain sample size with certain predicable (as time attribute: be all morning peak data Monday), adopt the sampling of data method to obtain the optimal sample data, final only store optimal sample day data but not all sample datas, thereby effectively reduced storage demand.The method of being invented relates to two kinds of methods of samplings, error sum of squares method (SSE) and test method (CV) mutually, content comprises: use mathematical statistic method, obtain the optimal sample amount of magnanimity traffic flow data, by comparing the difference of each sample and sample general characteristic, adding up wherein, the sample of difference minimum obtains the interior optimal sampling sample of sample size scope then.Replace population sample can when saving storage space, keep the rule and the information of raw data again as much as possible with the optimal sample data.Can test regularly, along with the continuous variation of sample size, the situation of change of optimal sampling sample in weight one; Also can test regularly at sample size one, the situation of change of optimal sampling sample under the different weights, promptly the optimal sampling sample is to the sensitivity problem of weight.
The present invention relates to the central limit theorem in the mathematical statistics, optimized error sum of squares method and tested correlation theories such as method mutually.Central limit theorem can be set up the correlationship between skewed distribution and the normal distribution sample size, for bridge is set up in the association between the different population samples, just can obtain the optimal sample amount according to the method for normal distribution.The optimized error sum of squares method and the method for testing mutually can be sampled to magnanimity urban road traffic flow data and be obtained the optimal sampling sample data.
The principle of method is to use mathematical statistic method, obtains the optimal sample amount of magnanimity traffic flow data, and then by comparing the difference of each sample and sample general characteristic, adding up wherein, the sample of difference minimum is the interior optimal sampling sample of sample size scope.
The optimal sample amount obtains by the method for central limit theorem.If sample (X 1, X 2..., X n) be to come to obey overall be N (μ, σ 2) normal distribution, if population variance the unknown, then statistic T is obeyed the t distribution that degree of freedom be (n-1), for given level of significance α (being that confidence level is 1-α), can obtain the optimal sample amount; If population variance is known, then statistic U = x ‾ - μ σ n ~ N ( 0,1 ) Obey standardized normal distribution, under given permissible error condition, can obtain the optimal sample amount; If the actual distribution of investigation amount is not in full conformity with normal distribution, pass through central limit theorem so, set up the correlationship between skewed distribution and the normal distribution sample size,, just can obtain the optimal sample amount according to the method for normal distribution for bridge is set up in the association between the different population samples.
Error sum of squares method (SSE method) obtains the process of optimal sampling day and is: (flow is arranged with data by the gross, these different traffic variablees of speed and occupation rate) mean value, by finding the deviation of each target and average sample more earlier, for different traffic variablees is unified into identical dimension, in the methods of sampling, introduce quantization system, calculate the quantized value of each variable, again in conjunction with the weight of different variablees, obtain the quantized value of each variable and the product of weight, the sum of products of the quantized value of three variablees and weight is the optimal sampling sample for minimum sampling sample in sample is overall.
The principle of testing method (CV method) mutually is similar to the SSE method, also is by the difference between more single sample and population sample mean value, obtains the optimal sampling sample.The difference of CV method and SSE method is, in the CV method, target not with the mean value of all targets relatively, but make comparisons with the mean value of rejecting the residue target after this target, the target of difference minimum is the optimal sampling sample.
The invention has the beneficial effects as follows that this method can determine optimal sampling day in mass data, the rule and the information that when reducing data volume, keep raw data as far as possible, simultaneously also save a large amount of storage spaces, satisfy the demand of different user raw data research.The methods of sampling of being invented can simply be expressed as follows.At first obtain raw data from traffic control center; Then, check the quality of data and use that systematized method reparation is lost or wrong data; At last, with error sum of squares method (SSE) or test method (CV) mutually data are carried out optimization and selected the optimal sample data.Being sampled to example all day, the purpose of whole process is to select sample size specific some day in the week (second-class as Monday, week) as sampling day, the certain day in this week of representative that selected sampling energy day between the whole sampling period is best.Choosing of sample object can be certain day, also can be complete cycle, weekend, even can be specific period.After new data obtain, rerun sample program after replacing the oldest data in the mode of rolling with its.This program will be stored the raw data of representative entire stream that can be best after sampling, thereby significantly reduce required storage space.
Description of drawings
The present invention is further described below in conjunction with drawings and Examples.
Fig. 1 apparatus of the present invention figure;
Fig. 2 apparatus of the present invention process flow diagram;
Fig. 3 sample size is 10 the quantized value and the variation diagram of weighted value;
Time-the discharge diagram of Fig. 4 optimal sampling day and the poorest sampling day and population mean;
Fig. 5 sample size is 25 o'clock quantized value and the variation diagram of weighted value (comprising ten weights);
Embodiment
Table 1 sample size is 10 quantized result table; The ladder table that the weight quantization value of table 2 flow, speed and occupation rate changes with sample size; Table 3 sample size is 25 quantized result table; Optimal sampling day under ten weights of table 4 is with the change list of sample size.
Method according to invention has designed corresponding urban road traffic flow sampling of data memory storage, and its structure is seen Fig. 1.Whole process connects backstage ORACLE database by the VB program and realizes.Urban road traffic flow sampling of data memory storage is made of five major parts, comprises the raw data reader unit, determines that optimal sample amount device, error sum of squares method determine optimal sampling day device, test method mutually and determine optimal sampling day device and five parts of optimal sampling day data output unit.These five parts connect each other, have realized the method for being invented.Through the data after the quality control through the sampling of data device can obtain optimal sampling day and optimal sampling day raw data.In the entire process process, need the user to set each parameter according to demand, take into full account user's demand information.
After being ready to the raw data that needs to analyze, set the parameter (as morning peak, evening peak, whole day etc.) of reading in data by the raw data reader unit, indicate the type of data, after parameter setting is finished, begin to read in raw data.After successfully reading in data, set the sampling parameter, enter then the error sum of squares method determine optimal sampling day device or the method for testing mutually determine optimal sampling day device, these two devices all are kept in the respective table in the ORACLE database the intermediate steps of original data processing.The implication of some important tables in the database is described, as table 5 earlier.At last by optimal sampling day the data output unit obtain accordingly result, output unit has taken into full account user's different demands, if the user not only needs the optimal sampling Time of Day under the various sample sizes, the optimal sampling day data that also need different weights under the various sample sizes, data output device also can provide, even can do some essential analysis.
Fig. 2 is a urban road traffic flow sampling of data memory storage process flow diagram, the data that obtain from the data reader unit will import in the system and device corresponding database table, then will carry out a series of fault-tolerant processing and carry out suitable correction raw data table according to certain algorithm, then revise the back data enter the error sum of squares method determine optimal sampling day device or the method for testing mutually determine optimal sampling day device, sampling results will be stored and enter output unit, output unit will carry out detailed analysis to the result, and in conjunction with actual carry out some the prediction etc.
In a word, the user only needs the raw data of being analyzed is imported in the respective table, carries out corresponding operating and can obtain the result on the sampling apparatus interface.After the data processing that system imports the user finishes, corresponding results is saved in corresponding database.
In conjunction with subordinate list, below provide the embodiment that example illustrates invention (because the process of SSE method and CV method is similar, so with the introduction of attaching most importance to of SSE method, data are from Beijing's three loops):
This paper has used ten weight combinations, uses ws1 respectively, ws2 ..., ws10 represents that ws1 is (1/3,1/3,1/3); Ws2 is (1/2,1/4,1/4); Ws3 is (1/4,1/2,1/4); Ws4 is (1/4,1/4,1/2); Ws5 is (1/5,3/10,1/2); Ws6 is (1/5,1/2,3/10); Ws7 is (3/10,1/5,1/2); Ws8 is (3/10,1/2,1/5); Ws9 is (1/2,1/5,3/10); Ws10 is (1/2,3/10,1/5).Ws1 is (1/3,1/3,1/3), and flow, speed and three parameters of occupation rate are of equal importance in the expression sampling process, and ws2 is (1/2,1/4,1/4), more stresses the research of flow parameter in the expression sampling process, and the rest may be inferred.
The Road Transportation in Beijing flow data that adopts is from 125 annular detectors on three loops, each detector produces and be data at interval in 2 minutes every day, mainly comprises detector number, date, time, car Taoist monastic name, flow, speed, occupation rate and long vehicle flow.The time span of Road Transportation in Beijing flow data of test is from about 10 months data in year Dec in March, 2002 to 2002, and data all day that this paper chooses detector 03006 (being positioned at northwest corner, crossing, Hu Jia building) the 2nd track continuous Wednesday are sampled embodiment is described.
Test process to data is divided into two aspects: to the test of single weight with to the test of a plurality of weights.Purpose to single weight test is the optimal sampling day that will find out under certain sample size, and observes the Changing Pattern of optimal sampling day with sample size.The fundamental purpose of a plurality of weights test is horizontal contrast, regularly observes the situation of change of optimal sampling day under the different weights at sample size one, promptly tests the susceptibility of optimal sampling day to weight.Sample size is 5, refers to the data of five continuous from 2002-3-13 to 2002-4-10 Wednesdays are sampled.The rest may be inferred, and sample size is 36, refers to sampling 36 continuous from 2002-3-13 to 2002-11-27 Wednesdays.It below is the process of this two aspects test and analysis.
The test of single weight and analysis
Data all day of the data of test are detector 03006 from 2002-3-13 to 2002-11-27 the 2nd track continuous Wednesday, weight is ws1 (1/3,1/3,1/3), sample size is from 5-36.Elder generation's emphasis narration sample size is 10 test process, and then the analyzing samples capacity is analysis and the summary to whole test result from the change procedure of 5-36 at last.
Refer under a certain sample size pairing sampling day when the quantized result of weight is minimum value, the existing analytic process that sampling is described as an example with weight ws1 optimal sampling day.Table 1 is the quantized result table to the sampling of data in the 2nd track continuous 10 Wednesdays (being that sample size is 10) that utilizes sampling apparatus to obtain according to the methods of sampling of being invented operation.
Fig. 3 is that sample size is 10 o'clock flow, speed, the quantized value of occupation rate and the variation diagram of weighted value.
As can be seen, the result 1.45 behind the data-measuring of 2002-4-17 is minimum, so be 10 o'clock at sample size, 2002-4-17 is optimal sampling day.That is to say the 6th Wednesday 2002-4-17 data the most approaching overall ten Wednesday data mean value.Certainly, the 2002-5-15 of maximal value 7.67 correspondences after the quantification is the poorest sampling days, and the poorest sampling day refers in a certain sample size, and weight is the pairing sky of maximal value as a result, and it and population mean differ farthest.Fig. 4 is optimal sampling day and the poorest sampling day and the time-discharge diagram of population mean.
Draw easily drawing a conclusion, optimal sampling day data and population mean mate preferably, the poorest sampling day data differ maximum with population mean.Be optimal sampling day data can represent the data of other day under this sample size.So ten days the data of being studied only need keep feature and rule that the data of 2002-4-17 can be held this ten day data.
When getting different sample sizes (when 5 change to 36), its analytic process similarly, the ladder table such as the table 2 that can proper sample size change with sample size between the weighted value of each parameter of 5-36.The ladder table has reflected the variation characteristics of optimal sampling day with sample size intuitively.By table 2, when sample size was 5, the sky 2002-4-10 of minimum weighted value 1.51 correspondences was optimal sampling days; Sample size is 10 o'clock, and the sky 2002-4-17 of minimum weighted value 1.45 correspondences is optimal sampling days, and the rest may be inferred.
Can be got by above test and analytic process, along with the increase gradually of sample size, optimal sampling day is also in continuous variation, and the sample day that and if only if increases newly just changes during more near the average level of population sample optimal sampling day.More can select representative optimal sampling day from the big sample size of result of real data operation than little sample size.Certainly, if sampling process is designed to a rolling and continuous process, the selected interval of this sample size representative of only representing optimal sampling day, the size of sample size is just unimportant so.
The test of a plurality of weights and analysis
The fundamental purpose of a plurality of weights test is horizontal contrast, regularly observes the situation of change of optimal sampling day under the different weights at sample size one, promptly tests the susceptibility of optimal sampling day to different weights.
The analysis classes of The whole analytical process and single weight seemingly, table 3 is that sample size is the quantized result table of optimal sampling day of 25.
Fig. 5 is that sample size is 25 speed, flow, the quantized value of occupation rate and the variation diagram of weighted value.By the weighted value of each parameter change procedure with sample size, can be under each weight optimal sampling day with the change list 4 of sample size.As can be seen from Table 4, along with the continuous increase of sample size, optimal sampling day, still, when sample size one timing, the probability that the optimal sampling day of different weights equates was very big also in continuous variation.
Contrast by a plurality of weights, the proof SSE methods of sampling is responsive inadequately to the variation of weight when the data of test Beijing, promptly appoint and get the optimal sampling day that optimal sampling day that a weight obtains can both obtain near other weight with very big probability, this flow, speed and occupation rate data stability that detector collection also is described is a self-consistentency, promptly as long as when the data of a variable are more accurate, other variable is also simultaneously more accurate.
Certainly, under identical sample size, along with the variation of weight, optimal sampling day may change, promptly getting different weights is influential to optimal sampling day, should and require to make a concrete analysis of the back in the application process according to actual conditions and select suitable weight combination.
Table 1 sample size is 10 quantized result table
Sequence number Date The flow quantifying value The speed quantized value The occupation rate quantized value The total quantization value of ws1
1 2002-3-13 3.12 5.39 2.30 3.60
2 2002-3-20 10.00 1.72 1.00 4.24
3 2002-3-27 5.54 9.12 3.98 6.21
4 2002-4-3 3.96 9.86 6.90 6.91
5 2002-4-10 1.00 3.43 1.90 2.11
6 2002-4-17 1.23 1.00 2.13
7 2002-4-24 2.97 4.45 4.64 4.02
8 2002-5-8 3.76 8.25 7.16 6.39
9 2002-5-15 3.02 10.00 10.00 7.67
10 2002-5-22 2.74 3.35 2.25 2.78
The ladder table that the weight quantization value of table 2 flow, speed and occupation rate changes with sample size
Date 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
2002-3-13 2. 59 2. 93 2. 92 3. 17 3. 82 3. 60 3. 26 3. O4 3. 42 2. 86 2. 73 2. 75 2. 63 2. 57 2. 57 2. 54 2. 45 2. 47 2. 45 2. 57 2. 69 2. 82 2. 80 2. 86 2. 83 2. 88 2. 93 2. 97 3. 00 2. 98 2. 74 2. 53
2002-3-20 4. 00 4. 00 4. 00 4. 00 4. 25 4. 24 4. 15 4. 09 4. 20 4. 06 4. 02 4. 00 4. 03 4. 00 4. 04 4. 04 4. 03 4. 04 4. 03 4. 15 4. 30 4. 34 4. 34 4. 38 4. 37 4. 39 4. 41 4. 43 4. 44 4. 43 4. 33 4. 25
2002-3-27 5. 41 6. 01 5. 77 6. 36 6. 19 6. 21 6. 15 6. 21 6. 69 5. 43 4. 91 5. 09 5. 13 5. 31 5. 39 5. 40 5. 45 5. 42 5. 47 5. 54 5. 58 5. 52 5. 31 5. 26 5. 23 5. 17 5. 20 5. 20 5. 15 5. 10 4. 69 4. 44
2002-4-3 7. 93 7. 87 7. 98 7. 93 7. 10 6. 91 6. 61 6. 36 6. 31 5. 26 4. 64 4. 61 4. 50 4. 55 4. 38 4. 36 4. 41 4. 30 4. 28 4. 45 4. 55 4. 59 4. 45 4. 45 4. 38 4. 35 4. 41 4. 49 4. 57 4. 66 4. 44 4. 31
202-4-10 1. 51 1. 83 1. 67 1. 84 2. 29 2. 11 1. 92 1. 74 1. 86 1. 63 1. 52 1. 52 1. 52 1. 44 1. 51 1. 46 1. 42 1. 44 1. 40 1. 58 1. 71 1. 78 1. 76 1. 82 1. 81 1. 83 1. 87 1. 90 1. 92 1. 92 1. 79 1. 65
2002-4-7 1. 90 2. 14 1. 69 1. 54 1. 45 1. 47 1. 45 1. 47 1. 41 1. 40 1. 44 1. 40 1. 41 1. 34 1. 32 1. 34 1. 31 1. 31 1. 54 1. 70 1. 72 1. 75 1. 77 1. 67 1. 70 1. 73 1. 79 1. 88 1. 93 1. 80 1. 85
2002-4-24 3. 74 4. 39 3. 88 4. 02 4. 14 4. 18 4. 09 3. 71 3. 09 3. 13 3. 30 3. 38 3. 46 3. 39 3. 45 3. 44 3. 45 3. 62 3. 69 3. 55 3. 32 3. 27 3. 33 3. 24 3. 25 3. 18 3. 11 3. 05 2. 87 2. 78
2002-5-8 7. 47 6. 58 6. 39 6. 08 5. 84 5. 73 4. 74 4. 22 4. 21 4. 07 4. 06 3. 86 3. 86 3. 85 3. 65 3. 67 3. 87 4. 02 4. 04 3. 92 3. 99 3. 89 3. 90 3. 97 4. 12 4. 22 4. 31 4. 11 3. 97
2002-5-15 7. 52 7. 67 7. 79 7. 90 7. 84 6. 67 5. 46 5. 63 5. 64 5. 86 5. 85 5. 84 6. 00 5. 87 5. 92 6. 06 6. 06 5. 81 5. 34 5. 18 5. 15 5. 01 5. 02 5. 01 4. 97 4. 99 4. 73 4. 62
2002-5-22 2. 78 2. 51 2. 23 2. 32 2. 08 2. 00 1. 92 1. 85 1. 77 1. 75 1. 73 1. 68 1. 68 1. 62 1. 81 1. 97 2. 07 2. 09 2. 17 2. 15 2. 24 2. 29 2. 36 2. 42 2. 46 2. 36 2. 29
2002-5-29 2. 76 2. 34 2. 59 2. 17 2. 26 2. 18 2. 05 1. 88 1. 84 1. 80 1. 67 1. 69 1. 63 1. 78 1. 93 2. 11 2. 15 2. 27 2. 22 2. 34 2. 39 2. 48 2. 54 2. 59 2. 44 2. 29
2002-6-5 3. 07 3. 18 2. 66 2. 65 2. 41 2. 24 2. 08 2. 06 2. 04 1. 91 1. 95 1. 85 2. 04 2. 21 2. 38 2. 41 2. 51 2. 47 2. 61 2. 68 2. 78 2. 82 2. 86 2. 76 2. 63
2002-6-12 8. 21 7. 08 5. 99 5. 96 5. 95 5. 99 5. 83 5. 83 5. 84 5. 67 5. 67 5. 88 5. 96 5. 91 5. 58 5. 58 5. 56 5. 44 5. 50 5. 55 5. 57 5. 63 5. 46 5. 32
2002-6-19 7. 39 6. 83 6. 86 6. 56 6. 48 6. 37 6. 37 6. 31 6. 17 6. 16 6. 20 6. 25 6. 46 6. 29 6. 39 6. 33 6. 39 6. 47 6. 60 6. 65 6. 62 6. 05 5. 63
2002-6-26 7. 84 7. 91 7. 83 7. 98 8. 00 7. 98 8. 03 8. 05 8. 03 8. 08 8. 07 7. 91 7. 25 7. 18 7. 22 7. 16 7. 17 7. 17 7. 00 6. 95 6. 35 6. 11
2002-7-3 4. 61 4. 51 4. 39 4. 29 4. 19 4. 13 4. 11 4. 05 4. 18 4. 28 4. 40 4. 30 4. 40 4. 40 4. 48 4. 51 4. 62 4. 66 4. 67 4. 35 4. 17
2002-7-10 7. 03 6. 89 6. 70 6. 71 6. 58 6. 51 6. 45 6. 44 6. 47 6. 75 6. 58 6. 70 6. 58 6. 75 6. 82 6. 92 7. 00 7. 08 6. 54 6. 12
2002-7-17 4. 71 4. 59 4. 54 4. 39 4. 39 4. 30 4. 32 4. 37 4. 60 4. 55 4. 69 4. 65 4. 81 4. 86 4. 97 5. 02 5. 03 4. 56 4. 21
2002-7-24 6. 26 6. 23 6. 18 6. 00 6. 01 6. 08 6. 13 6. 30 6. 15 6. 26 6. 18 6. 24 6. 32 6. 44 6. 55 6. 66 6. 13 5. 76
2002-7-31 2. 95 2. 94 2. 92 2. 89 3. 05 3. 13 3. 17 3. 20 3. 27 3. 30 3. 29 3. 30 3. 33 3. 38 3. 34 3. 05 2. 90
2002-8-7 1. 67 1. 72 1. 60 1. 77 1. 95 2. 16 2. 17 2. 27 2. 24 2. 40 2. 49 2. 60 2. 65 2. 72 2. 62 2. 50
2002-8-14 5. 80 5. 90 5. 96 6. 00 6. 09 5. 97 6. 08 5. 99 5. 95 6. 00 6. 09 6. 2l 6. 25 5. 74 5. 42
2002-8-21 2. 55 2. 73 2. 87 3. 02 2. 93 3. 00 2. 07 3. 09 3. 17 3. 29 3. 30 3. 35 3. 26 3. 16
2002-8-28 1. 14 1. 29 1. 40 1. 45 1. 53 1. 50 1. 57 1. 61 1. 69 1. 74 1. 77 1. 67 1. 59
2002-9-4 1. 34 1. 31 1. 26 1. 25 1. 26 1. 24 1. 26 1. 25 1. 26 1. 26 1. 26 1. 25
2002-9-11 7. 84 7. 18 7. 02 7. 05 6. 85 6. 82 6. 71 6. 58 6. 52 6. 06 5. 89
2002-9-18 7. 80 7. 73 7. 72 7. 69 7. 72 7. 68 7. 52 7. 49 7. 00 6. 71
2002-9-25 4. 63 4. 62 4. 54 4. 58 4. 58 4. 51 4. 51 4. 40 4. 31
2002-10-2 3. 15 3. 23 3. 29 3. 39 3. 50 3. 61 3. 45 3. 23
2002-10-9 5. 32 5. 31 5. 27 5. 25 5. 21 4. 92 4. 79
2002-10-1 6 2. 11 2. 11 2. 11 2. 08 1. 91 1. 82
2002-10-2 3 5. 74 5. 69 5. 61 5. 11 4. 84
2002-11-6 6. 48 7. 49 6. 74 6. 40
2002-11-1 3 6. 16 6. 02 5. 63
2002-11-20 7.76 7.85
2002-11-2 7 4. 10
Table 3 sample size is 25 quantized result table
Sequence number Date The flow quantifying value The speed quantized value The occupation rate quantized value ws1 ws2 ws3 ws4 ws5 ws6 ws7 ws8 ws9 ws10
1 2002-3-13 3.03 3.30 1.74 2.69 2.78 2.84 2.46 2.47 2.78 2.44 2.91 2.70 2.85
2 2002-3-20 10.00 1.89 1.00 4.30 5.72 3.69 3.47 3.07 3.24 3.88 4.14 5.68 5.77
3 2002-3-27 6.39 6.68 3.66 5.58 5.78 5.85 5.10 5.11 5.72 5.08 5.99 5.63 5.93
4 2002-4-3 3.47 5.66 4.51 4.55 4.28 4.82 4.54 4.65 4.88 4.43 4.77 4.22 4.34
5 2002-4-10 1.23 2.37 1.53 1.71 1.59 1.88 1.67 1.72 1.89 1.61 1.86 1.55 1.63
6 2002-4-17 1.46 1.88 1.76 1.70 1.64 1.74 1.71 1.73 1.76 1.69 1.73 1.63 1.64
7 2002-4-24 3.89 3.52 3.66 3.69 3.74 3.65 3.68 3.66 3.64 3.70 3.66 3.75 3.73
8 2002-5-8 3.11 4.72 4.23 4.02 3.79 4.20 4.07 4.15 4.25 3.99 4.14 3.77 3.82
g 2002-5-15 3.90 7.04 7.24 6.06 5.52 6.31 6.36 6.51 6.47 6.20 6.14 5.53 5.51
10 2002-5-22 2.02 2.23 1.66 1.97 1.98 2.04 1.89 1.90 2.02 1.88 2.05 1.96 2.01
11 2002-5-29 1.89 2.67 1.24 1.93 1.92 2.12 1.76 1.80 2.09 1.72 2.15 1.85 1.99
12 2002-6-5 2.77 2.51 1.36 2.21 2.35 2.29 2.00 1.98 2.22 2.01 2.36 2.29 2.41
13 2002-6-12 5.76 5.24 6.89 5.96 5.91 5.78 6.19 6.17 5.84 6.22 5.72 5.99 5.83
14 2002-6-19 4.52 8.89 5.33 6.25 5.82 6.91 6.02 6.24 6.95 5.80 6.87 5.64 5.99
15 2002-6-26 4.21 10.00 10.00 8.07 7.10 8.55 8.55 8.84 8.84 8.26 8.26 7.10 7.10
16 2002-7-3 3.71 5.50 3.62 4.28 4.14 4.59 4.12 4.21 4.58 4.03 4.59 4.04 4.23
17 2002-7-10 3.50 9.54 6.35 6.47 5.72 7.23 6.44 6.74 7.38 6.14 7.09 5.57 5.88
18 2002-7-17 2.57 7.71 2.82 4.37 3.92 5.20 3.98 4.24 5.22 3.72 5.19 3.67 4.16
19 2002-7-24 3.13 9.83 5.42 6.13 5.38 7.05 5.95 6.29 7.17 5.62 6.94 5.16 5.60
20 2002-7-31 2.84 4.72 1.84 3.13 3.06 3.53 2.81 2.90 3.48 2.71 3.58 2.91 3.20
21 2002-8-7 1.78 2.09 1.99 1.95 1.91 1.99 1.96 1.98 2.00 1.95 1.98 1.91 1.92
22 2002-8-14 3.84 8.50 5.67 6.00 5.46 6.63 5.92 6.15 6.72 5.69 6.54 5.32 5.60
23 2002-8-21 3.23 2.66 2.73 2.87 2.96 2.82 2.84 2.81 2.79 2.86 2.84 2.97 2.96
1.00 1.49 1.39 1.29 1.22 1.34 1.32 1.34 1.36 1.29 1.32 1.22 1.23
2002-9-4 1.10 1.00 1.93 1.34 1.28 1.26 1.49 1.49 1.30 1.50 1.22 1.33 1.23
Optimal sampling day under ten weights of table 4 is with the variation (Beijing) of sample size
Sample size ws1 ws2 ws3 ws4 ws5 ws6 ws7 ws8 ws9 ws10
5 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10
6 2002410 2002-4-10 2002-4-17 2002-4-10 2002-4-10 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-10
7 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10 2002-4-10
8 2002-4-17 2002-4-10 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-17
9 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
10 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
11 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
12 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-17 2002-4-17
13 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
14 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
15 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-17
16 2002-4-17 2002-4-10 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-17
17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-17
18 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-10 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-10
19 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
20 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17
21 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-17 2002-4-10 2002-4-17 2002-4-10 2002-4-17
22 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17
23 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-17 2002-4-10 2002-4-17
24 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28 2002-8-28
25 2002-8-28 2002-8-28 2002-9-4 2002-8-28 2002-8-28 2002-9-4 2002-8-28 2002-9-4 2002-8-28 2002-8-28
26 2002-9-4 2002-9-4 2002-9-4 2002-8-28 2002-9-4 2002-9-4 2002-8-28 2002-9-4 2002-9-4 2002-9-4
27 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
28 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
29 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
Sample size ws1 ws2 ws3 ws4 ws5 ws6 ws7 ws8 ws9 ws10
30 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
31 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
32 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
33 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
34 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
35 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
36 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4 2002-9-4
Corresponding table name and implication thereof in table 5 database
Table name Implication
tab Preserve the raw data that the user imports
temp1 Filtered the table that duplicate record obtains among the tab, afterwards by quality control
temp All data of a certain sample size of the needs sampling that from temp1, obtains
datatag All dates among the temp1 are afterwards by quality control
datetagtemp All dates of a certain sample size
tabsse The deviation chart of workaday each value that obtains with the SSE method
tabssesum The deviation and the table of the workaday all values that obtains with the SSE method
tabssescore The weighted value that obtains with the SSE method under a certain sample size is with the change list of weight
tabssescoretotal The weighted value that obtains with the SSE method under all sample sizes is with the change list of weight
tabsseanalyze The final optimal sampling day that obtains with the SSE method is table as a result
tabcv The deviation chart of workaday each value that obtains with the CV method
tabcvsum The deviation and the table of the workaday all values that obtains with the CV method
tabcvscore The weighted value that obtains with the SSE method under a certain sample size is with the change list of weight
tabcvscoretotal The weighted value that obtains with the CV method under all sample sizes is with the change list of weight
tabcvanalyze The final optimal sampling day that obtains with the CV method is table as a result

Claims (12)

1. urban road traffic flow sampling of data storage means, it is characterized in that: two kinds of magnanimity urban road traffic flow sampling of data methods based on mathematical statistics technology comprise:
A kind of is the error sum of squares method based on the optimized methods of sampling, and by comparing the difference of each sample and sample general characteristic, adding up wherein, the sample of difference minimum obtains the interior optimal sampling sample of sample size scope; Another kind of is the method for testing mutually based on the optimized methods of sampling, and by comparing each sample and the difference that remains the sample general characteristic, adding up wherein, the sample of difference minimum obtains the interior optimal sampling sample of sample size scope.
2. a kind of urban road traffic flow sampling of data storage means according to claim 1 is characterized in that, determines the method for optimal sample amount, sample (X 1, X 2..., X n) be to come to obey overall be N (μ, σ 2) normal distribution, if population variance the unknown, then statistic T is obeyed the t distribution that degree of freedom be (n-1), is that confidence level is 1-α for given level of significance α, can obtain the optimal sample amount.
3. a kind of urban road traffic flow sampling of data storage means according to claim 1 is characterized in that, determines the method for optimal sample amount, sample (X 1, X 2..., X n) be to come to obey overall be N (μ, σ 2) normal distribution, if population variance is known, statistic then U = x ‾ - μ σ n ~ N ( 0,1 ) Obey standardized normal distribution, under given permissible error condition, promptly x-μ=δ can obtain the optimal sample amount.
4. a kind of urban road traffic flow sampling of data storage means according to claim 1, it is characterized in that, determine the method for optimal sample amount, even the actual distribution of investigation amount is not in full conformity with normal distribution, pass through central limit theorem, set up the correlationship between skewed distribution and the normal distribution optimal sample amount,, obtain the optimal sample amount according to the method for normal distribution for bridge is set up in the association between the different population samples.
5. a kind of urban road traffic flow sampling of data storage means according to claim 1, it is characterized in that, the error sum of squares method, determine the optimal sampling sample under certain sample size, the optimal sampling sample data can be represented other sample data under this sample size, thereby has kept the most representative primary data information (pdi) when reducing the storage data volume; Maybe to determine regularly in weight one, along with the continuous variation of sample size, the situation of change of sampling sample; Or during the optimal sampling sample under determining certain sample size, test susceptibility to weight, and promptly regularly at sample size one, the situation of change of optimal sampling sample under the different weights.
6. a kind of urban road traffic flow sampling of data storage means according to claim 5, it is characterized in that, test method mutually, optimal sampling under determining a sample capacity is during day, comprise that the data of all day, morning peak and evening peak test and compare to the Real-time Road traffic flow data of the Beijing and the U.S..
7. urban road traffic flow sampling of data memory storage, it is characterized in that, comprise that raw data reader unit, optimal sample amount determine that device, error sum of squares method determine the optimal sampling sample device, test method mutually and determine optimal sampling sample device and five parts of optimal sampling sample data output unit, these five parts interconnect.
8. a kind of urban road traffic flow sampling of data memory storage according to claim 7, it is characterized in that, adopt central limit theorem to determine the optimal sample amount, adopt the error sum of squares method and the method for testing mutually to determine the optimal sampling sample data, the sampling results of various time periods of comparative evaluation.
9. the described a kind of urban road traffic flow sampling of data memory storage of claim 7 is characterized in that the optimal sample amount is determined device, adopts the method in the claim 2,3,4 to determine the optimal sample amount.
10. a kind of urban road traffic flow sampling of data memory storage according to claim 7 is characterized in that the error sum of squares method is determined the optimal sampling sample device, adopts the method in the claim 5 to determine the optimal sampling sample.
11. a kind of urban road traffic flow sampling of data memory storage according to claim 7 is characterized in that the method for testing is mutually determined the optimal sampling sample device, adopts the method in the claim 6 to determine the optimal sampling sample.
12. a kind of urban road traffic flow sampling of data memory storage according to claim 7, it is characterized in that, optimal sampling sample data output unit provides the optimal sampling sample data of different weights under optimal sampling sample and the various sample sizes under the various sample sizes, can do analysis.
CN 200410098917 2004-12-15 2004-12-15 Method and apparatus for sampling and storing urban road traffic flow data Pending CN1790344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410098917 CN1790344A (en) 2004-12-15 2004-12-15 Method and apparatus for sampling and storing urban road traffic flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410098917 CN1790344A (en) 2004-12-15 2004-12-15 Method and apparatus for sampling and storing urban road traffic flow data

Publications (1)

Publication Number Publication Date
CN1790344A true CN1790344A (en) 2006-06-21

Family

ID=36788193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410098917 Pending CN1790344A (en) 2004-12-15 2004-12-15 Method and apparatus for sampling and storing urban road traffic flow data

Country Status (1)

Country Link
CN (1) CN1790344A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364344B (en) * 2008-06-27 2010-06-16 北京工业大学 Road network limitation capacity determining method based on pressure test
CN102262678A (en) * 2011-08-16 2011-11-30 郑毅 System for sampling mass data and managing sampled data
CN103093620A (en) * 2013-01-07 2013-05-08 东南大学 Determination method of motor vehicle traffic conflict number based on conflict traffic flow characteristics
CN104281691A (en) * 2014-10-11 2015-01-14 百度在线网络技术(北京)有限公司 Search engine based data processing method and platform
CN107133718A (en) * 2017-04-17 2017-09-05 济南鼎道企业管理咨询有限公司 The sampling method for making sample of the large mineral resources commodity of solid kind
CN112233747A (en) * 2020-11-16 2021-01-15 广东省新一代通信与网络创新研究院 Twin network data analysis method and system based on personal digital
CN115565610A (en) * 2022-09-29 2023-01-03 四川大学 Method and system for establishing recurrence transfer analysis model based on multiple sets of mathematical data

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364344B (en) * 2008-06-27 2010-06-16 北京工业大学 Road network limitation capacity determining method based on pressure test
CN102262678A (en) * 2011-08-16 2011-11-30 郑毅 System for sampling mass data and managing sampled data
CN103093620A (en) * 2013-01-07 2013-05-08 东南大学 Determination method of motor vehicle traffic conflict number based on conflict traffic flow characteristics
CN104281691A (en) * 2014-10-11 2015-01-14 百度在线网络技术(北京)有限公司 Search engine based data processing method and platform
CN104281691B (en) * 2014-10-11 2017-07-21 百度在线网络技术(北京)有限公司 A kind of data processing method and platform based on search engine
CN107133718A (en) * 2017-04-17 2017-09-05 济南鼎道企业管理咨询有限公司 The sampling method for making sample of the large mineral resources commodity of solid kind
CN107133718B (en) * 2017-04-17 2020-07-24 济南鼎道企业管理咨询有限公司 Sampling and sample preparation method for solid bulk mineral resource commodities
CN112233747A (en) * 2020-11-16 2021-01-15 广东省新一代通信与网络创新研究院 Twin network data analysis method and system based on personal digital
CN115565610A (en) * 2022-09-29 2023-01-03 四川大学 Method and system for establishing recurrence transfer analysis model based on multiple sets of mathematical data
CN115565610B (en) * 2022-09-29 2024-06-11 四川大学 Recurrence and metastasis analysis model establishment method and system based on multiple groups of study data

Similar Documents

Publication Publication Date Title
CN110264709B (en) Method for predicting traffic flow of road based on graph convolution network
CN110503245B (en) Prediction method for large-area delay risk of airport flight
CN111651545A (en) Urban marginal area extraction method based on multi-source data fusion
CN112465243A (en) Air quality forecasting method and system
CN112580864B (en) Village and town domestic garbage yield prediction system combining with multivariate data application value improvement
CN110826689A (en) Method for predicting county-level unit time sequence GDP based on deep learning
CN113688558A (en) Automobile driving condition construction method and system based on large database samples
CN1790344A (en) Method and apparatus for sampling and storing urban road traffic flow data
CN110689055B (en) Cross-scale statistical index spatialization method considering grid unit attribute grading
CN114398951A (en) Land use change driving factor mining method based on random forest and crowd-sourced geographic information
CN111523562A (en) Commuting mode vehicle identification method based on license plate identification data
CN108053646B (en) Traffic characteristic obtaining method, traffic characteristic prediction method and traffic characteristic prediction system based on time sensitive characteristics
CN112148821B (en) City mixed occupation space calculation method and system
CN112201036B (en) Urban expressway travel speed short-time prediction method based on inclusion-CNN
CN112070129B (en) Ground settlement risk identification method, device and system
CN112084941A (en) Target detection and identification method based on remote sensing image
CN111291095A (en) Data processing method, device and equipment
CN111783351A (en) Non-probability credible set quantification method for uncertainty parameters of structural system
CN115982606A (en) Bridge structure health detection method and system based on fuzzy C-means clustering algorithm
CN116089771A (en) Particulate matter source analysis method, device, terminal and storage medium
CN115906669A (en) Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy
CN113239815B (en) Remote sensing image classification method, device and equipment based on real semantic full-network learning
CN115510945A (en) Geological disaster probability forecasting method based on principal component and Logistic analysis
CN112529311B (en) Road flow prediction method and device based on graph convolution analysis
CN111914009B (en) Pyspark-based energy storage data calculation and analysis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication