CN108717422B - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN108717422B
CN108717422B CN201810366853.5A CN201810366853A CN108717422B CN 108717422 B CN108717422 B CN 108717422B CN 201810366853 A CN201810366853 A CN 201810366853A CN 108717422 B CN108717422 B CN 108717422B
Authority
CN
China
Prior art keywords
geographical
attribute
data
coordinate
data object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810366853.5A
Other languages
Chinese (zh)
Other versions
CN108717422A (en
Inventor
陈孟婕
徐硕
刘慧媛
王振洲
蒋庆朝
李奥
王宇
鲁峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FISHING ENGINEERING INST CHINESE INST OF AQUATIC PRODUCTS SCIENCE
Original Assignee
FISHING ENGINEERING INST CHINESE INST OF AQUATIC PRODUCTS SCIENCE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FISHING ENGINEERING INST CHINESE INST OF AQUATIC PRODUCTS SCIENCE filed Critical FISHING ENGINEERING INST CHINESE INST OF AQUATIC PRODUCTS SCIENCE
Priority to CN201810366853.5A priority Critical patent/CN108717422B/en
Publication of CN108717422A publication Critical patent/CN108717422A/en
Application granted granted Critical
Publication of CN108717422B publication Critical patent/CN108717422B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Instructional Devices (AREA)

Abstract

Disclosed herein is a kind of data processing method and devices, comprising: pre-processes to data source, and exports the data object that geographical attribute is no less than two kinds;To each data object of output, following processing is executed, to obtain its geographical coordinate: extracting geographic descriptor of the data object in each geographical attribute, and obtain the geographic descriptor being spliced by the geographic descriptor of at least two geographical attributes;Geographic descriptor to each geographical attribute and the geographic descriptor progress address resolution being spliced to form by the geographic descriptor of at least two geographical attributes respectively, determine corresponding geographical coordinate;Whether the geographical space distance verified between each geographical coordinate is legal, with the optional geographical coordinate of determination;The relative accuracy that geographical location is described according to geographic descriptor corresponding to the optional geographical coordinate chooses a geographical coordinate as current data object from the optional geographical coordinate;New attribute is configured to obtain the data object of geographical coordinate.The application can be improved the accuracy of geography information parsing.

Description

A kind of data processing method and device
Technical field
This application involves technical field of data processing more particularly to a kind of data processing method and devices.
Background technique
For containing the data object of geography information, combining geographic information context environmental carries out data analysis and digs with data Pick facilitates user and the understanding of data content, data relationship and location-based application is carried out and analysed in depth.
When data object lacks the geography information of quantitative description, generally by reference by location, remotely-sensed data or other positions Parsing means are set to be converted.However, the parsing of remotely-sensed data is conducted a research from data external performance, other positions parsing Means need other information reference point then to construct complicated tool model and be positioned, this with geographical attribute for describing The location resolution of data object is all excessively cumbersome, causes efficiency lower, and for the data set of specific subject (for example, data pair The case where two or more attributes of elephant contain geographic descriptor) for, the parsing result accuracy of geography information It is poor, it is incomplete.
Summary of the invention
The application aims to solve at least one of above-mentioned technical problem.
The application provides a kind of data processing method and device, at least can be improved the accuracy of geography information parsing.
Present applicant proposes following technical solutions.
A kind of data processing method, comprising:
Data source is pre-processed, and exports the data object that geographical attribute is no less than two kinds, the geographical attribute packet Containing geographic descriptor;
To each data object of output, following processing is executed, to obtain its geographical coordinate: extracting data object in various regions The geographic descriptor in attribute is managed, and obtains and is retouched by the geography that the geographic descriptor of at least two geographical attributes is spliced State information;Geographic descriptor to each geographical attribute and the geographic descriptor spelling by least two geographical attributes respectively It connects the geographic descriptor to be formed and carries out address resolution, determine corresponding geographical coordinate;It verifies between each geographical coordinate Whether geographical space distance is legal, with the optional geographical coordinate of determination;It is retouched according to geography corresponding to the optional geographical coordinate The relative accuracy that information describes geographical location is stated, one is chosen from the optional geographical coordinate and is used as current data object Geographical coordinate;
New attribute is configured to obtain the data object of geographical coordinate, the new attribute includes the geographical coordinate.
Wherein, the method also includes: be configured to verify the geographical space distance whether legal threshold value;It is described to test Whether the geographical space distance demonstrate,proved between each geographical coordinate is legal, with the optional geographical coordinate of determination, comprising: calculate and correspond to Geographical space distance between the geographical coordinate of different geographic descriptors;By the geographical space distance and the threshold value ratio Compared with, judge whether geographical space distance is more than the threshold value, when the geographical space distance is less than the threshold value, verifying institute Geographical space is stated apart from legal, determines that corresponding geographical coordinate is the optional geographical coordinate.
Wherein, the method also includes: initialize the threshold value be default value;Between each geographical coordinate of verifying Geographical space distance it is whether legal, comprising: by the geographical space distance compared with the default value, judge geographical space away from From whether more than the default value;When the geographical space distance is less than the default value, the geographical space distance is closed Method determines that corresponding geographical coordinate is the optional geographical coordinate;When the geographical space distance is more than the default value, It resets the threshold value and compares again, or the verifying geographical space is apart from illegal.
Wherein, the method also includes: before whether geographical space distance between each geographical coordinate of verifying legal, Extract other attributes of current data object information, other described attributes be in addition to the geographical attribute with the data object Coverage area, area, size, size of components, one of end-to-end length or multinomial relevant attribute;According to current data pair As the information of other attributes, the threshold value is set.
Wherein, whether the geographical space distance between each geographical coordinate of verifying is legal, further includes: describedly When managing space length more than the default value, the information of other attributes of current data object is extracted, other described attributes is except institutes State except geographical attribute with one of the coverage area of the data object, area, size, size of components, end-to-end length or Multinomial relevant attribute;According to the information of other attributes of current data object, the threshold value is reset;By the geographical space distance With the threshold value comparison after resetting, judge the geographical space distance whether more than the threshold value after the resetting;Described geographical empty Between distance be more than the resetting after threshold value when, reset the threshold value and re-execute the comparison, or be determined as describedly It is illegal to manage space length;When the geographical space distance is less than the threshold value after the resetting, it is determined as described geographical empty Corresponding geographical coordinate is determined as the optional geographical coordinate by spacing clutch method.
Wherein, it is arranged or resets the threshold value including at least one of following;
The threshold value is set to one of other described attributes;
The threshold value is set to the minimum value in other described attributes;
The threshold value is set to the maximum value in other described attributes;
End-to-end maximum value based on one of other described attributes or the multinomial estimation data object, by the threshold value It is set to the end-to-end maximum value.
Wherein, the new attribute for setting the geographical coordinate to the data object, including one of following:
It is the data object configuration newer field in the data source, the newer field includes the ground of the data object Manage coordinate;
The ID of data source identification and data object in the data source is obtained, the data source identification, number are at least based on Data set is formed according to the geographical coordinate of ID of the object in data source, data object, and passes through data source identification and the number It is according to ID of the object in data source that the data set is associated with the data source.
Wherein, the method also includes: creation connection attribute, the connection attribute be at least two geographical attributes group It closes, the geographic descriptor that the geographic descriptor comprising at least two geographical attribute is spliced to form;Identification of geographic location Reflect the words in geographical location in information, and is corresponding geographical attribute or connection attribute configuration power according to the quantity of the words Weight values, the weighted value characterize the relative accuracy that corresponding geographic descriptor describes geographical location;It is described according to described optional Geographical coordinate corresponding to geographic descriptor the relative accuracy in geographical location is described, selected from the optional geographical coordinate Take a geographical coordinate as current data object, comprising: from the optional geographical coordinate, choose corresponding geographical attribute Or geographical coordinate of the maximum geographical coordinate of weighted value of connection attribute as current data object.
A kind of data processing equipment, comprising:
Preprocessing module for pre-processing to data source, and exports the data object that geographical attribute is no less than two kinds, The geographical attribute includes geographic descriptor;
Geographical coordinate determining module is sat for obtaining the geographical of each data object that the preprocessing module is exported Mark, including extraction module, address resolution module, authentication module and selection module, the extraction module is for extracting data object Geographic descriptor in each geographical attribute, and obtain and be spliced by the geographic descriptor of at least two geographical attributes Geographic descriptor;The address resolution module, for the geographic descriptor respectively to each geographical attribute and by least two The geographic descriptor that the geographic descriptor of kind geographical attribute is spliced to form carries out address resolution, determines corresponding geographical seat Mark;The authentication module, it is whether legal for verifying the distance of the geographical space between each geographical coordinate, it is optional to determine Geographical coordinate;The selection module describes geographical position for the geographic descriptor according to corresponding to the optional geographical coordinate The relative accuracy set chooses a geographical coordinate as current data object from the optional geographical coordinate;
Attribute configuration module configures new attribute for the data object to obtain geographical coordinate, and the new attribute includes institute State geographical coordinate.
A kind of data processing equipment, comprising: memory and processor, the memory is for storing computer program, institute It states when computer program is executed by the processor and realizes above-mentioned data processing method.
A kind of computer-readable medium is stored with computer program, the realization when computer program is executed by processor Above-mentioned data processing method.
The application can at least obtain one of following technical effect:
On the one hand, in the embodiment of the present invention, the geographic descriptor of two kinds of geographical attributes of data object is respectively converted into The geographical coordinate in quantificational description geographical location, and the geographical space distance by verifying between corresponding geographical coordinate determines optionally Geographical coordinate finally selects geography of the highest geographical coordinate of accuracy as data object from optional geographical coordinate Coordinate, in this way, the accuracy of respective data object geography information parsing can be greatly improved.
On the other hand, in the embodiment of the present invention, the threshold value verifying that other attributes of availability data object determine is geographical to be sat Geographical space distance between mark determines the accuracy of corresponding geographical coordinate so as to the feature of combined data object itself, most The geographical coordinate that accuracy meets data object own characteristic is obtained eventually, in this way, the parsing of data object geography information can improved Ensure the integrality and applicability of the parsing of respective data object geography information while accuracy.
In another aspect, handling in the embodiment of the present invention by the raw information to data object, the data pair are obtained The geographical coordinate of elephant does not need the complicated tool model of construction, not only simplify the process of data object geography information parsing and And analyzing efficiency is improved, the personalization features that may also be combined with data object determine corresponding geographical coordinate, and flexibility is higher.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is the flow diagram of the data processing method of the embodiment of the present invention;
Fig. 2 is the exemplary detail flowchart of the flow chart of data processing in the embodiment of the present invention;
Fig. 3 is the exemplary structure schematic diagram of the data processing equipment of the embodiment of the present application.
Fig. 4 is the schematic diagram of fishing port flow chart of data processing in exemplary application scene;
Fig. 5 is the executable address resolution information of Input (' geographical location fishing port title ') in exemplary application scene Exemplary diagram;
Fig. 6 is the fragment samples figure of Result (' geographical location fishing port title ') in exemplary application scene;
Fig. 7 is the working process process and result example of geography information set A in exemplary application scene;
Fig. 8 is data and its exemplary diagram of chained address in exemplary application scene;
Fig. 9 is the fragment samples figure of the process data collection Dataset5 after verifying in exemplary application scene.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
The geographic descriptor of two kinds of geographical attributes of data object can be respectively converted into quantificational description geography position by the application The geographical coordinate set, and the geographical space distance by verifying between corresponding geographical coordinate determines optional geographical coordinate, finally The highest geographical coordinate of accuracy is selected from optional geographical coordinate and is added into corresponding data, this is not only advantageous In the efficiency for the location resolution for improving the data object with geographic descriptor, and for the data set of specific subject (ratio Such as, the case where two or more attributes of data object contain geographic descriptor) for, the parsing knot of geography information Fruit accuracy is higher, more perfect, also more meets the personalization features of data object.
It should be noted that the application can be realized by any calculating equipment for supporting corresponding function.For example, the calculating Equipment can be physical server or its cluster, computer, virtual server or its cluster, distributed system etc..The application can Suitable for any data comprising geography information, particularly, the science data comprising geography information are applicable to, for example, fishing port Data.
The specific implementation of the application is described in detail below.
Embodiment one
A kind of data processing method, as shown in Figure 1, can include:
Step 101, data source is pre-processed, and exports the data object that geographical attribute is no less than two kinds, describedly Managing attribute includes geographic descriptor;
Step 102, to each data object of output, following processing is executed, to obtain its geographical coordinate:
Step 1021, geographic descriptor of the data object in each geographical attribute is extracted, and is obtained by least two The geographic descriptor that the geographic descriptor of reason attribute is spliced;
Step 1022, the geographic descriptor to each geographical attribute and the geography by least two geographical attributes respectively The geographic descriptor that description information is spliced to form carries out address resolution, determines corresponding geographical coordinate;
Step 1023, whether the geographical space distance verified between each geographical coordinate is legal, to determine optionally Manage coordinate;
Step 1024, the geographic descriptor according to corresponding to the optional geographical coordinate describes aligning for geographical location Exactness chooses a geographical coordinate as current data object from the optional geographical coordinate;
Step 103, new attribute is configured to obtain the data object of geographical coordinate, the new attribute includes the geographical seat Mark.
In the present embodiment, with being respectively converted into quantificational description by the geographic descriptor of all kinds of geographical attributes of data object The geographical coordinate of position is managed, and the geographical space distance by verifying between corresponding geographical coordinate determines optional geographical coordinate, Geographical coordinate of the highest geographical coordinate of accuracy as data object is finally selected from optional geographical coordinate, not only The accuracy of respective data object geography information parsing is greatly improved, and does not need the complicated tool model of construction, is simplified The process of data object geography information parsing improves analyzing efficiency simultaneously, in addition, the personalization that may also be combined with data object is special Point determines corresponding geographical coordinate, and flexibility is higher.
In the present embodiment, geographic descriptor refers to any text information with geographic elements, and text information includes Reflect geographic location feature or the word or word directly related with geographic location feature.For example, " Dandong City, red center fishing port, ocean " In comprising reflection geographic location feature word " Dandong City " therefore belong to geographic descriptor.For another example, " Liaoning Province Dan Dongdong Harbor city Bodhisattva mausoleum Zhenghai fuchsin village clam lump surrounding waters " includes the word of a variety of reflection geographic location features, also belongs to geography and retouches State information.
In the present embodiment, to data source carry out pretreated mode can there are many, for example, the pretreatment may include as One or more lower: processing empty value, unit are unified, exceptional value is checked, normalized.In addition to this, the pretreated process Other means can also be used, in this regard, not limiting herein.
In a kind of implementation of the present embodiment, the treatment process of step 1021 may include: from the correspondence data source Data set in read all properties title, obtain Property Name set, traverse Property Name set screening geographical attribute, The geographical attribute filtered out is formed into geographical attribute set;The geographical attribute set is traversed, connection attribute, the connection category are created Property for two or more different geographical attribute combination, the field name of connection attribute can be correspondingly to manage attribute field name Splicing, the field of connection attribute includes the geographic descriptor that the geographic descriptor of corresponding geographical attribute is spliced;? It is that each data object adds the connection attribute in the data set of data source, to form new data set;From new data set The middle geographic descriptor for extracting each data object, which may include the geographic descriptor of each geographical attribute With the geographic descriptor of each connection attribute.In addition to this, the process for extracting geographic descriptor can also be real by other means It is existing.For example, geographical attribute can be filtered out first, then the geographic descriptor of each geographical attribute is extracted, again will before address resolution The geographic descriptor of arbitrarily two or more different geographical attributes is spliced.It is retouched for extracting data object geography The concrete mode of information is stated, is not limited herein.
It should be noted that the combination or splicing of two or more different geographical attribute refer to same data object The arbitrarily combination or splicing of two or more different geographical attributes in a variety of geographical attributes.For example, some data object With two different geographical attributes, then the combination or splicing are the combination or splicing of two different geographical attributes.Again For example, there are four types of different geographical attributes for some data object tool, then the combination or splicing may include that these four are different geographical The combination of two or splicing of attribute also may include any three kinds of combination or splicing in these four different geographical attributes, may be used also To include the combination or splicing of these four different geographical attributes.Specific combination or connecting method can be according to practical application scenes Different, address resolution accuracy requirements are different freely to be set.
In the present embodiment, in step 1022 mode of address resolution can there are many.In a kind of implementation, it can pass through Geographical coordinate pick tool is called to realize the address resolution.In another implementation, dedicated address resolution can be passed through Tool realizes the address resolution.In addition to this, which can also realize by other means, in this regard, not limiting herein System.Here, the result of address resolution can include but is not limited to: longitude and latitude value, and geographical coordinate can be the longitude With the combination of latitude value.In addition to this, which can be with other forms, such as spherical coordinates value (r, θ, φ), and three-dimensional is sat Scale value (x, y, z) etc..In this regard, not limiting herein.
In the present embodiment, can also include: be configured to verify the geographical space distance whether legal threshold value;Step Whether the geographical space distance between each geographical coordinate of verifying described in 1024 is legal, with the optional geographical coordinate of determination, It may include: the geographical space distance calculated between the geographical coordinate of corresponding different geographic descriptors;By the geographical space Distance and the threshold value comparison judge whether geographical space distance is more than the threshold value, is less than in the geographical space distance When the threshold value, the geographical space determines that corresponding geographical coordinate is the optional geographical coordinate apart from legal.
In the present embodiment, whether verifying geographical space distance is legal to be can be once, twice or repeatedly, every time threshold when verifying The value of value is different.In practical application, can according to the difference of concrete application scene, the personalization features of data object and The demand of accuracy, using different verification modes.
In a kind of implementation, whether legal mode may is that the initialization threshold value is to verifying geographical space distance Default value;By the geographical space distance compared with the default value, judge whether geographical space distance is more than the default value; When the geographical space distance is less than the default value, the geographical space determines corresponding geographical coordinate apart from legal For the optional geographical coordinate;When the geographical space distance is more than the default value, resets the threshold value and compare again Compared with, or the verifying geographical space is apart from illegal.
In another implementation, whether legal mode may is that extraction current data pair to verifying geographical space distance As the information of other attributes, other described attributes be in addition to the geographical attribute with the coverage area of the data object, face One of product, size, size of components, end-to-end length or multinomial relevant attribute;According to current data object other attributes The threshold value is arranged in information.Then, by the geographical space distance and the threshold value comparison, whether judge geographical space distance More than the threshold value, when the geographical space distance is less than the threshold value, the geographical space is verified apart from legal, determination Corresponding geographical coordinate is the optional geographical coordinate.
In addition, also above two implementation can be combined.For example, can be more than described silent in the geographical space distance When recognizing value, extract other attributes of current data object information, other described attributes be in addition to the geographical attribute with it is described One of the coverage area of data object, area, size, size of components, end-to-end length or multinomial relevant attribute;According to working as The information of preceding other attributes of data object, resets the threshold value;By the geographical space distance and the threshold value comparison after resetting, sentence Whether the geographical space distance break more than the threshold value after the resetting;After the geographical space distance is more than the resetting It when threshold value, resets the threshold value and re-executes the comparison, or be determined as the geographical space apart from illegal;Described When geographical space distance is less than the threshold value after the resetting, it is determined as the geographical space apart from legal, the corresponding geography of general Coordinate is determined as the optional geographical coordinate.
In above-mentioned each implementation, being arranged or resetting the threshold value may include one or more following: 1) by the threshold Value is set to one of other described attributes;2) minimum value being set to the threshold value in other described attributes;3) by the threshold value The maximum value being set in other described attributes;4) end based on one of other described attributes or the multinomial estimation data object To end maximum value, the threshold value is set to the end-to-end maximum value.In addition to this it is possible to other modes are used, in this regard, this Text not limits.
In the present embodiment, it can be calculated by distance calculation formula, calculating formula of similarity or customized similarity algorithm Geographical space distance between geographical coordinate.In a kind of implementation, it can be combined using following Haversine formula geographical The latitude and longitude value of coordinate calculates the geographical space distance between two geographical coordinates:
Wherein, haversin (θ)=sin2(θ/2)=(1-cos (θ))/2;
Wherein, x2,x1Respectively indicate the longitude in two coordinate informations, y2, y1Latitude in respectively two coordinate informations Angle value, R are earth radius, and d is the geographical space distance between coordinate points corresponding to two coordinate informations.
In another implementation, the latitude and longitude value in each coordinate information is first mapped as three-dimensional coordinate, utilizes the three-dimensional Geographical space distance between coordinate coordinates computed information.
In addition to above two mode, geographical space distance can also be calculated using other modes, for example, can be longitude and latitude Degree is converted to radian, then does angle calcu-lation (calculating similarity using cosine formula).For another example in the Euclidean distance of longitude and latitude When smaller, Euclidean distance calculating geographical space distance can be used can be used above-mentioned when the Euclidean distance of longitude and latitude is larger Haversine calculate geographical space distance.For the specific calculation of geographical space distance, not limit herein.
It should be noted that the geographical space distance between geographical coordinate refers to two ground of corresponding different attribute herein Manage the geographical space distance between coordinate.For example, a data object has two different geographical attribute B1, B2 and a phase The connection attribute C1 answered, then the data object will then be resolved to three geographical coordinate A1 (corresponding B1), A2 (corresponding B2), A3 (corresponding B3), calculate geographical space apart from when need to calculate the geographical space distance of these three geographical coordinates between any two, namely It is (right that the corresponding data object can calculate three geographical space distance A12 (corresponding A 1 and A2), A23 (corresponding A 2 and A3), A13 Answer A1 and A3).When verifying, then whether legal need to separately verify these three geographical spaces distance.It is closed if verifying only has A12 Method, then corresponding A1 and A2 are optional geographical coordinate.
In the present embodiment, the new attribute for setting the geographical coordinate to the data object, including one of following: It 1) is the data object configuration newer field in the data source, the newer field includes that the geographical of the data object sits Mark;2) ID of data source identification and data object in the data source is obtained, the data source identification, data pair are at least based on As the geographical coordinate formation data set of ID, data object in data source, and pass through data source identification and the data pair As the ID in data source is associated with the data source by the data set.It in addition to this, can be by other means also number New attribute is configured according to object, in this regard, not limiting herein.
In a kind of implementation of the present embodiment, connection attribute can also be created, the connection attribute is at least two The combination of attribute is managed, the geographic descriptor (tool that the geographic descriptor comprising at least two geographical attribute is spliced to form Body process is seen above, and is repeated no more);Reflect the words in geographical location in identification of geographic location information, and according to the words Quantity is corresponding geographical attribute or connection attribute configures weighted value, and the weighted value characterizes corresponding geographic descriptor description ground Manage the relative accuracy of position;The geographic descriptor according to corresponding to the optional geographical coordinate describes geographical location Relative accuracy chooses a geographical coordinate as current data object, comprising: from institute from the optional geographical coordinate It states in optional geographical coordinate, chooses the maximum geographical coordinate of weighted value of corresponding geographical attribute or connection attribute as current number According to the geographical coordinate of object.In addition to this, geographical coordinate can be also selected by other means, for example, right according to geographical coordinate institute The Property Name answered is chosen, and the geographical coordinate that Property Name contains " position " is only chosen;For another example, according to geographic descriptor The relative accuracy in geographical location is described, priority is set for respective attributes, then geographical coordinate is chosen based on the priority.For The specific implementation of geographical coordinate is chosen, is not limited herein.
It should be noted that the words in reflection geographical location described herein can be comprising a region dividing unit or place name Word or word, which includes but is not limited to: country, provinces, municipalities and autonomous regions, county, town, village, street.With fishing port data For, it is D1 " Liaoning Province Dandong Donggang City Bodhisattva respectively that there are three types of geographic descriptors for same fishing port (i.e. same data object) tool The Zhenghai Sa Miao fuchsin village, red center fishing port, clam lump surrounding waters Dandong City ocean ", " Liaoning Province, Donggang City, Dandong Bodhisattva mausoleum town D2 The red village's clam lump surrounding waters in ocean ", D3 " Dandong City, red center fishing port, ocean ", by identifying in these three geographic descriptors Reflect the words in geographical location, finds to reflect that the words quantity in geographical location is most in D1, illustrate description of the D1 to geographical location It is most accurate, it is the highest weighted value of the corresponding attribute configuration of D1.And so on, the weight of the corresponding attribute configuration intervening gears of D2 Value, the minimum weighted value of the corresponding attribute configuration of D3.In practical application, other modes also can be used and determine that geographic descriptor is retouched The relative accuracy in geographical location is stated, in this regard, not limiting herein.
Fig. 2 shows the exemplary detail flowcharts of the present embodiment data processing method, may include:
Step 201, data source pre-processes: including processing empty value, unit unification, exceptional value investigation, normalized etc., row Except the data object (i.e. data object of the geographical attribute less than two kinds) for not meeting subsequent processing requirement in data source;
Step 202, data attribute extracts: extracting the attribute information of each data object in data source, forms geographically relevant category Property group and non-geographic set of properties;
Wherein, geographically relevant set of properties includes two or more different geographical attribute and corresponding connection attribute, Non-geographic set of properties includes a kind of, two or more other attribute, which at least may include and data object Coverage area, area, size, size of components, one of end-to-end length or multinomial relevant attribute.
In the specific implementation, the exemplary process for forming geographically relevant set of properties may include: to read the institute of data source There is Property Name, and forms Property Name set;Traversal Property Name set filters out the attribute comprising geographic descriptor i.e. Geographical attribute forms the geographical attribute set of the data source;Traverse the connection attribute of geographical attribute set creation data source, the company The connection that attribute is all kinds of geographical attributes in geographical attribute set is connect, the title of the connection attribute can pass through all kinds of geographical attributes Title merges to obtain, and the information which includes can splice to obtain by the information that all kinds of geographical attributes include;Finally, The geographical attribute and connection attribute for extracting each data object, form geographically relevant set of properties.
Non-geographic set of properties may include a kind of, two or more other attributes, the type number and data of other attributes The characteristic of object itself is related.With fishing port data instance, each data object (for describing a fishing port) comprising fishing port title, 9 attributes such as geographical location, wind sheltering grade, harbour length, shore protection length, breakwater length, wherein fishing port title and geographical position Setting the two attributes includes geography information, can be used as geographical attribute, then extracts fishing port title, geographical location and form geographical phase Set of properties is closed, can be indicated are as follows: { geographical location+title }, { geographical location }, { title };Harbour length, shore protection length, anti-wave Dike length is the attribute that can describe fishing port characteristic, can be used as other attributes, then extracts the information of these three attributes and formed non- Geographical attribute group.
Step 203, attribute weight configures: describing the relatively accurate of geographical location according to geographic descriptor in geographical attribute Degree is each geographical attribute configuration weighted value;
Here, weighted value can characterize corresponding geographical attribute and describe the relatively accurate of geographical location in one group of geographical attribute The accuracy that degree, the i.e. geographical attribute describe geographical location for other geographical attributes in one group of geographical attribute, one group The weighted value of each geographical attribute is all different in geographical attribute.In practical application, which can pass through numerical value, grade point Etc. indicating.For example, the weighted value can be indicated by percentage, it is worth maximum geographical attribute most accurate, the value that describes geographical location It is minimum with respect to accuracy for other geographical attributes that the smallest geographical attribute describes geographical location.
Step 204, initial threshold is arranged: being configured to threshold of the geographical space apart from legitimacy between verifying geographical coordinate Value, and the threshold value is initialized as default value;
In practical application, in combination with the personalization features of data object, different types of threshold value is set, it is geographical as verifying The standard of coordinate accuracy.
Here, default value can be the value determined based on the average coverage area of data object, to sit to geography The legitimacy of geographical space distance carries out quick preliminary identification between mark.In practical application, the value and data of the default value The feature of object is related.With fishing port data instance, which can be is carried out by the coverage area to multiple fishing ports The average value or empirical value obtained after statistical analysis.
Step 205, respectively geographically relevant set of properties and non-geographic set of properties configure weight: in geographically relevant set of properties Each geographical attribute, connection attribute configure weight, which can describe the relatively accurate of geographical location by its geographic descriptor Degree is other attribute configuration weights each in non-geographic set of properties to determine, which can be related to threshold value by its information Property determines that correlation is higher, and the value of the weight is bigger.
Step 206, it generates geographical coordinate: extracting geography of each data object in geographically relevant set of properties in each attribute and retouch Information is stated, address resolution is carried out to the geographic descriptor, each data object is obtained and corresponds to different geographical attributes and connection attribute Latitude and longitude value;
In a kind of implementation, batch address resolution can be carried out to all data objects in same data source, generated Set of geographic coordinates comprising all data object geographical coordinates.
In a kind of implementation, the process for generating geographical coordinate be may is that for each geographical attribute and each connection category Property, it extracts the geographic descriptor of its corresponding all data object and forms corresponding geography information set, a geographical letter The geographic descriptor of all data objects of the breath set comprising corresponding same geographical attribute or connection attribute;By each geographical attribute And the geography information set of each connection attribute is converted to string data format, to execute address resolution;It is criticized by calling Amount address resolution tool carries out address resolution to each geographic descriptor in each geography information set and obtains corresponding longitude and latitude Angle value, and the geography information set is added in the latitude and longitude value.After this processing, a geography information set includes one The geographic descriptor and latitude and longitude value of kind geographical attribute or the relevant all data objects of connection attribute.In this way, for geography Each attribute forms its corresponding geography information set in association attributes group.
With fishing port data instance, geographically relevant set of properties include three attribute: fishing port title, geographical location, fishing port title+ Geographical location.After address resolution, three kinds of geography information set are obtained, are respectively as follows: corresponding ' fishing port title+geographical location ' Geography information set A, the geography information set B of correspondence ' geographical location ' and the geography information collection of corresponding ' fishing port title ' Close C.Each geography information set includes the geographic descriptor and latitude and longitude value of each data object under respective attributes.
In practical application, address resolution tool can be the geographical coordinate pick tool of third-party platform offer,
Step 207~208 calculate the geographical space distance between each geographical coordinate for each data object, formed away from From set, which includes the geographical space distance of all data objects;
In this step, the geographical space distance calculated between each geographical coordinate is to calculate same data object differently Manage the geographical space distance between the latitude and longitude value in information aggregate.
Step 209, distance set is traversed, verifies whether each geographical space length is greater than threshold value, for being greater than the ground of threshold value Reason space length goes to step 211, goes to step 210 for the geographical space distance no more than threshold value.
Step 210, for any geographical space distance by the data object of verifying, from the geographical space by verifying away from From in corresponding geographical coordinate, choosing new category of the maximum geographical coordinate of weighted value of respective attributes as respective data object Property, and go to step 215.
Step 211, all unverified geographical space distances are collected, and determine therefrom that all geographical space distances all Unverified data object;
Step 212, the information and its weight of the non-geographic set of properties of the determined data object of read step 211;
Step 213, based on preset threshold calculations rule, the letter of the non-geographic set of properties of the data object is utilized Breath calculates the threshold value for being suitable for current data object and is configured;
Here, specific calculation can refer to example hereinbefore or hereinafter, repeat no more.
Here, a kind of example of the threshold calculations rule is to delimit threshold value according to data distribution.For example, from the data One of the coverage area of object, area, size, size of components, end-to-end length or it is multinomial in select a kind of data, the data When for normal distribution, the standard deviation of the data of all objects can be selected as threshold value.
Step 214, whether the geographical space distance for verifying respective data object is greater than threshold value, if there is geographical space distance No more than threshold value, then the data object is given up if all geographical space distances are all larger than the threshold value in return step 210 All geographical coordinates;
Step 215, terminate.
In process shown in Fig. 2, geographical coordinate (for example, latitude and longitude value) can also be added to data in step 210 In object, the geographic descriptor of each data object can be made to indicate with accurate quantized values, convenient for corresponding data with most straight The map view of sight is shown.
Embodiment two
A kind of data processing equipment, as shown in figure 3, may include:
Preprocessing module 31 for pre-processing to data source, and exports the data pair that geographical attribute is no less than two kinds As the geographical attribute includes geographic descriptor;
Geographical coordinate determining module 32 is sat for obtaining the geographical of each data object that the preprocessing module is exported Mark, including extraction module 321, address resolution module 322, authentication module 323 and selection module 324, the extraction module 321 are used In geographic descriptor of the extraction data object in each geographical attribute, and obtains and described by the geography of at least two geographical attributes The geographic descriptor that information is spliced;The address resolution module 322, for being described respectively to the geography of each geographical attribute Information and the geographic descriptor progress address resolution being spliced to form by the geographic descriptor of at least two geographical attributes, Determine corresponding geographical coordinate;The authentication module 323 is for verifying the geographical space distance between each geographical coordinate It is no legal, with the optional geographical coordinate of determination;The selection module 324, for according to corresponding to the optional geographical coordinate Geographic descriptor describes the relative accuracy in geographical location, and one is chosen from the optional geographical coordinate and is used as current number According to the geographical coordinate of object;
Attribute configuration module 33, for the new attribute of data object configuration to obtain geographical coordinate, the new attribute includes The geographical coordinate.
In the present embodiment, as shown in figure 3, geographical coordinate determining module 32 can also include: threshold value configuration module 325 and meter Calculate module 326, threshold value configuration module 325 for be configured to verify the geographical space distance whether legal threshold value;It calculates Module 326 is used to calculate the geographical space distance between the geographical coordinate of corresponding different geographic descriptors;Authentication module 323 has Body is used for the geographical space distance and the threshold value comparison judging whether geographical space distance is more than the threshold value, in institute When stating geographical space distance and being less than the threshold value, the geographical space is verified apart from legal, determines that corresponding geographical coordinate is The optional geographical coordinate.
In a kind of implementation, threshold value configuration module 325 can also be used to initialize the threshold value to be default value;Authentication module 323 are specifically used for the geographical space distance compared with the default value, judge whether geographical space distance is more than described silent Recognize value;When the geographical space distance is less than the default value, the geographical space is apart from legal, determining corresponding geography Coordinate is the optional geographical coordinate;When the geographical space distance is more than the default value, resets the threshold value and lay equal stress on It is new to compare, or the verifying geographical space is apart from illegal.
In a kind of implementation, extraction module 321 can also be used in the information for extracting other attributes of current data object, described Other attributes are to arrive in addition to the geographical attribute with the coverage area of the data object, area, size, size of components, end Hold one of length or multinomial relevant attribute;Threshold value configuration module 325 can also be used according to other attributes of current data object Information, the threshold value is set.
In a kind of implementation, authentication module 323 can also be used in when the geographical space distance is more than the default value Resetting instruction is sent to the threshold value configuration module 325, to reset the threshold value, the threshold value configuration module 325 is also used to After the resetting instruction for receiving the authentication module 323, according to the information of other attributes of current data object, the threshold is reset Value, and return to resetting and complete message to the authentication module 323.Authentication module 323 can also be used to receive from threshold value configuration After message is completed in the resetting of module 325, the geographical space distance and the threshold value comparison after resetting judge described geographical empty Between distance whether be more than threshold value after the resetting;When the geographical space distance is more than the threshold value after the resetting, continue Resetting is sent to be indicated to the threshold value configuration module 325 or be directly determined as the geographical space apart from illegal;Described When geographical space distance is less than the threshold value after the resetting, it is determined as the geographical space apart from legal, the corresponding geography of general Coordinate is determined as the optional geographical coordinate.
In practical application, it may include as follows one or more that threshold value configuration module 325, which is used to be arranged or reset the threshold value, : 1) threshold value is set to described other attributes one of;2) minimum value being set to the threshold value in other described attributes; 3) maximum value being set to the threshold value in other described attributes;4) based on described in one of other described attributes or multinomial estimation The threshold value is set to the end-to-end maximum value by the end-to-end maximum value of data object.
In the present embodiment, attribute configuration module 33, for setting the geographical coordinate to the new category of the data object Property, it may include one of following: 1) being the data object configuration newer field in the data source, the newer field includes institute State the geographical coordinate of data object;2) ID of data source identification and data object in the data source is obtained, is at least based on ID in data source of the data source identification, data object, data object geographical coordinate form data set, and pass through data source The ID of mark and the data object in data source is associated with the data source by the data set.
In the present embodiment, geographical coordinate determining module 32 can also include: creation module 327, for creating connection attribute, The connection attribute is the combination of at least two geographical attributes, and the geographic descriptor comprising at least two geographical attribute is spelled Connect the geographic descriptor to be formed;Weight configuration module 328 reflects the word in geographical location for identification in geographical location information Word, and be that corresponding geographical attribute or connection attribute configure weighted value according to the quantity of the words, the weighted value characterizes phase Geographic descriptor is answered to describe the relative accuracy in geographical location;Module 324 is chosen to be specifically used for from the optional geographical seat In mark, the maximum geographical coordinate of weighted value for choosing corresponding geographical attribute or connection attribute is sat as the geography of current data object Mark.
The other technologies details of the present embodiment can refer to embodiment one.
The data processing equipment of the present embodiment can be able to carry out data processing method described in embodiment one by any Equipment is calculated to realize.In practical application, which can be server, computer, distributed system etc..
Embodiment three
A kind of data processing equipment, comprising: memory and processor, the memory is for storing computer program, institute It states and realizes data processing method described in embodiment one when computer program is executed by the processor.
The other technologies details of the present embodiment can refer to embodiment one.
It should be noted that the data processing equipment can be able to carry out data processing side described in embodiment one by any The calculating equipment of method is realized.In practical application, which can be server, computer, distributed system etc..
Example IV
A kind of computer-readable medium is stored with computer program, the realization when computer program is executed by processor Data processing method described in embodiment one.
The other technologies details of the present embodiment can refer to embodiment one.
Application Scenarios-Example
Below with fishing port data instance, the exemplary realization process of the application the various embodiments described above is described in detail.
Data source: the data source from fishing port database, the data source include the data in multiple fishing ports, and every data is (i.e. Each data object) fishing port is corresponded to, each data object can have 9 attributes, be respectively as follows: ' fishing port title ', ' geography Position ', ' wind sheltering grade ', ' harbour length ', ' shore protection length ', ' breakwater length ', ' data offer unit ', ' update day Phase ', ' renewal time '.
As shown in figure 4, the process of above-mentioned fishing port data processing may include:
The first step, initial data pretreatment;
Firstly, obtaining original data set Dataset0 from fishing port database;
Secondly, carrying out attributive analysis to original data set Dataset0, obtain containing new attribute (connection category i.e. described above Property) process data collection Dataset1, to carry out attribute extraction;
Specifically, reading all properties title from original data set, Property Name set is obtained, the Property Name set packet 9 Property Names such as fishing port title, geographical location, wind sheltering grade are included, traversal Property Name set screening is believed containing geographical location The attribute filtered out is added and gathers, gathered: { fishing port title, ground by the attribute (namely geographical attribute described above) of breath Manage position }, it creates new attribute (connection attribute i.e. described above): geographical location+fishing port title, which is ' Manage position fishing port title ', the field contents of the new attribute are that " field contents in geographical location splice in the field of fishing port title Hold ", which is added in original data set Dataset0, process data collection Dataset1 is obtained, process data collection includes 10 attributes such as geographical location fishing port title (new attribute), fishing port title, geographical location, wind sheltering boarding.
Here, the process of new attribute is created i.e. are as follows: always increase newer field in every data of original data set Dataset0, it should Entitled " the geographical location fishing port title " of newer field, the content of the field are ' geographical location ', ' fishing port title ' the two words The splicing of the content of section.
Finally, carrying out data pick-up after forming process data set Dataset1, generating analysable attribute information;
Field name is read for the data of ' geographical location fishing port title ', at formatting from process data collection Dataset1 Reason generates the information of executable address resolution: geographic position name: Input (' geographical location fishing port title ');As shown in figure 5, For the example of the information.
The data that field name is ' geographical location ' are read from process data collection Dataset1, generating after formatting processing can Execute the information of address resolution: geographic position name: input (' geographical location ');
It is the data of ' fishing port title ' from process data collection Dataset1 reading attributes, geographical position is generated after formatting processing Set title: input (' fishing port title ');
Here, formatting treated data is string data format: var addr=[" data 1 ", " data 2 " ... ..., " data n "], wherein data n is the geographic descriptor recorded under respective field.
Second step, geodata acquisition:
The information for three kinds of executable address resolution that the first step generates is inputted into batch address analytical tool respectively and carries out ground Location parsing, obtains corresponding geographic coordinate information, which includes longitude and latitude value.
Specifically, Input (' geographical location fishing port title ') is inputted batch address analytical tool, output data Result (' geographical location fishing port title '), is denoted as geography information set A, is illustrated in figure 6 Result (' geographical location fishing port title ') The example of segment.
Specifically, Input (' geographical location ') is inputted batch address analytical tool, (' the geographical position output data Result Set '), it is denoted as geography information set B.
Specifically, Input (' fishing port title ') is inputted batch address analytical tool, output data Result (' fishing port name Claim '), it is denoted as geography information set C.
In practical application, batch address analytical tool can be the geographical coordinate pick tool of third-party platform offer, sit Mark crossover tool etc..
Wherein, in the output result of address resolution tool, that is, above-mentioned each geographical information aggregate, every record may include sequence Number, geographic descriptor, longitude, latitude value, every data object recorded in corresponding original data set Dataset0 is (i.e. One data).
Third step, data mart modeling processing;
In this step, by the output result of address resolution tool, that is, geography information set A, geography information set B, geographical letter Breath set C is respectively non-structured text storage format, this step formats these output results, to execute Subsequent processing.
In a kind of implementation, Excel can be called directly and divide column tool respectively to geographical information aggregate A, geography information Set B, geography information set C are processed, and form the process data collection of corresponding ' geographical location fishing port title ' Dataset2, the process data collection Dataset3 of correspondence ' geographical location ', correspondence ' fishing port title ' process data collection Dataset4, it includes four attributes: serial number, geographic position name, longitude, latitude that each process data, which is concentrated,.As shown in fig. 7, For the working process process and result example of geography information set A.
In another implementation, it can use character string cutted function mid and character position search function find one by one The record in each geographical information aggregate is intercepted, corresponding process data collection is formed.
4th step, data correlation;
Increase relating attribute in process data collection Dataset2, which can encode for original data set (dbcode) and corresponding data object ID it, can use Excel vlookup function and calculate dbcode and data object ID Content, generating process data set Dataset5 (' geographical location fishing port title '), in the main body of process data collection Dataset5 Holding includes 6 attributes: serial number, geographical location information, longitude, latitude, dbcode and ID.
Wherein, original data set coding (dbcode) is the unique identification of original data set Dataset0, and data object ID is indicated The mark of every data in original data set Dataset0, the mark can be flat by the publication where parsing original data set Dataset0 The chained address of platform.As shown in figure 8, " 2 " in chained address are the data for a data and its example of chained address Object ID.
By original data set coding and data object ID by process data collection Dataset5 and original data set Dataset0 Association, that is, increasing coordinate attributes in original data set Dataset0 for corresponding data, which may include two A field: title distinguishes longitude, latitude, and content is respectively longitude, latitude value.
5th step, data check.
Finally, by process data collection Dataset3 and process data collection Dataset4 in process data collection Dataset5 Geographical coordinate verified.
Specifically, the process of verification may include:
Setting threshold value Y is default value, for example, 1 kilometer;
Same data object is corresponded in calculating process data set Dataset5 and process data collection Dataset3 (i.e. samely Manage description information) geographical coordinate between geographical space distance;
Whether the geographical space of deterministic process data set Dataset5 and process data collection Dataset3 distance are greater than Y;
The case where Y is not more than for geographical space distance, in retention process data set Dataset5 with the geographical space away from From corresponding data;
The case where Y is greater than for geographical space distance, calculating process data set Dataset5 and process data collection Geographical space distance in Dataset4 between the geographical coordinate of the data object;
Whether the geographical space of deterministic process data set Dataset5 and process data collection Dataset4 distance are greater than Y;It is right In geographical space distance is not more than Y the case where, with the geographical space apart from corresponding in retention process data set Dataset5 Data;Corresponding data in process data collection Dataset5 is removed and is added by the case where being greater than Y for geographical space distance Enter to process data collection Dataset7, and judge whether corresponding data has carried out data supplement verification (i.e. secondary verification), is then Terminate, otherwise executes secondary verification.
Secondary checking procedure: other attributes (remove in reading process data set Dataset0 or process data collection Dataset1 Other attributes except attribute comprising geography information) content (for example, harbour length, shore protection length, breakwater length), base Threshold value Y is adjusted in the content of other attributes;The geography of calculating process data set Dataset7 and process data collection Dataset3 is empty Between distance, whether the geographical space distance of deterministic process data set Dataset7 and process data collection Dataset3 be greater than Y;For Geographical space distance be not more than Y the case where, by process data collection Dataset7 with the geographical space apart from corresponding data It removes and is added to process data collection Dataset5;
The case where Y is greater than for geographical space distance, calculating process data set Dataset7 and process data collection Geographical space distance in Dataset4 between the geographical coordinate of the data object, deterministic process data set Dataset7 and process Whether the geographical space distance of data set Dataset4 is greater than Y;The case where Y is not more than for geographical space distance, by number of passes According to being removed with the geographical space apart from corresponding data in collection Dataset7 and be added to process data collection Dataset5;For The case where geographical space distance is not more than Y, the corresponding data in process data collection Dataset7 is given up, terminates.
In practical application, the mode of content based on other attributes adjustment threshold value Y can there are many.
For example, threshold value Y can be adjusted to any one of harbour length, shore protection length, breakwater length.
For another example can by " threshold value=Min[harbour length, shore protection length, breakwater length] reset threshold value Y Value;
For another example, threshold value Y can be reset by " threshold value=Max[harbour length, shore protection length, breakwater length] " Value.
Also for example, three weighted values for respectively corresponding harbour length, shore protection length, breakwater length can be preset, Threshold value Y is calculated using customized functional relation based on harbour length, shore protection length, breakwater length and its corresponding weighted value Value.The customized functional relation can be using all kinds of operation modes, such as exponent arithmetic, radical operation etc..
Separately for example, can estimate the area in fishing port by harbour length, shore protection length, breakwater length, then it is based on fishing port Areal calculation threshold value Y.
In addition to aforesaid way, it can also use other modes, for the specific calculation of threshold value Y, not limit herein System.In practical application, area, size, length component, the coverage area etc. of threshold value Y and object (for example, fishing port) described by data Correlation can pass through estimation such data when secondary verification and calculate threshold value.
It is verified twice it should be noted that being used only in above-mentioned example, it is unlimited that number is verified in practical application.For not Same data source, described object is different, and the corresponding number that verifies can also be different.It, can be according to data in concrete application Properties of Objects described in source and the setting verification number of the requirement to geographical coordinate accuracy.Threshold when being verified except first time Value can be executed when verification by way of reset threshold every time using except default value.For example, the verification of above-mentioned example Journey may also is that and verify for the first time, utilize the default value of initialization;Second of verification, by " threshold value=Min[harbour length, Shore protection length, breakwater length] " reset threshold, and verified again using the threshold value after resetting and verify unsanctioned number for the first time According to;Third time verifies, by " threshold value=Max[harbour length, shore protection length, breakwater length] " reset threshold, and utilize weight The threshold value postponed is verified again verifies unsanctioned data for the second time.
As shown in figure 9, for the fragment samples of the process data collection Dataset5 after verification.Wherein, every data It may include 6 attributes: id, name, x, y, dbcode, preid, wherein id is the mark of the data, and name is to include ground The text field of location expression information and fishing port title is managed, text field is ' geographical location ' in original data set Dataset0 Splice with the field contents of ' fishing port title ', x indicates the longitude of corresponding geographical coordinate, and y indicates the latitude of corresponding geographical coordinate Value, dbcode are original data set coding, and preid is ID of the respective data object in original data set.
It should be noted that can also be after data check the step of data correlation in above-mentioned process flow, data are closed Connection and two steps of data mart modeling are optional step, can be according to practical application scene, used address resolution mode etc. Different additions and deletions.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (10)

1. a kind of data processing method characterized by comprising
Data source is pre-processed, and exports the data object that geographical attribute is no less than two kinds, the geographical attribute includes ground Manage description information;
To each data object of output, following processing is executed, to obtain its geographical coordinate: extracting data object and belong in each geography Property in geographic descriptor, and obtain the geographical description letter being spliced by the geographic descriptor of at least two geographical attributes Breath;Geographic descriptor to each geographical attribute and the geographic descriptor splicing shape by least two geographical attributes respectively At geographic descriptor carry out address resolution, determine corresponding geographical coordinate;Verify the geography between each geographical coordinate Whether space length is legal, with the optional geographical coordinate of determination;It describes to believe according to geography corresponding to the optional geographical coordinate The relative accuracy in breath description geographical location, chooses the ground as current data object from the optional geographical coordinate Manage coordinate;
New attribute is configured to obtain the data object of geographical coordinate, the new attribute includes the geographical coordinate.
2. data processing method according to claim 1, which is characterized in that
The method also includes: be configured to verify the geographical space distance whether legal threshold value;
Whether the geographical space distance between each geographical coordinate of verifying is legal, with the optional geographical coordinate of determination, packet It includes: calculating the geographical space distance between the geographical coordinate of corresponding different geographic descriptors;By the geographical space distance with The threshold value comparison judges whether geographical space distance is more than the threshold value, is less than the threshold in the geographical space distance When value, the geographical space is verified apart from legal, determines that corresponding geographical coordinate is the optional geographical coordinate.
3. data processing method according to claim 2, which is characterized in that
The method also includes: initializing the threshold value is default value;
Whether the geographical space distance between each geographical coordinate of verifying is legal, comprising: by the geographical space distance Compared with the default value, judge whether geographical space distance is more than the default value;It is less than in the geographical space distance When the default value, the geographical space determines that corresponding geographical coordinate is the optional geographical coordinate apart from legal;Institute When stating geographical space distance and being more than the default value, reset the threshold value and compare again, or the verifying geographical space away from From illegal;
The method also includes: before whether geographical space distance between each geographical coordinate of verifying is legal, extract current The information of other attributes of data object, other described attributes be in addition to the geographical attribute with the covering model of the data object It encloses, area, size, size of components, one of end-to-end length or multinomial relevant attribute;According to other categories of current data object The information of property, is arranged the threshold value.
4. data processing method according to claim 2, which is characterized in that
The method also includes: initializing the threshold value is default value;
Whether the geographical space distance between each geographical coordinate of verifying is legal, further includes:
When geographical space distance is more than the default value, extract the information of other attributes of current data object, it is described its His attribute be in addition to the geographical attribute with the coverage area of the data object, area, size, size of components, end-to-end One of length or multinomial relevant attribute;According to the information of other attributes of current data object, the threshold value is reset;
By the geographical space distance and the threshold value comparison after resetting, judge whether the geographical space distance is more than the resetting Threshold value afterwards;
When the geographical space distance is more than the threshold value after the resetting, resets the threshold value and re-executes the comparison, Or it is determined as the geographical space apart from illegal;
When the geographical space distance is less than the threshold value after the resetting, it is determined as the geographical space apart from legal, general Corresponding geographical coordinate is determined as the optional geographical coordinate.
5. data processing method according to claim 3 or 4, it is characterised in that:
It is arranged or resets the threshold value including at least one of following;
The threshold value is set to one of other described attributes;
The threshold value is set to the minimum value in other described attributes;
The threshold value is set to the maximum value in other described attributes;
End-to-end maximum value based on one of other described attributes or the multinomial estimation data object, the threshold value is set to The end-to-end maximum value.
6. data processing method according to claim 1, which is characterized in that set the data for the geographical coordinate The new attribute of object, including one of following:
It is the data object configuration newer field in the data source, the newer field includes that the geographical of the data object sits Mark;
The ID of data source identification and data object in the data source is obtained, the data source identification, data pair are at least based on As the geographical coordinate formation data set of ID, data object in data source, and pass through data source identification and the data pair As the ID in data source is associated with the data source by the data set.
7. data processing method according to claim 1, which is characterized in that
The method also includes: creation connection attribute, the connection attribute is the combination of at least two geographical attributes, comprising described The geographic descriptor that the geographic descriptor of at least two geographical attributes is spliced to form;Reflect ground in identification of geographic location information The words of position is managed, and is that corresponding geographical attribute or connection attribute configure weighted value, the power according to the quantity of the words Weight values characterize the relative accuracy that corresponding geographic descriptor describes geographical location;
The geographic descriptor according to corresponding to the optional geographical coordinate describes the relative accuracy in geographical location, from institute It states and chooses a geographical coordinate as current data object in optional geographical coordinate, comprising: from the optional geographical seat In mark, the maximum geographical coordinate of weighted value for choosing corresponding geographical attribute or connection attribute is sat as the geography of current data object Mark.
8. a kind of data processing equipment characterized by comprising
Preprocessing module for pre-processing to data source, and exports the data object that geographical attribute is no less than two kinds, described Geographical attribute includes geographic descriptor;
Geographical coordinate determining module is wrapped for obtaining the geographical coordinate for each data object that the preprocessing module is exported It includes extraction module, address resolution module, authentication module and chooses module, the extraction module is for extracting data object in various regions The geographic descriptor in attribute is managed, and obtains and is retouched by the geography that the geographic descriptor of at least two geographical attributes is spliced State information;The address resolution module, for the geographic descriptor respectively to each geographical attribute and by least two geography The geographic descriptor that the geographic descriptor of attribute is spliced to form carries out address resolution, determines corresponding geographical coordinate;It is described Authentication module, it is whether legal for verifying the distance of the geographical space between each geographical coordinate, to determine optional geographical seat Mark;The selection module describes the phase in geographical location for the geographic descriptor according to corresponding to the optional geographical coordinate To accuracy, a geographical coordinate as current data object is chosen from the optional geographical coordinate;
Attribute configuration module configures new attribute for the data object to obtain geographical coordinate, and the new attribute includes describedly Manage coordinate.
9. a kind of data processing equipment characterized by comprising memory and processor, the memory are calculated for storing Machine program, the computer program realize data processing as described in any one of claim 1 to 7 when being executed by the processor Method.
10. a kind of computer-readable medium, which is characterized in that be stored with computer program, the computer program is by processor Data processing method as described in any one of claim 1 to 7 is realized when execution.
CN201810366853.5A 2018-04-23 2018-04-23 A kind of data processing method and device Expired - Fee Related CN108717422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810366853.5A CN108717422B (en) 2018-04-23 2018-04-23 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810366853.5A CN108717422B (en) 2018-04-23 2018-04-23 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN108717422A CN108717422A (en) 2018-10-30
CN108717422B true CN108717422B (en) 2019-03-08

Family

ID=63899348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810366853.5A Expired - Fee Related CN108717422B (en) 2018-04-23 2018-04-23 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN108717422B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989365B (en) * 2019-12-16 2022-09-06 ***通信集团天津有限公司 Data processing method, device, equipment and storage medium
CN113220812B (en) * 2021-04-30 2022-03-29 广东省城乡规划设计研究院有限责任公司 Data spatialization method and device based on multi-source map platform cross validation
CN117408338B (en) * 2023-12-14 2024-03-12 神州医疗科技股份有限公司 Method and system for constructing knowledge graph of traditional Chinese medicine decoction pieces based on Chinese pharmacopoeia

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447797A (en) * 2010-10-13 2012-05-09 上海众恒信息产业股份有限公司 Correlation method of GIS (Geographic Information System) and communication system and system
CN105426417A (en) * 2015-11-02 2016-03-23 四川效率源信息安全技术股份有限公司 Method for quickly looking up geographic position information in smartphone

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131596B2 (en) * 2009-04-15 2012-03-06 Mcquilken George C Method and system of payment for parking using a smart device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447797A (en) * 2010-10-13 2012-05-09 上海众恒信息产业股份有限公司 Correlation method of GIS (Geographic Information System) and communication system and system
CN105426417A (en) * 2015-11-02 2016-03-23 四川效率源信息安全技术股份有限公司 Method for quickly looking up geographic position information in smartphone

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种***地理信息数据抽取方法;魏勇等;《信息工程大学学报》;20170415;第186-189页

Also Published As

Publication number Publication date
CN108717422A (en) 2018-10-30

Similar Documents

Publication Publication Date Title
Tang et al. Improving image classification with location context
CN108717422B (en) A kind of data processing method and device
CN103745498B (en) A kind of method for rapidly positioning based on image
CN103065361B (en) Three-dimensional island sand table implementation method
CN110020144B (en) Recommendation model building method and equipment, storage medium and server thereof
CN103400405A (en) Multi-beam bathymetric chart construction method based on seabed digital depth model feature extraction
JP6959416B2 (en) Submarine cable route planning tool
CN106845559A (en) Take the ground mulching verification method and system of POI data special heterogeneity into account
CN109063193A (en) A kind of thematic maps recommends the method and device of display
CN114241464A (en) Cross-view image real-time matching geographic positioning method and system based on deep learning
CN109121133B (en) Location privacy protection method and device
ESTOQUE et al. < Original Papers> Validating ALOS PRISM DSM-derived surface feature height: Implications for urban volume estimation
CN112700464A (en) Map information processing method and device, electronic equipment and storage medium
Nagel et al. National Stream Internet protocol and user guide
CN116384844A (en) Decision method and device based on geographic information cloud platform
CN111091235A (en) Method and device for determining incoming and outgoing line paths of substation area of transformer substation
CN113074735B (en) Processing method of map data structure
CN114463499A (en) Building three-dimensional modeling method and device
Goodchild et al. Data quality in massive data sets
CN115204273A (en) Method and device for classifying customers based on business district big data and electronic equipment
CN109241207A (en) A kind of method and device showing data on map
CN105824871A (en) Picture detecting method and equipment
David et al. Smart geocoding of objects
CN112642158B (en) Game resource map auditing method and device, storage medium and computer equipment
CN113220812B (en) Data spatialization method and device based on multi-source map platform cross validation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190308

Termination date: 20200423

CF01 Termination of patent right due to non-payment of annual fee