CN110618982B - Multi-source heterogeneous data processing method, device, medium and electronic equipment - Google Patents

Multi-source heterogeneous data processing method, device, medium and electronic equipment Download PDF

Info

Publication number
CN110618982B
CN110618982B CN201811603354.XA CN201811603354A CN110618982B CN 110618982 B CN110618982 B CN 110618982B CN 201811603354 A CN201811603354 A CN 201811603354A CN 110618982 B CN110618982 B CN 110618982B
Authority
CN
China
Prior art keywords
house source
data
original
source data
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811603354.XA
Other languages
Chinese (zh)
Other versions
CN110618982A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shiguang Renran Technology Co ltd
Original Assignee
Beijing Shiguang Renran Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shiguang Renran Technology Co ltd filed Critical Beijing Shiguang Renran Technology Co ltd
Priority to CN201811603354.XA priority Critical patent/CN110618982B/en
Publication of CN110618982A publication Critical patent/CN110618982A/en
Application granted granted Critical
Publication of CN110618982B publication Critical patent/CN110618982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure discloses a method, a device, a medium and an electronic device for processing multi-source heterogeneous data. The method comprises the following steps: determining original house source characteristics in original house source data; performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result; performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result; and mapping the original house source characteristics in the second real house source data into standard house source characteristics, obtaining first standard house source data based on the standard house source characteristics, performing duplication removal, verification and standardization processing on house source data of different sources, and displaying the processed house source data to a user in a uniform format.

Description

Multi-source heterogeneous data processing method, device, medium and electronic equipment
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a method, a device, a medium and an electronic device for processing multi-source heterogeneous data.
Background
With the development of internet technology, information provided by a network to a user is more and more abundant. For example, the user can browse the house property information through the network platform, so that the house renting or buying requirements are met.
Taking the second-hand house transaction field as an example, a plurality of property agencies providing second-hand house source information exist, and each property agency may have its own house source system for displaying the house source information. Because the house source system of each property broker may use different data storage formats for data storage, how to combine the house source data of different data sources and provide the house source data of second-hand houses with a uniform format for users is a technical problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, a medium and an electronic device for processing multi-source heterogeneous data, which can show room source data of different sources and different formats to a user in a uniform format.
In a first aspect, an embodiment of the present disclosure provides a method for processing multi-source heterogeneous data, including:
determining original house source characteristics in original house source data;
performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result;
performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result, wherein the data cleaning processing is a process of determining and deleting abnormal house source data, at least one house source characteristic of the abnormal house source data does not accord with a preset data cleaning rule, and the house source characteristic is the original house source characteristic or a standard house source characteristic;
and mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
In the foregoing scheme, optionally, performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result includes:
determining the characteristic vectors of the original house source data, and performing grouping processing on the original house source data according to the distance between the characteristic vectors to obtain a house source set;
and determining first real house source data in the original house source data contained in each house source set according to a preset data aggregation rule.
Further, determining the feature vectors of the original room source data, and performing grouping processing on the original room source data according to the distance between the feature vectors to obtain a room source set, including:
mapping the original house source characteristics to a set digital interval to obtain characteristic vectors of the original house source characteristics;
calculating distances between the feature vectors;
if the distance does not exceed a set first threshold value, determining that the original house source data corresponding to the characteristic vector participating in distance calculation belong to the same house source set;
and if the distance exceeds a set first threshold, determining that the original house source data corresponding to the feature vector participating in the distance calculation belong to different house source sets.
In the foregoing solution, optionally, the data cleaning processing is performed on the first real room source data based on the original room source characteristic, and determining, according to a cleaning result, second real room source data in the first real room source data includes:
matching the original house source characteristics of each first real house source data with a preset data cleaning rule;
and determining second real house source data of which the original house source characteristics accord with the preset data cleaning rule according to the matching result.
In the foregoing scheme, optionally, the processing method further includes:
after first standard house source data are obtained based on the standard house source characteristics, data aggregation processing is carried out on the first standard house source data based on the standard house source characteristics, and second standard house source data in the first standard house source data are determined according to an aggregation result.
In the foregoing scheme, optionally, the processing method further includes:
after second standard house source data in the first standard house source data are determined according to the aggregation result, data cleaning processing is carried out on the second standard house source data based on the standard house source characteristics, third standard house source data are determined according to the cleaning result, and the third standard house source data are displayed.
Further, the processing method further comprises the following steps:
after the third standard house source data are displayed, acquiring user feedback information aiming at the third standard house source data, and determining false house source data in the third standard house source data according to the user feedback information;
and deleting the false room source data in the third standard room source data to obtain fourth standard room source data, and displaying the fourth standard room source data.
In the foregoing solution, optionally, the processing method further includes:
after first standard house source data are obtained based on the standard house source characteristics, acquiring an original data identifier of the original house source data, wherein the original data identifier is identifier information of the original house source data in an original data source;
and acquiring a standard data identifier of the first standard house source data, and storing the standard data identifier and the original data identifier in an associated manner.
In a second aspect, an embodiment of the present disclosure further provides an apparatus for processing multi-source heterogeneous data, where the apparatus includes:
the characteristic determining module is used for determining original house source characteristics in the original house source data;
the data aggregation module is used for carrying out data aggregation processing on the original house source data based on the original house source characteristics and determining first real house source data in the original house source data according to an aggregation result;
the data cleaning module is used for performing data cleaning processing on the first real house source data based on the original house source characteristics and determining second real house source data in the first real house source data according to a cleaning result, wherein the data cleaning processing is a process of determining and deleting abnormal house source data, at least one house source characteristic of the abnormal house source data does not accord with a preset data cleaning rule, and the house source characteristic is the original house source characteristic or a standard house source characteristic;
and the characteristic mapping module is used for mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
In the foregoing scheme, optionally, the data aggregation module includes:
the room source grouping submodule is used for determining the characteristic vectors of the original room source data and carrying out grouping processing on the original room source data according to the distance between the characteristic vectors to obtain a room source set;
and the house resource aggregation submodule is used for determining first real house resource data in the original house resource data contained in each house resource set according to a preset data aggregation rule.
Further, the room source grouping sub-module is specifically configured to:
mapping the original house source characteristics to a set digital interval to obtain characteristic vectors of the original house source characteristics;
calculating distances between the feature vectors;
if the distance does not exceed a set first threshold value, determining that the original house source data corresponding to the characteristic vector participating in distance calculation belong to the same house source set;
and if the distance exceeds a set first threshold, determining that the original house source data corresponding to the feature vector participating in the distance calculation belong to different house source sets.
In the foregoing scheme, optionally, the data cleaning module is specifically configured to:
matching the original house source characteristics of each first real house source data with a preset data cleaning rule;
and determining second real house source data of which the original house source characteristics accord with the preset data cleaning rule according to the matching result.
In the foregoing solution, optionally, the processing apparatus further includes:
after first standard house source data are obtained based on the standard house source characteristics, data aggregation processing is carried out on the first standard house source data based on the standard house source characteristics, and second standard house source data in the first standard house source data are determined according to an aggregation result.
In the foregoing solution, optionally, the processing apparatus further includes:
after second standard house source data in the first standard house source data are determined according to the aggregation result, data cleaning processing is carried out on the second standard house source data based on the standard house source characteristics, third standard house source data are determined according to the cleaning result, and the third standard house source data are displayed.
Further, the processing apparatus further includes:
after the third standard room source data are displayed, user feedback information aiming at the third standard room source data is obtained, and false room source data in the third standard room source data are determined according to the user feedback information;
and deleting the false house source data in the third standard house source data to obtain fourth standard house source data, and displaying the fourth standard house source data.
In the above solution, optionally, the processing apparatus further includes:
after first standard house source data are obtained based on the standard house source characteristics, acquiring an original data identifier of the original house source data, wherein the original data identifier is identifier information of the original house source data in an original data source;
and acquiring a standard data identifier of the first standard house source data, and storing the standard data identifier and the original data identifier in an associated manner.
In a third aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the processing method of multi-source heterogeneous data according to the embodiment of the present disclosure.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable by the processor, where the processor implements a processing method of multi-source heterogeneous data according to an embodiment of the present disclosure when executing the computer program.
The embodiment of the disclosure provides a processing scheme of multi-source heterogeneous data, which includes the steps of extracting original house source characteristics of original house source data of different formats acquired by a plurality of data sources, carrying out data aggregation processing on the original house source data based on the original house source characteristics, and obtaining first real house source data based on an aggregation result; performing data cleaning processing on the first real house source data based on the original house source characteristics, and obtaining second real house source data based on a cleaning result; original house source characteristics in the second real house source data are mapped into standard house source characteristics, and first standard house source data are obtained based on the standard house source characteristics, so that the house source data from different sources are subjected to duplication removal, verification and standardization processing, the house source data after processing are displayed to a user in a unified format, the problem that a large amount of repeated house source data and false house source data exist due to the fact that multi-source heterogeneous data are directly integrated is avoided, the accuracy of the house source data displayed to the user can be improved, and the user viscosity of the network platform is improved.
Drawings
Fig. 1 is a flowchart of a processing method of multi-source heterogeneous data according to an embodiment of the present disclosure;
fig. 2 is a flowchart of another processing method of multi-source heterogeneous data according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a processing apparatus for multi-source heterogeneous data according to an embodiment of the present disclosure;
fig. 4 is a schematic view of a processing flow of multi-source heterogeneous data according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only some of the structures relevant to the present disclosure are shown in the drawings, not all of them.
Fig. 1 is a flowchart of a processing method of multi-source heterogeneous data according to an embodiment of the present disclosure, where the embodiment is applicable to a case where room source data of different sources and different formats are integrated and displayed. As shown in fig. 1, the method may include the steps of:
and step 110, determining original house source characteristics in the original house source data.
It should be noted that the original house source data is house source data of different data formats derived from different original data sources. The original data source can be internet (house source data shown on the internet) and house agency, etc. There may be a problem of data duplication since the raw data is acquired by different raw data sources. For example, the original house source data obtained by the original data source M is the second hand house of the building 5 in the first phase zone z of Happy New year zone B of A city. The original house source data obtained by the house source N is the second-hand house of the 5 th building in the Happy community z of the district B of the city A. The original house source data obtained by the two original data sources are actually house source data of the same house source.
The house source characteristics are data describing house sources, including but not limited to, city district, business district, district to which the house sources belong, house type, area, floor, orientation, price and other fields. A piece of house source data can be represented by the house source feature described above. The original house source characteristics are data items used for describing house source characteristics in the original house source data.
For example, the original house source data may be obtained from different data sources and stored in a local database. The method comprises the steps of sequentially obtaining original house source data from a local database, and extracting data items corresponding to set fields from the obtained original house source data, wherein the set fields are fields specified in a set field table. And marking the data item as the original house source characteristic corresponding to the set field. For example, assuming that the original source data is a cell z unit of happy district of city a and city B, a second-hand building of 5 th floor, and the setting field includes city, city district, cell to which the source belongs, floor, house type, etc., the original source features including "city a", "B district", "happy district", and "5 th floor" can be extracted based on the setting field.
And 120, performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result.
It should be noted that the data aggregation process is a process of aggregating house source data having the same or similar house source characteristics. The first real house source data in the original house source data can be determined based on preset data aggregation rules. The preset data aggregation rule can be set according to actual conditions and is preset in a setting rule engine. The data aggregation rules can be various and can be dynamically adjusted according to actual use conditions, and the disclosure is not particularly limited to specific data aggregation rules. For example, assuming that different property intermediaries all upload the same set of property sources, multiple identical raw property source data may appear. In addition, it is assumed that multiple property brokers in the same property broker upload the same house source, and multiple same original house source data may also appear. In addition, it is assumed that the same property broker in the same property broker uploads the same house source many times, and a plurality of the same original house source data may also appear. To effectively remove duplicate assets, a data aggregation rule may specify that the same asset broker can only upload once for the same asset. For another example, if the ratio of the same house source characteristics in the plurality of house source data exceeds 80%, it is determined that duplicate house sources exist in the plurality of house sources, and the house source data with the most public house source characteristics is determined as the real house source data, and so on. If the house type and the house direction in the first house source data are empty and each house source feature in the second house source data is disclosed, it can be determined that the house source features disclosed by the second house source data are more than those of the first house source data.
Exemplarily, the feature vectors of the original room source data are determined, and the original room source data are grouped according to the distance between the feature vectors to obtain a room source set. For example, mapping the original house source features to a set digital interval to obtain feature vectors of the original house source features; calculating distances between the feature vectors; if the distance does not exceed a set first threshold, determining that the original house source data corresponding to the feature vectors participating in distance calculation belong to the same house source set; and if the distance exceeds a set first threshold, determining that the original house source data corresponding to the feature vector participating in the distance calculation belong to different house source sets.
The original house source features can be mapped into a set digital interval by adopting a set rule, and a feature vector corresponding to the original house source features is obtained. And grouping the original house source data by calculating the distance between the characteristic vectors. Wherein calculating the distance between the feature vectors may be calculating a difference of two feature vectors.
For example, a feature vector corresponding to one original house source data is obtained from each original house source data and is recorded as a reference vector. Respectively calculating the distance between the feature vector corresponding to the remaining original room source data and the reference vector, determining at least one target original room source data with the distance smaller than a set first threshold value, and classifying the at least one target original room source data and the original room source data corresponding to the reference vector into the same room source set to obtain a first room source set. And for the remaining original house source data which do not belong to the first house source set, randomly acquiring a feature vector corresponding to the original house source data as a new reference vector. Respectively calculating the distance between the feature vector corresponding to the remaining original room source data and the new reference vector, determining that the distance is smaller than at least one set first predicted new target original room source data, and classifying the at least one new target original room source data and the original room source data corresponding to the new reference vector into the same room source set to obtain a second room source set. For the remaining original house source data that do not belong to the two house source sets, they are grouped based on distance in a similar manner.
After the original house source data are grouped, determining first real house source data in the original house source data contained in each house source set according to a preset data aggregation rule. For example, after the original house sources are grouped in the above manner, the original house source data in the same house source set have the same or similar original house source characteristics. And if the preset data aggregation rule indicates that the original house source data with the most public house source characteristics in the same house source set are real house source data, comparing the number of the house source characteristics disclosed by each original house source data in each house source set. And if some fields in some original house source data are null, the original house source data are considered to not disclose house source characteristics corresponding to the null fields. And according to the comparison result, using the original house source data with the most public house source characteristics as the first real house source data. It should be noted that the number of the first real house source data may be plural.
And step 130, performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result.
It should be noted that the data cleansing processing is a process of determining and deleting abnormal house source data, wherein at least one house source feature of the abnormal house source data does not conform to a preset data cleansing rule, and the house source feature may be the original house source feature or the standard house source feature. The preset data cleaning rule can be set according to the actual situation and is preset in the setting rule engine. The data cleansing rules can be various and can be dynamically adjusted according to actual use conditions, and the disclosure is not particularly limited to specific data cleansing rules. For example, the preset data cleansing rule may be a house source that does not present a second-hand house with a price higher than a ten thousand yuan in a certain municipal district. For another example, the preset data cleansing rule may be that no second-hand house with an area higher than v square meters appears in the cell to which a certain house source belongs.
After the aggregation process, there may be some false house source data in the first real house source data. And performing data cleaning processing on the first real house source data based on the original house source characteristics to eliminate abnormal house source data.
In the embodiment of the present disclosure, the data cleaning processing on the first real house source data may be data cleaning processing on the first real house source data based on the original house source characteristics according to a preset data cleaning rule.
For example, the original house source characteristics of each first real house source data are matched with the preset data cleansing rule. And determining second real house source data of which the original house source characteristics accord with the preset data cleaning rule according to the matching result, thereby determining abnormal house source data of which the original house source characteristics do not accord with the preset data cleaning rule. And deleting abnormal house source data of which the original house source characteristics do not accord with preset data cleaning rules in the first real house source data, wherein the abnormal house source data comprise the first real house source data with the missing original house source characteristics or the first real house source data with the wrong original house source characteristics and the like.
Step 140, mapping the original house source characteristics in the second real house source data to standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
Illustratively, the original house source characteristics in each second real house source data are extracted, and the original house source characteristics are mapped to the standard house source characteristics according to the set field table. And combining the standard house source characteristics according to a set sequence to obtain first standard house source data. The set field table is a table of standard format of the house source characteristics, the standard format can be defined by research personnel, and mapping rules of the original house source characteristics and the standard house source characteristics are stored in the set field table.
According to the technical scheme of the embodiment of the disclosure, original house source characteristics of original house source data with different formats acquired by a plurality of data sources are extracted, data aggregation processing is carried out on the original house source data based on the original house source characteristics, and first real house source data are obtained based on aggregation results; performing data cleaning processing on the first real house source data based on the original house source characteristics, and obtaining second real house source data based on a cleaning result; original house source characteristics in the second real house source data are mapped into standard house source characteristics according to the set field table, and first standard house source data are obtained based on the standard house source characteristics, so that the house source data from different sources are subjected to duplication removal, verification and standardization processing, the processed house source data are displayed for a user in a unified format, the problem that a large amount of repeated house source data and false house source data exist due to the fact that multi-source heterogeneous data are directly integrated is avoided, the accuracy of the house source data displayed for the user can be improved, and the user viscosity of the network platform is improved.
Fig. 2 is a flowchart of another processing method for multi-source heterogeneous data according to an embodiment of the present disclosure, and the embodiment is specifically optimized based on various alternatives in the foregoing embodiments. As shown in fig. 2, the method comprises the steps of:
step 201, determining original house source characteristics in original house source data.
Step 202, mapping the original house source characteristics to a set digital interval to obtain a characteristic vector of the original house source characteristics.
For example, it is specified that the region information in the geographic position of the second-hand house in beijing is mapped to 1, the region information in the geographic position of the second-hand house in shanghai is mapped to 2, the region information in the geographic position of the second-hand house in guangzhou is mapped to 3, the region information in the geographic position of the second-hand house in shenzhen is mapped to 4, and so on, and the region information in the geographic position of the second-hand house in the description information of the second-hand house is mapped to the set number interval by using the set rule.
In addition, for the house resources of second-hand houses in Beijing city, mapping processing can be carried out according to different administrative regions. For example, the administrative region in the geographical position of the second-hand house in the predefined hai lake area is mapped to 1, the administrative region in the geographical position of the house source of the second-hand house in the sunny area is mapped to 2, the administrative region in the geographical position of the house source of the second-hand house in the western city area is mapped to 3, and so on, the administrative region in the geographical position in the description information of the second-hand house is mapped to the set number interval by using the set rule.
In addition, the number is carried out according to the cell where the house source with the cell name of the second-hand house is located, and the cell number is used as a numerical value obtained by mapping the cell which belongs to the description information of the second-hand house. In addition, the house type in the description information of the second-hand house is mapped to the set digital interval according to the house type information. For example, it may be specified that the house type in the description information of the second-hand house in the one-room and one-hall is mapped to 11, the house type in the description information of the second-hand house in the one-room and two-hall is mapped to 12, the house type in the description information of the second-hand house in the two-room and one-hall is mapped to 21, the house type in the description information of the second-hand house in the three-room and one-hall is mapped to 31, the house type in the description information of the second-hand house in the three-room and two-hall is mapped to 32, and so on.
In addition, the numerical value after the floor mapping is determined according to the floor in the description information of the second-hand house. For example, assuming that the source of the second-hand house is located on the first floor, the floor in the description information of the second-hand house is mapped to 1.
In addition, any numerical value from 1 to 8 is given to east, south, west, north, southeast, northeast, southwest and northwest 8 directions in advance, so that the mapping of the orientation in the description information of the second-hand house to the set numerical interval is realized. For example, if the second-hand house is south, the orientation in the description information of the second-hand house is mapped to 1.
In addition, the price in the description information in the second-hand house is mapped to the set number interval according to the price interval. For example, price intervals of 0 to 100 ten thousand, 101 to 200 ten thousand, 201 to 300 ten thousand, and … are preset, and corresponding mapping values are 1,2,3,4, and …, respectively, so that the price in the description information of the second-hand house is mapped to the set number interval. After the mapping processing, the room source information of the second-hand room can be represented by a one-dimensional vector, and the one-dimensional vector is a feature vector corresponding to the room source information. Optionally, the information of each room source may be stored in an array form.
It should be noted that when some item of description information in the house source information is empty, the item of description information is mapped to 0, for example, when the orientation information in the house source information is empty, the orientation information is mapped to 0. In addition, the numerical values of the mapping operations recited in the embodiments of the present disclosure are examples and are not limiting.
And step 203, calculating the distance between the feature vectors.
Step 204, determining whether the distance between the feature vectors corresponding to the room source information exceeds a set first threshold, if so, executing step 205, otherwise, executing step 206.
And step 205, determining that the feature vectors participating in the distance calculation belong to different house source sets.
And if the distance exceeds a set first threshold value, determining that the characteristic vectors participating in the distance calculation belong to different house source sets.
And step 206, determining that the feature vectors participating in the distance calculation belong to the same room source set.
And if the distance does not exceed the set first threshold, determining that the characteristic vectors participating in the distance calculation belong to the same house source set.
And step 207, determining first real house source data in the original house source data contained in each house source set according to a preset data aggregation rule.
Illustratively, at least one first real house source data meeting a preset data aggregation rule is screened out from each house source set according to the preset data aggregation rule.
And 208, performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result.
And 209, mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
Step 210, according to a preset data aggregation rule, performing data aggregation processing on the first standard house source data based on standard house source characteristics, and determining second standard house source data in the first standard house source data according to an aggregation result.
Illustratively, the standard house source characteristics of each first standard house source data are matched with a preset aggregation rule, and according to a matching result, the standard house source characteristics are determined to meet second standard house source data matched with the preset aggregation rule.
And step 211, performing data cleaning processing on the second standard house source data based on the standard house source characteristics according to a preset data cleaning rule, determining third standard house source data according to a cleaning result, and displaying the third standard house source data.
In this embodiment of the disclosure, the data cleansing processing on the second standard room source data based on the standard room source characteristics may be a process of determining and deleting abnormal room source data in the second standard room source data, where at least one standard room source characteristic of the abnormal room source data does not conform to a preset data cleansing rule.
Illustratively, the standard house source characteristics of each second standard house source data are matched with the preset data cleansing rule, and according to the matching result, the third standard house source data with the standard house source characteristics conforming to the preset data cleansing rule is determined. And displaying the third standard house source data on the house source display system so that the third standard house source data with a uniform format can be browsed when a user accesses the house source display system through the Internet.
Step 212, obtaining user feedback information aiming at the third standard room source data, and determining false room source data in the third standard room source data according to the user feedback information.
In the present disclosure, the user feedback information may be feedback information for a false house source. The target third standard house source data which is intensively fed back as the false house source can be marked as false house source data according to the user feedback information.
And 213, deleting the false room source data in the third standard room source data to obtain fourth standard room source data, and displaying the fourth standard room source data.
According to the technical scheme, original house source data before being displayed on the house source display system are subjected to data aggregation processing and data cleaning processing based on original house source characteristics, after the original house source characteristics are mapped into standard house source characteristics, the standard house source data are subjected to data aggregation processing and data cleaning processing based on the standard house source characteristics, and after the standard house source data are displayed on the house source display system, false house source data in the displayed standard house source data can be deleted based on user feedback information, authenticity of house sources displayed on the house source display system is effectively improved, and the situation that false house sources and repeated house sources are uploaded to the house source display system is avoided.
On the basis of the above alternatives, further adding an original data identifier for obtaining the original house source data after obtaining the first standard house source data based on the standard house source characteristics, where the original data identifier is identification information of the original house source data in an original data source, and the original data source can be understood as a source of the original data, including the internet or a house property broker, etc.; and acquiring a standard data identifier of the first standard house source data, and storing the standard data identifier and the original data identifier in an associated manner. Since the original data identifier is the identification information of the original house source data in the original data source, the original house source data can be uniquely represented. And storing the original data identifier of at least one original house source data and the standard data identifier of the first standard house source data in a correlation manner, so that synchronous modification of data can be realized. For example, assuming that a plurality of original data identifiers of original house source data are all associated to a certain standard data identifier, when a modification operation of an original house source feature of one of the original house source data is detected, at least one original data identifier of the original house source data associated with the standard data identifier of the first standard house source data is obtained. And modifying the corresponding original house source characteristics in the original house source data corresponding to the original data identification.
Fig. 3 is a block diagram of a processing apparatus for multi-source heterogeneous data according to an embodiment of the present disclosure, which may be implemented by software and/or hardware, and is generally integrated in an electronic device, and may present, to a user, house source data from different sources and in different formats in a unified format by performing a processing method for multi-source heterogeneous data. As shown in fig. 3, the apparatus includes:
a characteristic determining module 310, configured to determine an original house source characteristic in original house source data;
the data aggregation module 320 is configured to perform data aggregation processing on the original house source data based on the original house source characteristics, and determine first real house source data in the original house source data according to an aggregation result;
a data cleansing module 330, configured to perform data cleansing processing on the first real room source data based on the original room source characteristics, and determine, according to a cleansing result, second real room source data in the first real room source data, where the data cleansing processing is a process of determining and deleting abnormal room source data, at least one room source characteristic of the abnormal room source data does not conform to a preset data cleansing rule, and the room source characteristic is the original room source characteristic or a standard room source characteristic;
the feature mapping module 340 is configured to map an original room source feature in the second real room source data to a standard room source feature, and obtain first standard room source data based on the standard room source feature.
According to the processing device for multi-source heterogeneous data, the original house source characteristics of original house source data with different formats acquired by a plurality of data sources are extracted, data aggregation processing is performed on the original house source data based on the original house source characteristics, and first real house source data are obtained based on aggregation results; performing data cleaning processing on the first real house source data based on the original house source characteristics, and obtaining second real house source data based on a cleaning result; original house source characteristics in the second real house source data are mapped into standard house source characteristics according to the setting field table, and first standard house source data are obtained based on the standard house source characteristics, so that the house source data from different sources are subjected to duplication removal, verification and standardization processing, the processed house source data are displayed for a user in a unified format, the problem that a large amount of repeated house source data and false house source data exist in the process of directly integrating multi-source heterogeneous data is avoided, the accuracy of the house source data displayed for the user can be improved, and the user viscosity of the network platform is improved.
In the foregoing solution, optionally, the data aggregation module 320 includes:
the room source grouping submodule is used for determining the characteristic vectors of the original room source data and carrying out grouping processing on the original room source data according to the distance between the characteristic vectors to obtain a room source set;
and the house resource aggregation submodule is used for determining first real house resource data in the original house resource data contained in each house resource set according to a preset data aggregation rule.
Further, the room source grouping sub-module is specifically configured to:
mapping the original house source characteristics to a set digital interval to obtain characteristic vectors of the original house source characteristics;
calculating distances between the feature vectors;
if the distance does not exceed a set first threshold, determining that the original house source data corresponding to the feature vectors participating in distance calculation belong to the same house source set;
and if the distance exceeds a set first threshold, determining that the original house source data corresponding to the feature vector participating in the distance calculation belong to different house source sets.
In the foregoing scheme, optionally, the data cleansing module 330 is specifically configured to:
matching the original house source characteristics of each first real house source data with a preset data cleaning rule;
and determining second real house source data of which the original house source characteristics accord with the preset data cleaning rule according to the matching result.
In the foregoing solution, optionally, the processing apparatus further includes:
after first standard house source data are obtained based on the standard house source characteristics, data aggregation processing is carried out on the first standard house source data based on the standard house source characteristics, and second standard house source data in the first standard house source data are determined according to an aggregation result.
In the above solution, optionally, the processing apparatus further includes:
after second standard house source data in the first standard house source data are determined according to the aggregation result, data cleaning processing is carried out on the second standard house source data based on the standard house source characteristics, third standard house source data are determined according to the cleaning result, and the third standard house source data are displayed.
Further, the processing apparatus further includes:
after the third standard room source data are displayed, user feedback information aiming at the third standard room source data is obtained, and false room source data in the third standard room source data are determined according to the user feedback information;
and deleting the false room source data in the third standard room source data to obtain fourth standard room source data, and displaying the fourth standard room source data.
In the above solution, optionally, the processing apparatus further includes:
after first standard house source data are obtained based on the standard house source characteristics, acquiring an original data identifier of the original house source data, wherein the original data identifier is identifier information of the original house source data in an original data source;
and acquiring a standard data identifier of the first standard house source data, and storing the standard data identifier and the original data identifier in an associated manner.
Optionally, the processing apparatus for multi-source heterogeneous data further includes a capture module, configured to obtain original house source data from different original data sources, and store the original house source data in the local database.
The present disclosure specifically explains the workflow of a processing apparatus for multi-source heterogeneous data by the following block diagrams. Fig. 4 is a block diagram of a processing flow of multi-source heterogeneous data according to an embodiment of the present disclosure. As shown in fig. 4, after the data acquisition event is triggered, the capture module 410 acquires original house source data from different original data sources and stores the original house source data in the local database. For example, a data acquisition event may be triggered periodically to acquire raw house source data from a raw data source. For another example, the data acquisition event may be triggered at a set period to acquire raw house source data from a raw data source. For another example, a data acquisition event may be triggered when it is detected that the original data source publishes the original house source data, so as to acquire the original house source data from the original data source. It should be noted that there are many ways to trigger the data acquisition event, and the disclosure is not limited in particular. The characteristic determining module 420 sequentially reads the original house source data from the local database, and extracts the original house source identifier and the original house source characteristic of the original house source data. Which original source characteristics are extracted may be determined from a set field table maintained in the presentation module 460. And the data aggregation module 430 performs data aggregation processing on the original house source data based on the original house source characteristics to determine first real house source data from the original house source data. And the data cleaning module 440 performs data cleaning processing on the first real house source data based on the original house source characteristics to determine second real house source data from the first real house source data. The feature mapping module 450 maps the original house source feature in the second real house source data into a standard house source feature according to the set field table, and obtains the first standard house source data based on the standard house source feature. The first standard house source data is presented through the presentation module 460. Since a standard house source data may correspond to multiple original house source data, that is, multiple original house source identifiers are mapped to one standard house source identifier, in order to facilitate data synchronization, the synchronization module 470 is used to store the standard data identifier and the original data identifier in an associated manner.
In order to improve the accuracy of the displayed data, after the original house source characteristics are mapped to the standard house source characteristics, the data aggregation module 430 may perform data aggregation processing on the standard house source data based on the standard house source characteristics according to a preset data aggregation rule to obtain second standard house source data. Optionally, the data cleaning module 440 performs data cleaning processing on the second standard house source data based on the standard house source characteristics according to a preset data cleaning rule, so as to obtain third standard house source data. The third standard house source data is presented through the presentation module 460.
The embodiment of the disclosure further provides an electronic device, and the processing device of the multi-source heterogeneous data provided by the embodiment of the disclosure can be integrated in the electronic device. The electronic device of the disclosed embodiment includes a terminal device or a server, wherein the terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device may include: one or more processors;
a memory for storing one or more programs,
when executed by the one or more programs, cause the one or more processors to implement a method comprising:
determining original house source characteristics in original house source data;
performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result;
performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result, wherein the data cleaning processing is a process of determining and deleting abnormal house source data, at least one house source characteristic of the abnormal house source data does not accord with a preset data cleaning rule, and the house source characteristic is the original house source characteristic or a standard house source characteristic;
and mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
It should be understood that the illustrated electronic device 500 is merely an example, and that the electronic device 500 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The following describes in detail an electronic device integrated with a processing apparatus for multi-source heterogeneous data according to this embodiment.
As shown in fig. 5, electronic device 500 may include a processor (e.g., central processing unit, graphics processor, etc.) 520 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)530 or a program loaded from memory 510 into a Random Access Memory (RAM) 540. In the RAM540, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processor 520, the ROM530, and the RAM540 are connected to each other through a bus 550. An input/output (I/O) interface 560 is also connected to bus 550.
Generally, the following devices may be connected to the I/O interface 560: input devices 580 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; output devices 590 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, etc.; a memory 510 including, for example, a tape, a hard disk, and the like; the electronic device 500 may also include a communications apparatus 570. The communication device 570 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing a method for processing multi-source heterogeneous data provided by embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from the memory, or installed from the ROM. When executed by a processor, performs the above-described functions defined in the processing method of multi-source heterogeneous data of the embodiment of the present disclosure.
It should be noted that the computer readable medium in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the embodiments of the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method of:
determining original house source characteristics in original house source data;
performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result;
performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result;
and mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics.
Of course, the storage medium provided by the embodiments of the present disclosure contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of false room source identification as described above, and may also perform related operations in the processing method of multi-source heterogeneous data provided by any embodiment of the present disclosure.
The processing device, the storage medium and the electronic device for multi-source heterogeneous data provided in the above embodiments can execute the processing method for multi-source heterogeneous data provided in any embodiment of the present disclosure, and have corresponding functional modules and beneficial effects for executing the method. Technical details that are not described in detail in the above embodiments may be referred to a processing method of multi-source heterogeneous data provided in any embodiment of the present disclosure.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and units described in the embodiments of the present disclosure may be implemented by software or hardware. The names of the modules and units do not limit the modules or units in some cases.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. A multi-source heterogeneous data processing method is characterized by comprising the following steps:
determining original house source characteristics in original house source data;
performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result;
performing data cleaning processing on the first real house source data based on the original house source characteristics, and determining second real house source data in the first real house source data according to a cleaning result, wherein the data cleaning processing is a process of determining and deleting abnormal house source data, at least one house source characteristic of the abnormal house source data does not accord with a preset data cleaning rule, and the house source characteristic is the original house source characteristic or a standard house source characteristic;
mapping the original house source characteristics in the second real house source data into standard house source characteristics, and obtaining first standard house source data based on the standard house source characteristics;
performing data aggregation processing on the original house source data based on the original house source characteristics, and determining first real house source data in the original house source data according to an aggregation result, including:
determining the characteristic vectors of the original house source data, and performing grouping processing on the original house source data according to the distance between the characteristic vectors to obtain a house source set;
determining first real house source data in original house source data contained in each house source set according to a preset data aggregation rule;
the data aggregation process is a process of aggregating house source data having the same or similar house source characteristics together.
2. The method according to claim 1, wherein determining feature vectors of the original room source data, and performing grouping processing on the original room source data according to distances between the feature vectors to obtain a room source set comprises:
mapping the original house source characteristics to a set digital interval to obtain characteristic vectors of the original house source characteristics;
calculating distances between the feature vectors;
if the distance does not exceed a set first threshold value, determining that the original house source data corresponding to the characteristic vector participating in distance calculation belong to the same house source set;
and if the distance exceeds a set first threshold, determining that the original house source data corresponding to the feature vector participating in the distance calculation belong to different house source sets.
3. The method according to claim 1, wherein the data cleansing processing is performed on the first real-life house source data based on the original house source characteristics, and the determination of the second real-life house source data in the first real-life house source data according to the cleansing result comprises:
matching the original house source characteristics of each first real house source data with a preset data cleaning rule;
and determining second real house source data of which the original house source characteristics accord with the preset data cleaning rule according to the matching result.
4. The method of claim 1, after obtaining the first standard house source data based on the standard house source characteristics, further comprising:
and performing data aggregation processing on the first standard house source data based on standard house source characteristics, and determining second standard house source data in the first standard house source data according to an aggregation result.
5. The method of claim 4, after determining second standard source data in the first standard source data according to the aggregation result, further comprising:
and performing data cleaning processing on the second standard house source data based on the standard house source characteristics, determining third standard house source data according to a cleaning result, and displaying the third standard house source data.
6. The method of claim 5, further comprising, after presenting the third standard source data:
acquiring user feedback information aiming at the third standard room source data, and determining false room source data in the third standard room source data according to the user feedback information;
and deleting the false room source data in the third standard room source data to obtain fourth standard room source data, and displaying the fourth standard room source data.
7. The method according to any one of claims 1 to 6, further comprising, after obtaining first standard house-source data based on the standard house-source characteristics:
acquiring an original data identifier of the original house source data, wherein the original data identifier is identifier information of the original house source data in an original data source;
and acquiring a standard data identifier of the first standard house source data, and storing the standard data identifier and the original data identifier in an associated manner.
8. A device for processing multi-source heterogeneous data, comprising:
the characteristic determining module is used for determining original house source characteristics in the original house source data;
the data aggregation module is used for carrying out data aggregation processing on the original house source data based on the original house source characteristics and determining first real house source data in the original house source data according to an aggregation result;
the data cleaning module is used for performing data cleaning processing on the first real house source data based on the original house source characteristics and determining second real house source data in the first real house source data according to a cleaning result, wherein the data cleaning processing is a process of determining and deleting abnormal house source data, at least one house source characteristic of the abnormal house source data does not accord with a preset data cleaning rule, and the house source characteristics are the original house source characteristics or standard house source characteristics;
the characteristic mapping module is used for mapping the original house source characteristic in the second real house source data into a standard house source characteristic and obtaining first standard house source data based on the standard house source characteristic;
the data aggregation module comprises:
the room source grouping submodule is used for determining the characteristic vectors of the original room source data and carrying out grouping processing on the original room source data according to the distance between the characteristic vectors to obtain a room source set;
the house source aggregation sub-module is used for determining first real house source data in original house source data contained in each house source set according to a preset data aggregation rule;
the data aggregation process is a process of aggregating house source data having the same or similar house source characteristics together.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for processing multi-source heterogeneous data according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for processing multi-source heterogeneous data according to any one of claims 1 to 7 when executing the computer program.
CN201811603354.XA 2018-12-26 2018-12-26 Multi-source heterogeneous data processing method, device, medium and electronic equipment Active CN110618982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811603354.XA CN110618982B (en) 2018-12-26 2018-12-26 Multi-source heterogeneous data processing method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811603354.XA CN110618982B (en) 2018-12-26 2018-12-26 Multi-source heterogeneous data processing method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110618982A CN110618982A (en) 2019-12-27
CN110618982B true CN110618982B (en) 2022-09-30

Family

ID=68920302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811603354.XA Active CN110618982B (en) 2018-12-26 2018-12-26 Multi-source heterogeneous data processing method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110618982B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552869B (en) * 2020-03-31 2022-04-22 北京城市网邻信息技术有限公司 House source information display method and device
CN111581182A (en) * 2020-04-21 2020-08-25 北京龙云科技有限公司 Data cleaning method and device
CN111798251A (en) * 2020-07-03 2020-10-20 北京字节跳动网络技术有限公司 Verification method and device of house source data and electronic equipment
CN112699289A (en) * 2020-12-30 2021-04-23 上海瑞家信息技术有限公司 House resource information aggregation display method and device, electronic equipment and computer readable medium
CN113192178B (en) * 2021-04-20 2024-02-09 北京异乡旅行网络科技有限公司 House source information processing method, device and system
CN113450163A (en) * 2021-08-30 2021-09-28 贝壳找房(北京)科技有限公司 House source presentation condition analysis method and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197312A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Obtain source of houses data method, device, equipment and readable storage medium storing program for executing
CN108197311A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Source of houses data aggregate methods of exhibiting, device, equipment and readable storage medium storing program for executing
CN108536825A (en) * 2018-04-10 2018-09-14 苏州市中地行信息技术有限公司 A method of whether identification source of houses data repeat
CN108763570A (en) * 2018-06-05 2018-11-06 北京拓世寰宇网络技术有限公司 A kind of method and device identifying the identical source of houses
CN109035078A (en) * 2018-08-31 2018-12-18 北京诸葛找房信息技术有限公司 A kind of source of houses polymerization based on the similar calculating of various dimensions information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7516128B2 (en) * 2006-11-14 2009-04-07 International Business Machines Corporation Method for cleansing sequence-based data at query time
US20110186633A1 (en) * 2008-08-21 2011-08-04 Akihiro Okabe Electronic shelf label system, commodity price management device, portable terminal device, electronic shelf label device, commodity price management method, commodity price update method, commodity price management program, and commodity price update program
CN106484774B (en) * 2016-09-12 2020-10-20 北京歌华有线电视网络股份有限公司 Correlation method and system for multi-source video metadata
CN106874381B (en) * 2017-01-09 2020-12-22 重庆邮电大学 Radio environment map data processing system based on Hadoop
CN107329852B (en) * 2017-06-09 2020-09-04 广州虎牙信息科技有限公司 Hive-based data processing method and system and terminal equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197312A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Obtain source of houses data method, device, equipment and readable storage medium storing program for executing
CN108197311A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Source of houses data aggregate methods of exhibiting, device, equipment and readable storage medium storing program for executing
CN108536825A (en) * 2018-04-10 2018-09-14 苏州市中地行信息技术有限公司 A method of whether identification source of houses data repeat
CN108763570A (en) * 2018-06-05 2018-11-06 北京拓世寰宇网络技术有限公司 A kind of method and device identifying the identical source of houses
CN109035078A (en) * 2018-08-31 2018-12-18 北京诸葛找房信息技术有限公司 A kind of source of houses polymerization based on the similar calculating of various dimensions information

Also Published As

Publication number Publication date
CN110618982A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN110618982B (en) Multi-source heterogeneous data processing method, device, medium and electronic equipment
KR102121361B1 (en) Method and device for identifying the type of geographic location where the user is located
US10523768B2 (en) System and method for generating, accessing, and updating geofeeds
CN110633381B (en) Method and device for identifying false house source, storage medium and electronic equipment
CN110633726A (en) Room source identification method and device, storage medium and electronic equipment
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN110619039A (en) Method and device for checking house property information, storage medium and electronic equipment
CN110515968B (en) Method and apparatus for outputting information
WO2019047524A1 (en) Information processing method and apparatus
CN111522927A (en) Entity query method and device based on knowledge graph
CN110619553B (en) Commodity information display method and device, electronic equipment and storage medium
CN112860662A (en) Data blood relationship establishing method and device, computer equipment and storage medium
CN111597466A (en) Display method and device and electronic equipment
CN114661811A (en) Data display method and device, electronic equipment and storage medium
CN111311305A (en) Method and system for analyzing user public traffic band based on user track
CN111143408A (en) Event processing method and device based on business rules
JP2023040276A (en) Information processor, information processing method, and program
CN110458743B (en) Community management method, device, equipment and storage medium based on big data analysis
CN112035581B (en) Model-based task processing method, device, equipment and medium
CN109785178B (en) Method and apparatus for generating information
CN111339394B (en) Method and device for acquiring information
CN111241368B (en) Data processing method, device, medium and equipment
CN113239889A (en) Image recognition method, device, equipment, storage medium and computer program product
CN112699289A (en) House resource information aggregation display method and device, electronic equipment and computer readable medium
CN111222048A (en) User number query calculation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant